search
HomeTechnology peripheralsAI2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology

With the development of large language models (LLM), practitioners face more challenges. How to avoid harmful replies from LLM? How to quickly delete copyright-protected content in training data? How to reduce LLM hallucinations (false facts)? How to quickly iterate LLM after data policy changes? These issues are critical to the safe and trustworthy deployment of LLM under the general trend of increasingly mature legal and ethical compliance requirements for artificial intelligence.

The current mainstream solution in the industry is to fine-tune the comparison data (positive samples and negative samples) by using reinforcement learning to align LLM (alignment) to ensure that the output of LLM is consistent with human Expectations and values. However, this alignment process is often limited by data collection and computing resources

ByteDance proposed a method for LLM to perform forgetting learning for alignment. This article studies how to perform "forgetting" operations on LLM, that is, forgetting harmful behaviors or machine unlearning (Machine Unlearning). The author shows the obvious effect of forgetting learning on three LLM alignment scenarios: (1) removing harmful output; (2) removing infringement protection content; (3) eliminating the big language LLM illusion

Forgetting learning has three advantages: (1) Only negative samples (harmful samples) are needed, and the negative samples are much simpler to collect than the positive samples (high-quality manual handwriting output) required by RLHF (such as red team testing or user report); (2) low computational cost; (3) forgetting learning is particularly effective if it is known which training samples lead to harmful behavior of LLM.

The author's argument is that for practitioners with limited resources, they should prioritize stopping producing harmful outputs rather than trying to pursue overly idealized outputs and forgetting that learning is a a convenient method. Despite having only negative samples, research shows that forget learning can still achieve better alignment performance than reinforcement learning and high-temperature high-frequency algorithms using only 2% of the computation time

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

  • Paper address: https://arxiv.org/abs/2310.10683
  • Code address: https: //github.com/kevinyaobytedance/llm_unlearn

Usage scenarios

With limited resources, we can Take this approach to maximize your advantages. When we don’t have the budget to hire people to write high-quality samples or the computing resources are insufficient, we should prioritize stopping LLM from producing harmful output rather than trying to make it produce beneficial output

harmful output caused by The damage cannot be compensated by beneficial output. If a user asks an LLM 100 questions and the answers he gets are harmful, he will lose trust, no matter how many helpful answers the LLM provides later. The expected output of harmful problems may be spaces, special characters, meaningless strings, etc. In short, it must be harmless text

shows three successful cases of LLM forgetting learning: (1) Stop generating harmful replies (please rewrite the content into Chinese, the original sentence does not need to appear); this is similar to the RLHF scenario, but the difference is that the goal of this method is to generate harmless replies, not helpful replies. This is the best that can be expected when there are only negative samples. (2) After training with infringing data, LLM successfully deleted the data and could not retrain LLM due to cost factors; (3) LLM successfully forgot the "illusion"

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Please rewrite the content into Chinese, the original sentence does not need to appear

Method

In the fine-tuning step t, The update of LLM is as follows:

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The first loss is gradient descent (gradient descent), the purpose is to forget harmful samples:

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术 is a harmful prompt (prompt), RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术 is the corresponding harmful reply. The overall loss reversely increases the loss of harmful samples, which makes LLM "forget" harmful samples.

The second loss is for random mismatches, which requires LLM to predict irrelevant replies in the presence of harmful cues. This is similar to label smoothing [2] in classification. The purpose is to make LLM better forget harmful output on harmful prompts. At the same time, experiments have proven that this method can improve the output performance of LLM under normal circumstances

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The third loss is to maintain performance on normal tasks:

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Similar to RLHF, calculating KL divergence on pre-trained LLM can better maintain LLM performance.

Additionally, all gradient ascent and descent is done only on the output (y) part, not on the tip-output pair (x, y) like RLHF.

Application scenarios: forgetting harmful content, etc.

This article uses PKU-SafeRLHF data as forgotten data, TruthfulQA as normal data, the content of Figure 2 The need for rewriting shows the harmful rate of LLM output on unlearned harmful cues after forgetting learning. The methods used in this article are GA (Gradient Ascent and GA Mismatch: Gradient Ascent Random Mismatch). The harmful rate after forgetting learning is close to zero.

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The content of the second picture needs to be rewritten

The third picture shows harmful prompts (not Forgotten) output, which has not been seen before. Even for harmful cues that have not been forgotten, the harmful rate of LLM is close to zero, which proves that LLM forgets not only specific samples, but generalizes to content containing harmful concepts

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Figure 3

The performance of LLM on normal samples remains similar to that before forgetting, and it also has the following characteristics

Table 1 shows the generated samples. It can be seen that under the harmful prompt, the samples generated by LLM are meaningless strings, that is, harmless output.

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Table 1

In other scenarios, such as forgetting infringing content and forgetting hallucinations, this method The original application text is described in detail

RLHF comparison

What needs to be rewritten Yes: The second table shows the comparison between this method and RLHF. RLHF uses positive examples, while the forgetting learning method only uses negative examples, so the method is at a disadvantage at the beginning. But even so, forgetting learning can still achieve alignment performance similar to RLHF

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The content that needs to be rewritten is: the second table

What needs to be rewritten: The fourth picture shows the comparison of calculation times. This method only requires 2% of the calculation time of RLHF.

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Content that needs to be rewritten: The fourth picture

Even with only negative samples, the method using forgetting learning can achieve a harmless rate comparable to RLHF and only use 2% of the computing power. Therefore, if the goal is to stop outputting harmful content, forgetting learning is more efficient than RLHF

Conclusion

This study is the first of its kind Exploring forgetting learning on LLM. The findings show that learning to forget is a promising approach to alignment, especially when practitioners are under-resourced. The paper shows three situations: forgetting learning can successfully delete harmful replies, delete infringing content and eliminate illusions. Research shows that even with only negative samples, forgetting learning can still achieve similar alignment effects to RLHF using only 2% of the calculation time of RLHF

The above is the detailed content of 2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Meta's New AI Assistant: Productivity Booster Or Time Sink?Meta's New AI Assistant: Productivity Booster Or Time Sink?May 01, 2025 am 11:18 AM

Meta has joined hands with partners such as Nvidia, IBM and Dell to expand the enterprise-level deployment integration of Llama Stack. In terms of security, Meta has launched new tools such as Llama Guard 4, LlamaFirewall and CyberSecEval 4, and launched the Llama Defenders program to enhance AI security. In addition, Meta has distributed $1.5 million in Llama Impact Grants to 10 global institutions, including startups working to improve public services, health care and education. The new Meta AI application powered by Llama 4, conceived as Meta AI

80% Of Gen Zers Would Marry An AI: Study80% Of Gen Zers Would Marry An AI: StudyMay 01, 2025 am 11:17 AM

Joi AI, a company pioneering human-AI interaction, has introduced the term "AI-lationships" to describe these evolving relationships. Jaime Bronstein, a relationship therapist at Joi AI, clarifies that these aren't meant to replace human c

AI Is Making The Internet's Bot Problem Worse. This $2 Billion Startup Is On The Front LinesAI Is Making The Internet's Bot Problem Worse. This $2 Billion Startup Is On The Front LinesMay 01, 2025 am 11:16 AM

Online fraud and bot attacks pose a significant challenge for businesses. Retailers fight bots hoarding products, banks battle account takeovers, and social media platforms struggle with impersonators. The rise of AI exacerbates this problem, rende

Selling To Robots: The Marketing Revolution That Will Make Or Break Your BusinessSelling To Robots: The Marketing Revolution That Will Make Or Break Your BusinessMay 01, 2025 am 11:15 AM

AI agents are poised to revolutionize marketing, potentially surpassing the impact of previous technological shifts. These agents, representing a significant advancement in generative AI, not only process information like ChatGPT but also take actio

How Computer Vision Technology Is Transforming NBA Playoff OfficiatingHow Computer Vision Technology Is Transforming NBA Playoff OfficiatingMay 01, 2025 am 11:14 AM

AI's Impact on Crucial NBA Game 4 Decisions Two pivotal Game 4 NBA matchups showcased the game-changing role of AI in officiating. In the first, Denver's Nikola Jokic's missed three-pointer led to a last-second alley-oop by Aaron Gordon. Sony's Haw

How AI Is Accelerating The Future Of Regenerative MedicineHow AI Is Accelerating The Future Of Regenerative MedicineMay 01, 2025 am 11:13 AM

Traditionally, expanding regenerative medicine expertise globally demanded extensive travel, hands-on training, and years of mentorship. Now, AI is transforming this landscape, overcoming geographical limitations and accelerating progress through en

Key Takeaways From Intel Foundry Direct Connect 2025Key Takeaways From Intel Foundry Direct Connect 2025May 01, 2025 am 11:12 AM

Intel is working to return its manufacturing process to the leading position, while trying to attract fab semiconductor customers to make chips at its fabs. To this end, Intel must build more trust in the industry, not only to prove the competitiveness of its processes, but also to demonstrate that partners can manufacture chips in a familiar and mature workflow, consistent and highly reliable manner. Everything I hear today makes me believe Intel is moving towards this goal. The keynote speech of the new CEO Tan Libo kicked off the day. Tan Libai is straightforward and concise. He outlines several challenges in Intel’s foundry services and the measures companies have taken to address these challenges and plan a successful route for Intel’s foundry services in the future. Tan Libai talked about the process of Intel's OEM service being implemented to make customers more

AI Gone Wrong? Now There's Insurance For ThatAI Gone Wrong? Now There's Insurance For ThatMay 01, 2025 am 11:11 AM

Addressing the growing concerns surrounding AI risks, Chaucer Group, a global specialty reinsurance firm, and Armilla AI have joined forces to introduce a novel third-party liability (TPL) insurance product. This policy safeguards businesses against

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools