


2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology
With the development of large language models (LLM), practitioners face more challenges. How to avoid harmful replies from LLM? How to quickly delete copyright-protected content in training data? How to reduce LLM hallucinations (false facts)? How to quickly iterate LLM after data policy changes? These issues are critical to the safe and trustworthy deployment of LLM under the general trend of increasingly mature legal and ethical compliance requirements for artificial intelligence.
The current mainstream solution in the industry is to fine-tune the comparison data (positive samples and negative samples) by using reinforcement learning to align LLM (alignment) to ensure that the output of LLM is consistent with human Expectations and values. However, this alignment process is often limited by data collection and computing resources
ByteDance proposed a method for LLM to perform forgetting learning for alignment. This article studies how to perform "forgetting" operations on LLM, that is, forgetting harmful behaviors or machine unlearning (Machine Unlearning). The author shows the obvious effect of forgetting learning on three LLM alignment scenarios: (1) removing harmful output; (2) removing infringement protection content; (3) eliminating the big language LLM illusion
Forgetting learning has three advantages: (1) Only negative samples (harmful samples) are needed, and the negative samples are much simpler to collect than the positive samples (high-quality manual handwriting output) required by RLHF (such as red team testing or user report); (2) low computational cost; (3) forgetting learning is particularly effective if it is known which training samples lead to harmful behavior of LLM.
The author's argument is that for practitioners with limited resources, they should prioritize stopping producing harmful outputs rather than trying to pursue overly idealized outputs and forgetting that learning is a a convenient method. Despite having only negative samples, research shows that forget learning can still achieve better alignment performance than reinforcement learning and high-temperature high-frequency algorithms using only 2% of the computation time
- Paper address: https://arxiv.org/abs/2310.10683
- Code address: https: //github.com/kevinyaobytedance/llm_unlearn
Usage scenarios
With limited resources, we can Take this approach to maximize your advantages. When we don’t have the budget to hire people to write high-quality samples or the computing resources are insufficient, we should prioritize stopping LLM from producing harmful output rather than trying to make it produce beneficial output
harmful output caused by The damage cannot be compensated by beneficial output. If a user asks an LLM 100 questions and the answers he gets are harmful, he will lose trust, no matter how many helpful answers the LLM provides later. The expected output of harmful problems may be spaces, special characters, meaningless strings, etc. In short, it must be harmless text
shows three successful cases of LLM forgetting learning: (1) Stop generating harmful replies (please rewrite the content into Chinese, the original sentence does not need to appear); this is similar to the RLHF scenario, but the difference is that the goal of this method is to generate harmless replies, not helpful replies. This is the best that can be expected when there are only negative samples. (2) After training with infringing data, LLM successfully deleted the data and could not retrain LLM due to cost factors; (3) LLM successfully forgot the "illusion"
Please rewrite the content into Chinese, the original sentence does not need to appear
Method
In the fine-tuning step t, The update of LLM is as follows:
The first loss is gradient descent (gradient descent), the purpose is to forget harmful samples:
is a harmful prompt (prompt),
is the corresponding harmful reply. The overall loss reversely increases the loss of harmful samples, which makes LLM "forget" harmful samples.
The second loss is for random mismatches, which requires LLM to predict irrelevant replies in the presence of harmful cues. This is similar to label smoothing [2] in classification. The purpose is to make LLM better forget harmful output on harmful prompts. At the same time, experiments have proven that this method can improve the output performance of LLM under normal circumstances
The third loss is to maintain performance on normal tasks:
Similar to RLHF, calculating KL divergence on pre-trained LLM can better maintain LLM performance.
Additionally, all gradient ascent and descent is done only on the output (y) part, not on the tip-output pair (x, y) like RLHF.
Application scenarios: forgetting harmful content, etc.
This article uses PKU-SafeRLHF data as forgotten data, TruthfulQA as normal data, the content of Figure 2 The need for rewriting shows the harmful rate of LLM output on unlearned harmful cues after forgetting learning. The methods used in this article are GA (Gradient Ascent and GA Mismatch: Gradient Ascent Random Mismatch). The harmful rate after forgetting learning is close to zero.
The content of the second picture needs to be rewritten
The third picture shows harmful prompts (not Forgotten) output, which has not been seen before. Even for harmful cues that have not been forgotten, the harmful rate of LLM is close to zero, which proves that LLM forgets not only specific samples, but generalizes to content containing harmful concepts
Figure 3
The performance of LLM on normal samples remains similar to that before forgetting, and it also has the following characteristics
Table 1 shows the generated samples. It can be seen that under the harmful prompt, the samples generated by LLM are meaningless strings, that is, harmless output.
Table 1
In other scenarios, such as forgetting infringing content and forgetting hallucinations, this method The original application text is described in detail
RLHF comparison
What needs to be rewritten Yes: The second table shows the comparison between this method and RLHF. RLHF uses positive examples, while the forgetting learning method only uses negative examples, so the method is at a disadvantage at the beginning. But even so, forgetting learning can still achieve alignment performance similar to RLHF
The content that needs to be rewritten is: the second table
What needs to be rewritten: The fourth picture shows the comparison of calculation times. This method only requires 2% of the calculation time of RLHF.
Content that needs to be rewritten: The fourth picture
Even with only negative samples, the method using forgetting learning can achieve a harmless rate comparable to RLHF and only use 2% of the computing power. Therefore, if the goal is to stop outputting harmful content, forgetting learning is more efficient than RLHF
Conclusion
This study is the first of its kind Exploring forgetting learning on LLM. The findings show that learning to forget is a promising approach to alignment, especially when practitioners are under-resourced. The paper shows three situations: forgetting learning can successfully delete harmful replies, delete infringing content and eliminate illusions. Research shows that even with only negative samples, forgetting learning can still achieve similar alignment effects to RLHF using only 2% of the calculation time of RLHF
The above is the detailed content of 2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology. For more information, please follow other related articles on the PHP Chinese website!

Meta has joined hands with partners such as Nvidia, IBM and Dell to expand the enterprise-level deployment integration of Llama Stack. In terms of security, Meta has launched new tools such as Llama Guard 4, LlamaFirewall and CyberSecEval 4, and launched the Llama Defenders program to enhance AI security. In addition, Meta has distributed $1.5 million in Llama Impact Grants to 10 global institutions, including startups working to improve public services, health care and education. The new Meta AI application powered by Llama 4, conceived as Meta AI

Joi AI, a company pioneering human-AI interaction, has introduced the term "AI-lationships" to describe these evolving relationships. Jaime Bronstein, a relationship therapist at Joi AI, clarifies that these aren't meant to replace human c

Online fraud and bot attacks pose a significant challenge for businesses. Retailers fight bots hoarding products, banks battle account takeovers, and social media platforms struggle with impersonators. The rise of AI exacerbates this problem, rende

AI agents are poised to revolutionize marketing, potentially surpassing the impact of previous technological shifts. These agents, representing a significant advancement in generative AI, not only process information like ChatGPT but also take actio

AI's Impact on Crucial NBA Game 4 Decisions Two pivotal Game 4 NBA matchups showcased the game-changing role of AI in officiating. In the first, Denver's Nikola Jokic's missed three-pointer led to a last-second alley-oop by Aaron Gordon. Sony's Haw

Traditionally, expanding regenerative medicine expertise globally demanded extensive travel, hands-on training, and years of mentorship. Now, AI is transforming this landscape, overcoming geographical limitations and accelerating progress through en

Intel is working to return its manufacturing process to the leading position, while trying to attract fab semiconductor customers to make chips at its fabs. To this end, Intel must build more trust in the industry, not only to prove the competitiveness of its processes, but also to demonstrate that partners can manufacture chips in a familiar and mature workflow, consistent and highly reliable manner. Everything I hear today makes me believe Intel is moving towards this goal. The keynote speech of the new CEO Tan Libo kicked off the day. Tan Libai is straightforward and concise. He outlines several challenges in Intel’s foundry services and the measures companies have taken to address these challenges and plan a successful route for Intel’s foundry services in the future. Tan Libai talked about the process of Intel's OEM service being implemented to make customers more

Addressing the growing concerns surrounding AI risks, Chaucer Group, a global specialty reinsurance firm, and Armilla AI have joined forces to introduce a novel third-party liability (TPL) insurance product. This policy safeguards businesses against


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

SublimeText3 Chinese version
Chinese version, very easy to use

Dreamweaver CS6
Visual web development tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

WebStorm Mac version
Useful JavaScript development tools
