Home > Article > Technology peripherals > GPT-4 coding ability improved by 21%! MIT’s new method allows LLM to learn to reflect, netizen: It’s the same way as humans think
This is the method in the latest paper published by Northeastern University and MIT: Reflexion.
This article is reprinted with the authorization of AI New Media Qubit (public account ID: QbitAI). Please contact the source for reprinting.
GPT-4 evolves again!
With a simple method, large language models such as GPT-4 can learn to self-reflect, and the performance can be directly improved by 30%.
Before this, large language models gave wrong answers. They often apologized without saying a word, and then emmmmmmm, they continued to make random guesses.
Now, it will no longer be like this. With the addition of new methods, GPT-4 will not only reflect on where it went wrong, but also give improvement strategies.
For example, it will automatically analyze why it is "stuck in a loop":
Or reflect on your own flawed search strategy:
This is the method in the latest paper published by Northeastern University and MIT: Reflexion.
Not only applies to GPT-4, but also to other large language models, allowing them to learn the unique human reflection ability.
The paper has been published on the preprint platform arxiv.
This directly made netizens say, "The speed of AI evolution has exceeded our ability to adapt, and we will be destroyed."
Some netizens even sent a "job warning" to developers:
The hourly wage for writing code in this way is cheaper than that of ordinary developers.
As netizens said, the reflection ability given to GPT-4 by Reflexion is similar to the human thinking process:
can be summed up in two words: Feedback.
This feedback process can be divided into three major steps:
In the first step of the evaluation process, first What you need to go through is the self-assessment of LLM (Large Language Model).
That is to say, LLM must first reflect on the answer itself when there is no external feedback.
How to conduct self-reflection?
The research team used a binary reward mechanism to assign values to the operations performed by LLM in the current state:
1 represents the generated result OK, 0 It means that the generated results are not very good.
The reason why binary is used instead of more descriptive reward mechanisms such as multi-valued or continuous output is related to the lack of external input.
To conduct self-reflection without external feedback, the answer must be restricted to binary states. Only in this way can the LLM be forced to make meaningful inferences.
After the self-evaluation is completed, if the output of the binary reward mechanism is 1, the self-reflection device will not be activated. If it is 0, the LLM will turn on the reflection mode.
During the reflection process, the model will trigger a heuristic function h (as shown below). Analogous to the human thinking process, h plays the same role as supervision.
#However, like human thinking, LLM also has limitations in the process of reflection, which can be reflected in the Ω and ε in the function.
Ω represents the number of times a continuous action is repeated. Generally, this value is set to 3. This means that if a step is repeated three times during the reflection process, it will jump directly to the next step.
And ε represents the maximum number of operations allowed to be performed during the reflection process.
Since there is supervision, correction must also be implemented. The function of the correction process is like this:
Among them, self-reflection Models are trained with "domain-specific failure trajectories and ideal reflection pairs" and do not allow access to domain-specific solutions to a given problem in the dataset.
In this way, LLM can come up with more "innovative" things in the process of reflection.
Since LLMs such as GPT-4 can perform self-reflection, what is the specific effect?
The research team evaluated this approach on the ALFWorld and HotpotQA benchmarks.
In the HotpotQA test of 100 question and answer pairs, LLM using the Reflexion method showed huge advantages. After multiple rounds of reflection and repeated questions, the performance of LLM improved by nearly 30%.
Without using Reflexion, after repeated Q&A, there was no change in performance.
In HotpotQA’s 134 question-and-answer test, it can be seen that with the support of Reflexion, LLM’s accuracy reached 97% after multiple rounds of reflection.
In another blog, team members also showed the effect of their method on GPT-4. The scope of the test was writing code.
The results are also obvious. Using Reflexion, the programming ability of GPT-4 has been directly improved by 21%.
I already know how to "think" about GPT-4, how do you (huang) (le) read (ma)?
Paper address: https://arxiv.org/abs/2303.11366
The above is the detailed content of GPT-4 coding ability improved by 21%! MIT’s new method allows LLM to learn to reflect, netizen: It’s the same way as humans think. For more information, please follow other related articles on the PHP Chinese website!