Tencent’s research team conducted a study on the scalability of agents. They found that through simple sampling voting, the performance of large language models (LLMs) increases with the number of instantiated agents. This study has verified the universality of this phenomenon in various scenarios for the first time, compared it with other complex methods, explored the reasons behind this phenomenon, and proposed methods to further exert the scaling effect.
Paper title: More Agents Is All You Need
Paper address: https://arxiv .org/abs/2402.05120
Code address: https://github.com/MoreAgentsIsAllYouNeed/More-Agents-Is-All-You-Need
In this article, researchers from Tencent found that: through a simple sampling voting method, the performance of large language models will increase as the number of instantiated agents increases, showing scaling property (can Scalability), without the support of complex multi-LLM agents collaboration framework and prompt engineering methods. Furthermore, this method is orthogonal to existing sophisticated methods and, when combined, can further enhance LLM to a degree related to task difficulty. This paper did the first study on the scaling property of raw agents (referring to LLM agents that do not rely on complex prompt engineering and collaboration frameworks). It conducted comprehensive experiments on various LLM benchmarks to verify the universality of this finding. , and examine strategies that can facilitate its occurrence. The code is now open source. ## Multiple models exceeded the big model Thesis detailed discussed a variety of integrated LLM related related related LLM Research, including LLM self-integration, heterogeneous LLM integration, and research on multiple LLM agent collaboration frameworks. By comparing with the proposed method, it can be seen that the paper has conducted a more comprehensive research and analysis.
To study how the performance of large language models improves as the number of instantiated agents increases. The paper uses a simple sampling and voting method (the author uses the term simple (st), which shows that they think this method may be one of the simplest methods). Notably, this method can be orthogonally combined with existing complex methods. It can be divided into two stages:
-
Input task query into a single LLM or multiple LLM Agents collaboration framework to generate multiple outputs ;
-
The final result is determined by majority voting
The paper selects different scales from the Llama2 and GPT series Language models are evaluated on task datasets covering multiple domains such as inference and generation. Experimental results show that on all tasks and LLMs of different types and sizes, it is found that the performance of LLM increases with the number of instantiated agents.
For example, the improvement is 12% to 24% on the GSM8K task and 6% to 10% on the MATH task. Interestingly, ensembles of multiple small LLMs can match or even exceed the performance of larger LLMs. For example, an ensemble of multiple Llama2-13Bs achieved 59% accuracy on GSM8K, exceeding the 54% accuracy of a single Llama2-70B.
###
Further, the author also explored ’s compatibility with other methods. Although these methods are implemented differently, when used in combination with them, the performance can be further improved, and are also consistent with the phenomenon that the more agents are instantiated, the stronger the performance gain. The experimental results show that the gain ranges from 1% to 27%, indicating that this simple method can further enhance the performance of LLM by using it orthogonally with other methods. # Based on LLama13B
## Based on LLama70B
Based on GPT-3.5-Turbo In addition, the paper also analyzes the relationship between
performance improvement and problem difficulty.
Intrinsic difficulty: As the inherent difficulty of the task increases, the performance improvement (ie, relative performance gain) also increases will increase, but when the difficulty reaches a certain level, the gain will gradually decrease. This shows that when the task is too complex, the model's reasoning ability may not be able to keep up, resulting in diminishing marginal effects of performance improvements. Number of steps: As the number of steps required to solve a task increases, so does the performance gain. This shows that in multi-step tasks, increasing the number of agents can help the model handle each step better, thereby overall improving task solving performance. Prior probability: The higher the prior probability of the correct answer, the greater the performance improvement. This means that increasing the number of agents is more likely to lead to significant performance improvements when the correct answer is more likely.
Nodes: steps, dashed lines: possible alternative steps. Depth of nodes: number of steps, intensity of colors: level of inherent difficulty. The illustration helps the reader understand how task complexity is measured along these dimensions.
Based on this, the paper proposes two optimization strategies to further improve the effectiveness of the method: Step-wise Sampling-and-Voting: This method breaks the task into multiple steps and applies sampling and voting at each step to reduce accumulation errors and improve overall performance. Hierarchical Sampling-and-Voting: This method decomposes low-probability tasks into multiple high-probability subtasks and solves them hierarchically. At the same time, it can be used Different models are used to handle subtasks with different probabilities to reduce costs. Finally, future work directions are proposed, including optimizing the sampling stage to reduce costs, and continuing to develop related mechanisms to mitigate the effects of LLM hallucinations. potential negative impacts, ensuring that the deployment of these powerful models is both responsible and beneficial.
The above is the detailed content of Quantity is power! Tencent reveals: The greater the number of agents, the better the effect of the large language model. For more information, please follow other related articles on the PHP Chinese website!