Home >Technology peripherals >AI >Where will ChatGPT go from here? LeCun's new work: Comprehensive review of the next generation 'enhanced language model”
ChatGPT has ignited the language model fire, and NLP practitioners are reflecting on and summarizing future research directions.
Recent Turing Award winner Yann LeCun participated in writing a review on "enhanced language models", reviewing The work combines language models with reasoning skills and the ability to use tools, and concludes that this new research direction has the potential to address the limitations of traditional language models, such as interpretability, consistency, and scalability issues.
Paper link: https://arxiv.org/abs/2302.07842
In the enhanced language model, reasoning means decomposing complex tasks into simpler subtasks. Tools include calling external modules (such as code interpreters, calculators, etc.). LM can use heuristics to Methods exploit these enhancements individually or in combination, or through demonstration learning.
While following the standard missing token prediction goal, the enhanced LM can use various external modules that may be non-parametric to extend context processing capabilities and is not limited to pure language modeling. Paradigms can be called augmented language models (ALMs, Augmented Language Models).
The prediction target of missing tokens allows ALM to learn to reason, use tools and even act (act), while still being able to perform standard natural language tasks and even perform on several benchmark datasets More than most regular LMs.
Large language models (LLMs) have driven tremendous progress in natural language processing and have gradually become the technical core of products used by millions of users. Including coding assistant Copilot, Google search engine and the recently released ChatGPT.
Memorization combined with Compositionality capabilities enables LLM to perform a variety of tasks such as language understanding or conditional and unconditional text generation at unprecedented performance levels, thereby serving a wider range of people. Computer interaction has opened up a practical path.
However, the current development of LLM is still subject to many restrictions, hindering its deployment to a wider range of application scenarios.
For example, LLMs often provide non-factual but seemingly reasonable predictions, also known as hallucinations. Many errors are actually completely avoidable, including arithmetic problems and chain of reasoning. Small errors occurred.
Furthermore, many of the breakthrough capabilities of LLMs appear to emerge with scale, as measured by the number of trainable parameters, and previous researchers have shown that once the model Once it reaches a certain scale, LLM can complete some BIG-bench tasks through few-shot prompting.
Although there have been recent efforts to train some smaller LMs while retaining some of the capabilities of large models, the scale of current LLMs and the demand for data are prohibitive for training and maintenance. is impractical: continuous learning of large models remains an open research problem.
Meta researchers believe that these problems stem from a basic flaw in LLMs: the training process is given a parameter model and limited context (usually n preceding and following words) , and then perform statistical language modeling.
Although the context size n has been growing in recent years due to the development of software and hardware, most models still use relatively small context sizes, so the huge size of the model is stored Not present is a necessary condition for contextual knowledge, which is also critical for performing downstream tasks.
Therefore, a growing research trend is to solve these problems in a way that slightly deviates from the purely statistical language modeling paradigm mentioned above.
For example, there is a work to circumvent the problem of LLM's limited context size by adding the calculation of relevance from "information extracted from relevant external files". By equipping LMs with a retrieval module to retrieve such documents in a given context from the database, it can match some of the capabilities of larger-scale LMs while having fewer parameters.
It should be noted that the resulting model is now non-parametric, as it can query external data sources. More generally, LM can also improve its context through inference strategies to generate more relevant context before generating answers, improving performance through more computation.
Another strategy is to allow the LM to leverage external tools to enhance the current context with important missing information not included in the LM’s weights. Although most of these works aim at mitigating the above-mentioned shortcomings of LM, it is straightforward to think that enhancing LM with reasoning and tools more systematically may lead to significantly more powerful agents.
Researchers collectively refer to these models as Augmented Language Models (ALMs).
As this trend accelerates, it becomes difficult to keep track of and understand the numerous models, requiring classification of the work of ALMs and the technical terminology sometimes used for different purposes Define.
Reasoning
In the context of ALM, reasoning is the decomposition of a potentially complex task into simpler Subtasks that LM can solve more easily by yourself or using tools.
There are currently various ways to decompose subtasks, such as recursion or iteration. In a sense, the reasoning is similar to LeCun's 2022 paper "Towards Autonomous Machine Intelligence" Plan defined in "Route".
## Paper link: https://openreview.net/pdf?id=BZ5a1r-kVsf
In this survey, reasoning refers to various strategies to improve reasoning capabilities in LM, such as using a small number of examples for step-by-step reasoning. Although it is not yet fully understood whether the LM is actually reasoning, or simply generating a larger context that increases the likelihood of correctly predicting missing tokens.
Reasoning may be an overused term given the current state of technology, but the term is already widely used within the community. In the context of ALM, a more pragmatic definition of inference is to give the model more computational steps before arriving at the answer to the prompt.
Tool Tool
##For ALM, the tool is an external module, usually Called using a rule or a special token, its output is included in the ALM context.Tools can be used to collect external information, or have an impact on the virtual or physical world (generally sensed by ALM): for example, a file retriever can be used as a tool to obtain external information, Or use a robotic arm to sense external influences.
Tools can be called at training time or at inference time. More generally, the model needs to learn to interact with the tool, including learning to call its API
Act
For ALM, calling a tool that has an impact on the virtual or physical world and observing the results, Usually it is brought into the current context of ALM.Some of the work introduced in this survey discusses searching the web, or manipulating robotic arms through LMs. In a slight misuse of terminology, ALM's invocation of a tool is sometimes represented as an action, even if it has no impact on the outside world. #Why discuss reasoning and tools at the same time? The combination of reasoning and tools in LM should allow solving a wide range of complex tasks without heuristics, i.e. with better generalization capabilities. Typically, reasoning facilitates LM to decompose a given problem into potentially simpler subtasks, while tools help complete each step correctly, e.g. Obtain results from mathematical operations. In other words, reasoning is a way for LM to combine different tools to solve complex tasks, while tools are a way to avoid reasoning failures and effectively decompose. Both should benefit from the other, and reasoning and tools can be placed in the same module, since both work by enhancing the context of LM to better predict missing tokens, Although in different ways. #Why discuss tools and actions at the same time? #Tools to gather additional information and have an impact on the virtual or physical world can be invoked by LM in the same way. For example, there seems to be little difference between an LM that outputs python code to solve mathematical operations and an LM that outputs python code to manipulate a robotic arm. Some of the works discussed in this review are already using LMs that have effects on the virtual or physical world. In this view, we can say that LMs have the potential to act and expect important progress in the direction of LMs as autonomous agents. . Classification method
The researchers will decompose the work introduced in the review into the above three dimensions, introduce them separately, and finally discuss related work in other dimensions. For the reader, it should be remembered that many of these techniques were originally introduced in contexts other than LM, and if necessary, look at the introductions of the mentioned papers and related work if possible. Finally, although the review focuses on LLM, not all related work adopts large models, but aims at the correctness of LM.
The above is the detailed content of Where will ChatGPT go from here? LeCun's new work: Comprehensive review of the next generation 'enhanced language model”. For more information, please follow other related articles on the PHP Chinese website!