Home >Technology peripherals >AI >After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go

After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go

王林
王林forward
2023-04-12 23:28:01963browse

The popularity of ChatGPT and GPT-4 has brought large-scale language models to their highlight moment so far. But where to go next?

Yann LeCun recently participated in a study that pointed out that enhancing language models may be a promising direction.

After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go

This is a review article. This article will briefly introduce the main content of the paper.

Research background

Large-scale language models have greatly promoted the progress of natural language processing, and related technologies have created several products with millions of users, including coding assistants Copilot, Google search engine and the recently popular ChatGPT. By combining memory with compositional capabilities, large language models can perform tasks such as language understanding or conditional and unconditional text generation with unprecedented performance, making higher-bandwidth human-computer interaction a reality.

However, large language models still have some limitations that prevent their wider deployment. Large language models often provide non-factual but plausible predictions, often called hallucinations. This leads to many avoidable errors, for example in arithmetic contexts or in reasoning chains. In addition, as measured by the number of trainable parameters, the breakthrough capabilities of many large language models seem to appear as the scale increases. For example, some researchers have proven that after a large language model reaches a certain scale, it can perform some tasks through few-sample prompting. BIG-bench tasks. Although a series of recent works have produced small-scale language models that still retain some characteristics of large models, the training and maintenance costs of large language models are still high due to their size and data requirements. Continuous learning of large models remains an open research problem, and Goldberg previously discussed other limitations of large language models in the context of the GPT-3-based chatbot ChatGPT.

In a recent study, researchers from Meta and other institutions analyzed that these problems stem from an essential flaw of large language models: they are usually trained to Perform statistical language modeling given (i) a single parameter model and (ii) limited context (usually n preceding or surrounding tokens). Although n has been growing due to innovations in software and hardware in recent years, most models still use relatively small contexts compared to the potentially large contexts required to consistently perform language modeling correctly. Therefore, models require huge scale to store knowledge that is not present in the context but is necessary to perform the task at hand.

After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go

Paper link: https://arxiv.org/pdf/2302.07842v1.pdf

Therefore, more and more research is aimed at solving these problems, slightly deviating from the purely statistical language modeling paradigm mentioned above.

For example, there is a work to circumvent the limited context size by increasing the relevance of large language models, by adding information extracted from relevant external documents. By equipping large language models with modules that retrieve such documents from a database for a given context, it is possible to match some of the capabilities of some of the largest language models with fewer parameters. Note that the resulting model is now non-parametric as it can query external data sources. In general, language models can also improve their context through inference strategies to generate more relevant context and save more computation before generating an answer.

Another strategy is to allow the language model to leverage external tools to augment the current context with important missing information not included in the language model weights. While much of this work aims to mitigate the language model shortcomings mentioned above, it also directly illustrates that more systematic use of inference and tools to enhance language models may lead to more powerful agents. These models are called Augmented Language Models (ALM). As this trend accelerated, the number of related studies grew dramatically, requiring the classification of works and the definition of technical terms for different uses.

The terms used in this paper are defined as follows:

reasoning. In the context of augmented language models, inference is the decomposition of a potentially complex task into simpler subtasks that the language model can more easily solve on its own or using tools. There are various ways of decomposing subtasks, such as recursively or iteratively. In this sense, reasoning is similar to "planning" as defined in LeCun's 2022 paper "A Path Towards Autonomous Machine Intelligence". In this article, inference will often involve various strategies for improving language model inference skills, such as step-by-step inference using few examples. It’s not entirely clear whether the language model is actually reasoning, or simply generating a larger context that increases the likelihood of correctly predicting the missing token. It may be helpful to refer to the discussion on this topic by other researchers (Huang and Chang (2022)): Although reasoning may be an abuse of language based on the current SOTA results, the term is already used in the community. A more practical definition of contextual reasoning in augmented language models is giving the model more computational steps before generating an answer to a prompt.

tool. #​For augmented language models, a tool is an external module, typically called using rules or special tokens, whose output is included in the context of the augmented language model. The tool can collect external information or have an impact on the virtual or physical world (often perceived by an augmented language model). An example of a tool that obtains external information is a document retriever, while a tool that has external effects is a robotic arm. Tools can be called during training or inference time. In general, learning to interact with a tool may include learning to call its API.

Behavior. For an augmented language model, an action is invoking a tool that has an impact on the virtual or physical world and observing the results, typically by including it in the current context of the augmented language model. For example, some of the works mentioned in this article discuss web search or the manipulation of robotic arms through language models. To overuse the terminology a bit, researchers sometimes refer to an augmented language model's invocation of a tool as a behavior, even if it has no external effects.

#Why should reasoning and tools be discussed together? The combination of reasoning and tools in language models is used to solve a large number of complex tasks without the need for heuristics and therefore has better generalization capabilities. Typically, inference will facilitate language models that decompose a given problem into potentially simpler subtasks, while tools will help get each step correct, such as getting results from mathematical operations. In other words, inference is a way for language models to combine different tools to solve complex tasks, and tools are a way to avoid inference failures using efficient decomposition. Both should benefit from the other. Furthermore, inference and tools can be placed under the same “hood” since both enhance the context of the language model to better predict missing tokens, albeit in different ways.

#Why should tools and actions be discussed together? Language models can be invoked in the same way as tools that gather additional information and have an impact on the virtual or physical world. For example, there seems to be no difference between a language model outputting Python code for solving a mathematical operation and a language model outputting Python code for operating a robotic arm. Some of the work discussed in the paper has used language models with implications for virtual or physical worlds. From this point of view, it can be said that language models have behavioral potential, and the important progress they have made as a direction for automated agents is also worth looking forward to.

This article divides the research included in the survey into three parts. Section 2 examines work on enhancing the inference capabilities of language models as defined above. Section 3 focuses on work that allows language models to interact with and take action on external tools. Finally, Section 4 explores whether reasoning and tool use is achieved through heuristics or through learning, for example through supervision or reinforcement. The survey also includes other components, which the authors discuss in Section V. For brevity, the survey focuses on work that combines inference or tools with language models. Finally, although the focus of this article is on large language models, not all studies considered employed large models, so to ensure accuracy, language models will also be adhered to in the remaining investigations.

Inference

Previous work has shown that large language models can solve simple inference problems but not complex inference problems: therefore, this paper Section focuses on various strategies to enhance the reasoning skills of language models. One of the challenges of complex inference problems for linear models is to correctly obtain the solution by combining their predicted correct answers into subproblems. For example, a language model can accurately predict the birth and death dates of famous people, but it may not accurately predict the age. Some researchers refer to this difference as the compositionality gap of language models. The remainder of this section discusses work related to three popular paradigms of induced inference in language models. Since the current work focuses on inference combined with tools, the reader is referred here to a more in-depth discussion of the work of other researchers on large language model inference.

Usage of Tools and Behaviors

# Recent language model research lines allow model access not necessarily stored in its weights knowledge, such as factual knowledge. More precisely, tasks such as precise computation or information retrieval can be offloaded to external modules, such as a Python interpreter or a search engine module that is queried by the model, in which case these modules make use of tools. Furthermore, when a tool has an impact on the external world, we can say that the language model performed an action. Easily include tools and behaviors in the form of special tokens, a convenient feature combined with Transformer language modeling.

After reviewing how language models can be enhanced to exercise their ability to reason and apply tools, this survey also describes how to teach models to apply these abilities.

For more research details, please refer to the original paper.

The above is the detailed content of After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete