Where will ChatGPT go from here? LeCun's new work: Comprehensive review of the next generation 'enhanced language model”-AI-php.cn

Where will ChatGPT go from here? LeCun's new work: Comprehensive review of the next generation 'enhanced language model”

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 11, 2023 pm 11:58 PM

aiintelligent

ChatGPT has ignited the language model fire, and NLP practitioners are reflecting on and summarizing future research directions.

Where will ChatGPT go from here? LeCuns new work: Comprehensive review of the next generation enhanced language model”

Recent Turing Award winner Yann LeCun participated in writing a review on "enhanced language models", reviewing The work combines language models with reasoning skills and the ability to use tools, and concludes that this new research direction has the potential to address the limitations of traditional language models, such as interpretability, consistency, and scalability issues.

Where will ChatGPT go from here? LeCuns new work: Comprehensive review of the next generation enhanced language model”

Paper link: https://arxiv.org/abs/2302.07842

In the enhanced language model, reasoning means decomposing complex tasks into simpler subtasks. Tools include calling external modules (such as code interpreters, calculators, etc.). LM can use heuristics to Methods exploit these enhancements individually or in combination, or through demonstration learning.

While following the standard missing token prediction goal, the enhanced LM can use various external modules that may be non-parametric to extend context processing capabilities and is not limited to pure language modeling. Paradigms can be called augmented language models (ALMs, Augmented Language Models).

The prediction target of missing tokens allows ALM to learn to reason, use tools and even act (act), while still being able to perform standard natural language tasks and even perform on several benchmark datasets More than most regular LMs.

Enhanced Language Model

Large language models (LLMs) have driven tremendous progress in natural language processing and have gradually become the technical core of products used by millions of users. Including coding assistant Copilot, Google search engine and the recently released ChatGPT.

Memorization combined with Compositionality capabilities enables LLM to perform a variety of tasks such as language understanding or conditional and unconditional text generation at unprecedented performance levels, thereby serving a wider range of people. Computer interaction has opened up a practical path.

However, the current development of LLM is still subject to many restrictions, hindering its deployment to a wider range of application scenarios.

For example, LLMs often provide non-factual but seemingly reasonable predictions, also known as hallucinations. Many errors are actually completely avoidable, including arithmetic problems and chain of reasoning. Small errors occurred.

Furthermore, many of the breakthrough capabilities of LLMs appear to emerge with scale, as measured by the number of trainable parameters, and previous researchers have shown that once the model Once it reaches a certain scale, LLM can complete some BIG-bench tasks through few-shot prompting.

Although there have been recent efforts to train some smaller LMs while retaining some of the capabilities of large models, the scale of current LLMs and the demand for data are prohibitive for training and maintenance. is impractical: continuous learning of large models remains an open research problem.

Meta researchers believe that these problems stem from a basic flaw in LLMs: the training process is given a parameter model and limited context (usually n preceding and following words) , and then perform statistical language modeling.

Although the context size n has been growing in recent years due to the development of software and hardware, most models still use relatively small context sizes, so the huge size of the model is stored Not present is a necessary condition for contextual knowledge, which is also critical for performing downstream tasks.

Where will ChatGPT go from here? LeCuns new work: Comprehensive review of the next generation enhanced language model”

Therefore, a growing research trend is to solve these problems in a way that slightly deviates from the purely statistical language modeling paradigm mentioned above.

For example, there is a work to circumvent the problem of LLM's limited context size by adding the calculation of relevance from "information extracted from relevant external files". By equipping LMs with a retrieval module to retrieve such documents in a given context from the database, it can match some of the capabilities of larger-scale LMs while having fewer parameters.

It should be noted that the resulting model is now non-parametric, as it can query external data sources. More generally, LM can also improve its context through inference strategies to generate more relevant context before generating answers, improving performance through more computation.

Another strategy is to allow the LM to leverage external tools to enhance the current context with important missing information not included in the LM’s weights. Although most of these works aim at mitigating the above-mentioned shortcomings of LM, it is straightforward to think that enhancing LM with reasoning and tools more systematically may lead to significantly more powerful agents.

Researchers collectively refer to these models as Augmented Language Models (ALMs).

As this trend accelerates, it becomes difficult to keep track of and understand the numerous models, requiring classification of the work of ALMs and the technical terminology sometimes used for different purposes Define.

Reasoning

In the context of ALM, reasoning is the decomposition of a potentially complex task into simpler Subtasks that LM can solve more easily by yourself or using tools.

There are currently various ways to decompose subtasks, such as recursion or iteration. In a sense, the reasoning is similar to LeCun's 2022 paper "Towards Autonomous Machine Intelligence" Plan defined in "Route".

Where will ChatGPT go from here? LeCuns new work: Comprehensive review of the next generation enhanced language model”

## Paper link: https://openreview.net/pdf?id=BZ5a1r-kVsf

In this survey, reasoning refers to various strategies to improve reasoning capabilities in LM, such as using a small number of examples for step-by-step reasoning. Although it is not yet fully understood whether the LM is actually reasoning, or simply generating a larger context that increases the likelihood of correctly predicting missing tokens.

Reasoning may be an overused term given the current state of technology, but the term is already widely used within the community. In the context of ALM, a more pragmatic definition of inference is to give the model more computational steps before arriving at the answer to the prompt.

Tool Tool

##For ALM, the tool is an external module, usually Called using a rule or a special token, its output is included in the ALM context.

Tools can be used to collect external information, or have an impact on the virtual or physical world (generally sensed by ALM): for example, a file retriever can be used as a tool to obtain external information, Or use a robotic arm to sense external influences.

Tools can be called at training time or at inference time. More generally, the model needs to learn to interact with the tool, including learning to call its API

Act

For ALM, calling a tool that has an impact on the virtual or physical world and observing the results, Usually it is brought into the current context of ALM.

Some of the work introduced in this survey discusses searching the web, or manipulating robotic arms through LMs. In a slight misuse of terminology, ALM's invocation of a tool is sometimes represented as an action, even if it has no impact on the outside world.

#Why discuss reasoning and tools at the same time?

The combination of reasoning and tools in LM should allow solving a wide range of complex tasks without heuristics, i.e. with better generalization capabilities.

Typically, reasoning facilitates LM to decompose a given problem into potentially simpler subtasks, while tools help complete each step correctly, e.g. Obtain results from mathematical operations.

In other words, reasoning is a way for LM to combine different tools to solve complex tasks, while tools are a way to avoid reasoning failures and effectively decompose.

Both should benefit from the other, and reasoning and tools can be placed in the same module, since both work by enhancing the context of LM to better predict missing tokens, Although in different ways.

#Why discuss tools and actions at the same time?

#Tools to gather additional information and have an impact on the virtual or physical world can be invoked by LM in the same way. For example, there seems to be little difference between an LM that outputs python code to solve mathematical operations and an LM that outputs python code to manipulate a robotic arm. Some of the works discussed in this review are already using LMs that have effects on the virtual or physical world. In this view, we can say that LMs have the potential to act and expect important progress in the direction of LMs as autonomous agents. . Classification method
The researchers will decompose the work introduced in the review into the above three dimensions, introduce them separately, and finally discuss related work in other dimensions. For the reader, it should be remembered that many of these techniques were originally introduced in contexts other than LM, and if necessary, look at the introductions of the mentioned papers and related work if possible. Finally, although the review focuses on LLM, not all related work adopts large models, but aims at the correctness of LM.

The above is the detailed content of Where will ChatGPT go from here? LeCun's new work: Comprehensive review of the next generation 'enhanced language model”. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Are You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles