Home  >  Article  >  Technology peripherals  >  With the blessing of ChatGPT, large decision-making models are one step closer to AGI

With the blessing of ChatGPT, large decision-making models are one step closer to AGI

王林
王林forward
2023-04-12 18:19:091388browse

In less than a year in the past, ChatGPT and GPT-4 have been released one after another, constantly refreshing people’s understanding of AI.

New technologies bring changes and have also triggered discussions about whether AI will replace people. OpenAI CEO Sam Altman also publicly expressed some concerns about the powerful capabilities of artificial intelligence technology.

Recently, Wang Jun, a professor at the Department of Computer Science at University College London (UCL), admitted in an interview with AI Technology Review that although ChatGPT has strong language and dialogue capabilities, it cannot make systematic decisions, such as machines. Control, group collaboration, dynamic scheduling, etc. These are the more revolutionary parts of the AI ​​technology wave.

Wang Jun is a professor in the Department of Computer Science at University College London (UCL) and a Turing Fellow at the Alan Turing Institute. His main research is intelligent information systems, including machine learning, reinforcement learning, multi-agent, data mining, computational advertising, recommendation systems, etc.; Google Scholar has been cited more than 16,000 times, has published more than 120 academic papers, and has been cited many times. Won the best paper award.

ChatGPT 加持,决策大模型距离 AGI 更进一步

王君

In April 2022, Shanghai Digital Brain Research Institute was officially established, and Enigma Tech (Chinese name "Enigma Technology") was incubated and established internally. "), Wang Jun serves as the co-founder and dean of Shanghai Digital Brain Research Institute, and serves as the chief scientist of Puzzle Technology; in the second half of the year, the Digital Brain Research Institute developed the world's first large multi-agent decision-making model, which integrates CV , NLP, reinforcement learning and multi-agent, dedicated to helping enterprises solve multi-scenario decision-making problems.

Wang Jun believes that the emergence of ChatGPT has solved the problem of how to lower the threshold in large model training in the past. By combining natural language processing with large decision-making models, ChatGPT can not only bring chat, but also On the basis of AIGC (AI Generated Content, content production), we will further explore AIGA (AI Generated Actions, decision generation), so that the thinking and decision-making capabilities of the model can be applied to specific scenarios, and truly help enterprises and people solve decision-making problems. Humanity is released into more creative activities.

1. Towards “intelligence” in multi-agent systems

The process of exploring AI intelligence is inseparable from the ultimate pursuit of definitional issues.

Wang Jun divides the path to intelligence into two steps. The first step is to clarify the difference between biological systems (living systems (people belong to biological systems)) and non-biological systems.

In 2013, biophysicist Jeremy England proposed a groundbreaking theory of "dissipation-driven adaptation", which attributed the origin of life to the inevitable result of thermodynamics. No molecular system passes through it under certain conditions. Chemical reaction metabolism consumes energy to promote the continuous consumption of energy and the increase of "entropy".

In the theory of entropy increase and entropy decrease, the process of a living body changing from disorder to order continues to absorb energy and continuously decrease entropy. Wang Jun believes that AI is generated from humans, so it also absorbs energy to help humans complete entropy decrease. mission, the key to solving basic problems is how to define intelligence and how much energy AI needs to absorb to achieve a certain level of intelligence.

When using AI for image classification and recognition, the accuracy of the classification algorithm can reach 98%. Through classification, AI can help us transform the disordered image content organization into orderly and regular images. The uncertainty in the system is reduced and entropy is reduced. Entropy reduction also requires calculation, which determines the computing power of the algorithm. The computing power is a reflection of the energy consumed.

The second step towards intelligence, Wang Jun believes, is to distinguish the consciousness of biological systems and so-called AI systems. At present, artificial intelligence exists as a tool. Algorithms can only judge the excellence of AI work. The machine itself does not think. How to make the machine finally achieve the same thinking ability as humans requires first understanding the various phenomena of the human brain and increasing understanding of it. AI awareness concerns.

In Wang Jun’s view, consciousness is an important manifestation of intelligence. Mammals can detect consciousness, perceive consciousness and form subjective feelings; at the same time, when multiple individuals interact with the environment, there must be other than a single individual. , another conscious individual affects and resonates with the environment, so that subjective feelings can be expressed.

In this regard, Wang Jun and his team proposed that in AI research, the interaction of multi-agent (Multi-Agent) must be used to induce consciousness.

Take a large model as an example. Cross-task is artificially defined and is limited to a given specific task. It is difficult to produce more intelligent AI by designing the algorithm and letting the machine run it. The model’s thinking ability and Decision-making skills cannot be improved.

Wang Jun told AI Technology Review, "When advancing multiple things at the same time, a big idea is needed to guide you. If not, there is obviously a lack of an inherent law." This law is the machine model leading to greater success. "Intelligent" critical path.

In May 2022, DeepMind released "GATO", a general-purpose agent that combines CV and NLP. It can play Atari games, output picture subtitles, stack blocks with robotic arms, chat with people, etc., etc. Being able to decide whether to output text, joint torques, button presses or other tokens (word by word) based on the context, this work caused a lot of discussion at the time. Wang Jun is also one of the followers.

In fact, starting in 2021, Wang Jun and his team began to think about creating a decision-making model that can realize cross-tasks and nest CV, NLP, reinforcement learning and multi-agent into a unified decision-making model. possibility. The emergence of "GATO" allowed Wang Jun to see the vast explorable space of large models. "This is enough to prove that it is the general trend for one model to solve tasks in multiple fields."

Decision-making large models cannot simply be based on the meaning of model size. Departure, in essence, is a certain level of cognition achieved through continuous interaction with the environment through reinforcement learning in the data set. How to overcome this problem? The biggest technical point is to reduce the complexity of reinforcement learning and environment interaction.

Original data plays a key role in this link.

Build a pre-training model by training the original data generated by other tasks or algorithms interacting with the environment. This model can be quickly applied when faced with new tasks, thereby realizing laws, relationships and Maximize the value of data. As the pre-training data set continues to expand, the model also grows larger until it can cover all tasks.

The final result is that the methods to solve the problem are gathered, and multiple directions are converged and unified into a multi-agent that can be scheduled and can be generalized across tasks. Multi-agent agents often need to consider the balance relationship, that is, while achieving their own goals, the other party can also achieve its goals, and restrain each other to maintain a stable balance.

In practical application scenarios, multi-agent can also help people solve many practical problems, such as search, recommendation, and even Internet advertising. It is essentially a decision-making process to help users find the content they need, and This content is in line with the user's preferences. "Recommended to you is actually a decision."

The advantage of multi-agent is that it can give full play to its cross-task capabilities.

In fact, as early as 2017, Wang Jun and his student Zhang Weinan (professor of Shanghai Jiao Tong University) began cross-task attempts to add reinforcement learning to natural language processing (NLP).

In the past, when natural language processing used GAN to generate text, due to the discontinuous data in the conversion process of word index and word vector, fine-tuning parameters often failed to work; not only that, because the discriminant model of GAN only The generated data is scored as a whole, but the text is generally generated word by word, making it difficult to control the details.

To this end, they proposed the SeqGAN model, which solved the problem of applying GAN to discrete data by drawing on reinforcement learning strategies. This was also one of the earliest papers to use reinforcement learning to train a generative language model, realizing text Generation, which has a wide range of applications in different fields such as natural language processing and information retrieval.

ChatGPT 加持,决策大模型距离 AGI 更进一步

Paper address: https://arxiv.org/pdf/1609.05473.pdf

"Reinforcement learning and decision-making are essentially the same. Through reinforcement Learning can solve some decision-making problems." In Wang Jun's view, decision-making is a long-term research problem. The proposal of a large multi-agent decision-making model can form unique advantages in certain specific fields after generalization. Most of the AI All problems can be solved with the help of large decision-making models.

2. AIGA goes one step further than AIGC

The popularity caused by ChatGPT has not yet passed. On March 15, after the release of the multi-modal pre-trained large model GPT-4, another disruptive event Changes are coming.

In this competition for general artificial intelligence, ChatGPT and GPT-4 are not the end points. The key focus of the competition is on the more valuable industrial revolution and innovation under the wave.

During this period, Wang Jun also maintained close communication with friends who were concerned about market capital.

In Wang Jun’s view, some of the problems in the academic circle are not bold enough and are subject to resource constraints, and thinking about problems will be constrained by certain factors. In the industrial world, large decision-making models can have richer application scenarios. Whether in traditional industries, Internet search recommendations, industrial Internet, etc., a variety of decisions are required.

With this idea, Wang Jun began to consider the possibility of integrating large-scale decision-making models into industry, academia, and research.

After a one-year preparation period, Shanghai Digital Brain Research Institute was officially established in April 2022. Enigma Tech ("Enigma Technology") was incubated and established internally, mainly responsible for bringing the scientific research results of the Digital Brain Research Institute to the industry Implemented to provide real-life scenarios and real business data for the Digital Research Institute. Wang Jun serves as the co-founder and dean of the Institute of Mathematics, and as the chief scientist of Puzzle Technology.

When large models enter actual application scenarios, enterprises often face two major pain points: the model is not broad-spectrum and the threshold for entry is high.

The classic machine learning method adopts a customized model. After the enterprise issues a task, it first defines the problem, collects data for training, and tests the model. After the secondary task is issued, the model needs to collect, define the problem, and collect data again. Data training and model testing often cause enterprises to lose a lot of financial and human resources in deployment, and the broad spectrum is not strong. At the same time, the use of large models requires extremely high technical capabilities of engineers and requires certain optimization experience, and the threshold for enterprise participation is high.

Wang Jun believes that ChatGPT combined with large decision-making models can effectively solve low-threshold, broad-spectrum problems.

Under such thinking, Wang Jun led the puzzle technology team to propose the DB large model (AIGA direction large model, AIGA: AI Generated Actions, decision generation). Its first DB1 is the world's first multi-modal model. The large decision-making model, benchmarked against GATO launched by DeepMind, can fully support multi-agent and can handle more than a thousand decision-making tasks concurrently.

ChatGPT 加持,决策大模型距离 AGI 更进一步

Performance of DB1 in vehicle collaboration tasks

By combining ChatGPT with large decision-making models, ChatGPT brings not only chatting, but also On the basis of AIGC, we further explore AIGA to apply the model's thinking and decision-making capabilities to specific scenarios. The resulting interactions interact with the environment of specific scenarios, and small data can complete large tasks, which can be directly oriented to real industrial scenarios. With the help of big data The model realizes task closed loop and enables wider applications such as robot collaboration, equipment dynamics, enterprise autonomous scheduling, and software development.

And then truly help companies and people solve decision-making problems, releasing humans into more creative activities. "Ultimately, it will bring great promotion to the progress of the entire human race. In this case, we can breed true AGI (artificial general intelligence)."

Currently, the basic structure of the Digital Brain Research Institute has been The construction is completed, and the business content covers everything from algorithms, systems to specific engineering projects. It can be applied to recommendation systems, fault prediction, autonomous driving, market design, game scenarios, EDA optimization and other scenarios to solve practical problems in the operation of enterprises. .

Stepping out of the laboratory and establishing the Digital Brain Research Institute, for Wang Jun, the feelings and states are completely different: it is impossible to consider all factors together in research. To solve this problem, other things must first be Simplification means solving the real problem before moving on to the next one; while the implementation of a research is more likely to be a collection of multiple problems, which requires each problem to be solved one by one, and the methods to solve the problem to be applied uniformly.

In July last year, AI Technology Review had the honor to have an in-depth discussion with Dean Wang Jun. At that time, he expressed that the goal of the Institute of Mathematics was to promote decision-making intelligence research and AI research, and to do the best in China. , the most basic research.

In just one year, the emergence of models such as Stable Diffusion, ChatGPT and GPT-4 surprised Wang Jun to realize the revolutionary progress of AI technology, and also made him have a better understanding of the Institute of Mathematics. Concrete goals apply large decision-making models to specific scenarios to solve problems of practical significance.

From academia to industry, the Digital Brain Research Institute has not been developing for a long time, and its prototype also reflects the direction of Wang Jun’s exploration in artificial intelligence. "We just want to follow our own path. How can we combine industry, academia and research to create a new path and ask some questions that have not been asked before."

3. Dialogue with Wang Jun

Large Model of Implementation Decision-making of the Institute of Mathematical Sciences

AI Technology Review: Let’s introduce the work and progress made by the Institute of Mathematics in the past year in large-scale multi-agent decision-making models.

Wang Jun: I started planning a new topic last summer. We felt that large models are not only used in NLP and CV, but also play a big role in decision-making. At that time, DeepMind’s “GATO” work tried to integrate Putting various tasks into a large model and learning them in Transform inspired us, so we decided to explore further based on it and made a large decision-making model, including video and image data. , natural language data, robot data, and even solver data, such as how to perform optimization tasks, arrange production schedules, optimize vehicles, etc. We made a large model with about 1 billion or 1.5 billion parameters. Although it was an early exploration, it also proved that the large model is not only natural language processing, but also plays a significant role in decision-making.

Some time ago we were working on a football game and found an unsolved problem: the current research logic of reinforcement learning, AlphaGo, StarCraft, Dota and other game systems, the more people there are, the more decisions they make Space will also be more complex.

In this regard, we used football in the game scene as a research point and made many attempts in a large multi-agent decision-making model, from simple 2-person football to 5-person to 11-person. This is a relatively large and challenging scenario for reinforcement learning. At present, the nature of the problem has not been completely solved, or it has been solved very well, so we have spent a lot of time on this matter, hoping to make some achievements.

AI Technology Review: After the release of ChatGPT, what impact will it have on the research of the Institute of Mathematics?

Wang Jun: Our focus has always been on decision-making, and it has always been so. But after ChatGPT came out, we were very surprised by its language capabilities, which completely exceeded our expectations and also played a certain role in promoting decision-making tasks.

In the process of decision-making optimization, two major pain points need to be solved: broad spectrum and low threshold.

The large decision-making model solves the broad-spectrum problem of the model to a certain extent. New tasks are placed in the large model for iteration and fine-tuning. A large model can deal with various decision-making problems.

The problem of low threshold is common in AI companies. Before this, the use of large models required very high ability of engineers. People with optimization experience were often required to participate in the problem decision-making process, involving individuals and enterprises. The threshold is very high, which also increases the cost of using AI.

In order to solve the problem of low threshold for use, we previously envisioned inventing a relatively simple language that can be more complex and rigorous than natural language, but simpler than real programming, and anyone can use it, ChatGPT The emergence of , suddenly made us realize that the natural language of machines can reach a level of normal communication with people, and the pain point of low threshold is solved at once. For us, the impact of this change is quite large.

What’s more interesting is that ChatGPT has certain logical reasoning capabilities, which can help us decompose a complex problem into several sub-problems. This sub-problem part originally required professionals to manually decompose it, but through ChatGPT With semantic understanding, when examples are obtained, the problem can be decomposed into basic problems, and then the existing decision-making capabilities of the basic problems can be directly invoked through the large decision-making model.

ChatGPT Lowering the Decision-Making Threshold

AI Technology Comment: Multi-agent decision-making large models cover many fields. What are the data requirements? After combining it with ChatGPT, are there any special needs for data in a certain field?

Wang Jun: It will have some specific requirements.

Natural language data is offline and belongs to methodological learning; decision-making requires a lot of data generation capabilities and requires a simulator. For example, when we train a robotic dog to walk, we will not let it walk around in rainy days or other environments to collect data. We often first build a simulator that is very similar to the outside world, and use the simulator to generate data. , after the model is learned, it is put into a real scene to give feedback, and then comes back to learn again, so that it can quickly transfer its decision-making capabilities to real-life applications. Large model technology covers a variety of scenes, whether it is raining, walking on stairs, or walking on sand, there is no problem.

ChatGPT 加持,决策大模型距离 AGI 更进一步

Mechanical dog walks in different environments

The second difficulty is that decision-making data training is more difficult than natural language processing. Data is constantly generated in this process. The efficiency of data generation, where it is generated, and how to allocate it to various learning modules for learning , a unified system-level solution is needed. Previously, we have specially developed a set of large-scale learning methods, which are mainly used in this reinforcement learning training method. However, after ChatGPT came out, the training method based on the large oracle model was not suitable.

AI Technology Comment: In specific scenarios, how to use ChatGPT to combine with large decision-making models?

Wang Jun: Take a case of a mechanical dog: At the earliest, we used the classic method of training the mechanical dog. The problem is that it has no problem walking on the road in a single environment, but it encounters rainy and snowy days. It couldn't walk, but when we added the large model solution, the mechanical dog began to have basic interactive capabilities and could perform reasoning. Send an instruction to the robot dog to send a message, and the model will automatically decompose the task into 1 to 5 basic steps. Each module has corresponding logic in the transmission, such as path planning from point A to point B.

Since the robot dog itself does not have the concept of going east or west, only coordinates, it is necessary to combine and correspond the interactive instructions with specific semantics. Through ChatGPT, we do not need to convert the instructions It is a programming language that can be directly interacted with. After receiving the question, the robot dog will decompose the instruction into several different questions. It will first optimize a part of Chat and match the actions, decisions and semantics with the natural language generated by ChatGPT.

This has become the main direction of our next research. We call it AIGA (AI generate actions). In the early stage, ChatGPT brought AIGC. Combined with the large decision-making model, we went further and changed from generate content to To generate actions, generate decisions.

The strength of the Digital Research Institute is in large decision-making models, so we insist on the direction of large-scale decision-making models. We want AI not only to communicate, but more importantly, to help you optimize and help you We think it is very valuable in making decisions. After ChatGPT is combined with the large decision-making model, the interaction generated is no longer limited to its ability to answer questions, but also whether it can understand complex and complex construction. By interacting with the environment of specific scenarios, ChatGPT can be combined with the large decision-making model to realize robots Collaboration, device dynamics, enterprise autonomous scheduling, software development and other wider applications.

Natural language is the foundation

AI Technology Commentary: After training multi-modal data, to what extent will the number of parameters reach more capabilities? Text, image, voice , video...Which modality will have a greater impact on the multi-modal model?

Wang Jun: In terms of data, there is a certain limit to the idea of ​​"big efforts can produce miracles". Although we have not fully seen this limit yet, I feel that we are not just focusing on learning the training of ChatGPT. Way.

ChatGPT’s language skills and conversational skills are strong, but does ChatGPT truly understand what it absorbs? I think it's not understood. Let it play a guessing game. On the surface it can play, but in fact it doesn't know and can't guess the number in your mind. ChatGPT is more about memorizing logical content in the original training data. Its ability to match information is very strong, but its ability to truly understand is very weak.

How to break its limitations? I think we need to add the model's understanding of the entire world to the training. If it does not build a mathematical model to describe the world and put its understanding into the world model, it will not have a deeper understanding of the surrounding world. of. To give a simple example, we give ChatGPT all human chess-playing ability data below 2000 points. If the model only imitates people, then it cannot imitate intelligence higher than 2000 points.

ChatGPT 加持,决策大模型距离 AGI 更进一步

The AI ​​Creation Assistant that Wang Jun’s team has previously done

Data is very important, but at the same time, the size of the model is also very important, and it must be different training methods to improve it.

In multi-modality, natural language is the foundation. When people think, language is the carrier of our thinking. It constructs a relatively clear logical description, which may not be 100% Rigorous, there are some unclear and vague aspects, but it is enough for us to express some very complex logical relationships.

But at the same time, we must also clearly realize that the semantic information and expressions implicit in natural language are very important. In other words, it may be able to express the problem very clearly, but this is just an appearance. , the most important thing is the semantic relationship contained in the dialogue. When other multi-modal modes come, it can be migrated to other modalities by matching the corresponding semantic expressions.

Based on natural language, we can add other more modalities to participate in the model.

AI Technology Review: How do you see the impact of “human feedback” data on multimodal large models or decision-making large models?

Wang Jun: Some human feedback data is needed, but the amount is not as large as the previous supervised learning requirements. A basic model only needs to be given a few demonstrations, with the purpose of guiding the basic model to adapt to new trends. mission scenarios, allowing the basic model to reveal its original capabilities. This is an innovation of the classic machine learning training model.

In the past, most AI companies used machine learning in a customized model. When a task came, they first defined the problem, collected data for training, and tested the model. After the second task was issued, they collected and defined the problem again. Collecting data for training and model testing is not only difficult to replicate, but deployment also consumes a lot of financial and human resources.

Machine learning after ChatGPT is based on large models first. I don’t need to know what the specific problem is. I can build the model first, then distribute it to customers or manufacturers, and place the imitation in a large model that does not have the training capabilities. For companies with capabilities, the company deploys and then defines the overall process in reverse. Its essence is to activate large models and apply them to specific tasks, then define tasks and output results, which greatly reduces the impact of "human feedback" data on the model. Impact, truly realizing broad-spectrum, low-threshold AI.

AI Technology Comment: Some people believe that in this round of ChatGPT competition, computing power and models are no longer as important as in the previous two periods, but scenarios and data will become the key to this round. What do you think? of?

Wang Jun: Models are very important. The current improvements in the language capabilities of some large models will lead to models having the ability to understand people, but this is just an appearance. The basic model training method that only relies on a few words to predict the next word is difficult to produce greater intelligence. The model's thinking and decision-making abilities cannot be improved, and these two are the most basic abilities of artificial intelligence. You need to know how to interact with the environment.

From the perspective of this model, the model still needs to be innovated. The Transform architecture is very good, but it does not mean that we can stop moving forward. We still need innovative, creative, and thoughtful ones. Neural network models emerge.

Computing power, models, data, and scenarios are all very important. When the data and computing power reach a certain level, a new innovation needs to appear. After the innovation, the variables of data and computing power are accumulated, and then Reaching a certain height and innovating is a spiral process.

Scenarios are the purpose. Ultimately, we need to define and solve problems in scenarios, rather than just leaving research at the academic level. After the scene is driven, a new model or method is used, and data and computing power are used to make it reach another extreme.

ChatGPT is very broad-spectrum, but it does not mean that it can solve all AI problems. What should we think about in the next scenario and what problems can it solve? The core of the problem is to enable the model's thinking and decision-making capabilities to truly be applied to specific scenarios. At the same time, it must interact with the environment, people, and various scenarios, and ultimately realize the impact of the entire energy industry. It has brought great promotion to the progress of mankind as a whole.

In this case, we can give birth to real AGI. This is also the goal of the Institute of Mathematics.

The above is the detailed content of With the blessing of ChatGPT, large decision-making models are one step closer to AGI. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete