Home > Article > Technology peripherals > ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University’s hot new paper, the HuggingGPT project has been open source
The AI craze triggered by ChatGPT has also "burned" the financial circle.
Recently, researchers at Bloomberg have also developed a GPT in the financial field—Bloomberg GPT, with 50 billion parameters.
The emergence of GPT-4 has given many people a taste of the powerful capabilities of large language models.
#However, OpenAI is not open. Many people in the industry have begun to clone GPT, and many ChatGPT replacement models are built on open source models, especially the Meta open source LLMa model.
#For example, Stanford's Alpaca, UC Berkeley teamed up with CMU, Stanford and other Vicuna, Dolly of the startup Databricks, etc.
Various ChatGPT-like large-scale language models built for different tasks and applications present a hundred schools of thought in the entire field. potential.
So the question is, how do researchers choose an appropriate model, or even multiple models, to complete a complex task?
Recently, the research team from Microsoft Research Asia and Zhejiang University released HuggingGPT, a large model collaboration system.
##Paper address: https://arxiv.org/pdf/2303.17580.pdf
HuggingGPT uses ChatGPT as a controller to connect various AI models in the HuggingFace community to complete multi-modal complex tasks.
This means that you will have a kind of super magic. Through HuggingGPT, you can have multi-modal capabilities, including pictures, videos, and voices. .
HuggingGPT BridgeResearchers pointed out that solving the current problems of large language models (LLMs) may be the first step towards AGI. It is also a critical step.
Because the current technology of large language models still has some shortcomings, there are some pressing challenges on the road to building AGI systems.- Limited by the input and output forms of text generation, current LLMs lack the ability to process complex information (such as vision and speech);
- In actual application scenarios, some complex tasks usually consist of multiple subtasks, so the scheduling and collaboration of multiple models are required, which is also beyond the capabilities of the language model;
- For some challenging tasks, LLMs show excellent results in zero-sample or few-sample settings, but they are still weaker than some experts (such as fine-tuned models).
To handle complex AI tasks, LLMs should be able to coordinate with external models to leverage their capabilities. Therefore, the key point is how to choose the appropriate middleware to bridge LLMs and AI models.
Researchers found that each AI model can be expressed in a language form by summarizing its model functions.
Thus, a concept is introduced, "Language is LLMs, namely ChatGPT, a universal interface to connect artificial intelligence models."
By incorporating the AI model description into the prompts, ChatGPT can be considered the brain that manages the AI model. Therefore, this method allows ChatGPT to call external models to solve practical tasks.
To put it simply, HuggingGPT is a collaboration system, not a large model.
Its function is to connect ChatGPT and HuggingFace to process input in different modalities and solve many complex artificial intelligence tasks.
So, every AI model in the HuggingFace community has a corresponding model description in the HuggingGPT library and is integrated into the prompt to build a ChatGPT connection.
HuggingGPT then uses ChatGPT as the brain to determine the answer to the question.
So far, HuggingGPT has integrated hundreds of models on HuggingFace around ChatGPT, covering text classification, target detection, semantic segmentation, image generation, 24 tasks including Q&A, text-to-speech, and text-to-video.
Experimental results prove that HuggingGPT has the ability to handle multi-modal information and complex artificial intelligence tasks.
Four-step workflow
HuggingGPT entire workflow It can be divided into the following four stages:
-Task planning: ChatGPT parses user requests, breaks them into multiple tasks, and plans the task sequence based on its knowledge and dependencies
- Model selection: LLM assigns the parsed tasks to expert models based on the model description in HuggingFace
-Task execution: The expert model executes the assigned task on the inference endpoint and records the execution information and inference results into LLM
- Response generation: LLM summarizes the execution process log and inference results, and returns the summary to the user
Experimental settings
In the experiment, the researcher used gpt-3.5-turbo and text-davinci-003 Variants of GPT models serve as Large Language Models (LLMs), which are publicly accessible through the OpenAI API.
#In order to make the output of LLM more stable, we set the decoding temperature to 0.
#At the same time, in order to adjust the output of LLM to conform to the expected format, we set logit_bias to 0.1 on the format constraint.
The researchers provide detailed tips designed for the mission planning, model selection, and reaction generation phases in the following table, where {{variable}} represents Before the prompt is entered into the LLM, the field values need to be filled in with the corresponding text.
Researchers tested HuggingGPT on a wide range of multi-modal tasks.
With the cooperation of ChatGP and expert models, HuggingGPT can solve tasks in multiple modes such as language, image, audio and video, including detection, generation, classification and question answering. Task.
#Although these tasks may seem simple, mastering the basic capabilities of HuggingGPT is a prerequisite for solving complex tasks.
For example, visual question and answer task:
# #Text generation:
文生图:
HuggingGPT can integrate multiple input contents to perform simple reasoning. It can be found that even if there are multiple task resources, HuggingGPT can decompose the main task into multiple basic tasks, and finally integrate the inference results of multiple models to obtain the correct answer.
In addition, the researchers evaluated the effectiveness of HuggingGPT in complex task situations through tests.
# demonstrated HuggingGPT’s ability to handle multiple complex tasks.
When processing multiple requests, they may contain multiple implicit tasks or require multiple aspects of information. In this case, relying on an expert model to solve the problem is not enough.
#HuggingGPT can organize the collaboration of multiple models through task planning.
A user request may explicitly contain multiple tasks:
The figure below shows HuggingGPT’s ability to handle complex tasks in multi-turn dialogue scenarios.
Users divide a complex request into several steps and reach the final goal through multiple rounds of requests. It was found that HuggingGPT can track the situation status of user requests through dialogue situation management in the task planning stage, and can well solve the requested resources and task planning mentioned by users.
"Jarvis" open sourceCurrently, this project has been open sourced on GitHub. But the code has not been fully released.
Interestingly, the researchers named this project Jarvis in "Iron Man", the invincible AI Here it comes.
JARVIS: A system connecting LLMs and the ML communityBy the way, HuggingGPT requires the OpenAI API to be used.
JARVIS / HuggingGPT is just like the Toolformer proposed by Meta before. They are all acting as connectors.
#Even, including ChatGPT plugins.
Netizens said, "I strongly suspect that the first artificial general intelligence (AGI) will appear earlier than expected. It will rely on "glue" artificial intelligence , able to intelligently glue together a series of narrow artificial intelligence and practical tools.
#I was given access to the plug-in, which transformed it from a math noob to a math genius overnight. Of course, this is only a small step, but it is a sign of future development trends.
I predict that in the next year or so we will see an AI assistant that is Dozens of large language models (LLMs) and similar tools are connected, and end users simply give instructions to their assistants to complete tasks for them. This sci-fi moment is coming.
Some netizens said that this is the future research method.
GPT In front of a lot of tools, you know how to use them.
The above is the detailed content of ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University’s hot new paper, the HuggingGPT project has been open source. For more information, please follow other related articles on the PHP Chinese website!