


The AI craze triggered by ChatGPT has also "burned" the financial circle.
Recently, researchers at Bloomberg have also developed a GPT in the financial field—Bloomberg GPT, with 50 billion parameters.
The emergence of GPT-4 has given many people a taste of the powerful capabilities of large language models.
#However, OpenAI is not open. Many people in the industry have begun to clone GPT, and many ChatGPT replacement models are built on open source models, especially the Meta open source LLMa model.
#For example, Stanford's Alpaca, UC Berkeley teamed up with CMU, Stanford and other Vicuna, Dolly of the startup Databricks, etc.
Various ChatGPT-like large-scale language models built for different tasks and applications present a hundred schools of thought in the entire field. potential.
So the question is, how do researchers choose an appropriate model, or even multiple models, to complete a complex task?
Recently, the research team from Microsoft Research Asia and Zhejiang University released HuggingGPT, a large model collaboration system.
##Paper address: https://arxiv.org/pdf/2303.17580.pdf
HuggingGPT uses ChatGPT as a controller to connect various AI models in the HuggingFace community to complete multi-modal complex tasks.
This means that you will have a kind of super magic. Through HuggingGPT, you can have multi-modal capabilities, including pictures, videos, and voices. .
HuggingGPT BridgeResearchers pointed out that solving the current problems of large language models (LLMs) may be the first step towards AGI. It is also a critical step.
Because the current technology of large language models still has some shortcomings, there are some pressing challenges on the road to building AGI systems.- Limited by the input and output forms of text generation, current LLMs lack the ability to process complex information (such as vision and speech);
- In actual application scenarios, some complex tasks usually consist of multiple subtasks, so the scheduling and collaboration of multiple models are required, which is also beyond the capabilities of the language model;
- For some challenging tasks, LLMs show excellent results in zero-sample or few-sample settings, but they are still weaker than some experts (such as fine-tuned models).
To handle complex AI tasks, LLMs should be able to coordinate with external models to leverage their capabilities. Therefore, the key point is how to choose the appropriate middleware to bridge LLMs and AI models.
Researchers found that each AI model can be expressed in a language form by summarizing its model functions.
Thus, a concept is introduced, "Language is LLMs, namely ChatGPT, a universal interface to connect artificial intelligence models."
By incorporating the AI model description into the prompts, ChatGPT can be considered the brain that manages the AI model. Therefore, this method allows ChatGPT to call external models to solve practical tasks.
To put it simply, HuggingGPT is a collaboration system, not a large model.
Its function is to connect ChatGPT and HuggingFace to process input in different modalities and solve many complex artificial intelligence tasks.
So, every AI model in the HuggingFace community has a corresponding model description in the HuggingGPT library and is integrated into the prompt to build a ChatGPT connection.
HuggingGPT then uses ChatGPT as the brain to determine the answer to the question.
So far, HuggingGPT has integrated hundreds of models on HuggingFace around ChatGPT, covering text classification, target detection, semantic segmentation, image generation, 24 tasks including Q&A, text-to-speech, and text-to-video.
Experimental results prove that HuggingGPT has the ability to handle multi-modal information and complex artificial intelligence tasks.
Four-step workflow
HuggingGPT entire workflow It can be divided into the following four stages:
-Task planning: ChatGPT parses user requests, breaks them into multiple tasks, and plans the task sequence based on its knowledge and dependencies
- Model selection: LLM assigns the parsed tasks to expert models based on the model description in HuggingFace
-Task execution: The expert model executes the assigned task on the inference endpoint and records the execution information and inference results into LLM
- Response generation: LLM summarizes the execution process log and inference results, and returns the summary to the user
Multi-modal capabilities, with
Experimental settings
In the experiment, the researcher used gpt-3.5-turbo and text-davinci-003 Variants of GPT models serve as Large Language Models (LLMs), which are publicly accessible through the OpenAI API.
#In order to make the output of LLM more stable, we set the decoding temperature to 0.
#At the same time, in order to adjust the output of LLM to conform to the expected format, we set logit_bias to 0.1 on the format constraint.
The researchers provide detailed tips designed for the mission planning, model selection, and reaction generation phases in the following table, where {{variable}} represents Before the prompt is entered into the LLM, the field values need to be filled in with the corresponding text.
Researchers tested HuggingGPT on a wide range of multi-modal tasks.
With the cooperation of ChatGP and expert models, HuggingGPT can solve tasks in multiple modes such as language, image, audio and video, including detection, generation, classification and question answering. Task.
#Although these tasks may seem simple, mastering the basic capabilities of HuggingGPT is a prerequisite for solving complex tasks.
For example, visual question and answer task:
# #Text generation:
文生图:
HuggingGPT can integrate multiple input contents to perform simple reasoning. It can be found that even if there are multiple task resources, HuggingGPT can decompose the main task into multiple basic tasks, and finally integrate the inference results of multiple models to obtain the correct answer.
In addition, the researchers evaluated the effectiveness of HuggingGPT in complex task situations through tests.
# demonstrated HuggingGPT’s ability to handle multiple complex tasks.
When processing multiple requests, they may contain multiple implicit tasks or require multiple aspects of information. In this case, relying on an expert model to solve the problem is not enough.
#HuggingGPT can organize the collaboration of multiple models through task planning.
A user request may explicitly contain multiple tasks:
The figure below shows HuggingGPT’s ability to handle complex tasks in multi-turn dialogue scenarios.
Users divide a complex request into several steps and reach the final goal through multiple rounds of requests. It was found that HuggingGPT can track the situation status of user requests through dialogue situation management in the task planning stage, and can well solve the requested resources and task planning mentioned by users.
Currently, this project has been open sourced on GitHub. But the code has not been fully released.
Interestingly, the researchers named this project Jarvis in "Iron Man", the invincible AI Here it comes.
JARVIS: A system connecting LLMs and the ML communityBy the way, HuggingGPT requires the OpenAI API to be used.
Netizen: The future of research
JARVIS / HuggingGPT is just like the Toolformer proposed by Meta before. They are all acting as connectors.
#Even, including ChatGPT plugins.
Netizens said, "I strongly suspect that the first artificial general intelligence (AGI) will appear earlier than expected. It will rely on "glue" artificial intelligence , able to intelligently glue together a series of narrow artificial intelligence and practical tools.
#I was given access to the plug-in, which transformed it from a math noob to a math genius overnight. Of course, this is only a small step, but it is a sign of future development trends.
I predict that in the next year or so we will see an AI assistant that is Dozens of large language models (LLMs) and similar tools are connected, and end users simply give instructions to their assistants to complete tasks for them. This sci-fi moment is coming.
Some netizens said that this is the future research method.
GPT In front of a lot of tools, you know how to use them.
The above is the detailed content of ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University's hot new paper, the HuggingGPT project has been open source. For more information, please follow other related articles on the PHP Chinese website!

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Atom editor mac version download
The most popular open source editor

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment