search
HomeTechnology peripheralsAIAl Agent--An important implementation direction in the era of large models

1. Overall architecture of LLM-based Agent

Al Agent--大模型时代重要落地方向

The composition of the large language model Agent is mainly divided into the following 4 Modules:

1. Portrait module: mainly describes the background information of the Agent

The following introduces the main content and generation strategy of the portrait module.

Portrait content is mainly based on 3 types of information: demographic information, personality information and social information.

Generation strategy: 3 strategies are mainly used to generate portrait content:

  • Manual design method: Specify by yourself method to write the content of the user portrait into the prompt of the large model; suitable for situations where the number of Agents is relatively small;
  • Large model generation method: first specify Use a small number of portraits as examples, and then use a large language model to generate more portraits; suitable for situations with a large number of Agents;
  • Data alignment method: required Based on the background information of the characters in the pre-specified data set as the prompt of the large language model, corresponding predictions are made.

2. Memory module: The main purpose is to record Agent behavior and provide support for future Agent decisions

Memory Structure:

  • Unified memory: only short-term memory is considered, long-term memory is not considered.
  • Hybrid Memory: A combination of long-term and short-term memory.

#Memory form: Mainly based on the following 4 forms.

  • Language
  • ##Database
  • Vector representation
  • List

Memory content: Common following 3 operations:

  • Memory reading
  • Memory writing
  • Memory reflection

3. Planning module

  • Planning without feedback: The large language model does not require feedback from the external environment during the reasoning process. This type of planning is further subdivided into three types: single-channel based reasoning, which uses a large language model only once to completely output the steps of reasoning; multi-channel based reasoning, drawing on the idea of ​​​​crowdsourcing, allowing the large language model to generate multiple Reason the path and determine the best path; borrow an external planner.
  • Planning with feedback: This planning method requires feedback from the external environment, while the large language model requires feedback from the environment for the next step and subsequent planning. . Providers of this type of planning feedback come from three sources: environmental feedback, human feedback, and model feedback.

4. Action module

  • Action goal: The goal of some Agents is to complete a certain There are several tasks, some are communication and some are exploration.
  • Action generation: Some agents rely on memory recall to generate actions, and some perform specific actions according to the original plan.
  • Action space: Some action spaces are a collection of tools, and some are based on the large language model's own knowledge, considering the entire action space from the perspective of self-awareness.
  • Action impact: including the impact on the environment, the impact on the internal state, and the impact on new actions in the future.

The above is the overall framework of Agent. For more information, please refer to the following papers:

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen: A Survey on Large Language Model based Autonomous Agents. CoRR abs /2308.11432 (2023)

2. Key & Difficult Issues of LLM-based Agent

Al Agent--大模型时代重要落地方向

##The key and difficult issues of current large language model Agent mainly include:

1. How to improve Agent’s role-playing ability

The most important function of Agent is to complete specific tasks by playing a certain role , or complete various simulations, so the Agent's role-playing ability is crucial.

(1) Agent role-playing ability definition

Agent role-playing ability is divided into two dimensions:

  • The behavioral relationship between the role and the Agent
  • The evolution mechanism of the role in the environment

(2)Agent role-playing ability evaluation

After defining the role-playing ability, the next step is to evaluate the Agent role-playing ability from the following two aspects :

  • Role-playing evaluation index
  • Role-playing evaluation scenario

(3) Improvement of Agent’s role-playing ability

On the basis of the evaluation, the Agent’s role-playing ability needs to be further improved. There are the following two methods:

  • Improve role-playing capabilities through prompts: The essence of this method is to stimulate the ability of the original large language model by designing prompts;
  • Improve role-playing capabilities through fine-tuning: This method is usually based on external data and re-finetune the large language model to improve role-playing capabilities.

2. How to design the Agent memory mechanism

The biggest difference between Agent and large language model is that Agent can The environment is constantly undergoing self-evolution and self-learning; in this, the memory mechanism plays a very important role. Analyze the Agent's memory mechanism from three dimensions:

(1) Agent memory mechanism design

The following two common memory mechanisms are common:

  • Memory mechanism based on vector retrieval
  • Memory mechanism based on LLM summary

(2) Agent memory ability evaluation

To evaluate the Agent’s memory ability, it is mainly necessary to determine the following two points:

  • Evaluation indicators
  • Evaluation scenarios

(3) Agent memory mechanism evolution

Finally, the evolution of Agent memory mechanism needs to be analyzed, including:

  • The evolution of memory mechanism
  • Autonomous update of memory mechanism

3. How to improve Agent’s reasoning/planning ability

(1) Agent’s task decomposition ability

  • Sub-task definition and disassembly
  • Optimal order of task execution

(2) Integration of Agent reasoning and external feedback

  • Design the integration mechanism of external feedback during the reasoning process: let the Agent and the environment form an interactive whole;
  • Improve the Agent's ability to respond to external feedback: On the one hand, the Agent needs to truly respond to the external environment, and on the other hand, the Agent needs to be able to ask questions and seek solutions to the external environment.

4. How to design an efficient multi-Agent collaboration mechanism

(1) Multi-Agents collaboration mechanism

  • Agents different role definition
  • Agents cooperation mechanism design

(2) Multi-Agents Debate Mechanism

  • Agents Debate Mechanism Design
  • Agents Debate Convergence Condition Determination

##3. Based on the big User behavior simulation agent of language model

Al Agent--大模型时代重要落地方向

The following will give some actual cases of Agent. The first is a user behavior simulation agent based on a large language model. This agent is also an early work in combining large language model agents with user behavior analysis. In this work, each Agent is divided into three modules:

1. The portrait module

specifies different attributes for different Agents. Such as ID, name, occupation, age, interests and characteristics, etc.

2. Memory module

The memory module includes three sub-modules

(1) Feeling Memory

(2) Short-term memory

  • After processing the objectively observed raw observation, the amount of information is generated Higher observations are stored in short-term memory;
  • #The storage time of short-term memory contents is relatively short

(3) Long-term memory

  • #The content of short-term memory will be automatically transferred to long-term memory after repeated triggering and activation.
  • The storage time of long-term memory contents is relatively long
  • The contents of long-term memory will be stored according to the Existing memories are subject to independent reflection, sublimation and refinement.

3. Action module

Each Agent can perform three actions:

  • Agent’s behavior in the recommendation system, including watching movies, finding the next page, and leaving the recommendation system;
  • ## The behavior of conversations between #Agents;
  • #Agent’s behavior of posting on social media.
During the entire simulation process, an Agent can freely choose three actions in each round of actions without external interference; we can see Different Agents will talk to each other and autonomously produce various behaviors in social media or recommendation systems. After multiple rounds of simulations, we can observe some interesting social phenomena and the behavior of users on the Internet. law.

For more information, please refer to the following papers:

Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen :When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm

4. Based on Multi-agent software development for large language models

Al Agent--大模型时代重要落地方向

The next Agent example is software development using multi-Agent. This work is also an early work of multi-Agent cooperation, and its main purpose is to use different Agents to develop a complete software. Therefore, it can be regarded as a software company, and different Agents will play different roles: some Agents are responsible for design, including roles such as CEO, CTO, CPO, etc.; some Agents are responsible for coding, and some Agents are mainly responsible for testing; in addition, there are Some agents are responsible for writing documents. In this way, different Agents are responsible for different tasks; finally, the cooperation mechanism between Agents is coordinated and updated through communication, and finally a complete software development process is completed.

#5. Future direction of LLM-based Agent

Al Agent--大模型时代重要落地方向

Agents of large language models can currently be divided into two major directions:

    Solve specific tasks, such as MetaGPT, ChatDev, Ghost, DESP, etc.

  • This type of Agent should ultimately be a "superman" aligned with the correct values ​​of mankind, which has two "qualifiers":
    Aligned correctly Human values;
    #beyond the capabilities of ordinary people.
  • Simulate the real world, such as Generative Agent, Social Simulation, RecAgent, etc.

  • The abilities required by this type of Agent are completely opposite to the first type. .
    Allow Agent to present a variety of values;
    We hope that Agent will try to conform to ordinary people instead of going beyond ordinary people.

In addition, the current large language model Agent has the following two pain points:

    Illusion problem

  • Since the Agent needs to continuously interact with the environment, the hallucinations of each step will be accumulated, which will produce a cumulative effect and make the problem more serious; therefore, the hallucination problem of large models needs further attention here. The solutions include:
    Design an efficient human-machine collaboration framework;
    Design Plan an efficient human intervention mechanism.
  • Efficiency issues

  • In the simulation process, efficiency is a very important issue; the following table summarizes the time consumption of different Agents under different API numbers.

Al Agent--大模型时代重要落地方向

The above is the content shared this time, thank you all.

The above is the detailed content of Al Agent--An important implementation direction in the era of large models. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserTesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailSam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsExploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaThe Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressAI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarJamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityNew Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor