


ProAgent: Intelligent agents led by OpenAI liberate manpower, released by Tsinghua University and other universities
- Project address: https://github.com/OpenBMB/ProAgent
- Paper address: https://github.com/OpenBMB/ProAgent/blob/main/paper/paper.pdf
In the development of human technology Throughout history, automation has been the main driving force, helping humans to free themselves from complex, dangerous, and tedious labor environments. From waterwheel irrigation in the early agricultural era to steam engines in the industrial era, humans have been constantly pursuing more advanced automation technologies to liberate themselves from heavy work
With the information age With the arrival of , software, as the basis for information processing, storage and communication, has become an inseparable part of human production and life, thus catalyzing the formation of Robotic Process Automation (RPA) technology. It coordinates multiple software into a solidified workflow (Workflow) through manually compiled rules, and interacts with software to achieve efficient execution by simulating human interaction.
In this diagram we compare Robotic Process Automation (RPA) with Agent Process Automation (APA)
RPA (Robotic Process Automation) uses software robots or "BOTs" to simulate and perform repetitive and regular tasks to free up human resources and improve work efficiency. The application range of RPA is very wide. Many enterprises (including banks, insurance companies, manufacturing, retail and other industries) usually use RPA robots to automate routine and tedious tasks, such as data entry, data extraction, and data processing. By automating tasks, RPA can significantly reduce error rates and be able to perform tasks 24*7, thereby improving business reliability and responsiveness
According to market research, the RPA market is growing rapidly and achieving great success. Gartner predicts that global RPA market revenue will reach US$3.3 billion by 2023, with a growth rate of 17.5%. This shows that enterprises have a very high demand and recognition for RPA
However, RPA can only replace simple, mechanical human work, and some complex processes still rely on manual labor:
- Writing RPA workflow itself requires heavy human labor and is costly.
- Complex tasks are very flexible and usually involve dynamic decision-making, which is difficult to solidify into rules for expression.
Figure 2 Comparison of efficiency and intelligence between RPA and APA
Fortunately, The recent emergence of large language model agent technology (Large Language Model based Agents, LLM-based Agents) in the field of AI may create new possibilities for automation technology. Is it possible to introduce the flexibility of Agent technology into the RPA field to further reduce human participation?
The team's research explores the new automation paradigm "Agentic Process Automation" (APA) in the era of large-model agents. Compared with traditional RPA, in the APA paradigm, the Agent can autonomously complete the workflow construction according to human needs. At the same time, it can identify the parts of human needs that require dynamic decision-making, automatically orchestrate them into the workflow, and execute the workflow when the workflow is executed. This part actively takes over the execution of the workflow to complete corresponding complex decisions.
In order to explore the possibilities of APA, this research work implemented an automated agent ProAgent, which can receive human instructions and build workflows by generating code while also being in the workflow DataAgent and ControlAgent are introduced to implement complex data processing and logical control in workflow. ProAgent's research demonstrates the feasibility of APA in the era of large-model agents, and also reveals new possibilities for automation technology in the era of LLM.
Method introduction
In RPA, the workflow is a graph structure composed of a series of tool calls: nodes represent atomic tool calls (such as Gmail, Twitter, Google Sheets), while edges represent the logical sequence of execution (connection, branch, loop). A workflow usually contains all prior knowledge of a task or a type of task, including problem solving paths and exception handling logic. Therefore, writing fixed workflows is often very stable, thorough and efficient
Figure 3 Example of agent workflow description language
In ProAgent, since LLM itself is pre-trained in the code data , and learned strong coding capabilities, this research is based on the code-based Agentic Workflow Description Language. This language uses JSON to organize and manage data in the workflow, and uses Python syntax to implement logical control of the workflow. Jumps, loops, etc. in the control flow are directly represented through Python syntax, while the tools in the workflow are The call is encapsulated as a Python Function. So for ProAgent, workflow building tasks are transformed into code generation tasks. When receiving human instructions, ProAgent writes the corresponding Agentic Workflow Description Language, thereby realizing automated workflow construction.
Figure 4 Example of agent workflow description language combining DataAgent and ControlAgent
Complex reality Tasks usually involve dynamic decision-making, and simple Python-style logic control rules and JSON-style data organization are ineffective when facing flexible needs. At this time, agents need to be introduced. Therefore, this research work further defines two Agent operations:
1. DataAgent: For a complex data processing requirement, natural language will be used to describe the processing when building the workflow. The task will then initialize a DataAgent when executed, which will autonomously process and complete the data processing task based on the natural language description.
2. ControlAgent: For logical control rules that are difficult to express with rules, use natural language to describe the control logic when building the workflow, and then A ControlAgent will be initialized at runtime, which will autonomously select the branch that needs to be executed later in the workflow based on the natural language description.
ProAgent uses ReACT mode to build workflow step by step, which contains four workflow construction steps:
- Action_Define: Decide what tools to add to the workflow.
- Action Implement: Convert the input/output parameters of the tool into a JSON structure, and encapsulate the call of the tool into a Python function.
- Workflow Implement: Define a mainWorkflow function to organize the logic control and data processing of the entire workflow.
- Task Submit: When ProAgent completes building the workflow, this operation identifies the end of the build process.
The example shows Figure 5 of the ProAgent workflow building process
In addition, In order to optimize the effect of ProAgent, several optimization techniques are introduced:
- 1.Testing-on-Constructing: During the construction process, ProAgent will modify the workflow once Test the workflow to ensure its correctness.
- Function Calling: All operations of workflow construction are encapsulated into GPT-4 Functions, thereby improving control over the workflow construction process.
- Chain-of-Thought: When ProAgent writes workflow code, it needs to give comments and a writing plan for each function to improve the performance of ProAgent workflow construction. .
The workflow execution process is based on the Python interpreter. When a workflow is given, the corresponding mainWorkflow function is used as the entry point for execution, thus starting the entire execution process. The execution process follows the execution rules of Python code, that is, it is executed line by line in order. Once the mainWorkflow function returns, execution of the workflow has completed successfully
Feasibility Verification
In order to verify the feasibility of Agentic Process Automation, this research uses OpenAI GPT-4 as the basic model and an open source RPA platform n8n as The carrier implements the above-mentioned ProAgent. At the same time, we designed a task that requires both flexibility and efficiency: this is a typical business scenario, which requires extracting profit data of various business lines from Google Sheets, and determining subsequent actions based on whether the business is 2B or 2C. Once the line of business is determined to be 2C, a message is sent to the Slack channel. For business lines in 2B, an email is sent to the respective manager, which includes an assessment of the business line and a brief profitability overview.
Figure 6 Task Instruction Display
The content that needs to be rewritten is: For this task , First of all, it is a repetitive task, and the same process should be adopted for multiple product lines. Secondly, it is very difficult to distinguish whether a business line is 2C or 2B, and it requires dynamic decision-making by the Agent to determine the subsequent workflow. Finally, writing the evaluation email of the business line requires a certain amount of intelligence, so the intervention of the Agent is required
In the ProAgent generation, for this task, a program containing four atomic operations was written. Workflow for a DataAgent and a ControlAgent. The overall process is roughly as shown in the figure below:
Figure 7 ProAgent workflow construction process display
It can be seen that ProAgent automatically The way of writing code automatically completes the workflow construction process without manual intervention. When it is necessary to determine whether the business line is 2B or 2C, ProAgent introduces ControlAgent to make the judgment. The Prompt of ControlAgent is set to "Decide Whether the business line is toC or toB". When the business line is 2B, ProAgent also introduces a DataAgent, whose task is set to "Write an email of the business line of profit, together with your suggestion", thus using the intelligence of the agent to write based on the actual situation of different business lines. mail.
After the workflow is written and solidified, the workflow will automatically branch to different logic according to different data for efficient data processing.
Figure 8 ProAgent workflow execution process display
When processing 2C business line data, ControlAgent You can determine the type of the current business line based on the business line description and choose to use the Slack tool for communication. When processing 2B business line data, DataAgent can compose an email and send it to the corresponding manager's mailbox
Summary
This study proposes A new automation paradigm - Agentic Process Automation is developed, suitable for the era of large models. Compared with traditional Robotic Process Automation technology, Agentic Process Automation can automate the construction of workflows and realize the automation of dynamic decisions during workflow execution. The research also further developed ProAgent and experimentally demonstrated the feasibility and potential of large-model agents in automation. I believe that in the future, large model agent technology will help humans achieve a higher level of automation and liberate themselves from heavy labor
Team related research
Currently, the research team has conducted many studies in the direction of large model agents, including:
- XAgent: a super powerful model agent application framework that can dismantle complex tasks on its own. and execute efficiently.
- Project address: https://github.com/OpenBMB/XAgent
- ChatDev: a multi-agent collaborative development framework that allows multiple Agents with different roles collaborate to automatically develop software applications.
- Project address: https://github.com/OpenBMB/ChatDev
- AgentVerse: A general platform for large model-driven agents, recruiting A variety of agent experts work together to help users solve complex tasks.
- Project address: https://github.com/OpenBMB/AgentVerse
The above is the detailed content of ProAgent: Intelligent agents led by OpenAI liberate manpower, released by Tsinghua University and other universities. For more information, please follow other related articles on the PHP Chinese website!

While it can’t provide the human connection and intuition of a trained therapist, research has shown that many people are comfortable sharing their worries and concerns with relatively faceless and anonymous AI bots. Whether this is always a good i

Artificial intelligence (AI), a technology decades in the making, is revolutionizing the food retail industry. From large-scale efficiency gains and cost reductions to streamlined processes across various business functions, AI's impact is undeniabl

Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). In addition, for my comp

Maintaining a professional image requires occasional wardrobe updates. While online shopping is convenient, it lacks the certainty of in-person try-ons. My solution? AI-powered personalization. I envision an AI assistant curating clothing selecti

Google Translate adds language learning function According to Android Authority, app expert AssembleDebug has found that the latest version of the Google Translate app contains a new "practice" mode of testing code designed to help users improve their language skills through personalized activities. This feature is currently invisible to users, but AssembleDebug is able to partially activate it and view some of its new user interface elements. When activated, the feature adds a new Graduation Cap icon at the bottom of the screen marked with a "Beta" badge indicating that the "Practice" feature will be released initially in experimental form. The related pop-up prompt shows "Practice the activities tailored for you!", which means Google will generate customized

MIT researchers are developing NANDA, a groundbreaking web protocol designed for AI agents. Short for Networked Agents and Decentralized AI, NANDA builds upon Anthropic's Model Context Protocol (MCP) by adding internet capabilities, enabling AI agen

Meta's Latest Venture: An AI App to Rival ChatGPT Meta, the parent company of Facebook, Instagram, WhatsApp, and Threads, is launching a new AI-powered application. This standalone app, Meta AI, aims to compete directly with OpenAI's ChatGPT. Lever

Navigating the Rising Tide of AI Cyber Attacks Recently, Jason Clinton, CISO for Anthropic, underscored the emerging risks tied to non-human identities—as machine-to-machine communication proliferates, safeguarding these "identities" become


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

WebStorm Mac version
Useful JavaScript development tools

Dreamweaver Mac version
Visual web development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Zend Studio 13.0.1
Powerful PHP integrated development environment
