Home  >  Article  >  Technology peripherals  >  The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

WBOY
WBOYforward
2023-12-15 10:49:101253browse

Entering the multi-modal era, large models can also control drones!

When the vision module captures the starting conditions, the "brain" of the large model will generate action instructions, and then the drone can execute them quickly and accurately

The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Researchers from the Beijing University of Aeronautics and Astronautics' intelligent drone team led by Professor Zhou Yaoming have proposed an embodied intelligence architecture based on multi-modal large models

Currently, this structure has been used to control unmanned aerial vehicles How does this new intelligent agent perform? What are the technical details?

"Agent is the brain"The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The research team uses large models to understand multi-modal data and integrates multi-source information such as photos, sounds, and sensor data of the real physical world to make The agent can perceive the surrounding environment and perform corresponding behavioral operations

At the same time, the team proposed a set of "Agent as Cerebrum, Controller as Cerebellum"

(The agent is the brain, the controller is the cerebellum)

’s control architecture:

The intelligent agent, as the decision generator of the brain, focuses on generating high-level behaviors. Rewritten sentence: As the decision generator of the brain, the agent focuses on generating high-level behaviors

As the motion controller of the cerebellum, the main responsibility of the controller is to generate high-level behaviors (such as expected target points) Converted into low-level system commands (such as rotor speed)

Specifically, the research team believes that this achievement has three main contributions.

New system architecture applied to actual situations

The research team proposed a new system architecture that can be applied to actual robots. This architecture embodies the intelligent agent based on the multi-modal large model into the brain

, while the robot motion planner and controller are embodied into the cerebellum. The robot's perception system is analogized to human eyes, ears and other information collection The robot's actuator is analogous to actuators such as human hands.

△Figure 1 Hardware system architecture

These nodes are connected through ROS, and communicate through the subscription and publication of messages in ROS or the request and response of services. It is different from traditional end-to-end robot large model control. The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

This architecture allows the Agent to focus on the generation of high-level commands, be more intelligent for high-level tasks, and have better robustness and reliability for actual execution.

The content that needs to be rewritten is: △Figure 2 Software system architecture Rewritten content: The software system architecture is shown in Figure 2

New AgentThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Under this architecture, the author built AeroAgent, an intelligent agent that serves as a brain.

The agent mainly consists of three parts:

An automatic plan generation module, which has multi-modal sensing and monitoring capabilities and is good at handling emergencies in standby mode. .

A multi-modal data memory module that can be used for multi-modal memory retrieval and reflection, giving the agent the ability to learn with few samples.

    An embodied intelligent action module can establish a bridge for stable control between embodied intelligence and other modules on ROS. This module provides the ability to access other nodes on ROS using operations as a bridge.
  • At the same time, in order to complete an action, multiple interactions may be required to obtain the parameters necessary to perform the action from the sensor to ensure that the agent can perform actions based on comprehensive situational awareness and the actuators it has. Stable output of specific actions

#The content that needs to be rewritten is: △ Figure 3 AeroAgent module architecture Rewritten content: △Figure 3 AeroAgent module architecture design

Bridge connecting large models and ROSThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In order to build a bridge between the embodied agent and the ROS robot system, let the Agent generate operations It can be sent to ROS correctly and stably and successfully executed by other nodes. At the same time, the information provided by other nodes can be read and understood by LMM. The team designed ROSchain -

A combination of LLMs/LMMs The bridge connecting ROS

ROSchain simplifies the integration of large models with robot sensing devices, execution units and control mechanisms through a set of modules and application program interfaces (APIs), providing a way for agents to access the ROS system. A stable middleware.

Why choose drones

The research team gave three reasons to explain why they chose drones to conduct testing and simulation of the system architecture

First of all, most of the web-scale world knowledge contained in LMMs today is from a third-person perspective. Embodied intelligence in fields such as humanoid robots is similar to the first-person perspective with humans as the subject. perspective. The camera on the drone, especially the downward-looking camera, is more like the third-person perspective (God's perspective) of organism intelligence

On the other hand, LMMs at the current stage, whether it is model deployment or API services are usually limited by computing resources, resulting in a certain delay in response.

UAV mission planning is due to its ability to hover and the ability to cope with delays, which is an obstacle to application in fields such as autonomous driving

Both of these two points have led to the current level of technological development. UAVs are suitable as pioneers to verify relevant theories and applications.

Second

, currently, in the field of industrial drones, such as wildfire rescue, agriculture, forestry and plant protection, unmanned grazing, power inspection, etc., pilots and experts cooperate with actual operations,

Intelligent tasksExecution has industrial requirements. Third

, from the perspective of future development,

Multi-agent collaborationhas obvious needs in logistics, construction, factories and other fields . In this field, drones, as embodied intelligence from a "God's perspective", are suitable for serving as the leader of the central node to allocate tasks, and other robots can be regarded as the actuators of the drones. part of the research, so this research also has future development prospects.

The team used airgen’s emulator to conduct simulation experiments, and also selected DRL and other methods as a control group. The following are the experimental results:

In the wild fire search and rescue scenario, AeroAgent achieved an excellent score of 100 points under the standardized score, with an average of 2.04 points per stepThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The agents that simply call LLM or DRL-based agents only scored 29.4 points, with an average of 0.2 per step, less than one-tenth of AeroAgent.

The content that needs to be rewritten is: Picture △No. 4-1, wildfire rescue sceneThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In the landing mission, AeroAgent also scored 97.4 overall points and an average score per step of 48.7 exceeds other models.

The content that needs to be rewritten is: △Figure 4-2 Sea apron landing sceneThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

And in the wind turbine inspection test, AeroAgent directly became The only model that can accomplish this task.

△Figure 4-3 Wind turbine inspection scenarioThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In the navigation task, the scores of each step of AeroAgent 4.44 are DRL and pure LLM respectively. 40 times and nearly 10 times

The content that needs to be rewritten is: △Figure 4-4 Airgen simulation experimentThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The team also conducted it in a real scene The testing of the UAV system was carried out as a case study using a simple guidance experiment of trapped people as an example.

The content that needs to be rewritten is: △ Figure 5 Case experiment of guiding trapped peopleThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The team is currently based on this work, on a certain plateau The Yak Ranch conducts experiments on unmanned grazing intelligent drones to explore the possibility of its practical application. With the goal of "embodiing intelligence", it will explore the application of intelligent agents in cooperation with other robots/multi-robots.

Paper address: https://arxiv.org/abs/2311.15033

The above is the detailed content of The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete