The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones-AI-php.cn

Home

Technology peripherals

The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Dec 15, 2023 am 10:49 AM

aiModel

Entering the multi-modal era, large models can also control drones!

When the vision module captures the starting conditions, the "brain" of the large model will generate action instructions, and then the drone can execute them quickly and accurately

The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Researchers from the Beijing University of Aeronautics and Astronautics' intelligent drone team led by Professor Zhou Yaoming have proposed an embodied intelligence architecture based on multi-modal large models

Currently, this structure has been used to control unmanned aerial vehicles How does this new intelligent agent perform? What are the technical details?

"Agent is the brain" The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The research team uses large models to understand multi-modal data and integrates multi-source information such as photos, sounds, and sensor data of the real physical world to make The agent can perceive the surrounding environment and perform corresponding behavioral operations

At the same time, the team proposed a set of "Agent as Cerebrum, Controller as Cerebellum"

(The agent is the brain, the controller is the cerebellum)

’s control architecture:

The intelligent agent, as the decision generator of the brain, focuses on generating high-level behaviors. Rewritten sentence: As the decision generator of the brain, the agent focuses on generating high-level behaviors

As the motion controller of the cerebellum, the main responsibility of the controller is to generate high-level behaviors (such as expected target points) Converted into low-level system commands (such as rotor speed)

Specifically, the research team believes that this achievement has three main contributions.

New system architecture applied to actual situations

The research team proposed a new system architecture that can be applied to actual robots. This architecture embodies the intelligent agent based on the multi-modal large model into the brain

, while the robot motion planner and controller are embodied into the cerebellum. The robot's perception system is analogized to human eyes, ears and other information collection The robot's actuator is analogous to actuators such as human hands.

△Figure 1 Hardware system architecture

These nodes are connected through ROS, and communicate through the subscription and publication of messages in ROS or the request and response of services. It is different from traditional end-to-end robot large model control. The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

This architecture allows the Agent to focus on the generation of high-level commands, be more intelligent for high-level tasks, and have better robustness and reliability for actual execution.

The content that needs to be rewritten is: △Figure 2 Software system architecture Rewritten content: The software system architecture is shown in Figure 2

New Agent The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Under this architecture, the author built AeroAgent, an intelligent agent that serves as a brain.

The agent mainly consists of three parts:

An automatic plan generation module, which has multi-modal sensing and monitoring capabilities and is good at handling emergencies in standby mode. .

A multi-modal data memory module that can be used for multi-modal memory retrieval and reflection, giving the agent the ability to learn with few samples.

At the same time, in order to complete an action, multiple interactions may be required to obtain the parameters necessary to perform the action from the sensor to ensure that the agent can perform actions based on comprehensive situational awareness and the actuators it has. Stable output of specific actions

#The content that needs to be rewritten is: △ Figure 3 AeroAgent module architecture Rewritten content: △Figure 3 AeroAgent module architecture design

Bridge connecting large models and ROS The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In order to build a bridge between the embodied agent and the ROS robot system, let the Agent generate operations It can be sent to ROS correctly and stably and successfully executed by other nodes. At the same time, the information provided by other nodes can be read and understood by LMM. The team designed ROSchain -

A combination of LLMs/LMMs The bridge connecting ROS

ROSchain simplifies the integration of large models with robot sensing devices, execution units and control mechanisms through a set of modules and application program interfaces (APIs), providing a way for agents to access the ROS system. A stable middleware.

Why choose drones

The research team gave three reasons to explain why they chose drones to conduct testing and simulation of the system architecture

First of all, most of the web-scale world knowledge contained in LMMs today is from a third-person perspective. Embodied intelligence in fields such as humanoid robots is similar to the first-person perspective with humans as the subject. perspective. The camera on the drone, especially the downward-looking camera, is more like the third-person perspective (God's perspective) of organism intelligence

On the other hand, LMMs at the current stage, whether it is model deployment or API services are usually limited by computing resources, resulting in a certain delay in response.

UAV mission planning is due to its ability to hover and the ability to cope with delays, which is an obstacle to application in fields such as autonomous driving

Both of these two points have led to the current level of technological development. UAVs are suitable as pioneers to verify relevant theories and applications.

Second

, currently, in the field of industrial drones, such as wildfire rescue, agriculture, forestry and plant protection, unmanned grazing, power inspection, etc., pilots and experts cooperate with actual operations,

Intelligent tasksExecution has industrial requirements. Third

, from the perspective of future development,

Multi-agent collaborationhas obvious needs in logistics, construction, factories and other fields . In this field, drones, as embodied intelligence from a "God's perspective", are suitable for serving as the leader of the central node to allocate tasks, and other robots can be regarded as the actuators of the drones. part of the research, so this research also has future development prospects.

The team used airgen’s emulator to conduct simulation experiments, and also selected DRL and other methods as a control group. The following are the experimental results:

In the wild fire search and rescue scenario, AeroAgent achieved an excellent score of 100 points under the standardized score, with an average of 2.04 points per step The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The agents that simply call LLM or DRL-based agents only scored 29.4 points, with an average of 0.2 per step, less than one-tenth of AeroAgent.

The content that needs to be rewritten is: Picture △No. 4-1, wildfire rescue scene The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In the landing mission, AeroAgent also scored 97.4 overall points and an average score per step of 48.7 exceeds other models.

The content that needs to be rewritten is: △Figure 4-2 Sea apron landing scene The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

And in the wind turbine inspection test, AeroAgent directly became The only model that can accomplish this task.

△Figure 4-3 Wind turbine inspection scenario The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In the navigation task, the scores of each step of AeroAgent 4.44 are DRL and pure LLM respectively. 40 times and nearly 10 times

The content that needs to be rewritten is: △Figure 4-4 Airgen simulation experiment The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The team also conducted it in a real scene The testing of the UAV system was carried out as a case study using a simple guidance experiment of trapped people as an example.

The content that needs to be rewritten is: △ Figure 5 Case experiment of guiding trapped people The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The team is currently based on this work, on a certain plateau The Yak Ranch conducts experiments on unmanned grazing intelligent drones to explore the possibility of its practical application. With the goal of "embodiing intelligence", it will explore the application of intelligent agents in cooperation with other robots/multi-robots.

Paper address: https://arxiv.org/abs/2311.15033

The above is the detailed content of The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Gemma Scope: Google's Microscope for Peering into AI's Thought ProcessApr 17, 2025 am 11:55 AM

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Who Is a Business Intelligence Analyst and How To Become One?Apr 17, 2025 am 11:44 AM

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

How to Add a Column in SQL? - Analytics VidhyaApr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Business Analyst vs. Data AnalystApr 17, 2025 am 11:38 AM

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

What are COUNT and COUNTA in Excel? - Analytics VidhyaApr 17, 2025 am 11:34 AM

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Chrome is Here With AI: Experiencing Something New Everyday!!Apr 17, 2025 am 11:29 AM

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

AI's Human Side: Wellbeing And The Quadruple Bottom LineApr 17, 2025 am 11:28 AM

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

5 Game-Changing Quantum Computing Use Cases You Should Know AboutApr 17, 2025 am 11:24 AM

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version

Visual web development tools

Dreamweaver CS6

Visual web development tools

Hot Topics

Where is the login entrance for gmail email?

7543

CakePHP Tutorial

1381

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers