


The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones
Entering the multi-modal era, large models can also control drones!
When the vision module captures the starting conditions, the "brain" of the large model will generate action instructions, and then the drone can execute them quickly and accurately
Researchers from the Beijing University of Aeronautics and Astronautics' intelligent drone team led by Professor Zhou Yaoming have proposed an embodied intelligence architecture based on multi-modal large models
Currently, this structure has been used to control unmanned aerial vehicles How does this new intelligent agent perform? What are the technical details?
"Agent is the brain"
(The agent is the brain, the controller is the cerebellum)
’s control architecture: The intelligent agent, as the decision generator of the brain, focuses on generating high-level behaviors. Rewritten sentence: As the decision generator of the brain, the agent focuses on generating high-level behaviorsAs the motion controller of the cerebellum, the main responsibility of the controller is to generate high-level behaviors (such as expected target points) Converted into low-level system commands (such as rotor speed)
Specifically, the research team believes that this achievement has three main contributions. New system architecture applied to actual situationsThe research team proposed a new system architecture that can be applied to actual robots. This architecture embodies the intelligent agent based on the multi-modal large model into the brain, while the robot motion planner and controller are embodied into the cerebellum. The robot's perception system is analogized to human eyes, ears and other information collection The robot's actuator is analogous to actuators such as human hands.△Figure 1 Hardware system architecture
These nodes are connected through ROS, and communicate through the subscription and publication of messages in ROS or the request and response of services. It is different from traditional end-to-end robot large model control.
The content that needs to be rewritten is: △Figure 2 Software system architecture Rewritten content: The software system architecture is shown in Figure 2
New Agent
An automatic plan generation module, which has multi-modal sensing and monitoring capabilities and is good at handling emergencies in standby mode. .
A multi-modal data memory module that can be used for multi-modal memory retrieval and reflection, giving the agent the ability to learn with few samples.
- An embodied intelligent action module can establish a bridge for stable control between embodied intelligence and other modules on ROS. This module provides the ability to access other nodes on ROS using operations as a bridge.
- At the same time, in order to complete an action, multiple interactions may be required to obtain the parameters necessary to perform the action from the sensor to ensure that the agent can perform actions based on comprehensive situational awareness and the actuators it has. Stable output of specific actions
#The content that needs to be rewritten is: △ Figure 3 AeroAgent module architecture Rewritten content: △Figure 3 AeroAgent module architecture design
Bridge connecting large models and ROS
Why choose drones
The research team gave three reasons to explain why they chose drones to conduct testing and simulation of the system architecture
First of all, most of the web-scale world knowledge contained in LMMs today is from a third-person perspective. Embodied intelligence in fields such as humanoid robots is similar to the first-person perspective with humans as the subject. perspective. The camera on the drone, especially the downward-looking camera, is more like the third-person perspective (God's perspective) of organism intelligence
On the other hand, LMMs at the current stage, whether it is model deployment or API services are usually limited by computing resources, resulting in a certain delay in response. UAV mission planning is due to its ability to hover and the ability to cope with delays, which is an obstacle to application in fields such as autonomous drivingBoth of these two points have led to the current level of technological development. UAVs are suitable as pioneers to verify relevant theories and applications.Second
, currently, in the field of industrial drones, such as wildfire rescue, agriculture, forestry and plant protection, unmanned grazing, power inspection, etc., pilots and experts cooperate with actual operations,Intelligent tasksExecution has industrial requirements. Third
, from the perspective of future development,Multi-agent collaborationhas obvious needs in logistics, construction, factories and other fields . In this field, drones, as embodied intelligence from a "God's perspective", are suitable for serving as the leader of the central node to allocate tasks, and other robots can be regarded as the actuators of the drones. part of the research, so this research also has future development prospects.
The team used airgen’s emulator to conduct simulation experiments, and also selected DRL and other methods as a control group. The following are the experimental results:In the wild fire search and rescue scenario, AeroAgent achieved an excellent score of 100 points under the standardized score, with an average of 2.04 points per step
The content that needs to be rewritten is: Picture △No. 4-1, wildfire rescue scene
The content that needs to be rewritten is: △Figure 4-2 Sea apron landing scene
△Figure 4-3 Wind turbine inspection scenario
The content that needs to be rewritten is: △Figure 4-4 Airgen simulation experiment
The content that needs to be rewritten is: △ Figure 5 Case experiment of guiding trapped people
Paper address: https://arxiv.org/abs/2311.15033
The above is the detailed content of The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones. For more information, please follow other related articles on the PHP Chinese website!

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version
Visual web development tools

Dreamweaver CS6
Visual web development tools