Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM-AI-php.cn

Home

Technology peripherals

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

王林

Apr 17, 2024 pm 11:40 PM

gitaiModel

Multi-task robot learning is of great significance in dealing with diverse and complex scenarios. However, current methods are limited by performance issues and difficulties in collecting training datasets.

This paper proposes GeRM (General Robot Model), where researchers use offline reinforcement learning to optimize data utilization strategies, learning from demonstrations and sub-optimal data, thereby surpassing human demonstrations limitations.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Authors: Song Wenxuan, Zhao Han, Ding Pengxiang, Cui Can, Lu Shangke, Fan Yaning, Wang Donglin

Unit: West Lake University, Zhejiang University

Paper address: https://arxiv.org/abs/2403.13358

Project address: https://songwxuan.github.io/GeRM/

Then a Transformer-based vision-language-action model is used to process multi-modal input and output actions.

By introducing an expert hybrid structure, GeRM achieves faster inference speed and higher overall model capacity, thus solving the problem of limited reinforcement learning parameters and improving multi-task performance. Model performance during learning while controlling computational cost.

Through a series of experiments, it is proven that GeRM outperforms other methods in all tasks, while verifying its efficiency in the training and inference processes.

In addition, the researchers also provided the QUARD-Auto data set to support training. The construction of this data set follows the new paradigm of data automation collection proposed in the article. This method can reduce the number of collection robots. The cost of data drives progress in the multi-task learning community.

Main contributions:

#1. Proposed a hybrid expert model for four-legged reinforcement learning for the first time. Train on mixed-quality data with the potential to learn optimal policies.

2. Compared with existing methods, GeRM shows a higher success rate when only activating 1/2 of its own parameters, activating the emergence ability, and at the same time during the training process A better data utilization strategy is demonstrated in .

3. Proposed a paradigm for fully automatic robot data set collection, and collected a large-scale open source data set.

Method

The GeRM network structure is shown in Figure 1. The visual-linguistic input including demonstration data and failure data is input to 8 after passing through the encoder and tokenizer respectively. The decoder uses a layer of mixed expert structure to generate action tokens, which are eventually converted into discrete robot action data and deployed to the robot through the underlying strategy. In addition, we use reinforcement learning for training.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Figure 1 GeRM network structure diagram

GeRM Decoder is an architecture model including Transformer Decoder, in which A feedforward network (FFN) was selected from a set of 8 different expert networks.

At each layer, for each token, the gating network selects two experts to process the token and combine their outputs in a weighted manner.

Different experts are good at different tasks/different action dimensions to solve problems in different scenarios, thereby learning a common model across multiple tasks. This architecture expands the amount of network parameters while keeping the computational cost essentially unchanged.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Figure 2 Decoder structure diagram

We propose an automatic paradigm to collect robot multi-mode status data. In this way, we constructed QUARD-Auto, a large-scale robotics dataset containing a combination of demonstration and suboptimal data. It includes 5 tasks and 99 subtasks, with a total of 257k trajectories. We will open source to promote the development of the robotics community.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Table 1 Introduction to the data set

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Figure 3 Data Volume statistics

Experiments

#We conducted a comprehensive and robust series of experiments covering all 99 subtasks, each of which was carefully tested on 400 trajectories.

As shown in Table 1, GeRM has the highest success rate among all tasks. Compared with RT-1 and other variants of GeRM, it effectively learns from mixed-quality data, outperforms other methods, and exhibits superior capabilities in multiple tasks. At the same time, the MoE module balances computational cost and performance by activating some parameters during inference.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Table 2 Multi-task comparison experiment

GeRM shows commendable training efficiency. Compared with other methods, GeRM achieves extremely low loss and high success rate with only a few batches, highlighting GeRM's ability to optimize data utilization strategies.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Figure 4 Success rate/Loss change curve

GeRM demonstrates dynamic adaptive path planning emergent ability. As shown in the video, the quadruped robot has a limited field of view in the initial position, making it difficult to determine the direction of movement. To avoid the obstacle, it randomly chooses to turn left.

Subsequently, after encountering erroneous visual input, the robot performed a substantial reorientation to align with the correct target outside the original field of view. It then continues toward its destination, ultimately completing its mission.

It is worth noting that such trajectories do not belong to the distribution of our training data set. This demonstrates GeRM's emergent capabilities for dynamic adaptive path planning in the context of a scene, i.e., its ability to make decisions based on visual perception, plan future paths, and change next steps as needed.

Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM

Figure 5 Emergent Capability

The above is the detailed content of Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

AI Therapists Are Here: 14 Groundbreaking Mental Health Tools You Need To KnowApr 30, 2025 am 11:17 AM

While it can’t provide the human connection and intuition of a trained therapist, research has shown that many people are comfortable sharing their worries and concerns with relatively faceless and anonymous AI bots. Whether this is always a good i

Calling AI To The Grocery AisleApr 30, 2025 am 11:16 AM

Artificial intelligence (AI), a technology decades in the making, is revolutionizing the food retail industry. From large-scale efficiency gains and cost reductions to streamlined processes across various business functions, AI's impact is undeniabl

Getting Pep Talks From Generative AI To Lift Your SpiritApr 30, 2025 am 11:15 AM

Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). In addition, for my comp

Why AI-Powered Hyper-Personalization Is A Must For All BusinessesApr 30, 2025 am 11:14 AM

Maintaining a professional image requires occasional wardrobe updates. While online shopping is convenient, it lacks the certainty of in-person try-ons. My solution? AI-powered personalization. I envision an AI assistant curating clothing selecti

Forget Duolingo: Google Translate's New AI Feature Teaches LanguagesApr 30, 2025 am 11:13 AM

Google Translate adds language learning function According to Android Authority, app expert AssembleDebug has found that the latest version of the Google Translate app contains a new "practice" mode of testing code designed to help users improve their language skills through personalized activities. This feature is currently invisible to users, but AssembleDebug is able to partially activate it and view some of its new user interface elements. When activated, the feature adds a new Graduation Cap icon at the bottom of the screen marked with a "Beta" badge indicating that the "Practice" feature will be released initially in experimental form. The related pop-up prompt shows "Practice the activities tailored for you!", which means Google will generate customized

They're Making TCP/IP For AI, And It's Called NANDAApr 30, 2025 am 11:12 AM

MIT researchers are developing NANDA, a groundbreaking web protocol designed for AI agents. Short for Networked Agents and Decentralized AI, NANDA builds upon Anthropic's Model Context Protocol (MCP) by adding internet capabilities, enabling AI agen

The Prompt: Deepfake Detection Is A Booming BusinessApr 30, 2025 am 11:11 AM

Meta's Latest Venture: An AI App to Rival ChatGPT Meta, the parent company of Facebook, Instagram, WhatsApp, and Threads, is launching a new AI-powered application. This standalone app, Meta AI, aims to compete directly with OpenAI's ChatGPT. Lever

The Next Two Years In AI Cybersecurity For Business LeadersApr 30, 2025 am 11:10 AM

Navigating the Rising Tide of AI Cyber Attacks Recently, Jason Clinton, CISO for Anthropic, underscored the emerging risks tied to non-human identities—as machine-to-machine communication proliferates, safeguarding these "identities" become

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

4 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Where to find the Site Office Key in Atomfall

4 weeks agoByDDD

Hot Tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 English version

Recommended: Win version, supports code prompts!

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Hot Topics

Where is the login entrance for gmail email?

7868

1649

1407

1301

1244