search
HomeTechnology peripheralsAILanguage, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

#In the field of robotics, implementing a general robotics strategy requires a large amount of data, and collecting this data in the real world is time-consuming and laborious. Although simulation provides an economical solution for generating different volumes of data at the scene and instance levels, increasing task diversity in simulated environments still faces challenges due to the large amount of manpower required (especially for complex tasks). This results in typical artificial simulation benchmarks typically containing only tens to hundreds of tasks.

How to solve it? In recent years, large language models have continued to make significant progress in natural language processing and code generation for various tasks. Likewise, LLM has been applied to multiple aspects of robotics, including user interfaces, task and motion planning, robot log summary, cost and reward design, revealing strong capabilities in both physics-based and code generation tasks.

In a recent study, researchers from MIT CSAIL, Shanghai Jiao Tong University and other institutions further explored whether LLM can be used to create diverse simulation tasks and further explore them Ability.

Specifically, the researchers proposed an LLM-based framework GenSim, which provides an automated mechanism for designing and verifying task asset arrangements and task progress. More importantly, the generated tasks exhibit great diversity, promoting task-level generalization of robot strategies. Furthermore, conceptually, with GenSim, the reasoning and encoding capabilities of LLM are refined into language-visual-action strategies through intermediate synthesis of simulated data.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

##Paper address: https://arxiv.org/pdf/2310.01361.pdf

The GenSim framework consists of the following three parts:

  • The first is a prompt mechanism that proposes new tasks through natural language instructions and the corresponding code implementation;
  • Second is a task library that caches previously generated high-quality instruction code for verification and language model fine-tuning and returns it as a comprehensive task data set;
  • Finally It is a language-adapted multi-task policy training process that uses generated data to enhance task-level generalization capabilities.

At the same time the framework operates through two different modes. Among them, in the goal-oriented setting, the user has a specific task or wishes to design a task course. At this time, GenSim adopts a top-down approach, taking the expected tasks as input and iteratively generating related tasks to achieve the expected goals. In an exploratory environment, if there is a lack of prior knowledge of the target task, GenSim gradually explores content beyond the existing tasks and establishes a basic strategy that is independent of the task.

In Figure 1 below, the researcher initialized a task library containing 10 manually curated tasks, used GenSim to extend it and generate more than 100 tasks.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

The researchers also proposed several customized indicators to gradually measure the quality of generated simulation tasks, and Several LLMs have been evaluated in goal-directed and exploratory settings. For the task library generated by GPT-4, they performed supervised fine-tuning on LLMs such as GPT-3.5 and Code-Llama, further improving the task generation performance of LLM. At the same time, the achievability of tasks is quantitatively measured through strategy training, and task statistics of different attributes and code comparisons between different models are provided.

Not only that, the researchers also trained multi-task robot strategies that performed well on all generation tasks compared to models trained only on human planning tasks. Generalize and improve zero-shot generalization performance. Joint training with the GPT-4 generation task can improve generalization performance by 50% and transfer approximately 40% of zero-shot tasks to new tasks in simulations.

Finally, the researchers also considered simulation-to-real transfer, showing that pre-training on different simulation tasks can improve real-world generalization ability by 25%.

In summary, the strategy trained on tasks generated by different LLMs achieves better task-level generalization capabilities to new tasks, demonstrating the ability to train basic strategies through LLM extended simulation tasks potential.

Tenstorrent AI Product Management Director Shubham Saboo gave this research high praise. He said that this is a breakthrough research on GPT-4 combined with robots, using LLM such as GPT-4 to generate a robot on autopilot. A series of simulated robot tasks makes zero-sample learning and real-world adaptation of robots a reality.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Method introduction

As shown in Figure 2 below, the GenSim framework passes Procedural synthesis generates simulation environments, tasks, and demonstrations. The GenSim pipeline starts from the task creator and the prompt chain runs in two modes, goal-directed mode and exploratory mode, depending on the target task. The task library in GenSim is an in-memory component used to store previously generated high-quality tasks. The tasks stored in the task library can be used for multi-task policy training or fine-tuning LLM.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Task Creator

As shown below As shown in 3, the language chain will first generate the task description and then the related implementation. The task description includes the task name, resources, and task summary. This study uses a few-sample prompt in the pipeline to generate code.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Task Library

In the GenSim framework The task library stores tasks generated by the task creator to generate better new tasks and train multi-tasking strategies. The task library is initialized based on tasks from manually created benchmarks.

The task library provides the task creator with the previous task description as a condition for the description generation phase, provides the previous code for the code generation phase, and prompts the task creator from the task library Select a reference task as an example for writing a new task. After task implementation is complete and all tests have passed, LLM is prompted to "reflect" on the new task and task library and form a comprehensive decision on whether the newly generated task should be added to the library.

As shown in Figure 4 below, the study also observed that GenSim exhibits interesting task-level combination and extrapolation behavior:

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

LLM Supervised Multi-Task Strategy

After the tasks are generated, the study uses these task implementations to generate demonstrations data and train the operation strategy, using a dual-stream transmission network architecture similar to Shridhar et al. (2022).

As shown in Figure 5 below, this study regards the program as an effective representation of tasks and related demonstration data (Figure 5), and can define the embedding space between tasks, and its distance index More robust to various factors derived from perception, such as object pose and shape.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Experiments and results

This study uses experiments to verify the GenSim framework , aiming at the following specific questions: (1) How effective is LLM in designing and implementing simulation tasks? Can GenSim improve the performance of LLM in task generation? (2) Can training on tasks generated by LLM improve policy generalization ability? Would policy training benefit more if given more generation tasks? (3) Is pre-training on LLM-generated simulation tasks beneficial to real-world robot policy deployment?

Evaluate the generalization ability of LLM robot simulation tasks

As shown in Figure 6 below, for the exploration mode and goal Guided mode task generation, two-stage prompt chain with few samples and task library can effectively improve the success rate of code generation.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Task Level Generalization

Few-sample strategy optimization for related tasks. As can be observed from the left side of Figure 7 below, jointly training the tasks generated by LLM can improve the policy performance on the original CLIPort task by more than 50%, especially in low data situations (such as 5 demos).

Zero-shot policy generalization to unseen tasks. As can be seen in Figure 7, by pre-training on more tasks generated by LLM, our model can better generalize to tasks in the original Ravens benchmark. In the middle right of Figure 7, the researchers also pre-trained on 5 tasks on different task sources, including manually written tasks, closed-source LLM, and open-source fine-tuned LLM, and observed similar zero-shot task-level generalization.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

##Adapt the pre-trained model to the real world

Researchers transferred the strategies trained in the simulation environment to the real environment. The results are shown in Table 1 below. The model pre-trained on 70 GPT-4 generated tasks conducted 10 experiments on 9 tasks and achieved an average success rate of 68.8%, which is better than pre-training on the CLIPort task only. Compared with the baseline model, it has improved by more than 25%, and compared with the model pre-trained on only 50 tasks, it has improved by 15%.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

The researchers also observed that pre-training on different simulation tasks improved the robustness of long-term complex tasks. For example, GPT-4 pre-trained models show more robust performance on real-world build-wheel tasks.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Ablation experiment

Simulation training successful Rate. In Table 2 below, the researchers demonstrate the success rates of single-task and multi-task policy training on a subset of generated tasks with 200 demos. For policy training on GPT-4 generation tasks, its average task success rate is 75.8% for single tasks and 74.1% for multi-tasks.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Generate task statistics. In Figure 9 (a) below, the researcher shows the task statistics of different features of the 120 tasks generated by LLM. There is an interesting balance between the colors, assets, actions, and number of instances generated by the LLM model. For example, the generated code contains a lot of scenes with more than 7 object instances, as well as a lot of pick-and-place primitive actions and assets like blocks.

Code generation comparison. In Figure 9(b) below, the researchers qualitatively evaluate the failure cases in the top-down experiments of GPT-4 and Code Llama.

Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world

Please refer to the original paper for more technical details.

The above is the detailed content of Language, robot breaking, MIT and others use GPT-4 to automatically generate simulation tasks and migrate them to the real world. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
AI Game Development Enters Its Agentic Era With Upheaval's Dreamer PortalAI Game Development Enters Its Agentic Era With Upheaval's Dreamer PortalMay 02, 2025 am 11:17 AM

Upheaval Games: Revolutionizing Game Development with AI Agents Upheaval, a game development studio comprised of veterans from industry giants like Blizzard and Obsidian, is poised to revolutionize game creation with its innovative AI-powered platfor

Uber Wants To Be Your Robotaxi Shop, Will Providers Let Them?Uber Wants To Be Your Robotaxi Shop, Will Providers Let Them?May 02, 2025 am 11:16 AM

Uber's RoboTaxi Strategy: A Ride-Hail Ecosystem for Autonomous Vehicles At the recent Curbivore conference, Uber's Richard Willder unveiled their strategy to become the ride-hail platform for robotaxi providers. Leveraging their dominant position in

AI Agents Playing Video Games Will Transform Future RobotsAI Agents Playing Video Games Will Transform Future RobotsMay 02, 2025 am 11:15 AM

Video games are proving to be invaluable testing grounds for cutting-edge AI research, particularly in the development of autonomous agents and real-world robots, even potentially contributing to the quest for Artificial General Intelligence (AGI). A

The Startup Industrial Complex, VC 3.0, And James Currier's ManifestoThe Startup Industrial Complex, VC 3.0, And James Currier's ManifestoMay 02, 2025 am 11:14 AM

The impact of the evolving venture capital landscape is evident in the media, financial reports, and everyday conversations. However, the specific consequences for investors, startups, and funds are often overlooked. Venture Capital 3.0: A Paradigm

Adobe Updates Creative Cloud And Firefly At Adobe MAX London 2025Adobe Updates Creative Cloud And Firefly At Adobe MAX London 2025May 02, 2025 am 11:13 AM

Adobe MAX London 2025 delivered significant updates to Creative Cloud and Firefly, reflecting a strategic shift towards accessibility and generative AI. This analysis incorporates insights from pre-event briefings with Adobe leadership. (Note: Adob

Everything Meta Announced At LlamaConEverything Meta Announced At LlamaConMay 02, 2025 am 11:12 AM

Meta's LlamaCon announcements showcase a comprehensive AI strategy designed to compete directly with closed AI systems like OpenAI's, while simultaneously creating new revenue streams for its open-source models. This multifaceted approach targets bo

The Brewing Controversy Over The Proposition That AI Is Nothing More Than Just Normal TechnologyThe Brewing Controversy Over The Proposition That AI Is Nothing More Than Just Normal TechnologyMay 02, 2025 am 11:10 AM

There are serious differences in the field of artificial intelligence on this conclusion. Some insist that it is time to expose the "emperor's new clothes", while others strongly oppose the idea that artificial intelligence is just ordinary technology. Let's discuss it. An analysis of this innovative AI breakthrough is part of my ongoing Forbes column that covers the latest advancements in the field of AI, including identifying and explaining a variety of influential AI complexities (click here to view the link). Artificial intelligence as a common technology First, some basic knowledge is needed to lay the foundation for this important discussion. There is currently a large amount of research dedicated to further developing artificial intelligence. The overall goal is to achieve artificial general intelligence (AGI) and even possible artificial super intelligence (AS)

Model Citizens, Why AI Value Is The Next Business YardstickModel Citizens, Why AI Value Is The Next Business YardstickMay 02, 2025 am 11:09 AM

The effectiveness of a company's AI model is now a key performance indicator. Since the AI boom, generative AI has been used for everything from composing birthday invitations to writing software code. This has led to a proliferation of language mod

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)