Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.-AI-php.cn

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

PHPz

Apr 21, 2023 pm 07:34 PM

Googlerobot

Look carefully, the man in front of you is constantly giving natural language instructions to a robot, such as "Push the green star between the red blocks", "Move the blue block to the lower left Corner", the robot can complete every input command in real time.

Since the 1960s, robotics experts have been trying to make robots understand people's "natural language instructions" and perform specific actions.

Ideally, future robots will react in real time to any relevant task that users can describe in natural language.

Especially in open human environments, users may need to customize the behavior of the robot when it occurs, providing quick corrections, such as "stop, move the arm up a little" or specify Limit "Move slowly to the right".

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

In addition, real-time languages can make it easier for people and robots to collaborate on complex long-term tasks, and people can guide robots iteratively and interactively Operation, there will occasionally be verbal feedback.

The current related work can be roughly divided into the following three parts:

1. The robot body needs to exist in the real world;

2. Able to respond to a large number of rich natural language commands;

3. Able to execute interactive (interactive) language commands , that is, the robot needs to accept new natural language instructions during task execution.

Regarding the third point, the current interactive development speed in the field of robots is still very slow, which also makes robots lack a "sense of life".

Recently Google published a paper proposing a brand new framework that can produce real-world, real-time interactive robots that execute natural language instructions, as well as related data sets and environments. , benchmarks and strategies are all available.

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

##Paper link: https://arxiv.org/pdf/2210.06407.pdf

Project homepage: https://interactive-language.github.io/

Through a data set of hundreds of thousands of language annotation trajectories Conducting behavior cloning training, the resulting policy can skillfully execute an order of magnitude more commands than previous work achieved. In the real world, the researchers estimated that the method had a 93.5% success rate on 87,000 different natural language strings.

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

# And the same strategy can be guided by humans in real time via natural language to solve a wide range of precise long-distance rearrangement goals, such as "using Make a smiling face with building blocks" etc.

The data set released with the paper includes nearly 600,000 language-tagged trajectories, which is an order of magnitude larger than previously available data sets.

Interactive Language: Real-time Conversation with the Robot

To integrate the robot into the real world, the most important thing is to be able to process open natural language instructions, but from the machine From a learning perspective, getting robots to learn open vocabulary languages is a huge challenge.

Open representative models need to perform a large number of tasks, including small corrective instructions, etc. Existing multi-task learning setups leverage carefully designed imitation learning datasets or complex reinforcement learning reward functions to drive learning for each task, and predefined sets designed in this way are bound to not be very large.

Therefore, a key question in the open vocabulary task is: how to extend the collection of robot data to cover thousands of actions in real environments, and How do you connect all of this behavior to the natural language instructions that the end user might actually provide?

In interactive languages, the key to the large-scale simulation learning framework proposed by Google is the scalability of creating large, multi-language conditional robot demonstration data sets.

Unlike the previous setup where all skills were defined and then a curated demonstration of each skill was collected, the researchers continued to work across multiple robots without scene resets. ) or low level skill segmentation.

All data, including failed data (such as knocking blocks off a table), must go through a HindSight language relabeling process before being paired with text.

In this process, annotators need to watch long robot videos to identify as many behaviors as possible, mark the start and end time of each behavior, and use unlimited forms of Natural language to describe each fragment.

The most important thing is that compared to the previous set of bootstrapping, all skills used for training are revealed bottom-up from the data itself, rather than being pre-set by researchers. definite.

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

#The researchers intentionally made the learning method and architecture as simple as possible. The Robot Policy Network is a cross-attention Transformer that combines 5 Hz video and text. Mapping to 5 Hz robot motion, the target is cloned using standard supervised learning behavior without auxiliary losses.

While testing, new natural language commands can be sent into the policy network via speech-to-text at rates up to 5 Hz.

Open Source Benchmark

During the annotation process, the researchers collected a Language-Table dataset containing more than 440,000 actual and 180,000 simulated robot executions of natural Demonstration of language commands, and the sequence of actions taken by the robot during the demonstration.

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

This is also currently the largest language-conditioned robot demonstration data set, directly improved by an order of magnitude.

Language-Table has launched a simulation learning benchmark, which can be used for model selection or to evaluate the ability of robots trained by different methods to execute instructions.

Real-time language behavior learning

In experiments, researchers found that robots are particularly powerful when they can follow natural language instructions input in real time. .

On the project website, the researchers demonstrate that users can guide the robot through complex long-horizon sequences to solve long-term problems using only natural language. The goal of precise coordinated control.

Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.

For example, if there are many blcoks on the table, the command can be "Make a smiley face with green eyes" or "Place them all in a vertical line "Up" and so on.

Because the robot was trained to follow open-vocabulary language, experiments saw the robot respond to a range of different verbal corrections, such as "Gently to the right." Move the red star".

Finally, the researchers explored the advantages of real-time language, such as making robot data collection more efficient. A human operator can control four robots at the same time using spoken language. It is possible Scaling robot data collection in the future without having to equip each robot with an annotator.

Conclusion

Although the project is currently limited to a fixed set of objects on the desktop, the experimental results of the interactive language can initially show that large-scale imitation learning can indeed produce real-time interactive A bot capable of following free-form end-user commands.

In order to promote the advancement of real-time language control technology for physical robots, researchers have open sourced Language-Table, which is currently the largest real-world robot demonstration data set based on language conditions. It can also be used as Related simulation benchmarks.

The researchers believe that the role of this data set may not only be limited to the field of robot control, but may also be used to study language and action conditional video prediction, robot video conditional language modeling, or in It provides a new starting point for studying many other interesting and active problems in the broader machine learning context.

The above is the detailed content of Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

How do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesMay 14, 2025 am 04:56 AM

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

5 AI Agent Myths You Need To Stop Believing NowMay 14, 2025 am 04:54 AM

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

An easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTMay 14, 2025 am 04:50 AM

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI

See all articles