


Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.
Look carefully, the man in front of you is constantly giving natural language instructions to a robot, such as "Push the green star between the red blocks", "Move the blue block to the lower left Corner", the robot can complete every input command in real time.
Since the 1960s, robotics experts have been trying to make robots understand people's "natural language instructions" and perform specific actions.
Ideally, future robots will react in real time to any relevant task that users can describe in natural language.
Especially in open human environments, users may need to customize the behavior of the robot when it occurs, providing quick corrections, such as "stop, move the arm up a little" or specify Limit "Move slowly to the right".
In addition, real-time languages can make it easier for people and robots to collaborate on complex long-term tasks, and people can guide robots iteratively and interactively Operation, there will occasionally be verbal feedback.
The current related work can be roughly divided into the following three parts:
1. The robot body needs to exist in the real world;
2. Able to respond to a large number of rich natural language commands;
3. Able to execute interactive (interactive) language commands , that is, the robot needs to accept new natural language instructions during task execution.
Regarding the third point, the current interactive development speed in the field of robots is still very slow, which also makes robots lack a "sense of life".
Recently Google published a paper proposing a brand new framework that can produce real-world, real-time interactive robots that execute natural language instructions, as well as related data sets and environments. , benchmarks and strategies are all available.
##Paper link: https://arxiv.org/pdf/2210.06407.pdf
Project homepage: https://interactive-language.github.io/
Through a data set of hundreds of thousands of language annotation trajectories Conducting behavior cloning training, the resulting policy can skillfully execute an order of magnitude more commands than previous work achieved. In the real world, the researchers estimated that the method had a 93.5% success rate on 87,000 different natural language strings.
# And the same strategy can be guided by humans in real time via natural language to solve a wide range of precise long-distance rearrangement goals, such as "using Make a smiling face with building blocks" etc.
The data set released with the paper includes nearly 600,000 language-tagged trajectories, which is an order of magnitude larger than previously available data sets.
Interactive Language: Real-time Conversation with the RobotTo integrate the robot into the real world, the most important thing is to be able to process open natural language instructions, but from the machine From a learning perspective, getting robots to learn open vocabulary languages is a huge challenge.
Open representative models need to perform a large number of tasks, including small corrective instructions, etc. Existing multi-task learning setups leverage carefully designed imitation learning datasets or complex reinforcement learning reward functions to drive learning for each task, and predefined sets designed in this way are bound to not be very large.
Therefore, a key question in the open vocabulary task is: how to extend the collection of robot data to cover thousands of actions in real environments, and How do you connect all of this behavior to the natural language instructions that the end user might actually provide?
In interactive languages, the key to the large-scale simulation learning framework proposed by Google is the scalability of creating large, multi-language conditional robot demonstration data sets.
Unlike the previous setup where all skills were defined and then a curated demonstration of each skill was collected, the researchers continued to work across multiple robots without scene resets. ) or low level skill segmentation.
All data, including failed data (such as knocking blocks off a table), must go through a HindSight language relabeling process before being paired with text.
In this process, annotators need to watch long robot videos to identify as many behaviors as possible, mark the start and end time of each behavior, and use unlimited forms of Natural language to describe each fragment.
The most important thing is that compared to the previous set of bootstrapping, all skills used for training are revealed bottom-up from the data itself, rather than being pre-set by researchers. definite.
#The researchers intentionally made the learning method and architecture as simple as possible. The Robot Policy Network is a cross-attention Transformer that combines 5 Hz video and text. Mapping to 5 Hz robot motion, the target is cloned using standard supervised learning behavior without auxiliary losses.
While testing, new natural language commands can be sent into the policy network via speech-to-text at rates up to 5 Hz.
Open Source Benchmark
During the annotation process, the researchers collected a Language-Table dataset containing more than 440,000 actual and 180,000 simulated robot executions of natural Demonstration of language commands, and the sequence of actions taken by the robot during the demonstration.
This is also currently the largest language-conditioned robot demonstration data set, directly improved by an order of magnitude.
Language-Table has launched a simulation learning benchmark, which can be used for model selection or to evaluate the ability of robots trained by different methods to execute instructions.
Real-time language behavior learning
In experiments, researchers found that robots are particularly powerful when they can follow natural language instructions input in real time. .
On the project website, the researchers demonstrate that users can guide the robot through complex long-horizon sequences to solve long-term problems using only natural language. The goal of precise coordinated control.
For example, if there are many blcoks on the table, the command can be "Make a smiley face with green eyes" or "Place them all in a vertical line "Up" and so on.
Because the robot was trained to follow open-vocabulary language, experiments saw the robot respond to a range of different verbal corrections, such as "Gently to the right." Move the red star".
Finally, the researchers explored the advantages of real-time language, such as making robot data collection more efficient. A human operator can control four robots at the same time using spoken language. It is possible Scaling robot data collection in the future without having to equip each robot with an annotator.
Conclusion
Although the project is currently limited to a fixed set of objects on the desktop, the experimental results of the interactive language can initially show that large-scale imitation learning can indeed produce real-time interactive A bot capable of following free-form end-user commands.
In order to promote the advancement of real-time language control technology for physical robots, researchers have open sourced Language-Table, which is currently the largest real-world robot demonstration data set based on language conditions. It can also be used as Related simulation benchmarks.
The researchers believe that the role of this data set may not only be limited to the field of robot control, but may also be used to study language and action conditional video prediction, robot video conditional language modeling, or in It provides a new starting point for studying many other interesting and active problems in the broader machine learning context.
The above is the detailed content of Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.. For more information, please follow other related articles on the PHP Chinese website!

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

Atom editor mac version download
The most popular open source editor

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software