Home >Technology peripherals >AI >Where has the 'embodied intelligence” that Li Feifei focused on reached?
In 2009, Li Feifei, a computer scientist working at Princeton University at the time, led the construction of a data set that changed the history of artificial intelligence—ImageNet. It contains millions of labeled images that can be used to train complex machine learning models to recognize objects in images.
In 2015, machines’ recognition capabilities surpassed those of humans. Li Feifei soon turned to a new goal, to find what she called another "North Star" (the "Polaris" here refers to the key scientific problem that researchers focus on solving, which can inspire their Research enthusiasm and make breakthrough progress).
She found inspiration by looking back 530 million years to the Cambrian explosion of life, when many land animal species first appeared. One influential theory suggests that the explosion of new species was driven in part by the emergence of eyes, which allowed creatures to see the world around them for the first time. Li Feifei believes that animal vision does not arise in isolation, but is "deeply embedded in a whole that needs to move, navigate, survive, manipulate and change in a rapidly changing environment," she said, "so I It is natural to turn to a more active AI field."
Today, Li Feifei's work focuses on AI agents, which can not only receive data from A set of static images can also be moved around in a simulated environment of a three-dimensional virtual world and interact with the surrounding environment.
This is the broad goal of a new field called “embodied AI.” It overlaps with robotics in that robots can be viewed as the physical equivalent of embodied AI agents and reinforcement learning in the real world. Li Feifei and others believe that embodied AI may bring us a major transformation, from the simple ability of machine learning such as recognizing images, to learning how to perform complex human-like tasks through multiple steps, such as making frying pans. Egg rolls.
Today, the work of embodied AI includes any agent that can detect and modify its own environment. In robotics, AI agents always live in robot bodies, while agents in real simulations may have a virtual body, or may perceive the world through a moving camera position and interact with the surrounding environment . "The meaning of embodiment is not the body itself, but the overall needs and functions of interacting with the environment and doing things in the environment," Li Feifei explained.
This interactivity gives agents a new—and in many cases, better—way to understand the world. This is equivalent to the fact that before you were just observing the possible relationship between two objects, but now you can experiment and make this relationship happen yourself. With this new understanding, ideas are put into practice and greater wisdom follows. With a new set of virtual worlds up and running, embodied AI agents have begun to realize this potential, making significant progress in their new environments.
"Right now, we don't have any evidence for the existence of intelligence that doesn't learn by interacting with the world," said Viviane Clay, an embodied AI researcher at the University of Osnebruck in Germany.
Although researchers have long wanted to create real virtual worlds for AI agents to explore, they have only been created for about five years. This capability comes from improvements in graphics in the film and video game industries. In 2017, AI agents can depict interior spaces as realistically as if they were in a home—a virtual, but literal “home.” Computer scientists at the Allen Institute for Artificial Intelligence built a simulator called AI2-Thor that lets agents roam around natural kitchens, bathrooms, living rooms, and bedrooms. Agents can learn three-dimensional views that change as they move, with the simulator showing new angles when they decide to take a closer look.
This new world also gives the agent an opportunity to think about changes in a new dimension "time". "That's a big change," said Manolis Savva, a computer graphics researcher at Simon Fraser University. "In an embodied AI setting, you have these temporally coherent streams of information that you can control."
These simulated worlds are now good enough to train agents to complete completely new tasks. Not only can they recognize an object, they can interact with it, pick it up and navigate around it. These seemingly small steps are necessary for any agent to understand its environment. In 2020, virtual agents have the ability to go beyond vision and hear the sounds made by virtual things, providing a new perspective on understanding objects and how they operate in the world.
Embodied AI agents that can run in a virtual world (ManipulaTHOR environment) learn in a different way and may be more suitable for more complex, human-like Task.
However, the simulator also has its own limitations. “Even the best simulators are far less realistic than the real world,” says Daniel Yamins, a computer scientist at Stanford University. Yamins co-developed ThreeDWorld with colleagues at MIT and IBM, a project focused on simulating real-life physics in virtual worlds, such as the behavior of liquids and how some objects are rigid in one area and rigid in another. The area is flexible again.
This is a very challenging task that requires AI to learn in new ways.
So far, a simple way to measure the progress of embodied AI is to compare the performance of embodied agents with those trained on simpler static image tasks. algorithms for comparison. The researchers note that these comparisons are not perfect, but early results do suggest that embodied AI learns differently and sometimes better than their predecessors.
In a recent paper ("Interactron: Embodied Adaptive Object Detection"), researchers found that an embodied AI agent was more accurate at detecting specific objects, nearly 12% better than traditional methods . "It took more than three years for the object detection field to achieve this level of improvement," said study co-author Roozbeh Mottaghi, a computer scientist at the Allen Institute for Artificial Intelligence. "And we've achieved so much just by interacting with the world. Progress."
Other papers have shown that when you take the form of an embodied AI and have them explore a virtual space or walk around collecting multiple views of an object, the algorithm Make progress.
The researchers also found that embodied algorithms and traditional algorithms learn completely differently. To demonstrate this, consider neural networks, the fundamental ingredient behind the learning capabilities of every embodied algorithm and many disembodied algorithms. Neural networks are made up of many layers of artificial neuron nodes connected and are loosely modeled after the networks in the human brain. In two separate papers, researchers found that in neural networks of embodied agents, fewer neurons respond to visual information, meaning each individual neuron is more selective in how it responds . Disembodied networks are much less efficient, requiring more neurons to remain active most of the time. One research team (led by incoming NYU professor Grace Lindsay) even compared embodied and non-embodied neural networks with neuronal activity in a living brain (the visual cortex of mice) and found that embodied neural networks The Internet is the closest thing to a living body.
Lindsay is quick to point out that this doesn’t necessarily mean the embodied versions are better, they’re just different. Unlike the object detection paper, Lindsay et al.'s study compares the potential differences of the same neural network, allowing the agents to complete completely different tasks, so they may need neural networks that work differently to accomplish their goals.
While comparing embodied neural networks to disembodied neural networks is one way to measure improvement, what researchers really want to do is not improve the performance of embodied agents on existing tasks. , their real goal is to learn more complex, more human-like tasks. This is what excites researchers the most, and they're seeing impressive progress, especially on navigation tasks. In these tasks, the agent must remember the long-term goals of its destination while formulating a plan to get there without getting lost or bumping into objects.
In just a few years, a team led by Dhruv Batra, a research director at Meta AI and a computer scientist at the Georgia Institute of Technology, worked on a specific navigation task called "point-goal navigation." A lot of progress has been made. In this task, the agent is placed in a completely new environment and must go to a certain coordinate (such as "Go to the point that is 5 meters north and 10 meters east") without a map.
Batra said that they trained the agent in a Meta virtual world called "AI Habitat" and gave it a GPS and a compass. They found that it could obtain 99.9% on the standard data set. the above accuracy. More recently, they have successfully extended their results to a more difficult and realistic scenario - without a compass or GPS. As a result, the agent achieved 94% accuracy in estimating its position using only the stream of pixels it saw while moving.
Meta AI The "AI Habitat" virtual world created by the Dhruv Batra team. They hope to increase the speed of simulations until embodied AI can achieve 20 years of simulation experience in just 20 minutes of wall-clock time.
Mottaghi said, "This is a great improvement, but it does not mean that the navigation problem is completely solved. Because many other types of navigation tasks require the use of more complex language instructions, such as "passing the kitchen" Go get the glasses on the bedside table in your bedroom," and the accuracy is still only about 30% to 40%.
But navigation remains one of the simplest tasks in embodied AI, since the agent does not need to manipulate anything as it moves through the environment. So far, embodied AI agents are far from mastering any object-related tasks. Part of the challenge is that when an agent interacts with new objects, it can make many errors, and the errors can pile up. Currently, most researchers address this problem by choosing tasks with only a few steps, but most human-like activities, such as baking or washing dishes, require long sequences of actions on multiple objects. To achieve this goal, AI agents will need to make even greater advances.
In this regard, Fei-Fei Li may be at the forefront again, as her team developed a simulated dataset, BEHAVIOR, that it hopes will do for embodied AI what her ImageNet project did for object recognition. Make a contribution.
This data set contains more than 100 human activities for agents to complete, and the test can be completed in any virtual environment. Fei-Fei Li's team's new dataset will allow the community to better assess the progress of virtual AI agents by creating metrics that compare agents performing these tasks to real videos of humans performing the same tasks.
Once the agent successfully completes these complex tasks, Li Feifei believes that the purpose of simulation is to train for the final operable space-the real world.
"In my opinion, simulation is one of the most important and exciting areas in robotics research." Li Feifei said.
Robots are essentially embodied intelligence. They inhabit some kind of physical body in the real world and represent the most extreme form of embodied AI agent. But many researchers have found that even such agents can benefit from training in virtual worlds.
The most advanced algorithms in robotics, such as reinforcement learning, often require millions of iterations to learn something meaningful, Mottaghi said. Therefore, training real robots to perform difficult tasks can take years.
#Robots can navigate uncertain terrain in the real world. New research shows that training in virtual environments can help robots master these and other skills.
But if you train them in the virtual world first, the speed will be much faster. Thousands of agents can be trained simultaneously in thousands of different rooms. Additionally, virtual training is safer for both robots and humans.
In 2018, OpenAI researchers demonstrated that skills learned by an agent in the virtual world can be transferred to the real world, so many robotics experts began to pay more attention to simulators. They trained a robotic hand to manipulate a cube that had only been seen in simulations. Recent research also includes enabling drones to learn to avoid collisions in the air, deploying self-driving cars in urban environments on two different continents, and enabling a four-legged robot dog to complete a one-hour hike in the Swiss Alps (and It takes the same amount of time as humans).
In the future, researchers may also send humans into virtual space through virtual reality headsets, thus bridging the gap between simulation and the real world. Dieter Fox, senior director of robotics research at Nvidia and a professor at the University of Washington, pointed out that a key goal of robotics research is to build robots that are helpful to humans in the real world. But to do this, they must first be exposed to and learn how to interact with humans.
Using virtual reality technology to put humans into these simulated environments and then have them make presentations and interact with robots would be a very powerful approach, Fox said.
Whether they are in a simulation or the real world, embodied AI agents are learning to be more like humans and complete tasks that are more like human tasks. The field is advancing in all aspects, including new worlds, new tasks, and new learning algorithms.
“I see the fusion of deep learning, robot learning, vision and even language,” Li Feifei said. “Now I think that through this ‘moonshot’ or ‘North Star’ for embodied AI, we will Learning the basic technologies of intelligence can truly lead to major breakthroughs."
Li Feifei's article discusses the "Polaris" issue of computer vision. Link: https://www.amacad.org/publication/searching-computer-vision-north-stars
The above is the detailed content of Where has the 'embodied intelligence” that Li Feifei focused on reached?. For more information, please follow other related articles on the PHP Chinese website!