Home > Article > Technology peripherals > NVIDIA opens a new era: "perpetual motion machine" for robot training data
Most of the previous synthetic data was used for AI large model training. This time, NVIDIA has built a "data granary" for robot training. One of the key reasons why the development pace of robot technology lags far behind other AI fields is Lack of data. With only 200 human demonstration source data, the system can directly generate 50,000 training data.
With AI's huge demand for data, data resources are almost exhausted. Therefore, various companies have begun to explore a "new way" to obtain data-"creating" their own data. However, most of the previous synthetic data was used for large AI model training. This time, NVIDIA created a "data granary" for robot training.
A recent research paper by Nvidia and the University of Texas at Austin introduces a system called "MimicGen" that can automatically generate large-scale robot training data sets with only a small number of human demonstrations. Nvidia senior scientist Jim Fan said the company will open source everything, including the generated data sets.
What is the size of the generated data? Using 10 human demonstrations, MimicGen can generate 1,000 synthetic examples; with 200 human demonstrations, MimicGen can directly generate 50,000 training data, involving 18 tasks and multiple simulation environments.
How is the generated data set?
MimicGen can "evolve" the same scene in different stages based on existing data:
It can also generate different data sets across a wide range of task reset distributions, including assembling items, pouring coffee, cleaning mugs, etc.:
Can generate different new robot arm demos:
In addition, there is also task data that requires long-term training:
Real world scene data is no problem either:
It is worth noting that the researchers compared the data generated by different source data sets. However, they found that the two sets of results were comparable - suggesting that "(source) data quality may not be as important in large-scale data mechanisms".
Not only that, the researchers also compared the data generated by 10 human demonstrations and 200 human demonstrations, and the results were also not much different. Therefore, the paper also admits that further research is needed on whether more human demonstration data will cause redundancy and unnecessary data annotation costs.
Why are you so obsessed with synthetic data? In addition to the limited source data resources mentioned at the beginning of the article, collecting data is also extremely expensive and time-consuming. With systems like MimicGen, can automatically generate large-scale rich data sets with only a small amount of data, and These data sets span multiple scenes, object capabilities, and robotic arms, and can also be used for long-term or high-precision tasks. They can be called a "powerful and economical way to expand robot learning."
"Synthetic data will provide the next wave of terascale data for our 'hungry' models. " NVIDIA senior scientist Jim Fan said when introducing MimicGen, "Robotics One of the key reasons why the pace of development lags far behind other AI fields is the lack of data - you cannot obtain control signals (of robots) from the Internet." “We are rapidly running out of high-quality real data from the Internet, and AI born from synthetic data will be the future development direction.
Source: Science and Technology Innovation Board DailyThe above is the detailed content of NVIDIA opens a new era: "perpetual motion machine" for robot training data. For more information, please follow other related articles on the PHP Chinese website!