Home > Article > Technology peripherals > The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...
Do you think this is an ordinary boring self-driving video?
The original meaning of this content does not need to be changed, it needs to be rewritten into Chinese
Not a single frame is "real".
Different road conditions, various weather conditions, and more than 20 situations can be simulated, and the effect is just like the real thing.
The world model once again demonstrates its powerful role! This time, LeCun excitedly retweeted
after seeing it. The above effect is brought by the latest version of GAIA-1.
It has a scale of 9 billion parameters, and uses 4700 hours of driving video training to achieve the effect of inputting video, text or operations to generate automatic driving videos.
The most direct benefit is that it can better predict future events. It can simulate more than 20 scenarios, thereby further improving the safety of autonomous driving and reducing costs
The creative team said that this will change the rules of the autonomous driving game!
How is GAIA-1 implemented? In fact, we have previously introduced in detail the GAIA-1 developed by the Wayve team in Autonomous Driving Daily: a generative world model for autonomous driving. If you are interested in this, you can go to our official account to read relevant content!
GAIA-1 is a multi-modal generative world model that can understand and generate the world by integrating multiple perception methods such as vision, hearing and language. expression. This model uses deep learning algorithms to learn and reason about the structure and laws of the world from a large amount of data. The goal of GAIA-1 is to simulate human perception and cognitive abilities to better understand and interact with the world. It has wide applications in many fields, including autonomous driving, robotics, and virtual reality. Through continuous training and optimization, GAIA-1 will continue to evolve and improve, becoming a more intelligent and comprehensive world model
It uses video, text and motion as input, and generates realistic driving scene videos, while The behavior and scene characteristics of self-driving vehicles can be finely controlled
and videos can be generated with text prompts only .
The principle of its model is similar to that of large language models, that is, predicting the next token
The model can use vector quantization representation to discretize video frames, and then Predicting future scenarios is converted into predicting the next token in the sequence. The diffusion model is then used to generate high-quality videos from the language space of the world model.
The specific steps are as follows:
The first step is simple to understand, which is to recode and arrange and combine various inputs.
Different inputs can be projected into a shared representation by using specialized encoders to encode various inputs. Text and video encoders separate and embed the input, while the operational representations are individually projected into a shared representation
These encoded representations are temporally consistent
After permutation, key parts World Model appears.
As an autoregressive Transformer, it has the ability to predict the next set of image tokens in the sequence. It not only considers previous image tokens, but also considers contextual information of text and actions
The content generated by the model maintains consistency not only with the image, but also with the predicted text and actions
According to the team, the size of the world model in GAIA-1 is 6.5 billion parameters, which was trained on 64 A100s for 15 days.
By using a video decoder and a video diffusion model, these tokens are finally converted back to video
This step is about the semantic quality, image accuracy and temporal consistency of the video.
GAIA-1's video decoder has a scale of 2.6 billion parameters and was trained for 15 days using 32 A100s.
It is worth mentioning that GAIA-1 is not only similar in principle to the large language model, but also shows the characteristics of improving the generation quality as the model scale expands.
The team compared the early version released in June with the latest effect
The latter is 480 times larger than the former.
You can intuitively see that the video has been significantly improved in details, resolution, etc.
In terms of practical applications, GAIA-1 has also had an impact. Its creative team said that this will change the rules of autonomous driving.
The reasons come from three aspects:
First of all, in terms of safety, the world model can simulate the future and give AI the ability to be aware of its own decisions, which is critical to the safety of autonomous driving.
Secondly, training data is also very critical for autonomous driving. The data generated is more secure, cheaper, and infinitely scalable.
Generative AI can solve a major challenge facing autonomous driving - long-tail scenarios. It can handle more edge cases, such as encountering pedestrians crossing the road in foggy weather. This will further improve the performance of autonomous driving
GAIA-1 comes from British autonomous driving startup Wayve.
Wayve was founded in 2017, with investors including Microsoft, etc., and its valuation has reached Unicorn.
The founders are current CEOs Alex Kendall and Amar Shah (the leadership page of the company’s official website no longer has information about them). Both of them graduated from Cambridge University and have a doctorate in machine learning
On the technical roadmap, like Tesla, Wayve advocates a purely visual solution using cameras, abandoning high-precision maps very early and firmly following the "instant perception" route.
Not long ago, another large model LINGO-1 released by the team also caused a sensation.
This self-driving model can generate explanations in real time during driving, thereby further improving the interpretability of the model
In March this year, Bill Gates also took a test ride on Wayve’s Self-driving cars.
##Paper address: https://arxiv.org/abs/2309.17080
The content that needs to be rewritten is: Original link: https://mp.weixin.qq.com/s/bwTDovx9-UArk5lx5pZPag
The above is the detailed content of The world model shines! The realism of these 20+ autonomous driving scenario data is incredible.... For more information, please follow other related articles on the PHP Chinese website!