Home > Article > Technology peripherals > LeCun deeply disappointed with self-driving unicorn fraud
Do you think this is an ordinary self-driving video?
Picture
This content needs to be rewritten into Chinese without changing the original meaning
None of the frames is "real".
Picture
Different road conditions, various weather conditions, and more than 20 situations can be simulated, and the effect is just like the real thing.
Picture
The world model has once again made a great contribution! LeCun enthusiastically retweeted this after seeing it.
Picture
According to the above effect, which is brought by the latest version of GAIA-1
The scale of this project has reached 9 billion parameters, through 4,700 hours of driving video training, successfully achieved the effect of inputting video, text or operations to generate self-driving videos
The most direct benefit is that it can better predict future events, 20 A variety of scenarios can be simulated, further improving the safety of autonomous driving and reducing costs.
Picture
Our creative team bluntly stated that this will completely change the rules of the autonomous driving game!
So how is GAIA-1 implemented?
GAIA-1 is a generative world model with multiple modes
By utilizing video, text and actions as input, the system Realistic driving scene videos can be generated, with fine control over autonomous vehicle behavior and scene characteristics
Videos can be generated by using only text prompts
Picture
The model principle is similar to that of a large language model, that is, predicting the next mark
The model can use vector quantization representation to discrete video frames, and then predict future scenes, which is converted into a prediction sequence The next token in . The diffusion model is then used to generate high-quality videos from the language space of the world model.
The specific steps are as follows:
Picture
The first step is simple to understand, which is to recode and arrange and combine various inputs.
By using specialized encoders to encode various inputs and project different inputs into a shared representation. Text and video encoders separate and embed inputs, while operational representations are individually projected into a shared representation. These encoded representations are temporally consistent.
After the arrangement, the key part of the world model appears.
As an autoregressive Transformer, it can predict the next set of image tokens in the sequence. And it not only takes into account the previous image token, but also takes into account the contextual information of the text and operation.
The content generated by the model not only maintains the consistency of the image, but also remains consistent with the predicted text and actions
The team introduced that the size of the world model in GAIA-1 is 6.5 billion parameters. It was trained for 15 days on 64 blocks of A100.
Finally, use the video decoder and video diffusion model to convert these tokens back to videos.
The importance of this step is to ensure the semantic quality, image accuracy and temporal consistency of the video
GAIA-1’s video decoder has a scale of 2.6 billion parameters and is trained using 32 A100s Coming in 15 days.
It is worth mentioning that GAIA-1 is not only similar in principle to large language models, but also shows the characteristics of improved generation quality as the model scale expands
PictureThe team compared the previously released early version in June with the latest effect
The latter is 480 times larger than the former.
You can intuitively see that the video has been significantly improved in details, resolution, etc.
PictureFrom the perspective of practical application, the emergence of GAIA-1 has also brought some impact. Its main creative team said that this will change Rules for autonomous driving
Picture
The reason can be explained from three aspects:
First of all, in terms of safety, the world model can simulate the future and give AI the ability to realize its own decisions, which is critical to the safety of autonomous driving.
Secondly, training data is also very important for autonomous driving. The data generated is more secure, cost-effective, and infinitely scalable
Generative AI can solve one of the long-tail scenario challenges facing autonomous driving. It can handle more edge scenarios, such as encountering pedestrians crossing the road in foggy weather. This will further improve the capabilities of autonomous driving
GAIA-1 was developed by British self-driving startup Wayve
Wayve was founded in 2017. Investors include Microsoft and others, and its valuation has reached unicorn.
The founders are Alex Kendall and Amar Shah, both of whom have PhDs in machine learning from the University of Cambridge
Picture
On the technical route, like Tesla, Wayve advocates the use of purely visual solutions using cameras, abandoning high-precision maps very early and firmly following the "instant perception" route.
Not long ago, another large model LINGO-1 released by the team also attracted widespread attention
This autonomous driving model can generate commentary in real time during driving, thus further improving the model's accuracy. Explainability
In March this year, Bill Gates also took a test drive in Wayve’s self-driving car.
Picture
Paper address: https://www.php.cn/link/1f8c4b6a0115a4617e285b4494126fbf
Reference link:
[1]https://www.php.cn/link/85dca1d270f7f9aef00c9d372f114482[2]https://www.php.cn/link/a4c22565dfafb162a17a7c357ca9e0be
The above is the detailed content of LeCun deeply disappointed with self-driving unicorn fraud. For more information, please follow other related articles on the PHP Chinese website!