As the media frantically hypes Sora, OpenAI’s introductory material calls Sora a “world simulator.” The term world model has come into view again, but there are few articles introducing world models.
Here we review what a world model is and discuss whether Sora is a world simulator.
What are world models/world models
When the words world/world and environment/environment are mentioned in the field of AI, usually It is to distinguish it from the intelligent body/agent.
The fields where most agents are studied are reinforcement learning and robotics.
So we can see that world models and world modeling appear earliest and most often in papers in the field of robotics.
The word world models that has the greatest impact today may be this article named "world models" that Jurgen posted on arxiv in 2018. The article was eventually titled "Recurrent World Models" The title "Facilitate Policy Evolution" was published at NeurIPS'18.
The paper does not define what World models are, but instead makes an analogy to the mental model of the human brain in cognitive science, citing the 1971 of literature.
The mental model is the human brain’s mirror image of the surrounding world
The mental model introduced in Wikipedia, It is clearly pointed out that it may participate in cognition, reasoning, and decision-making processes. And when it comes to mental model, it mainly includes two parts: mental representations and mental simulation.
an internal representation of external reality, hypothesized to play a major role in cognition, reasoning and decision-making. The term was coined by Kenneth Craik in 1943 who suggested that the mind constructs " small-scale models" of reality that it uses to anticipate events.
It's still a bit confusing, but the structure diagram in the paper clearly explains what a world model is.
The vertical V->z in the figure is the low-dimensional representation of the observation, implemented with VAE, and the horizontal M->h-> M->h is the representation of the sequence that predicts the next moment, which is implemented using RNN. The two parts add up to the World Model.
In other words, the World model mainly includes state representation and transition model, which also corresponds to mental representations and mental simulation.
When you see the picture above, you may think, aren’t all sequence predictions world models?
In fact, students who are familiar with reinforcement learning can see at a glance that the structure of this picture is wrong (incomplete), and the real structure is the picture below. The input of RNN is not only It's z, and there's action. This is not the usual sequence prediction (will adding an action be very different? Yes, adding an action can allow the data distribution to change freely, which brings huge challenges).
#Jurgen’s paper belongs to the field of reinforcement learning.
So, aren’t there many model-based RL in reinforcement learning? What is the difference between the model and the world model? The answer is there is no difference, it is the same thing. Jurgen first said a paragraph
The basic meaning is that no matter how many model-based RL work, I am the RNN pioneer, RNN is the one who makes the model. Invented, I just want to do it.
In the early version of Jurgen's article, he also mentioned a lot of model-based RL. Although he learned the model, he did not fully train RL in the model.
The RL is not fully trained in the model. In fact, it is not the difference between the models of model-based RL, but the long-standing frustration of the model-based RL direction: the model is not accurate enough and the training is completely in the model. The RL effect is very poor. This problem has only been solved in recent years.
The smart Sutton realized the problem of inaccurate model a long time ago. In 1990, the paper Integrated Architectures for Learning, Planning and Reacting based on Dynamic Programming that proposed the Dyna framework (published on ICML, which was the first workshop to be a conference), called this model an action model, emphasizing predicting the results of action execution.
RL learns from real data (line 3) while learning from the model (line 5) to prevent inaccurate model learning from poor strategy.
#As you can see, the world model is very important for decision-making. If you can obtain an accurate world model, you can find the optimal decision in reality by trial and error in the world model.
This is the core function of the world model: counterfactual reasoning/Counterfactual reasoning, that is, even for decisions that have not been seen in the data, decisions can be inferred in the world model the result of.
Students who understand causal reasoning will be familiar with the term counterfactual reasoning. In the popular science book The book of why, Turing Award winner Judea Pearl draws a causal ladder, with the lowest level It is "association", which is what most prediction models today are mainly doing; the middle layer is "intervention", and exploration in reinforcement learning is a typical intervention; the top layer is counterfactual, answering the what if question through imagination. The schematic diagram Judea drew for counterfactual reasoning is what scientists imagine in their brains, which is similar to the schematic diagram Jurgen used in his paper.
Left: Schematic diagram of the world model in Jurgen’s paper. Right: The ladder of cause and effect in Judea’s book.
We can conclude here that AI researchers’ pursuit of world models is an attempt to transcend data, conduct counterfactual reasoning, and pursue the ability to answer what if questions. This is an ability that humans naturally have, but the current AI is still very poor at it. Once a breakthrough is made, AI decision-making capabilities will be greatly improved, enabling scenario applications such as fully autonomous driving.
Is Sora a world simulator
The word simulator appears more in the engineering field, and it works the same as a world model. Try those things that are difficult to High-cost, high-risk trial and error of real-world implementation. OpenAI seems to want to re-form a phrase, but the meaning remains the same.
The video generated by Sora can only be guided by vague prompt words, making it difficult to control accurately. Therefore, it is more of a video tool and is difficult to use as a counterfactual reasoning tool to accurately answer what if questions.
It is even difficult to evaluate how strong Sora’s generation ability is, because it is completely unclear how different the demo video is from the training data.
What’s even more disappointing is that these demos show that Sora has not accurately learned the laws of physics. I have seen someone point out the inconsistency with physical laws in the videos generated by Sora [OpenAI releases Vincent video model Sora, AI can understand the physical world in motion. Is this a world model? What does it mean? ]
I guess that OpenAI releases these demos based on very sufficient training data, even including data generated by CG. However, even so, the physical laws that can be described by equations with a few variables are still not grasped.
OpenAI believes that Sora proves a route to simulators of the physical world, but it seems that simply stacking data is not the path to more advanced intelligent technology.
The above is the detailed content of Nanda Yu Yang's in-depth interpretation: What is a 'world model”?. For more information, please follow other related articles on the PHP Chinese website!

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1
Easy-to-use and free code editor

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.