Technology peripherals

Nanda Yu Yang's in-depth interpretation: What is a 'world model”?

Nanda Yu Yang's in-depth interpretation: What is a 'world model”?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Feb 27, 2024 pm 07:50 PM

aidata

As the media frantically hypes Sora, OpenAI’s introductory material calls Sora a “world simulator.” The term world model has come into view again, but there are few articles introducing world models.

Here we review what a world model is and discuss whether Sora is a world simulator.

What are world models/world models

When the words world/world and environment/environment are mentioned in the field of AI, usually It is to distinguish it from the intelligent body/agent.

The fields where most agents are studied are reinforcement learning and robotics.

So we can see that world models and world modeling appear earliest and most often in papers in the field of robotics.

The word world models that has the greatest impact today may be this article named "world models" that Jurgen posted on arxiv in 2018. The article was eventually titled "Recurrent World Models" The title "Facilitate Policy Evolution" was published at NeurIPS'18.

Nanda Yu Yangs in-depth interpretation: What is a world model”?

The paper does not define what World models are, but instead makes an analogy to the mental model of the human brain in cognitive science, citing the 1971 of literature.

Nanda Yu Yangs in-depth interpretation: What is a world model”?

The mental model is the human brain’s mirror image of the surrounding world

The mental model introduced in Wikipedia, It is clearly pointed out that it may participate in cognition, reasoning, and decision-making processes. And when it comes to mental model, it mainly includes two parts: mental representations and mental simulation.

an internal representation of external reality, hypothesized to play a major role in cognition, reasoning and decision-making. The term was coined by Kenneth Craik in 1943 who suggested that the mind constructs " small-scale models" of reality that it uses to anticipate events.

It's still a bit confusing, but the structure diagram in the paper clearly explains what a world model is.

Nanda Yu Yangs in-depth interpretation: What is a world model”?

The vertical V->z in the figure is the low-dimensional representation of the observation, implemented with VAE, and the horizontal M->h-> M->h is the representation of the sequence that predicts the next moment, which is implemented using RNN. The two parts add up to the World Model.

In other words, the World model mainly includes state representation and transition model, which also corresponds to mental representations and mental simulation.

When you see the picture above, you may think, aren’t all sequence predictions world models?

In fact, students who are familiar with reinforcement learning can see at a glance that the structure of this picture is wrong (incomplete), and the real structure is the picture below. The input of RNN is not only It's z, and there's action. This is not the usual sequence prediction (will adding an action be very different? Yes, adding an action can allow the data distribution to change freely, which brings huge challenges).

Nanda Yu Yangs in-depth interpretation: What is a world model”?

#Jurgen’s paper belongs to the field of reinforcement learning.

So, aren’t there many model-based RL in reinforcement learning? What is the difference between the model and the world model? The answer is there is no difference, it is the same thing. Jurgen first said a paragraph

Nanda Yu Yangs in-depth interpretation: What is a world model”?

The basic meaning is that no matter how many model-based RL work, I am the RNN pioneer, RNN is the one who makes the model. Invented, I just want to do it.

In the early version of Jurgen's article, he also mentioned a lot of model-based RL. Although he learned the model, he did not fully train RL in the model.

Nanda Yu Yangs in-depth interpretation: What is a world model”?

The RL is not fully trained in the model. In fact, it is not the difference between the models of model-based RL, but the long-standing frustration of the model-based RL direction: the model is not accurate enough and the training is completely in the model. The RL effect is very poor. This problem has only been solved in recent years.

The smart Sutton realized the problem of inaccurate model a long time ago. In 1990, the paper Integrated Architectures for Learning, Planning and Reacting based on Dynamic Programming that proposed the Dyna framework (published on ICML, which was the first workshop to be a conference), called this model an action model, emphasizing predicting the results of action execution.

RL learns from real data (line 3) while learning from the model (line 5) to prevent inaccurate model learning from poor strategy.

Nanda Yu Yangs in-depth interpretation: What is a world model”?

#As you can see, the world model is very important for decision-making. If you can obtain an accurate world model, you can find the optimal decision in reality by trial and error in the world model.

This is the core function of the world model: counterfactual reasoning/Counterfactual reasoning, that is, even for decisions that have not been seen in the data, decisions can be inferred in the world model the result of.

Students who understand causal reasoning will be familiar with the term counterfactual reasoning. In the popular science book The book of why, Turing Award winner Judea Pearl draws a causal ladder, with the lowest level It is "association", which is what most prediction models today are mainly doing; the middle layer is "intervention", and exploration in reinforcement learning is a typical intervention; the top layer is counterfactual, answering the what if question through imagination. The schematic diagram Judea drew for counterfactual reasoning is what scientists imagine in their brains, which is similar to the schematic diagram Jurgen used in his paper.

Nanda Yu Yangs in-depth interpretation: What is a world model”?

Left: Schematic diagram of the world model in Jurgen’s paper. Right: The ladder of cause and effect in Judea’s book.

We can conclude here that AI researchers’ pursuit of world models is an attempt to transcend data, conduct counterfactual reasoning, and pursue the ability to answer what if questions. This is an ability that humans naturally have, but the current AI is still very poor at it. Once a breakthrough is made, AI decision-making capabilities will be greatly improved, enabling scenario applications such as fully autonomous driving.

Is Sora a world simulator

The word simulator appears more in the engineering field, and it works the same as a world model. Try those things that are difficult to High-cost, high-risk trial and error of real-world implementation. OpenAI seems to want to re-form a phrase, but the meaning remains the same.

The video generated by Sora can only be guided by vague prompt words, making it difficult to control accurately. Therefore, it is more of a video tool and is difficult to use as a counterfactual reasoning tool to accurately answer what if questions.

It is even difficult to evaluate how strong Sora’s generation ability is, because it is completely unclear how different the demo video is from the training data.

What’s even more disappointing is that these demos show that Sora has not accurately learned the laws of physics. I have seen someone point out the inconsistency with physical laws in the videos generated by Sora [OpenAI releases Vincent video model Sora, AI can understand the physical world in motion. Is this a world model? What does it mean? ]

I guess that OpenAI releases these demos based on very sufficient training data, even including data generated by CG. However, even so, the physical laws that can be described by equations with a few variables are still not grasped.

OpenAI believes that Sora proves a route to simulators of the physical world, but it seems that simply stacking data is not the path to more advanced intelligent technology.

The above is the detailed content of Nanda Yu Yang's in-depth interpretation: What is a 'world model”?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Related Article

Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi Teaser

Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance Retail

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025

Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 Models

Exploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics Vidhya

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent Progress

AI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At Caterpillar

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR Quality

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks agoByDDD

Atomfall guide: item locations, quest guides, and tips

4 weeks agoByDDD

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Hot Topics

Where is the login entrance for gmail email?

7652

15

CakePHP Tutorial

1393

52

What is the format of the account name of steam

91

11

win11 activation key permanent

73

19

nyt mini crossword answers

37

110