


The final conclusion of the ACL 2024 paper: large language model ≠ world simulator, Yann LeCun: That's so right
If GPT-4 is only about 60% accurate when simulating state changes based on common sense tasks, then should we still consider using large language models as world simulators?
## Address: https://x.com/peterjansen_ai/status/1801687501557665841
##However, some people have expressed different views: the current accuracy of LLM (without targeted task training) can reach 60%, which does not mean that they are at least "to a certain extent" "world model"? And it will continue to improve as LLM iterates. LeCun also stated that the world model will not be an LLM.
Back to the paper, the researchers built and used a new benchmark they called "ByteSized32-State-Prediction", which includes a text game state transition and Comes with a dataset consisting of game tasks. They use this benchmark for the first time to directly quantify the performance of large language models (LLMs) as text-based world simulators.
By testing GPT-4 on this dataset, the researchers found that although its performance is impressive, without further innovation, it will still be a Unreliable world simulator.
Therefore, the researchers believe that their work provides both new insights into the capabilities and weaknesses of current LLMs, as well as a way to track future progress as new models emerge. A new benchmark.
Action driven transformation simulator: Given c, s_t and a_t, F_act: C×S×A→S predicts s^act_t+ 1, where s^act_t+1 represents the direct state change caused by the action. Environment-driven transformation simulator: Given c and s^act_t+1, F_env: C×S→S predicts s_t+1, where s_t+1 It is the state generated after any environment-driven transformation. Game Progress Simulator: Given c, s_t+1 and a_t, F_R: C×S×A→R×{0,1} predicted reward r_t+1 and game completion status d_t+1.
Complete state prediction: LLM outputs the complete state. State difference prediction: LLM only outputs the difference between the input and output states.
-
Object properties: All objects in the game, the properties of each object (such as temperature, size), and the relationship with other objects (such as within or on another object). Game progress: The status of the agent relative to the overall goal, including the current accumulated rewards, whether the game has been terminated, and whether the overall goal has been achieved.
The above is the detailed content of The final conclusion of the ACL 2024 paper: large language model ≠ world simulator, Yann LeCun: That's so right. For more information, please follow other related articles on the PHP Chinese website!

HiddenLayer's groundbreaking research exposes a critical vulnerability in leading Large Language Models (LLMs). Their findings reveal a universal bypass technique, dubbed "Policy Puppetry," capable of circumventing nearly all major LLMs' s

The push for environmental responsibility and waste reduction is fundamentally altering how businesses operate. This transformation affects product development, manufacturing processes, customer relations, partner selection, and the adoption of new

The recent restrictions on advanced AI hardware highlight the escalating geopolitical competition for AI dominance, exposing China's reliance on foreign semiconductor technology. In 2024, China imported a massive $385 billion worth of semiconductor

The potential forced divestiture of Chrome from Google has ignited intense debate within the tech industry. The prospect of OpenAI acquiring the leading browser, boasting a 65% global market share, raises significant questions about the future of th

Retail media's growth is slowing, despite outpacing overall advertising growth. This maturation phase presents challenges, including ecosystem fragmentation, rising costs, measurement issues, and integration complexities. However, artificial intell

An old radio crackles with static amidst a collection of flickering and inert screens. This precarious pile of electronics, easily destabilized, forms the core of "The E-Waste Land," one of six installations in the immersive exhibition, &qu

Google Cloud's Next 2025: A Focus on Infrastructure, Connectivity, and AI Google Cloud's Next 2025 conference showcased numerous advancements, too many to fully detail here. For in-depth analyses of specific announcements, refer to articles by my

This week in AI and XR: A wave of AI-powered creativity is sweeping through media and entertainment, from music generation to film production. Let's dive into the headlines. AI-Generated Content's Growing Impact: Technology consultant Shelly Palme


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool
