search
HomeTechnology peripheralsAILearning like a baby, DeepMind's new model learns the rules of the physical world in 28 hours

​Deepmind aims to build a model that can learn intuitive physics and analyze why the model achieves this ability.

From AlphaFold to mathematical reasoning, DeepMind has been trying to combine AI and basic science. Now, DeepMind has created a new model that can learn simple physical rules.

Developmental psychologists tested and analyzed how babies follow the movement of objects through their gaze. For example, children expressed surprise when a video was played in which a ball suddenly disappeared.

Computer scientist Luis Piloto of DeepMind and colleagues hope to develop similar tests for artificial intelligence (AI). The team trained a neural network using videos of animations of simple objects like cubes and balls, and the model learned by discovering patterns in large amounts of data. The research paper was published July 11 in Nature Human Behavior.

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

  • Paper address: https://www.nature.com/articles/s41562-022-01394 -8
  • Dataset address: https://github.com/deepmind/physical_concepts

This model performs physical learning by automatically encoding and tracking objects, Hence the name PLATO (Physics Learning through Auto-encoding and Tracking Objects). PLATO receives the original image from the video and a version of the image that highlights the targets of each object in the scene. PLATO aims to develop internal representations of the physical properties of objects, such as their position and velocity.

The system was trained on approximately 30 hours of videos showing simple motion mechanisms (such as a ball rolling down a slope) and developed the ability to predict how these objects would behave in different situations. . In particular, PLATO learns continuity and robustness to ensure that the trajectory of the target is uninterrupted and the shape of the object is persistent. As the video plays, the model's predictions become more accurate.

When playing videos with "impossible" events, such as an object suddenly disappearing, PLATO can measure the difference between the video and its own predictions, thus providing a measure of "surprise."

Piloto said: "PLATO was not designed as a model of infant behavior, but it can test hypotheses about how human infants learn. We hope that cognitive scientists will eventually use it to simulate infant behavior."

Jeff Clune, a computer scientist at the University of British Columbia, said, "Comparing AI with the learning methods of human infants is an important research direction. PLATO researchers hand-designed much of the prior knowledge that gives the artificial intelligence model advantages." Researchers like Clune are trying to let programs develop their own algorithms to understand the physical world.

Using knowledge from developmental psychology

In order to pursue richer physical intuition in AI systems, DeepMind’s research team draws inspiration from developmental psychology. The research team built a deep learning system that incorporates a core insight from developmental psychology, namely that physics is understood at the level of discrete objects and their interactions.

The core of intuitive physics relies on a discrete set of concepts (e.g., object persistence, solidity, continuity, etc.) that can be distinguished, manipulated, and individually detected. Traditional, standard approaches to AI learning intuitive physics learn about the physical world through video or state predictors, binary outcome predictions, question-answer performance, or reinforcement learning tasks. These approaches appear to require understanding some aspects of intuitive physics but do not explicitly operationalize or strategically explore a clear set of concepts.

Developmental psychology, on the other hand, holds that a physical concept corresponds to a set of expectations about how the future will unfold. For example, people expect that objects will not magically teleport from one place to another suddenly, but will trace a continuous path through time and space, which leads to the concept of continuity. Therefore, there is a way to measure knowledge of specific physical concepts: the Violation of Expectations (VoE) paradigm.

When exploring a specific concept using the VoE paradigm, researchers show infants visually similar arrays (called probes) that are either consistent (physically possible) or inconsistent (physically unlikely) with the physical concept. possible). In this paradigm, “surprise” is measured by gaze duration.

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

Method Introduction

First of all, DeepMind proposed a very rich video corpus-Physical Concepts data set. This dataset contains VoE probe videos targeting five important physical concepts considered core elements in developmental psychology, including continuity, goal persistence, and robustness. The fourth is immutability, which captures the concept that certain target properties (such as shape) do not change; the fifth concept is directional inertia, which involves the expectation that a moving object will change in a direction consistent with the principle of inertia.

The most important thing is that the Physical Concepts dataset also includes a separate video corpus as training data. These videos demonstrate various procedurally generated physics events.

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

Figure 2: Example of video dataset used to train the model

PLATO model architecture

Deepmind aims to build an intuitive learning model of physics, and analyze why the model achieves this capability. Some advanced systems in the field of AI are instantiated in the PLATO model.

The first is the target personalization process. The target personalization process cuts the visual continuous sensory input into a set of discrete entities, where each entity has a corresponding set of attributes. In PLATO, each segmented video frame is decomposed into a set of target codes (Fig. 3a-c) by the perceptual module, enabling mapping from visual input to individual targets. PLATO does not learn to segment the scene, but given a segmentation target, it learns a compressed representation.

Secondly, target tracking (or target index) assigns an index to each target, thereby achieving the correspondence between target perception and dynamic attribute calculation across time (Figure 3b, c) . In PLATO, target code is accumulated and tracked over frames in the target buffer (Figure 3d).

The last component is the relationship processing of these tracked targets. This process is inspired by the "physical reasoning system" proposed in developmental psychology, which can dynamically process the relationship between objects. Representations, generating new representations that are affected by relationships and interactions between objects and other objects.

PLATO learns the interaction between target memory and target perception history (Figure 3d) to generate predicted video frames for the next target and update target-based memory.

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

Figure 3: PLATO includes two components: perception module (left) and dynamic prediction (right)

Experimental results

In When tested, PLATO showed strong VoE effects in all five detection categories when trained with five different random seeds.

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

Figure 5: PLATO shows robust performance in probing the Physical Concepts dataset.

The training corpus in the Physical Concepts dataset contains a total of 300,000 videos. Using conservative calculations, that's approximately 52 days of continuous visual experience. From an AI and development perspective, there's the question of how much training data is actually needed to produce a VoE effect in testing. To evaluate this, Deepmind trained random seeds of three PLATO dynamic predictors on datasets of decreasing size (Figure 6), calculating a grand average of the VoE effects across all five detection classes.

Results show robust VoE effects in Deepmind’s models after training with as few as 50,000 examples (equivalent to 28 hours of visual experience) .

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

Figure 6: PLATO shows powerful results in just 28 hours of visual experience.

Generalization testing: Deepmind uses the ADEPT dataset, which is designed to explore intuitive physical knowledge. As shown in Figure 7, PLATO shows clear VoE effects for all three detection categories.

Learning like a baby, DeepMinds new model learns the rules of the physical world in 28 hours

Figure 7: PLATO demonstrates robust effects on unseen targets and dynamics without any retraining.

For more information, please view the original paper. ​

The above is the detailed content of Learning like a baby, DeepMind's new model learns the rules of the physical world in 28 hours. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
You Must Build Workplace AI Behind A Veil Of IgnoranceYou Must Build Workplace AI Behind A Veil Of IgnoranceApr 29, 2025 am 11:15 AM

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Decisions, Decisions… Next Steps For Practical Applied AIDecisions, Decisions… Next Steps For Practical Applied AIApr 29, 2025 am 11:14 AM

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The Agents Are Coming – More On What We Will Do Next To AI PartnersThe Agents Are Coming – More On What We Will Do Next To AI PartnersApr 29, 2025 am 11:13 AM

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Why Empathy Is More Important Than Control For Leaders In An AI-Driven FutureWhy Empathy Is More Important Than Control For Leaders In An AI-Driven FutureApr 29, 2025 am 11:12 AM

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

AI For Product Classification: Can Machines Master Tax Law?AI For Product Classification: Can Machines Master Tax Law?Apr 29, 2025 am 11:11 AM

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

Could Data Center Demand Spark A Climate Tech Rebound?Could Data Center Demand Spark A Climate Tech Rebound?Apr 29, 2025 am 11:10 AM

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

AI And Hollywood's Next Golden AgeAI And Hollywood's Next Golden AgeApr 29, 2025 am 11:09 AM

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

Is ChatGPT Slowly Becoming AI's Biggest Yes-Man?Is ChatGPT Slowly Becoming AI's Biggest Yes-Man?Apr 29, 2025 am 11:08 AM

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool