Home > Article > Technology peripherals > Learning like a baby, DeepMind’s new model learns the rules of the physical world in 28 hours
Deepmind aims to build a model that can learn intuitive physics and analyze why the model achieves this ability.
From AlphaFold to mathematical reasoning, DeepMind has been trying to combine AI and basic science. Now, DeepMind has created a new model that can learn simple physical rules.
Developmental psychologists tested and analyzed how babies follow the movement of objects through their gaze. For example, children expressed surprise when a video was played in which a ball suddenly disappeared.
Computer scientist Luis Piloto of DeepMind and colleagues hope to develop similar tests for artificial intelligence (AI). The team trained a neural network using videos of animations of simple objects like cubes and balls, and the model learned by discovering patterns in large amounts of data. The research paper was published July 11 in Nature Human Behavior.
This model performs physical learning by automatically encoding and tracking objects, Hence the name PLATO (Physics Learning through Auto-encoding and Tracking Objects). PLATO receives the original image from the video and a version of the image that highlights the targets of each object in the scene. PLATO aims to develop internal representations of the physical properties of objects, such as their position and velocity.
The system was trained on approximately 30 hours of videos showing simple motion mechanisms (such as a ball rolling down a slope) and developed the ability to predict how these objects would behave in different situations. . In particular, PLATO learns continuity and robustness to ensure that the trajectory of the target is uninterrupted and the shape of the object is persistent. As the video plays, the model's predictions become more accurate.
When playing videos with "impossible" events, such as an object suddenly disappearing, PLATO can measure the difference between the video and its own predictions, thus providing a measure of "surprise."
Piloto said: "PLATO was not designed as a model of infant behavior, but it can test hypotheses about how human infants learn. We hope that cognitive scientists will eventually use it to simulate infant behavior."
Jeff Clune, a computer scientist at the University of British Columbia, said, "Comparing AI with the learning methods of human infants is an important research direction. PLATO researchers hand-designed much of the prior knowledge that gives the artificial intelligence model advantages." Researchers like Clune are trying to let programs develop their own algorithms to understand the physical world.
In order to pursue richer physical intuition in AI systems, DeepMind’s research team draws inspiration from developmental psychology. The research team built a deep learning system that incorporates a core insight from developmental psychology, namely that physics is understood at the level of discrete objects and their interactions.
The core of intuitive physics relies on a discrete set of concepts (e.g., object persistence, solidity, continuity, etc.) that can be distinguished, manipulated, and individually detected. Traditional, standard approaches to AI learning intuitive physics learn about the physical world through video or state predictors, binary outcome predictions, question-answer performance, or reinforcement learning tasks. These approaches appear to require understanding some aspects of intuitive physics but do not explicitly operationalize or strategically explore a clear set of concepts.
Developmental psychology, on the other hand, holds that a physical concept corresponds to a set of expectations about how the future will unfold. For example, people expect that objects will not magically teleport from one place to another suddenly, but will trace a continuous path through time and space, which leads to the concept of continuity. Therefore, there is a way to measure knowledge of specific physical concepts: the Violation of Expectations (VoE) paradigm.
When exploring a specific concept using the VoE paradigm, researchers show infants visually similar arrays (called probes) that are either consistent (physically possible) or inconsistent (physically unlikely) with the physical concept. possible). In this paradigm, “surprise” is measured by gaze duration.
First of all, DeepMind proposed a very rich video corpus-Physical Concepts data set. This dataset contains VoE probe videos targeting five important physical concepts considered core elements in developmental psychology, including continuity, goal persistence, and robustness. The fourth is immutability, which captures the concept that certain target properties (such as shape) do not change; the fifth concept is directional inertia, which involves the expectation that a moving object will change in a direction consistent with the principle of inertia.
The most important thing is that the Physical Concepts dataset also includes a separate video corpus as training data. These videos demonstrate various procedurally generated physics events.
Figure 2: Example of video dataset used to train the model
Deepmind aims to build an intuitive learning model of physics, and analyze why the model achieves this capability. Some advanced systems in the field of AI are instantiated in the PLATO model.
The first is the target personalization process. The target personalization process cuts the visual continuous sensory input into a set of discrete entities, where each entity has a corresponding set of attributes. In PLATO, each segmented video frame is decomposed into a set of target codes (Fig. 3a-c) by the perceptual module, enabling mapping from visual input to individual targets. PLATO does not learn to segment the scene, but given a segmentation target, it learns a compressed representation.
Secondly, target tracking (or target index) assigns an index to each target, thereby achieving the correspondence between target perception and dynamic attribute calculation across time (Figure 3b, c) . In PLATO, target code is accumulated and tracked over frames in the target buffer (Figure 3d).
The last component is the relationship processing of these tracked targets. This process is inspired by the "physical reasoning system" proposed in developmental psychology, which can dynamically process the relationship between objects. Representations, generating new representations that are affected by relationships and interactions between objects and other objects.
PLATO learns the interaction between target memory and target perception history (Figure 3d) to generate predicted video frames for the next target and update target-based memory.
Figure 3: PLATO includes two components: perception module (left) and dynamic prediction (right)
In When tested, PLATO showed strong VoE effects in all five detection categories when trained with five different random seeds.
Figure 5: PLATO shows robust performance in probing the Physical Concepts dataset.
The training corpus in the Physical Concepts dataset contains a total of 300,000 videos. Using conservative calculations, that's approximately 52 days of continuous visual experience. From an AI and development perspective, there's the question of how much training data is actually needed to produce a VoE effect in testing. To evaluate this, Deepmind trained random seeds of three PLATO dynamic predictors on datasets of decreasing size (Figure 6), calculating a grand average of the VoE effects across all five detection classes.
Results show robust VoE effects in Deepmind’s models after training with as few as 50,000 examples (equivalent to 28 hours of visual experience) .
Figure 6: PLATO shows powerful results in just 28 hours of visual experience.
Generalization testing: Deepmind uses the ADEPT dataset, which is designed to explore intuitive physical knowledge. As shown in Figure 7, PLATO shows clear VoE effects for all three detection categories.
Figure 7: PLATO demonstrates robust effects on unseen targets and dynamics without any retraining.
For more information, please view the original paper.
The above is the detailed content of Learning like a baby, DeepMind’s new model learns the rules of the physical world in 28 hours. For more information, please follow other related articles on the PHP Chinese website!