


Hard core to solve Sora's physics bug! Four top universities in the United States jointly released: Install a physics engine for the video generator
Some bugs appeared after Sora was released. Users on the Internet discovered some problems. Although the model did not fully understand the physical world, when the puppy was walking, its two front legs would Interleaving issues occur, causing the game to appear unexpectedly.
The interaction of objects is very important for generating video realism, but currently, it is still very difficult to synthesize the dynamic behavior of real 3D objects in interaction.
Action Conditioned Dynamics is a field of research that requires the perception of physical material properties of objects and the prediction of 3D motion based on these properties (such as object stiffness).
Evaluating physical material properties remains a thorny and unsolved problem due to the lack of data support, as measuring physical material properties of real objects is extremely difficult.
Recently, MIT, Stanford University, Columbia University, and Cornell University jointly proposed a physics-based model, PhysDreamer, that uses object dynamics learned from video generation models to learn priors , empowering interactive dynamics learning for static 3D objects.
Paper link: https://arxiv.org/pdf/2404.13026.pdf
Project Home page: https://physdreamer.github.io/
By refining prior knowledge, PhysDreamer can realize the response of physical objects to new interactions, such as external forces or agent operations, and through The effectiveness of the method is demonstrated on different examples of elastic objects, and user studies are used to evaluate the realism of the synthesized interactions.
Formalization of the problem
Given a static object represented by a 3D Gaussian(where xp represents the position and αp represents the opacity , Σp represents the covariance matrix, cp represents the color of the particle), the ultimate goal is to estimate the physical material property field of the object to achieve real interactive motion synthesis.
The specific properties include mass m, Young's modulus E and Poisson's ratio ν. Young's modulus is used to measure the stiffness of the material and determines the movement trajectory of the object in response to external forces: relatively A high Young's modulus results in smaller deformation, stiffer and higher frequency motion.
Simulated motion of flowers under the same force but with different Young’s modulus
So the researchers formalized the problem as estimating the spatially varying Young's modulus field E(x) of the 3D object, and can use to query the Young's modulus of the particle for particle simulation.
As for other physical properties, the mass m_p of the particle can be pre-calculated as the product of the constant density (ρ) and the particle volume Vp; the particle volume can be calculated by dividing the "volume of the background unit" by It is estimated by "the number of particles contained in the unit"; the influence of Poisson's ratio νp on the motion of the object is negligible and can be assumed to be a constant.
Model Architecture
PhysDreamer can estimate the material field of a static 3D object. The key idea is to generate a credible video of the moving object and then optimize the material field E(x ) to match synthetic motion.
Given an object represented as a 3D Gaussian, first render it from some viewpoint (with background), then use an image-to-video generation model to generate a reference video of the object in motion, Differentiable material point methods (MPM, Material Point Methods) and differentiable rendering are then used to optimize the spatially varying material field and initial velocity field, aiming to minimize the difference between the rendered video and the reference video.
The dashed arrow represents the gradient flow
1. Basic knowledge
3D Gaussian uses a set of anisotropic 3D Gaussian kernels to represent the radiation field of the 3D scene. Although it is mainly introduced as a 3D new view synthesis method, due to the Lagrangian properties of 3D Gaussian, So it can be directly applied to particle physics simulators.
Similar to the PhysGaussian method, researchers use material point methods (MPM, Material Point Methods) to directly simulate object dynamics on Gaussian particles.
Since the 3D Gaussian distribution is mainly located on the surface of the object, an optional internal filling process can be applied to improve the realism of the simulation.
Continuum mechanics and elastic materials
In continuum mechanics, the deformation of the material is Simulation is carried out through a mapping function ϕ, which can convert the space point X of the material in the undeformed state into the point #In order to measure the local rotation and strain (strain) in material deformation, the concept of deformation gradient (deformation gradient) is introduced, which is the Jacobian matrix F of the mapping function ϕ, that is,
Deformation gradient is the key to understanding and describing the material stress-strain relationship, which involves the local deformation state of the material.
In highly elastic materials, the calculation of Cauchy stress (stress) relies on the strain energy density function ψ(F), which can quantify the degree of non-rigid deformation of the material; generally speaking , a function designed by materials scientists based on the principles of symmetry and rotational invariance of materials and matched to experimental data.
In addition, the energy density function in the fixed rotation hyperelastic model can be expressed by a singular value σi of the deformation gradient, and the model parameters μ and λ are related to the Young’s modulus E of the material Directly related to Poisson's ratio ν, these parameters are critical to understanding how materials behave when stressed.
##Material Point Method (MPM)
The researchers used the moving least squares material point method (MLS-MPM) to solve the governing equations of "elastic material dynamics", where ρ represents the density and v(x, t) represents the velocity field in the world space. , f represents external force.
MPM is a calculation method for simulating the dynamics of various materials, which combines the advantages of Euler and Lagrangian methods. It is particularly suitable for simulating the dynamic behavior of materials such as solids, fluids, sand, and cloth. It can effectively handle topological changes in materials, and can be easily parallelized on a graphics processing unit (GPU).
Spatial discretization is performed by treating the object as a series of Gaussian particles. Each particle p represents a small part of the volume of the object and carries volume, mass, position, velocity, deformation Properties such as gradient and local velocity field gradient.
The calculation process of MPM includes particle-to-grid (P2G) and grid-to-particle (G2P) transfer loops:
In the P2G stage, momentum is transferred from the particle to the grid, updating the velocity on the grid, and then these updated velocity information is passed back to the particle to update the particle's position and velocity. At the same time, the particle's local velocity gradient and The deformation gradient is also updated to reflect the current state of the material.
The MPM method can accurately simulate the complex dynamic behavior of materials, including material deformation, fracture and interaction.
2. Estimated physical properties
The researchers used the Moving Least Squares Material Point Method (MLS-MPM) as a physical simulator and a fixed rotation hyperelastic material model to simulate the process of three-dimensional objects.
MLS-MPM simulation process
The simulator uses MLS-MPM to simulate the physical behavior of objects, simulation functions Receives the particle position x, velocity v, deformation gradient F and local velocity field gradient C at the current time step t, as well as the particle's physical property set θ (including the mass, Young's modulus, Poisson's ratio and volume of all particles) and time The step size Δt (1×10^-4) is taken as input and the corresponding value of the next time step t 1 is output.
To simulate the dynamics between adjacent video frames, it is often necessary to iterate hundreds of sub-steps.
Simulation and Rendering
After simulation, use the differentiable rendering function Frender to render the Gaussian of each frame particles, where Rt represents the rotation matrix of all particles obtained from the simulation step.
The generated video is then used as a reference to optimize the spatially varying Young’s modulus E and initial velocity v0 through a loss function for each frame, The loss function combines L1 loss and D-SSIM loss, and the weight parameter λ is set to 0.1
Parameterization and regularization
The material field and velocity field are parameterized through two triplanes and three multilayer perceptrons (MLP). In order to improve the spatial smoothness, these two Total variation regularization is applied to all spatial planes of the field.
Optimization process
The optimization process is divided into two stages: Improve stability and speed up convergence:
#1. In the first stage, the Young's modulus of each Gaussian particle is randomly initialized and fixed, and then only the front part of the reference video is used. Three frames to optimize the initial velocity of each particle.
2. In the second stage, the initial velocity is fixed and the spatially varying Young's modulus is optimized. To prevent gradients from exploding or disappearing, the gradient signal only flows to the previous frame.
In this way, the simulator is able to simulate the physical behavior of the object and optimize the material properties and initial conditions based on the reference video to generate realistic dynamic effects.
3. Accelerate simulation with subsampling
Using three-dimensional Gaussian particles for high-fidelity rendering usually requires millions of particles. Representing a scenario imposes a huge computational burden on running the simulation.
In order to improve efficiency, the model introduces a sub-sampling process, which greatly reduces the amount of calculation while maintaining the high fidelity of the rendering results: only a small number of driving particles (driving particles) are used particle), and then drive the particles through interpolation to obtain the position and rotation of Gaussian particles, effectively balancing computational efficiency and rendering quality.
Specifically, the model uses the K-Means clustering algorithm to create a set of driving particles at time t=0, where each driving particle is represented by a set of physical attributes, including position, Velocity, deformation gradient, local velocity field gradient, Young's modulus, mass, Poisson's ratio and volume.
The initial position of the driving particle is the average of the positions of all its cluster members, where the number of driving particles is much smaller than the number of three-dimensional Gaussian particles.
During the rendering process, the position and rotation of each three-dimensional Gaussian particle are calculated by interpolating the position and rotation of the driving particle: for each three-dimensional Gaussian particle, first find its eight closest ones at time t=0 neighboring driven particles, and then fit the rigid body transformation T of these eight driven particles between time t=0 and the current timestamp to determine the current position and rotation of the particles.
Experimental results
Dataset
By capturing multiple perspectives Images, the researchers collected eight real-world static scenes, each of which included an object and a background. The items included five flowers (a red rose, a carnation, an orange rose, a tulip, and a (a white rose), an alocasia, a telephone cord, and a beanie; then capture four videos of the interactions to illustrate their natural movements after the interaction, such as poking or dragging, using real videos for additional comparison refer to.
Experimental results
Regarding the spatially varying Young’s modulus ( A physical quantity that measures the elasticity of a material) Qualitative analysis results
In user studies, compared with baseline methods and real-world captured videos, it can be seen that there was more than 80% participation Participants preferred the PhysDreamer model in the two-choice experiment (2AFC), believing that it was superior in terms of realism of movement; in terms of visual quality, 65% of participants also preferred the PhysDreamer model
It should be noted that since the compared static scenes themselves are consistent, the evaluation of visual quality also relies on the motion effect of the generated objects to a certain extent.
It can be observed from the slices of motion patterns at different time points that PhysGaussian is generated due to the lack of principled estimation of material properties. The range of motion is too large and the speed is too slow, which is inconsistent with reality.
Compared with DreamGaussian4D, 70% and 63.5% of the 2AFC samples prefer the PhysDreamer model in terms of visual quality and motion authenticity. As can be seen from the figure above, DreamGaussian4D The generated motion is periodic and the amplitude remains at a small constant value. In contrast, PhysDreamer can simulate the attenuation effect in motion.
The above is the detailed content of Hard core to solve Sora's physics bug! Four top universities in the United States jointly released: Install a physics engine for the video generator. For more information, please follow other related articles on the PHP Chinese website!

Harnessing the Power of Data Visualization with Microsoft Power BI Charts In today's data-driven world, effectively communicating complex information to non-technical audiences is crucial. Data visualization bridges this gap, transforming raw data i

Expert Systems: A Deep Dive into AI's Decision-Making Power Imagine having access to expert advice on anything, from medical diagnoses to financial planning. That's the power of expert systems in artificial intelligence. These systems mimic the pro

First of all, it’s apparent that this is happening quickly. Various companies are talking about the proportions of their code that are currently written by AI, and these are increasing at a rapid clip. There’s a lot of job displacement already around

The film industry, alongside all creative sectors, from digital marketing to social media, stands at a technological crossroad. As artificial intelligence begins to reshape every aspect of visual storytelling and change the landscape of entertainment

ISRO's Free AI/ML Online Course: A Gateway to Geospatial Technology Innovation The Indian Space Research Organisation (ISRO), through its Indian Institute of Remote Sensing (IIRS), is offering a fantastic opportunity for students and professionals to

Local Search Algorithms: A Comprehensive Guide Planning a large-scale event requires efficient workload distribution. When traditional approaches fail, local search algorithms offer a powerful solution. This article explores hill climbing and simul

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

Chip giant Nvidia said on Monday it will start manufacturing AI supercomputers— machines that can process copious amounts of data and run complex algorithms— entirely within the U.S. for the first time. The announcement comes after President Trump si


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Chinese version
Chinese version, very easy to use

SublimeText3 Mac version
God-level code editing software (SublimeText3)