Home  >  Article  >  Technology peripherals  >  Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

WBOY
WBOYforward
2023-09-29 23:57:02836browse

Open a digital human, which is full of generative AI.

On the evening of September 23, at the opening ceremony of the Hangzhou Asian Games, the lighting of the main torch showed the "little flames" of hundreds of millions of online digital torchbearers gathering on the Qiantang River. A digital human image is formed. Then, the digital human torchbearer and the sixth torchbearer on site walked to the torch stage together and lit the main torch together

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

As the core idea of ​​the opening ceremony, the digital torch bearer The Internet's torch-lighting form has become a hot search topic and attracted people's attention. Rewritten content: As the core idea of ​​the opening ceremony, the torch lighting method of Digital Reality Internet has aroused heated discussions and attracted people's attention.

Digital People Ignition is an unprecedented initiative, with hundreds of millions of people participating. , involving a large number of advanced and complex technologies. One of the most important issues is how to make digital people "move". It can be clearly seen that with the rapid development of generative artificial intelligence and large-scale models, more new changes have appeared in digital human research

At the upcoming global computer vision conference ICCV 2023 in early October, We noticed that a study on generating 3D digital human motion was included in the conference. The related paper is titled "Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models" and was jointly published by Zhejiang University and Ant Group.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

According to the introduction, this research solves to a certain extent the problem of digital humans synthesizing complex movements over long distances, and can achieve effects that cannot be achieved with original models or path planning. Technology related to digital human driving has also been used in the online delivery of 100 million digital human beings in the Asian Games

Generative AI driver to make digital humans move

Many times , we need to synthesize 3D human motion in a given 3D scene so that virtual humans can naturally walk around the scene and interact with objects. This effect has many applications in AR/VR, film production, and video games.

Here, traditional character control motion generation methods aim to generate short-term or repetitive motions guided by the user's control signals, while new research focuses on generating a given starting position and target object model. Longer human-computer interaction content.

Although this idea is more effective, it is obviously more challenging. First, human-object interactions should be coherent, which requires the ability to model long-range interactions between humans and objects. Second, in the context of content generation, generative models should be able to synthesize motions of different sizes, since there are multiple ways for real people to approach and interact with target objects.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology
Figure 1. Generation of interactive images between people and objects. Given an object, the new method first predicts a set of milestone events, where the ring represents the position and the person in pink represents the original pose. The algorithm fills in actions between milestones. The diagram shows the new method using the same object to generate different milestones and actions. The flow of time is shown with a color code, with darker blue representing further frames.

In terms of methods for generating digital human movements, existing synthesis methods can be roughly divided into online generation and offline generation. Most online methods focus on real-time control of the character. Given a target object, they typically use autoregressive models to cyclically generate future motion through feedback predictions. Although this method has been widely used in interactive scenarios such as video games, its quality is still unsatisfactory for long-term generation.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

In order to improve the quality of motion, some recent offline methods have begun to adopt a multi-level framework, first generating trajectories and then synthesizing motion. Although this strategy can produce reasonable paths, the diversity of paths is limited

In this new study, the authors propose a new offline method for synthesizing long-term and diverse Interaction between people and objects. The innovation of this method lies in the hierarchical generation strategy. First, the strategy predicts a series of milestones and then generates human actions between those milestones

Specifically, given a starting position and a target object, the author designed a milestone generation module to synthesize a set of nodes along the movement trajectory. Each milestone encodes the local pose and indicates the transition during human movement. point. Based on these milestones, the algorithm employs a motion generation module to generate complete motion sequences. Thanks to the existence of these milestones, we can simplify the generation of long sequences to the synthesis of several short motion sequences.

The local pose of each milestone is generated by a transformer model that considers global dependencies to produce time-consistent results, further facilitating coherent motion

In addition to the hierarchical generation framework, The researchers further used diffusion models to synthesize human-object interactions. Some previous motion synthetic diffusion models combined transformers and denoising diffusion probabilistic models (DDPM).

It is worth mentioning that due to the long motion sequences, applying them directly to the new settings requires a lot of calculations and may cause GPU memory explosion. Because the new hierarchical generation framework converts long-term generation into the synthesis of multiple short sequences, the GPU memory required is reduced to the same level as short-term motion generation.

Therefore, researchers can effectively use Transformer DDPM to synthesize long-term motion sequences, thereby improving the generation quality

To achieve this purpose, researchers designed a hierarchical generation framework, as shown in the figure below Show

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

First, they use GoalNet to predict interaction targets on objects, and then generate target poses to explicitly model human-object interactions. Next, they use the milestone generation module to estimate the length of the milestone, thereby generating the milestone trajectory from the starting point to the target, and place the milestone pose

In this way, the long-distance motion generation is decomposed into multiple short-distance Motion generated combinations. Finally, the authors designed a motion generation module to synthesize trajectories between milestones and fill in actions.

Artificial Intelligence (AI) Posture Generation

Researchers refer to the posture in which a person interacts with an object and remains stationary as the target posture. Previously, most methods used cVAE models to generate human poses, but researchers found that this method performed poorly in their own studies.

To address this challenge, they adopted the VQ-VAE model to model the data distribution. This model utilizes discrete representation to cluster data into a limited set of points. Furthermore, based on observations, different human poses may have similar properties. For example, when a person is sitting down, the hand movements may be different, but the leg position may be the same. Therefore, they divided the joints into L (L = 5) different non-overlapping groups

As shown in Figure 3, the target pose was divided into independent joint groups

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

Based on the starting pose and target pose, we can let the algorithm generate the milestone trajectory and synthesize the local pose at the milestone. Since the length of the motion data is unknown and can be arbitrary (for example, a person may quickly walk to the chair and sit down, or he may walk slowly around the chair and then sit down), it is necessary to predict the length of the milestone, represented by N . Then, N landmark points are synthesized and local poses are placed on these points.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

The last step is action generation. The method used by the researchers is not to predict actions frame by frame, but to synthesize the entire sequence hierarchically based on the generated milestones. They first generate trajectories and then synthesize actions. Specifically, within two consecutive milestones, they complete the trajectory first. Then, fill in the movement guided by successive milestone gestures. These two steps are completed using two Transformer DDPM respectively.

The researcher will carefully design the conditions of DDPM for each step to generate the target output

The rewritten content is: the effect of being ahead of other products

The researchers compared the results of different methods on the SAMP dataset. It can be seen that the method proposed in the paper has lower FD, higher user research score and higher APD. Furthermore, their method achieves higher trajectory diversity than SAMP.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

This new method can produce satisfactory results in complex scenes. The percentage of penetration frames generated by this method is 3.8%, and that of SAMP is 4.9%

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

On SAMP, COUCH and other data sets, the methods mentioned in the study have achieved Better results than baseline methods

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology

Complete full-link layout

Digital human is a multi-modal combination of voice, semantics, vision, etc. A combination of dynamic technologies. While generative AI has recently made breakthroughs, the field of digital humans is experiencing leapfrog development. The modeling, generation interaction, rendering and other aspects that previously required manual work are now being fully artificialized. As engineers continue to Optimization, the experience of this technology on the mobile terminal is also getting better. The just-concluded online Asian Games torch relay event is a good example: if we want to become a torch bearer, we only need to click on the mini program of the Alipay App.

It is said that in order to ensure the smooth progress of the opening ceremony project, Ant Group’s engineers conducted more than 100,000 tests on hundreds of different models of mobile phones, typed more than 200,000 lines of code, and passed self-research The combination of Web3D interactive engine Galacean, AI digital human, cloud services, blockchain and other technologies ensures that everyone can become a digital torchbearer and participate in the torch relay. The Asian Games Digital Torchbearer Platform can reach hundreds of millions of users and supports 97% of common smartphone devices.

In order to allow digital torchbearers to participate realistically, Ant’s technical team developed 58 face-pinching controllers. By using facial recognition and AI algorithms, they can map a digital torchbearer's face based on each person's facial features. At the same time, users can also freely adjust face shape, hairstyle, nose, mouth, eyebrows and other features to achieve free dress-up. This technology can provide 2 trillion different digital image choices

In addition, after the opening ceremony lighting ceremony, each digital torch bearer can receive an exclusive digital ignition certificate with each digital torch painted on it. With a unique image of your hand, this certificate will be stored on the blockchain through distributed technology.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technologyIt is easy to see from the content of the research paper and the Asian Games projects that there is support from a complete digital human technology system behind it. It is understood that Ant Group is actively exploring digital human technology and has completed the self-research layout of the full-link core technology of digital human.

Unlike most companies on the market, Ant Group’s digital human technology is self-developed and has chosen a development direction that is combined with generative AI. In terms of technical deployment, it covers the entire life cycle of digital human modeling, rendering, driving, and interaction. Combining AIGC and large models significantly reduces the full-link production cost of digital humans. Currently, it can support 2D and 3D digital people, and provides a variety of solutions such as broadcast type and interactive type.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technologyAccording to public information, it can be summarized that the Ant Digital Human Platform currently has four technical advantages and features:

    Low-cost modeling : Cooperating with Tsinghua University to launch a 3D parametric model of Asian faces, which reconstructs 3D faces based on photos, which is more in line with the characteristics of Asian faces.
  • Generative driver: The combination of driver generation and motion capture effectively reduces costs and improves the richness of movements compared to the traditional action production process.
  • Highly adaptable rendering: self-developed Web3D rendering engine Galacean, covering 97% of common mobile phone terminals; in terms of neural rendering, a NeRF framework that decouples dynamic driving and static modeling has been built, and applications in digital human dynamic video scenes.
  • Intelligent interaction: Based on pre-trained timbre cloning, it supports minute-level audio input to generate personalized digital human timbre; and layouts digital human interaction based on large models.
  • Before the opening ceremony of the Asian Games, the China Academy of Information and Communications Technology released the latest compliance verification results of digital human standards. Ant Group’s Lingjing Digital Human Platform became the first product in the industry to pass the financial digital human evaluation. Obtained the highest rating of "Excellent Level (L4)".

In addition to the Asian Games, the Ant Digital People Platform also supports Ant Group’s Alipay, digital finance, government affairs, Wufu and other businesses, and this year began to apply it to short videos, live broadcasts, mini programs and other carriers to partners Provide basic services.

It can be predicted that in the near future, as digital humans powered by generative AI continue to upgrade, we will also experience better interactions in more scenarios, and truly enter a smart life integrating digital and real things.

The above is the detailed content of Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant’s generative AI black technology. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete