Home >Technology peripherals >AI >Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

WBOY
WBOYforward
2024-03-30 13:31:35879browse
A photo and a video can make the photo come alive!

Recently, Champ, a controllable human vision generation work jointly released by Alibaba, Fudan University, and Nanjing University, has become popular all over the Internet. This model has only been open sourced for 5 days and has received 1k stars on GitHub. It has become popular on Twitter, attracting a large number of bloggers to create new projects, and the total number of views has reached 300K.

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

Currently, Champ has open sourced the inference code and weights, and users can download and use them directly from Github. The official Hugging Face Demo has been launched, and the encapsulated Champ-ComfyUI is also being promoted simultaneously. The GitHub homepage shows that the team will open source the training code and data sets in the near future. Interested partners can continue to pay attention to the project dynamics. Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

  • Project homepage: https://fudan-generative-vision.github.io/champ/

  • Paper link: https ://arxiv.org/abs/2403.14781

  • Github link: https://github.com/fudan-generative-vision/champ

  • Hugging Face link: https://huggingface.co/fudan-generative-ai/champ

Champ’s video effect on real-world portraits, allowing different portraits to be “copied” "The same action, the action video from the upper left corner angle is the input.

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

Although Champ is only trained with real human body videos, it demonstrates strong generalization capabilities on different types of images:

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

The effects of black and white photos, oil paintings, watercolors, etc. are outstanding. The realistic images and virtual characters generated by different Vincentian graph models are not to mention:

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

Technical Overview

Champ uses an advanced human mesh recovery model to extract the corresponding parameterized three-dimensional human mesh model SMPL sequence (Skinned Multi-Person Linear Model) from the input human video. ), and further render the corresponding depth map, normal map, human posture and human body semantic map, as the corresponding motion control conditions to guide video generation, and migrate the actions to the input reference portrait, which can significantly improve the human motion video quality, as well as geometric and cosmetic consistency.

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

For different motion conditions, Champ adopts a multi-layer motion fusion module (MLMF), using the self-attention mechanism to fully integrate the characteristics between different conditions to achieve more for refined motion control. The following figure shows the attention visualization results of this module under different conditions: the depth map focuses on the geometric outline information of the human form, the normal map indicates the orientation of the human body, the semantic map controls the appearance correspondence of different parts of the human body, and the human posture skeleton It only focuses on the key details of the face and hands.

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

#On the other hand, Champ discovered and solved the problem of body shape transfer that has been ignored in human video generation. Previous work was either based on the human skeleton model or based on other geometric information obtained from the input video to drive the movement of the human figure. However, these methods were unable to decouple the movement from the human body shape, resulting in the generated results being inconsistent with the human body in the reference image. Body type matching.

For example, given a big fat person as a reference image, the comparison result shown in Figure 7 is as follows:

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

As you can see, Animate Anyone and MagicAnimate In the generated results, the fat belly has been smoothed out, and even the frame has shrunk a bit. Champ uses the body shape parameters in SMPL to align it with the SMPL sequence that drives the video in a parameterized body shape, thereby achieving the best consistency in body shape and action (with PST in the picture).

Experimental results

As shown in Table 4 below, compared with other SOTA work, Champ has better motion control and fewer artifacts:

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

At the same time, Champ also demonstrated its superior generalization performance and stability in appearance matching:

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playableChamp is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

In the TikTok Dance data set, Champ evaluated the quantitative effect of image generation and video generation. It has greatly improved on multiple evaluation indicators, as shown in Table 1 below.

Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable

For more technical details and experimental results, please refer to Champ’s original paper and code. You can also go to HuggingFace or download the official source code for hands-on experience.

The above is the detailed content of Champ is the first open source: human body video generates new SOTA, gained 1k stars in 5 days, and the demo is playable. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete