Shanghai University of Science and Technology and others released DreamFace: just text can generate a 'hyper-realistic 3D digital human'-AI-php.cn

Shanghai University of Science and Technology and others released DreamFace: just text can generate a 'hyper-realistic 3D digital human'

王林

May 17, 2023 am 08:02 AM

digital man

With the development of large language model (LLM), diffusion (Diffusion) and other technologies, the birth of products such as ChatGPT and Midjourney has set off a new wave of AI craze, and generative AI has also become a topic of great concern. .

Unlike text and images, 3D generation is still in the technology exploration stage.

At the end of 2022, Google, NVIDIA and Microsoft have successively launched their own 3D generation work, but most of them are based on advanced Neural Radiation Field (NeRF) implicit expression and are incompatible with industrial 3D software Rendering pipelines such as Unity, Unreal Engine, and Maya are not compatible.

Even if it is converted into geometric and color maps expressed by Mesh through traditional solutions, it will cause insufficient accuracy and reduced visual quality, and cannot be directly applied to film and television production and game production.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

Project website: https://sites.google.com/view/dreamface

Paper address: https://arxiv.org/abs/2304.03117

Web Demo: https ://hyperhuman.top

HuggingFace Space: https://huggingface.co/spaces/DEEMOSTECH/ChatAvatar

In order to solve these problems, the R&D team from Yingmo Technology and Shanghai University of Science and Technology proposed a text-guided progressive 3D generation framework.

This framework introduces external data sets (including geometry and PBR materials) that comply with CG production standards, and can directly generate 3D assets that comply with this standard based on text. It is the first to support Production-Ready A framework for 3D asset generation.

To achieve text generation-driven 3D hyper-realistic digital humans, the team combined this framework with a production-grade 3D digital human dataset. This work has been accepted by Transactions on Graphics, the top international journal in the field of computer graphics, and will be presented at SIGGRAPH 2023, the top international computer graphics conference.

DreamFace mainly includes three modules, geometry generation, physics-based material diffusion and animation capability generation.

Compared with previous 3D generation work, the main contributions of this work include:

· Proposed DreamFace This novel generative approach combines recent visual-language models with animatable and physically materialable facial assets, using progressive learning to separate geometry, appearance, and animation capabilities.

· Introduces the design of dual-channel appearance generation, combining a novel material diffusion model with a pre-trained model, simultaneously in the latent space and image space Perform two-stage optimization.

· Facial assets using BlendShapes or generated Personalized BlendShapes have animation capabilities and further demonstrate the use of DreamFace for natural character design.

Geometry generation

The geometry generation module can generate a consistent geometric model based on text prompts. However, when it comes to face generation, this can be difficult to supervise and converge.

Therefore, DreamFace proposes a selection framework based on CLIP (Contrastive Language-Image Pre-Training), which first selects the best candidates from randomly sampled candidates in the face geometric parameter space. Get a good rough geometry model and then sculpt the geometric details to make the head model more consistent with the text prompt.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

Based on the input prompts, DreamFace uses the CLIP model to select the rough geometry candidate with the highest matching score. Next, DreamFace uses an implicit diffusion model (LDM) to perform Scored Distillation Sampling (SDS) processing on the rendered image under random viewing angles and lighting conditions.

This allows DreamFace to add facial details to rough geometry models through vertex displacement and detailed normal maps, resulting in highly detailed geometry.

Similar to the head model, DreamFace also makes hairstyle and color selections based on this framework.

Physically Based Material Diffusion Generation

The physically based material diffusion module is designed to predict facial textures that are consistent with predicted geometry and text cues.

First, DreamFace fine-tuned the pre-trained LDM on the large-scale UV material data set collected to obtain two LDM diffusion models.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

DreamFace uses a joint training scheme that coordinates two diffusion processes, one for directly denoising UV textures map, and the other is used to supervise the rendered image to ensure the correct formation of facial UV maps and rendered images consistent with text cues.

In order to reduce the generation time, DreamFace adopts a rough texture potential diffusion stage to provide a priori potential for detailed texture generation.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

To ensure that the texture maps created do not contain undesirable features or lighting situations, while still maintaining diversity, the design A cued learning strategy.

The team uses two methods to generate high-quality diffuse reflection maps:

(1) Prompt Tuning. Unlike hand-crafted domain-specific text cues, DreamFace combines two domain-specific continuous text cues Cd and Cu with corresponding text cues, which will be optimized during U-Net denoiser training to avoid instability and Time-consuming manual writing of prompts.

(2) Masking of non-face areas. The LDM denoising process will be additionally constrained by non-face area masks to ensure that the resulting diffuse map does not contain any unwanted elements.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

As the final step, DreamFace applies the Super-Resolution module to generate 4K physically-based textures for high-quality rendering.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

The DreamFace framework has achieved very good results in generating celebrities and generating characters based on descriptions. In the User Study Obtained results that far exceeded previous work. Compared with previous work, it also has obvious advantages in running time.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

In addition to this, DreamFace also supports texture editing using hints and sketches. Global editing effects such as aging and makeup can be achieved by directly using fine-tuned texture LDMs and cues. By further combining masks or sketches, various effects can be created such as tattoos, beards, and birthmarks.

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

Animation ability generation

Shanghai University of Science and Technology and others released DreamFace: just text can generate a hyper-realistic 3D digital human

# #DreamFace generates models with animation capabilities. Unlike BlendShapes-based methods, DreamFace’s neural facial animation method produces personalized animations by predicting unique deformations to animate the resulting Neutral model.

First, a geometric generator is trained to learn the latent space of expressions, where the decoder is extended to be conditioned on neutral geometric shapes. Then, the expression encoder is further trained to extract expression features from RGB images. Therefore, DreamFace is able to generate personalized animations conditioned on neutral geometric shapes using monocular RGB images.

Compared to DECA, which uses generic BlendShapes for expression control, DreamFace's framework provides fine expression details and is able to capture performances with fine detail.

Conclusion

This paper introduces DreamFace, a text-guided progressive 3D generation framework that combines the latest visual-language models, implicit Diffusion models, and physically based material diffusion techniques.

DreamFace’s main innovations include geometry generation, physically based material diffusion generation and animation capability generation. Compared with traditional 3D generation methods, DreamFace has higher accuracy, faster running speed and better CG pipeline compatibility.

DreamFace’s progressive generation framework provides an effective solution for solving complex 3D generation tasks and is expected to promote more similar research and technology development.

In addition, physically based material diffusion generation and animation capability generation will promote the application of 3D generation technology in film and television production, game development and other related industries.

The above is the detailed content of Shanghai University of Science and Technology and others released DreamFace: just text can generate a 'hyper-realistic 3D digital human'. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

How to Build an Intelligent FAQ Chatbot Using Agentic RAGMay 07, 2025 am 11:28 AM

AI agents are now a part of enterprises big and small. From filling forms at hospitals and checking legal documents to analyzing video footage and handling customer support – we have AI agents for all kinds of tasks. Compan

From Panic To Power: What Leaders Must Learn In The AI AgeMay 07, 2025 am 11:26 AM

Life is good. Predictable, too—just the way your analytical mind prefers it. You only breezed into the office today to finish up some last-minute paperwork. Right after that you’re taking your partner and kids for a well-deserved vacation to sunny H

Why Convergence-Of-Evidence That Predicts AGI Will Outdo Scientific Consensus By AI ExpertsMay 07, 2025 am 11:24 AM

But scientific consensus has its hiccups and gotchas, and perhaps a more prudent approach would be via the use of convergence-of-evidence, also known as consilience. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my

The Studio Ghibli Dilemma – Copyright In The Age Of Generative AIMay 07, 2025 am 11:19 AM

Neither OpenAI nor Studio Ghibli responded to requests for comment for this story. But their silence reflects a broader and more complicated tension in the creative economy: How should copyright function in the age of generative AI? With tools like

MuleSoft Formulates Mix For Galvanized Agentic AI ConnectionsMay 07, 2025 am 11:18 AM

Both concrete and software can be galvanized for robust performance where needed. Both can be stress tested, both can suffer from fissures and cracks over time, both can be broken down and refactored into a “new build”, the production of both feature

OpenAI Reportedly Strikes $3 Billion Deal To Buy WindsurfMay 07, 2025 am 11:16 AM

However, a lot of the reporting stops at a very surface level. If you’re trying to figure out what Windsurf is all about, you might or might not get what you want from the syndicated content that shows up at the top of the Google Search Engine Resul

Mandatory AI Education For All U.S. Kids? 250-Plus CEOs Say YesMay 07, 2025 am 11:15 AM

Key Facts Leaders signing the open letter include CEOs of such high-profile companies as Adobe, Accenture, AMD, American Airlines, Blue Origin, Cognizant, Dell, Dropbox, IBM, LinkedIn, Lyft, Microsoft, Salesforce, Uber, Yahoo and Zoom.

Our Complacency Crisis: Navigating AI DeceptionMay 07, 2025 am 11:09 AM

That scenario is no longer speculative fiction. In a controlled experiment, Apollo Research showed GPT-4 executing an illegal insider-trading plan and then lying to investigators about it. The episode is a vivid reminder that two curves are rising to

See all articles