


With the development of large language model (LLM), diffusion (Diffusion) and other technologies, the birth of products such as ChatGPT and Midjourney has set off a new wave of AI craze, and generative AI has also become a topic of great concern. .
Unlike text and images, 3D generation is still in the technology exploration stage.
At the end of 2022, Google, NVIDIA and Microsoft have successively launched their own 3D generation work, but most of them are based on advanced Neural Radiation Field (NeRF) implicit expression and are incompatible with industrial 3D software Rendering pipelines such as Unity, Unreal Engine, and Maya are not compatible.
Even if it is converted into geometric and color maps expressed by Mesh through traditional solutions, it will cause insufficient accuracy and reduced visual quality, and cannot be directly applied to film and television production and game production.
Project website: https://sites.google.com/view/dreamface
Paper address: https://arxiv.org/abs/2304.03117
Web Demo: https ://hyperhuman.top
HuggingFace Space: https://huggingface.co/spaces/DEEMOSTECH/ChatAvatar
In order to solve these problems, the R&D team from Yingmo Technology and Shanghai University of Science and Technology proposed a text-guided progressive 3D generation framework.
This framework introduces external data sets (including geometry and PBR materials) that comply with CG production standards, and can directly generate 3D assets that comply with this standard based on text. It is the first to support Production-Ready A framework for 3D asset generation.
To achieve text generation-driven 3D hyper-realistic digital humans, the team combined this framework with a production-grade 3D digital human dataset. This work has been accepted by Transactions on Graphics, the top international journal in the field of computer graphics, and will be presented at SIGGRAPH 2023, the top international computer graphics conference.
DreamFace mainly includes three modules, geometry generation, physics-based material diffusion and animation capability generation.
Compared with previous 3D generation work, the main contributions of this work include:
· Proposed DreamFace This novel generative approach combines recent visual-language models with animatable and physically materialable facial assets, using progressive learning to separate geometry, appearance, and animation capabilities.
· Introduces the design of dual-channel appearance generation, combining a novel material diffusion model with a pre-trained model, simultaneously in the latent space and image space Perform two-stage optimization.
· Facial assets using BlendShapes or generated Personalized BlendShapes have animation capabilities and further demonstrate the use of DreamFace for natural character design.
Geometry generation
The geometry generation module can generate a consistent geometric model based on text prompts. However, when it comes to face generation, this can be difficult to supervise and converge.
Therefore, DreamFace proposes a selection framework based on CLIP (Contrastive Language-Image Pre-Training), which first selects the best candidates from randomly sampled candidates in the face geometric parameter space. Get a good rough geometry model and then sculpt the geometric details to make the head model more consistent with the text prompt.
Based on the input prompts, DreamFace uses the CLIP model to select the rough geometry candidate with the highest matching score. Next, DreamFace uses an implicit diffusion model (LDM) to perform Scored Distillation Sampling (SDS) processing on the rendered image under random viewing angles and lighting conditions.
This allows DreamFace to add facial details to rough geometry models through vertex displacement and detailed normal maps, resulting in highly detailed geometry.
Similar to the head model, DreamFace also makes hairstyle and color selections based on this framework.
Physically Based Material Diffusion Generation
The physically based material diffusion module is designed to predict facial textures that are consistent with predicted geometry and text cues.
First, DreamFace fine-tuned the pre-trained LDM on the large-scale UV material data set collected to obtain two LDM diffusion models.
DreamFace uses a joint training scheme that coordinates two diffusion processes, one for directly denoising UV textures map, and the other is used to supervise the rendered image to ensure the correct formation of facial UV maps and rendered images consistent with text cues.
In order to reduce the generation time, DreamFace adopts a rough texture potential diffusion stage to provide a priori potential for detailed texture generation.
To ensure that the texture maps created do not contain undesirable features or lighting situations, while still maintaining diversity, the design A cued learning strategy.
The team uses two methods to generate high-quality diffuse reflection maps:
(1) Prompt Tuning. Unlike hand-crafted domain-specific text cues, DreamFace combines two domain-specific continuous text cues Cd and Cu with corresponding text cues, which will be optimized during U-Net denoiser training to avoid instability and Time-consuming manual writing of prompts.
(2) Masking of non-face areas. The LDM denoising process will be additionally constrained by non-face area masks to ensure that the resulting diffuse map does not contain any unwanted elements.
As the final step, DreamFace applies the Super-Resolution module to generate 4K physically-based textures for high-quality rendering.
The DreamFace framework has achieved very good results in generating celebrities and generating characters based on descriptions. In the User Study Obtained results that far exceeded previous work. Compared with previous work, it also has obvious advantages in running time.
In addition to this, DreamFace also supports texture editing using hints and sketches. Global editing effects such as aging and makeup can be achieved by directly using fine-tuned texture LDMs and cues. By further combining masks or sketches, various effects can be created such as tattoos, beards, and birthmarks.
Animation ability generation
# #DreamFace generates models with animation capabilities. Unlike BlendShapes-based methods, DreamFace’s neural facial animation method produces personalized animations by predicting unique deformations to animate the resulting Neutral model.
First, a geometric generator is trained to learn the latent space of expressions, where the decoder is extended to be conditioned on neutral geometric shapes. Then, the expression encoder is further trained to extract expression features from RGB images. Therefore, DreamFace is able to generate personalized animations conditioned on neutral geometric shapes using monocular RGB images.
Compared to DECA, which uses generic BlendShapes for expression control, DreamFace's framework provides fine expression details and is able to capture performances with fine detail.
Conclusion
This paper introduces DreamFace, a text-guided progressive 3D generation framework that combines the latest visual-language models, implicit Diffusion models, and physically based material diffusion techniques.
DreamFace’s main innovations include geometry generation, physically based material diffusion generation and animation capability generation. Compared with traditional 3D generation methods, DreamFace has higher accuracy, faster running speed and better CG pipeline compatibility.
DreamFace’s progressive generation framework provides an effective solution for solving complex 3D generation tasks and is expected to promote more similar research and technology development.
In addition, physically based material diffusion generation and animation capability generation will promote the application of 3D generation technology in film and television production, game development and other related industries.
The above is the detailed content of Shanghai University of Science and Technology and others released DreamFace: just text can generate a 'hyper-realistic 3D digital human'. For more information, please follow other related articles on the PHP Chinese website!

随着大型语言模型(LLM)、扩散(Diffusion)等技术的发展,ChatGPT、Midjourney等产品的诞生掀起了新一波的AI热潮,生成式AI也成为备受关注的话题。与文本和图像不同,3D生成仍处于技术探索阶段。2022年年底,Google、NVIDIA和微软相继推出了自己的3D生成工作,但大多基于先进的神经辐射场(NeRF)隐式表达,与工业界3D软件如Unity、UnrealEngine和Maya等的渲染管线不兼容。即使通过传统方案将其转换为Mesh表达的几何和颜色贴图,也会造成精度不足

「你好,我在咱们公司刚入职。业务上有什么事儿,就请您多多指教啦!」什么,这些同事竟然都是大模型驱动的“数字人”?只需30秒画面,10秒音频,10分钟就能极速定制一个这样和真人无异的“数字同事”。它可以直接和你实时交互,并且有着通信运营商级别的高质量低延迟的音画传输。就像这样:像这样:这是小冰公司最新上线的“零样本”数字人(Zero-shotXiaoiceNeuralRendering,Zero-XNR)技术,依托超千亿大模型基座,新技

最快5分钟,打造一个直接上岗工作的3D数字人。这是大模型给数字人领域带来的最新震撼。就像这样,一句话描述需求:生成的数字人直接就能进驻直播间当主播。跳起女团舞也不在话下。整个制作过程中,想到什么说什么就行,大模型都能自动拆解需求,瞬间get设计、修改思路。△2倍速再也不怕老板/甲方的想法太新奇。这样的文生数字人技术,来自百度智能云最新发布。该说不说,是要把数字人的使用门槛一口气砍没的节奏了。听闻如此神器,我们照例第一时间争取到了内测资格,更多细节,一起先睹为快~一句话5分钟,3D数字人直接上岗从

作为构建元宇宙内容的基石,数字人是最早可落地且可持续发展的元宇宙细分成熟场景,目前,虚拟偶像、电商带货、电视主持、虚拟主播等商业应用已被大众认可。在元宇宙世界中,最核心的内容之一非数字人莫属,因为数字人不光是真实世界人类在元宇宙中的“化身”,也是我们在元宇宙中进行各种交互的重要载具之一。众所周知,创建和渲染逼真的数字人类角色是计算机图形学中最困难的问题之一。近日,在由51CTO主办的MetaCon元宇宙技术大会《游戏与AI交互》分会场中,Unity大中华区平台技术总监杨栋通过一系列的Demo演示

打开一个数字人,里面全是生成式AI。9月23日晚上,杭州亚运会的开幕式上,点燃主火炬的环节展现了上亿线上数字火炬手的「小火苗」聚集在钱塘江上,形成了一个数字人形象。接着,数字人火炬手和现场的第六棒火炬手一同走到火炬台前,共同点燃了主火炬作为开幕式的核心创意,数实互联的火炬点燃形式冲上了热搜,引发了人们的重点关注。重写后的内容:作为开幕式的核心创意,数实互联的火炬点燃方式引起了热议,吸引了人们的关注数字人点火是一个前所未有的创举,上亿人参与其中,涉及了大量先进且复杂的技术。其中最重要的问题之一是如

在当今技术先进的世界中,栩栩如生的数字人已经成为了一个备受关注的新兴领域。作为一种基于计算机图形(CG)技术与人工智能技术创造出的与人类形象接近的数字化虚拟形象,数字人能够为人们提供更加便捷、高效、个性化的服务。与此同时,数字人的出现也可以促进虚拟经济的发展,为数字内容创新和数字消费提供更多机会。根据国际数据公司(IDC)发布的报告预测,全球虚拟数字人市场规模预计在2025年将达到270亿美元,年复合增长率高达22.5%。由此可见,数字人具有非常广阔的应用前景和市场潜力。什么是数字人?数字人是运

导读:对话技术是数字人交互的核心能力之一,这次分享主要从百度 PLATO 相关的研发和应用出发,谈谈大模型对对话系统的影响和对数字人的一些机会,本次分享题目为:大模型推动的人机交互对话。今天的介绍从以下几点展开:对话系统概览百度 PLATO 及相关技术对话大模型落地应用、挑战及展望一、对话系统概览1、对话系统概览日常生活中,我们常常接触到一些偏任务类型的对话系统,比如让手机助手定闹铃、让智能音箱放首歌。这种在特定领域内的垂类对话,技术相对成熟,系统设计上通常是模块化的,包括对话理解、对话管理、

2023年,中国的各大电商平台将会上线多个直播间,这些直播间将配备"数字人"主播。这些主播不仅能够高度模仿真人的表情和动作,还可以24小时直播带货,并且能够流畅地回答消费者的购物问题。根据相关数据统计,目前中国有近1.4亿个从事视频表演等活动的主播账号,其中虚拟"数字人"的比例达到了四成根据东方证券发布的虚拟数字人行业报告显示,预计到2030年,我国虚拟数字人市场规模将达到2700亿元韩坤先生,新壹科技的董事长,主持了数字人形象的发布会作为中国领先的人工智


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

Atom editor mac version download
The most popular open source editor

Dreamweaver Mac version
Visual web development tools

Dreamweaver CS6
Visual web development tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
