Home >Technology peripherals >AI >Get a virtual 3D wife in 30 seconds with a single card! Text to 3D generates a high-precision digital human with clear pore details, seamlessly connecting with Maya, Unity and other production tools
ChatGPT has injected a dose of chicken blood into the AI industry. Everything that was once unimaginable has become basic practice today.
is continuing to advance Text-to-3D, which is regarded as following Diffusion(image) and GPT(text). The next frontier hot spot in the field of AIGC has received unprecedented attention.
No, a product called ChatAvatar has been put into low-key public beta. It quickly attracted more than 700,000 views and attention, and was listed in the Hot (Spaces of the week) .
△ChatAvatar will also support Image to 3D technology that generates 3D stylized characters from AI-generated single-perspective/multi-perspective original paintings, which has been widely received Pay attention to
The 3D model generated by the current beta version can be directly downloaded to the local together with the PBR material. Not only does it work well, but more importantly it is free to play. Some netizens exclaimed:
It’s so cool, I feel like I can easily generate my own digital twin.
#This has attracted many netizens to try it out and contribute their ideas. Some people combined this product with ControlNet and found that the effect was so delicate and realistic that it was unexpected.
This Text-to-3D tool with almost zero threshold to use is called ChatAvatar and was created by the domestic AI startup Yingmo Technology Team.
It is understood that this is the world’s first Production-Ready Text to 3D product. Through simple text, such as the name of a star or the appearance of a desired character, it can generate film and television-level images. 3D hyper-realistic digital human assets.
The efficiency is also very high. It only takes 30 seconds on average to make a face that looks real - even your own.
In the future, the generation field will also be expanded to other three-dimensional assets.
And the model has regular topology, PBR material with 4k resolution, and binding. It can be directly connected to the production pipeline of production engines such as Unity, Unreal Engine, and Maya.
So, what kind of 3D generation tool is ChatAvatar? What technology is used behind it?
Experience the gameplay of ChatAvatar personally and find that it can be said to be a truly zero threshold.
Specifically, you only need to describe your needs to ChatBot in vernacular on the official website in the form of a conversation, and a 3D face can be generated on demand and covered with a sticker The real "human skin" of the model.
During the entire conversation process, ChatBot will guide according to the user's needs to understand the user's thoughts on the required model in as much detail as possible.
During the experience, we described to ChatBot such a 3D image we want to generate:
Click left Click the Generate button on the side, and in less than 10 seconds on average, the initial prototypes of 9 different 3D faces generated according to the description will appear on the screen.
After selecting one of them at will, the model and material will continue to be optimized based on the selection. Finally, the model rendering result after covering the skin will appear, and the rendering effect under different light and shadow will be displayed - these renderings are completed in real time in the browser :
## Use the mouse to drag, you can also rotate the head, and zoom in to see more detailed local effects, pores and acne are clearly visible:It is worth mentioning that if the user is a prompt engineering expert, he can also complete the generation by directly entering prompt in the left box.
Finally, with one click download, you can get a 3D digital head asset that can be directly connected to the production engine and driven:
Although the beta version The hairstyle function has not yet been launched, but overall, the final generated 3D digital human assets and description content have a high degree of matching.
The official website also displays many assets generated by ChatAvatar users, with different races, skin colors, different ages, joy, anger, sorrow, beauty, ugly, fat and thin, and all kinds of looks.
Let’s summarize the highlights of the ChatAvatar product for generating 3D digital human assets:
First of all, it is easy to use; secondly,The generation span is large, and the facial features can be changed, and masks, tattoos, etc. that fit the face can also be generated, such as this:
##According to the official According to the promotional video, ChatAvatar can even further generate characters beyond human scope, such as characters in film and television works such as Avatar:
##The most important thing is, ChatAvatar
Solve the compatibility issues between 3D models and traditional rendering software. This means that the 3D assets generated by ChatAvatar can be directly integrated into the game and film and television production processes.
Of course, before being officially connected to the industrial process, ChatAvatar has already attracted thousands of artists and professional art personnel to participate in the first round of public beta, and related topics on Twitter have received nearly one million views and attention.
Any tweet can have more than 50k views.
It’s not for nothing that I have accumulated a lot of “tap water”. Look at the 3D face of Einstein. Who wouldn’t say that it really looks like him?
If combined with ControlNet, the generated effect is no less than that of a SLR photo taken directly:
There are already many users After the experience, I began to imagine using this Text-to-3D tool on a large scale in industrial applications such as games, film and television.
It is understood that user feedback will become an important basis for the ChatAvatar team to quickly iterate and update, forming a data flywheel to provide more complete and demand-based functions in a timely manner.
In fact, for previous designers or companies in the 3D industry, most AI text-to-3D applications are not ineffective, but there are still many difficulties in actually implementing them into the industrial design process.
What are the technical reasons behind ChatAvatar being able to make such a big splash this time?
What is the difficulty in generating 3D assets that meet industry requirements?
The biggest difficulty is to make the things generated by AI meet the industry's requirements for 3D assets from the
standard. How do you understand the
Industry Standardhere? From the perspective of professional 3D art design, there are at least three aspects-Quality, controllability and generation speed.
The first is quality. Especially for the film, television and game industries that emphasize visual effects, in order to generate 3D assets that meet pipeline requirements, "industry unspoken rules" such as topological regularity and texture mapping accuracy are the first steps that must be taken for AI products. Hom.
Take the regularity of the topological structure as an example. This essentially refers to the reasonableness of the 3D asset wiring.
For 3D assets, the regularity of the topology often directly affects the animation effect, modification processing efficiency and texture drawing speed of the object:
According to the introduction of 3D art design in the industry, manual retopology The time cost is often higher than the production of the 3D model itself, even in multiples. This means that no matter how cool the 3D assets generated by the AI model are, if the generated topological regularity does not meet the requirements, the cost cannot be fundamentally reduced. Not to mention texture accuracy.
△Yingmo Technology’s ChatAvatar project has significantly improved compared to previous work in terms of generation quality, speed and standard compatibility
Take PBR textures, which are currently commonly required by the game and film and television industries, as an example. They include a series of textures such as reflectivity maps and normal maps, which are equivalent to the "layers" of 2D image PSD files and are essential for 3D asset pipeline production. One of the few conditions.
However, the current 3D assets generated by AI are often a "whole", and it is rare to be able to independently generate PBR textures that meet the industrial environment as required.
The second is controllability. For generative AI, how to make the generated content more "controllable" is another major requirement of the CG industry for this technology.
Take the well-known 2D industry as an example. Before the emergence of ControlNet, the 2D AIGC industry had been in a state of "semi-dark progress".
In other words, AI can generate images of objects of specified categories, but cannot generate objects of specified postures. The generation effect depends entirely on prompt engineering and "metaphysics."
After the emergence of ControlNet, the controllability of 2D AI image generation has been improved by leaps and bounds. However, for 3D AI, in order to generate assets with corresponding effects, it still relies heavily on professional Prompt works.
Finally is the generation speed. Compared with 3D art design, the advantage of AI generation is speed. However, if the speed and effect of AI rendering cannot match that of manual rendering, then this technology will still not be able to bring benefits to the industry.
Take NeRF, which is currently very popular in AI technology, as an example. Its industrialization is faced with compatibility problems of speed and quality.
When the generation quality is high, 3D generation based on NeRF often takes a long time; however, if speed is pursued, even 3D assets generated by NeRF will not be put into industrial use at all.
But even if this problem is solved, how to make NeRF compatible with mainstream engines in the traditional CG industry without losing accuracy is still a huge problem.
It is not difficult to find from the above industrial standardization process that there are two major bottlenecks in the implementation of most AI text to 3D applications:
One is that the prompt project needs to be completed manually, which is not friendly enough for non-AI professionals or designers who do not understand AI; the other is that the generated 3D assets often do not meet industry standards and cannot be put into use no matter how beautiful they are.
In view of these two points, ChatAvatar has given two specific and effective solutions.
On the one hand, ChatAvatar realizes a second path besides manual input prompt engineering, and is also a shortcut more suitable for ordinary people: describing needs through direct dialogue through "Party A mode".
The team’s official Twitter said that in order to realize this feature, ChatAvatar developed a method of converting conversational descriptions into portrait features based on GPT’s capabilities.
Designers only need to keep chatting with GPT and describe the "feeling" they want:
GPT can automatically help complete the prompt project and display the results Delivered to AI:
# In other words, if ControlNet is the "Game Changer" for the 2D industry, then for the 3D industry, ChatAvatar can convert text into 3D , is nothing short of a game changer for the industry.
On the other hand, what is more important is that ChatAvatar is perfectly compatible with the CG pipeline, that is, the generated assets meet industry requirements in terms of topology, controllability and speed.
This not only means that after generating 3D assets, the downloaded content can be directly imported into various post-production software for secondary editing, with greater controllability;
At the same time, the generated Models and high-precision material maps can also achieve extremely realistic rendering effects in later rendering.
In order to achieve such an effect, the team developed a progressive 3D generation framework DreamFace for ChatAvatar.
The key lies in the underlying data used to train the model, which is the world's first large-scale, High-precision, multi-expression face high-precision data set.
Based on this data set, DreamFace can efficiently complete the generation ofproduct-level three-dimensional assets, that is, the generated assets have regular topology, materials, and bindings.
DreamFace mainly includes three modules: geometry generation, physics-based material diffusion and animation capability generation. By introducing an external 3D database, DreamFace can directly output assets that comply with the CG process.△The effect of generated asset-driven renderingThe essence of solving the above two major technical bottlenecks The trend of the times has been further accelerated by the AIGC torrent, and "generation" will replace "search" - The Shadow Eye team believes that "generation" will become the way to obtain a new generation of digital assets. Previously, when we needed to find a picture or asset that met our needs, we usually used search engines to query. The huge "search box" and neat asset cards displayed on the homepage of the ChatAvatar project look like a search engine, but in fact it is a completely different way of finding assets than search. △ChatAvatar project homepageYingmu Technology CTO Zhang Qixuan introduced this:In the past, if we needed an illustration, maybe You have to search repeatedly in multiple libraries, or use more complicated methods such as Photoshop synthesis and hand-painting to get the results. But after the emergence of technologies such as Stable Diffusion, you only need to describe the desired image through text, and you can directly generate results that meet your needs. This is a huge impact on traditional asset libraries. The goal of ChatAvatar is to replace the traditional search-based 3D asset library with 3D generation. The next frontier in the field of AIGCChatGPT has stirred up waves with one stone. After entering the AI 2.0 era, people’s attention has also turned to multi-modal information including images, videos, 3D and other information. AI. As far as the field of 3D generation is concerned, whether it is the film, television or game industries, the 3D content production and consumption market is already large enough, but it is hampered by technical difficulties at the production level. For example, Transformer, which is very popular in the field of text, has relatively limited use in the field of 3D generation. Last summer, when the
文生图 field achieved results due to Diffusion Model, people began to expect Text generation 3D to have the same amazing performance. Once the 3D creation technology of generative AI matures, content creation such as VR and video will take off.
△"Van Gogh Wind Photography" generated by diffusion model Midjourney5.1In fact, Both technology giants and start-up companies are indeed secretly working in the direction of Text-to-3D. In September last year, Google released FreamFusion, which generates 3D models based on text prompts, claiming that it does not require 3D training data or modify the image diffusion model. Following closely, Meta also launched the Make-A-Video model that can generate videos from text with one click. Later in the Text-to-3D AI model team, Nvidia Magic3D, OpenAI’s latest open source project Shap-E, etc. have appeared successively. The top computer graphics conference SIGGRAPH 2023 will be held in August this year. There are also many papers related to Text-to-3D. Yingmo Technology’s paper on DreamFace, a text-guided progressive 3D generation framework, is one of them. ChatAvatar is also the most generative model product focused on 3D digital human assets so far.
The AI startup company behind itYingmu Technology was incubated from the MARS Laboratory of Shanghai University of Science and Technology in 2020. After its establishment, it received two rounds of investment from Qiji Chuangtan and Sequoia Seeds.
The company focuses on the research and productization of computer graphics and generative AI. In 2021, before AIGC made huge waves, the company had already launched Wand, the first AIGC ToC painting application in China, and the product once topped the AppStore partition.
And this forward-looking team, which is already well-known in the industry, average age is only 25 years old.
After specifically anchoring the first commercialization scenario on digital people, ChatAvatar is their latest progress in this direction by taking advantage of AIGC.
As a newly launched product, ChatAvatar has exceeded the expectations of the Yingmo team in terms of product effects such as compatibility, completion and accuracy. However, in Wu Di's words, the process of getting here was "very embarrassing."
The main reason is nothing more than "lack of people". At present, Shadow Eye has made progress in multi-category 3D generation technology, and the next step is to launch "3D generated large models".
##△Yingmo Technology will launch its first multi-modal model in May The cross-platform 3D search engine Rodin connects multiple 3D asset platforms such as Sketchfab, and supports searching for 3D through text, 3D through pictures, and even 3D through 3D. The search engine is just the primary form of Rodin, and Shadow Eye will build Rodin into a large 3D generated model. To continue to move forward, more engineering teams, technical artists, and product talents who embrace generative AI are needed to join the team. As a team with R&D as its main background, such talents are still in short supply. "People are the measure of all things," Wu Di said, "We need more like-minded people to join in and jointly promote innovative development in the 3D field." As you can see, the technology behind ChatAvatar Building from scratch reveals the continuous innovation of an AI start-up company; and the company's desire for talents from small to large reveals that under the wave of AIGC, every segment wants to emerge from the water Heart on the water. Are you willing to embrace generative AI and become a Game Changer in the Text-to-3D field?
The above is the detailed content of Get a virtual 3D wife in 30 seconds with a single card! Text to 3D generates a high-precision digital human with clear pore details, seamlessly connecting with Maya, Unity and other production tools. For more information, please follow other related articles on the PHP Chinese website!