Home >Technology peripherals >AI >Tencent opens source training code for Hunyuan Wensheng graph large model and releases LoRA and ControlNet plug-ins
On June 21, Tencent Hunyuan Wenshengtu Large Model (hereinafter referred to as the Hunyuan DiT model) announced that it will fully open source the training code, and at the same time open source the Hunyuan DiT LoRA small-scale data set training solution and the controllable plug-in ControlNet.
This means that enterprises and individual developers and creators around the world can fine-tune based on the Hunyuan DiT training code to create more personalized exclusive models and create with greater freedom; or based on the Hunyuan DiT training code, Modify and optimize Yuan DiT's code, build its own applications based on this, and promote rapid iteration and innovation of technology.
As a native Chinese model, users can directly use Chinese data and labels when fine-tuning through Hunyuan DiT’s training code, without having to translate the data into English.
Previously, Tencent Hunyuanwenshengtu large model announced a comprehensive upgrade and open source. It has been released on the Hugging Face platform and Github, and can be used by enterprises and individual developers for free commercial use. This is the industry's first Chinese-native DiT architecture Vincentian open source model, supporting bilingual input and understanding in Chinese and English. The model has only been open sourced for one month, and the number of Github stars has reached 2.4k, making it one of the most popular DiT models in the open source community.
Hunyuan DiT Github project page
While the training code is open source, the release of the LoRA small-scale data set training solution and the controllable plug-in ControlNet also makes the open source ecology of the Hunyuan DiT model more imaginative.
LoRA model, full name Low-Rank Adaptation of Large Language Models, is a technology used to fine-tune large language models. In the Vincentian graph model, LoRA is used as a plug-in, allowing users to use a small amount of data to train a model with specific painting style, IP or character characteristics without modifying the original model or increasing the model size.
LoRA technology is very popular in the open source field of Wenshengtu. A large number of creators use this technology to create a variety of models, such as using several personal photos to generate a high-precision photo studio exclusive to a certain person; or create We produce blind box, clay and other style models.
LoRA model on AI image community LiblibAI
The exclusive LoRA plug-in released by Hunyuan DiT this time allows developers to create exclusive models with at least one image. For example, by importing four blue and white porcelain pictures and corresponding prompt words, the model training can be completed, and a "blue and white porcelain" generation model is created: the user inputs simple prompt words to generate the desired blue and white porcelain image.
Part of the training data:
Example of inference results of the trained model:
Blue and white porcelain generation model trained using Hunyuan DiT LoRA
Another plug-in ControlNet launched this time is a A controllable generation algorithm used in the field of Vincentian images, which allows users to better control image generation by adding additional conditions.
Currently, Tencent Hunyuan provides three first-release ControlNet models that can extract and apply conditions such as edges (canny), depth (depth), and human posture (pose) of images, allowing developers to directly use them for inference. The three ControlNet plug-ins can realize the ability to generate full-color images through line drawings, generate images with the same depth structure, and generate people with the same posture. At the same time, Hunyuan DiT has also open sourced the ControlNet training solution, so developers and creators can train customized ControlNet models.
Demonstration of the effects of three ControlNet plug-ins launched by Tencent Hunyuan DiT
Since the Hunyuan DiT model was open sourced, it has received support and feedback from many developers, and the Tencent Hunyuan team has also been continuously improving and optimizing the model based on Hunyuan DiT. Yuan DiT's open source components work with the industry to build a next-generation visual generation open source ecosystem. At the beginning of this month, Hunyuan DiT released an exclusive acceleration library that can further improve the inference efficiency and shorten the graph generation time by 75%. At the same time, the ease of use of the model has been greatly improved. Users can use Hunyuan DiT based on the graphical interface of ComfyUI, or use the Hugging Face Diffusers general model library to call the Hunyuan DiT model with only three lines of code, without downloading the original code library.
It is understood that Tencent’s Hunyuanwenshengtu capabilities have been widely used in many businesses and scenarios such as material creation, product synthesis, and game graphics. At the beginning of this year, Tencent Advertising released Tencent Advertising Miaosi, a one-stop AI advertising creative platform based on Tencent’s Hunyuan model. More than 20 media such as "CCTV News" and "Xinhua Daily" have also used Tencent Hunyuan Wenshengtu for news content production.
Tencent Hunyuan Open Source Wensheng Diagram Large Model
官網:https://dit.hunyuan.tencent.com/
程式碼:https://github.com/Tencent/HunyuanDiT
模式:https://huggingface.co/Tencent-Hunyuan/HunyuanDiT
:https://tencent.github.io/HunyuanDiT/asset/Hunyuan_DiT_Tech_Report_05140553.pdf資料製作流程:https://github.com/Tencent/HunyuanDiT/blob/main/IndexKits/mds/MakeDataset.The above is the detailed content of Tencent opens source training code for Hunyuan Wensheng graph large model and releases LoRA and ControlNet plug-ins. For more information, please follow other related articles on the PHP Chinese website!