search
HomeTechnology peripheralsAIAltDiffusion-m18, a versatile tool for generating multilingual texts and images

Currently, the selection of non-English text and image generation models is limited, and users often have to translate the prompt into English before entering the model. This will not only cause additional operational burden, but also language and cultural errors in the translation process will affect the accuracy of the generated images.

Zhiyuan Research Institute’s FlagAI team pioneered an efficient training method, using a multi-language pre-training model combined with Stable Diffusion to train a multi-language text and image generation model - AltDiffusion-m18, supporting 18 types Language text-image generation.

Including Chinese, English, Japanese, Thai, Korean, Hindi, Ukrainian, Arabic, Turkish, Vietnamese, Polish, Dutch, Portuguese, Italian, Spanish, German, French, Russian.

Huggingface: https://huggingface.co/BAAI/AltDiffusion-m18

GitHub: https://github.com/FlagAI-Open/FlagAI/blob/master/examples/AltDiffusion -m18

AltDiffusion-m18 achieved Stable Diffusion 95~99% effect in the objective evaluation of FID, IS, CLIP score in English, reached the optimal level in Chinese and Japanese, and filled in the remaining 15 categories. The gap in the language text and picture generation model has greatly satisfied the industry's strong demand for multi-language text and picture generation. Special thanks go to the Stable Diffusion Research Team for providing advice on this work.

In addition, AltDiffusion-m18 related innovative technology report "AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities" has been accepted by Findings of ACL 2023.

Technical Highlights

1 New AltCLIP, efficient and low-cost construction of multi-language T2I model

AltDiffusion released last year -m9, based on Stable Diffusion v1.4, the Zhiyuan team innovatively replaced the language tower with the multi-language tower AltCLIP, and used multi-language data in nine languages ​​for fine-tuning, extending the original Stable Diffusion that only supports English to support 9 different languages.

AltCLIP: https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP-m18

And AltDiffusion-m18 is based on Stable Diffusion v2.1 training. The new language tower of Stable Diffusion v2.1 is the inverted second layer of OpenCLIP. Therefore, the new AltCLIP uses the inverted second layer of OpenCLIP as the distillation target to retrain, and based on m9, it will only use the CrossAttention layer K and V matrices in Unet. Fine-tuning is expanded into a two-stage training method, as shown in the figure below:

开源AltDiffusion-m18 ,18种语言文图生成all in one

- First stage: Earlier during the experiment of m9, it was discovered that fine-tuning the K and V matrices The main thing to learn is the conceptual alignment of text and pictures, so the first stage of m18 training continues to use the data of 18 languages ​​to fine-tune the K and V matrices. In addition, experiments have proven that reducing the resolution of an image from 512*512 to 256*256 does not lose the semantic information of the image. Therefore, in the first stage of learning text-image concept alignment, the resolution of 256*256 is used for training, which speeds up the training.

- The second stage: In order to further improve the quality of the generated images, use the resolution of 512*512 to train the full parameters of Unet in the data of 18 languages. In addition, 10% of the text is discarded for unconditional training to serve classifier-free guidance inference.

- In addition, a classifier-free guided training technique is adopted to further improve the generation quality.

The latest evaluation results show that AltCLIP-m18 surpasses CLIP and reaches the optimal level in Chinese and English zero-shot (zero sample) retrieval tasks⬇️

开源AltDiffusion-m18 ,18种语言文图生成all in one

On multi-language image classification benchmarks, AltCLIP-m9 (early version, supports 9 languages) and AltCLIP-m18 reach the optimal level ⬇️

开源AltDiffusion-m18 ,18种语言文图生成all in one

Similarly, thanks to AltCLIP With the innovative idea of ​​changing towers, AltDiffusion-m18 can also be seamlessly connected to all Stable Diffusion models and ecological tools built on the original CLIP. All tools that support Stable Diffusion such as Stable Diffusion WebUI, DreamBooth, etc. can be applied to AltDiffusion-m18. Painless to get started and great playability!

2 Multi-language generation effects are aligned, with superior performance and accurate details

With the blessing of the new AltCLIP, AltDiffusion-m18 has achieved 95~99% of the original Stable Diffusion effect in the English FID, IS, CLIP score evaluation, and has achieved the most advanced performance in 17 languages ​​including Chinese and Japanese. The performance of AltDiffusion-m18 is shown in the following table:

开源AltDiffusion-m18 ,18种语言文图生成all in one

## In English, Chinese, and Japanese, AltDiffusion-m18 has superior effects and more detailed results than other model generation results. Accurate:

开源AltDiffusion-m18 ,18种语言文图生成all in one

AltDiffusion-m18 in (a) above can generate results that are highly consistent with the original Stable Diffusion, and is better than other domestic Chinese-English bilingual models in prompt understanding , for example: "A stuffed bear", "A black and white photo", "cat" and other concepts that failed to be generated in other domestic Chinese-English bilingual models can be successfully generated in AltDiffusion. The same phenomenon occurs in Chinese and Japanese.

The "black sofa, wooden floor" in (b) above is only correctly generated by AltDiffusion-m18.

The "bears" in (c) above, Japanese Stable Diffusion incorrectly generates "human", but AltDiffusion-m18 can correctly generate "bear".

In addition, Zhiyuan FlagEval team developed the text and image generation model evaluation tool ImageEval. After evaluation, the accuracy of AltDiffusion-m18 in the entity object and entity quantity dimensions exceeds that of domestic peer models by 11% and 10% respectively (Note: The ImageEval evaluation method and results will be publicly released in the near future, so stay tuned).

3 The savior of small language texts and pictures, providing a reference system for multilingual text and picture generation models

AltDiffusion-m18 learned the biases of different languages ​​from multilingual data, It helps users cross the language translation threshold and bypass cultural translation, reducing the loss of cultural information behind the language. As shown in the figure below, the face outline of the little boy generated by Chinese and Japanese prompts is more "Asian style", while the little boy generated by English and other European language prompts is more "European and American style".

开源AltDiffusion-m18 ,18种语言文图生成all in one

What’s more interesting is that the details of the pictures generated by animal prompts in different languages ​​​​are also different. As shown in the figure below, although the pictures generated in different languages ​​are highly consistent overall, there are subtle differences in the background of the picture and the details of Corgi's facial features.

开源AltDiffusion-m18 ,18种语言文图生成all in one

In general, AltDiffusion-m18 provides a basic reference system for multi-language text and image generation models. Users whose native languages ​​include Spanish, German, and French can enjoy the fun of AIGC without having to translate the prompts in their minds into English. AI training experts can also further optimize based on AltDiffusion-m18 by combining DreamBooth, ControlNet and LoRA, or use corpus fine-tuning in other languages ​​to obtain better text and image generation effects.

At the same time, FlagAI (github.com/FlagAI-Open/FlagAI), a one-stop open source project for large model algorithms, models and tools, also provides training inference tools and APIs for everyone to quickly download and use. AltDiffusion-m18.

The above is the detailed content of AltDiffusion-m18, a versatile tool for generating multilingual texts and images. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Are You At Risk Of AI Agency Decay? Take The Test To Find OutAre You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaHow to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIRevisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkUnderstanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedThe Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniInsights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphA Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.