


A new step towards high-quality image generation: Google's UFOGen ultra-fast sampling method
In the past year, a series of Vincentian graph diffusion models represented by Stable Diffusion have completely changed the field of visual creation. Countless users have improved their productivity with images produced by diffusion models. However, the speed of generation of diffusion models is a common problem. Because the denoising model relies on multi-step denoising to gradually turn the initial Gaussian noise into an image, it requires multiple calculations of the network, resulting in a very slow generation speed. This makes the large-scale Vincentian graph diffusion model very unfriendly to some applications that focus on real-time and interactivity. With the introduction of a series of technologies, the number of steps required to sample from a diffusion model has increased from the initial few hundred steps to dozens of steps, or even only 4-8 steps.
Recently, a research team from Google proposed the UFOGen model, a variant of the diffusion model that can sample extremely quickly. By fine-tuning Stable Diffusion with the method proposed in the paper, UFOGen can generate high-quality images in just one step. At the same time, Stable Diffusion's downstream applications, such as graph generation and ControlNet, can also be retained.
Please click the following link to view the paper: https://arxiv.org/abs/2311.09257
As you can see from the picture below, UFOGen can generate high-quality, diverse pictures in just one step.
Improving the generation speed of diffusion models is not a new research direction. Previous research in this area mainly focused on two directions. One direction is to design more efficient numerical calculation methods, so as to achieve the purpose of solving the sampling ODE of the diffusion model using fewer discrete steps. For example, the DPM series of numerical solvers proposed by Zhu Jun's team at Tsinghua University have been verified to be very effective in Stable Diffusion, and can significantly reduce the number of solution steps from the default 50 steps of DDIM to less than 20 steps. Another direction is to use the knowledge distillation method to compress the ODE-based sampling path of the model to a smaller number of steps. Examples in this direction are Guided distillation, one of the best paper candidates of CVPR2023, and the recently popular Latent Consistency Model (LCM). LCM, in particular, can reduce the number of sampling steps to only 4 by distilling the consistency target, which has spawned many real-time generation applications.
However, Google’s research team did not follow the above general direction in the UFOGen model. Instead, it took a different approach and used a mixture of the diffusion model and GAN proposed more than a year ago. Model ideas. They believe that the aforementioned ODE-based sampling and distillation has its fundamental limitations, and it is difficult to compress the number of sampling steps to the limit. Therefore, if you want to achieve the goal of one-step generation, you need to open up new ideas.
Hybrid model refers to a method that combines a diffusion model and a generative adversarial network (GAN). This method was first proposed by NVIDIA's research team at ICLR 2022 and is called DDGAN ("Using Denoising Diffusion GAN to Solve Three Problems in Generative Learning"). DDGAN is inspired by the shortcomings of ordinary diffusion models that make Gaussian assumptions about noise reduction distributions. Simply put, the diffusion model assumes that the denoising distribution (a conditional distribution that, given a noisy sample, generates a less noisy sample) is a simple Gaussian distribution. However, the theory of stochastic differential equations proves that such an assumption only holds true when the noise reduction step size approaches 0. Therefore, the diffusion model requires a large number of repeated denoising steps to ensure a small denoising step size, resulting in a slower generation speed.
DDGAN proposes to abandon the Gaussian assumption of the denoising distribution and instead Use a conditional GAN to simulate this noise reduction distribution. Because GAN has extremely strong representation capabilities and can simulate complex distributions, a larger noise reduction step size can be used to reduce the number of steps. However, DDGAN changes the stable reconstruction training goal of the diffusion model into the training goal of GAN, which can easily cause training instability and make it difficult to extend to more complex tasks. At NeurIPS 2023, the same Google research team that created UGOGen proposed SIDDM (paper title Semi-Implicit Denoising Diffusion Models), reintroducing the reconstruction objective function into the training objective of DDGAN, making training more stable and The generation quality is greatly improved compared to DDGAN.
SIDDM, as the predecessor of UFOGen, can generate high-quality images on CIFAR-10, ImageNet and other research data sets in only 4 steps. But SIDDM has two problems that need to be solved: first, it cannot achieve one-step generation of ideal conditions; second, it is not simple to extend it to the field of Vincentian graphs that attract more attention. To this end, Google’s research team proposed UFOGen to solve these two problems.
Specifically, for question one, through simple mathematical analysis, the team found that by changing the parameterization method of the generator and changing the calculation method of the reconstruction loss function, the theory The above model can be generated in one step. For question two, the team proposed to use the existing Stable Diffusion model for initialization to allow the UFOGen model to be expanded to Vincent diagram tasks faster and better. It is worth noting that SIDDM has proposed that both the generator and the discriminator adopt the UNet architecture. Therefore, based on this design, the generator and discriminator of UFOGen are initialized by the Stable Diffusion model. Doing so makes the most of Stable Diffusion's internal information, especially about the relationship between images and text. Such information is difficult to obtain through adversarial learning. The training algorithm and diagram are shown below.
It is worth noting that before this, there was some work using GAN to do Vincentian graphs, such as NVIDIA StyleGAN-T and Adobe's GigaGAN both extend the basic architecture of StyleGAN to a larger scale, allowing them to generate graphs in one step. The author of UFOGen pointed out that compared with previous GAN-based work, in addition to generation quality, UFOGen has several advantages:
Rewritten content: 1. In the Vincentian graph task , pure generative adversarial network (GAN) training is very unstable. The discriminator not only needs to judge the texture of the image, but also needs to understand the degree of match between the image and the text, which is a very difficult task, especially in the early stages of training. Therefore, previous GAN models, such as GigaGAN, introduced a large number of auxiliary losses to help training, which made training and parameter adjustment extremely difficult. However, UFOGen makes GAN play a supporting role in this regard by introducing reconstruction loss, thereby achieving very stable training
2. Training GAN directly from scratch is not only unstable but also abnormal Expensive, especially for tasks like Vincent plots that require large amounts of data and training steps. Because two sets of parameters need to be updated at the same time, the training of GAN consumes more time and memory than the diffusion model. UFOGen's innovative design can initialize parameters from Stable Diffusion, greatly saving training time. Usually convergence only requires tens of thousands of training steps.
3. One of the charms of the Vincent graph diffusion model is that it can be applied to other tasks, including applications that do not require fine-tuning such as graph graphs, and applications that already require fine-tuning such as controlled generation. Previous GAN models have been difficult to scale to these downstream tasks because fine-tuning GANs has been difficult. In contrast, UFOGen has the framework of a diffusion model and therefore can be more easily applied to these tasks. The figure below shows UFOGen's graph generation graph and examples of controllable generation. Note that these generation only require one step of sampling.
Experiments have shown that UFOGen only needs one step of sampling to generate high-quality images that conform to text descriptions. Compared with recently proposed high-speed sampling methods for diffusion models (such as Instaflow and LCM), UFOGen shows strong competitiveness. Even compared to Stable Diffusion, which requires 50 steps of sampling, the samples generated by UFOGen are not inferior in appearance. Here are some comparison results:
Summary
The Google team proposed a method called UFOGen Powerful model, achieved by improving the existing diffusion model and a hybrid model of GAN. This model is fine-tuned by Stable Diffusion, and while ensuring the ability to generate graphs in one step, it is also suitable for different downstream applications. As one of the early works to achieve ultra-fast text-to-image synthesis, UFOGen has opened up a new path in the field of high-efficiency generative models
The above is the detailed content of A new step towards high-quality image generation: Google's UFOGen ultra-fast sampling method. For more information, please follow other related articles on the PHP Chinese website!

The legal tech revolution is gaining momentum, pushing legal professionals to actively embrace AI solutions. Passive resistance is no longer a viable option for those aiming to stay competitive. Why is Technology Adoption Crucial? Legal professional

Many assume interactions with AI are anonymous, a stark contrast to human communication. However, AI actively profiles users during every chat. Every prompt, every word, is analyzed and categorized. Let's explore this critical aspect of the AI revo

A successful artificial intelligence strategy cannot be separated from strong corporate culture support. As Peter Drucker said, business operations depend on people, and so does the success of artificial intelligence. For organizations that actively embrace artificial intelligence, building a corporate culture that adapts to AI is crucial, and it even determines the success or failure of AI strategies. West Monroe recently released a practical guide to building a thriving AI-friendly corporate culture, and here are some key points: 1. Clarify the success model of AI: First of all, we must have a clear vision of how AI can empower business. An ideal AI operation culture can achieve a natural integration of work processes between humans and AI systems. AI is good at certain tasks, while humans are good at creativity and judgment

Meta upgrades AI assistant application, and the era of wearable AI is coming! The app, designed to compete with ChatGPT, offers standard AI features such as text, voice interaction, image generation and web search, but has now added geolocation capabilities for the first time. This means that Meta AI knows where you are and what you are viewing when answering your question. It uses your interests, location, profile and activity information to provide the latest situational information that was not possible before. The app also supports real-time translation, which completely changed the AI experience on Ray-Ban glasses and greatly improved its usefulness. The imposition of tariffs on foreign films is a naked exercise of power over the media and culture. If implemented, this will accelerate toward AI and virtual production

Artificial intelligence is revolutionizing the field of cybercrime, which forces us to learn new defensive skills. Cyber criminals are increasingly using powerful artificial intelligence technologies such as deep forgery and intelligent cyberattacks to fraud and destruction at an unprecedented scale. It is reported that 87% of global businesses have been targeted for AI cybercrime over the past year. So, how can we avoid becoming victims of this wave of smart crimes? Let’s explore how to identify risks and take protective measures at the individual and organizational level. How cybercriminals use artificial intelligence As technology advances, criminals are constantly looking for new ways to attack individuals, businesses and governments. The widespread use of artificial intelligence may be the latest aspect, but its potential harm is unprecedented. In particular, artificial intelligence

The intricate relationship between artificial intelligence (AI) and human intelligence (NI) is best understood as a feedback loop. Humans create AI, training it on data generated by human activity to enhance or replicate human capabilities. This AI

Anthropic's recent statement, highlighting the lack of understanding surrounding cutting-edge AI models, has sparked a heated debate among experts. Is this opacity a genuine technological crisis, or simply a temporary hurdle on the path to more soph

India is a diverse country with a rich tapestry of languages, making seamless communication across regions a persistent challenge. However, Sarvam’s Bulbul-V2 is helping to bridge this gap with its advanced text-to-speech (TTS) t


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.
