Home  >  Article  >  Technology peripherals  >  Image generation based on Diffusion Model

Image generation based on Diffusion Model

王林
王林forward
2023-04-14 14:58:202093browse

Part 01

##● Development History

1.1 Origin

In 2015, it was proposed in the article Deep Unsupervised Learning using Nonequilibrium Thermodynamics that the generative models at that time, such as VAE, had a big difficulty. This type of model first defines the conditional distribution, and then defines the variational posterior for adaptation. In the end, it will be necessary to optimize the conditional distribution and the variational posteriori at the same time. However, this is very difficult. If we can define a simple process that maps the data distribution to a standard Gaussian, the task of the "generator" becomes simply fitting each small step of the inverse process of this process. This is the core idea of ​​the diffusion model. . However, this article did not make any waves at the time.

1.2 Development

In 2020, based on the thoughts of predecessors, the DDPM model (Denoising Diffusion Probabilistic Models), compared to the basic diffusion model, the author combines the diffusion model and denoising scores to guide the training and sampling process, which brings about an appropriate improvement in the generated image samples, making it easier and more stable to train. , the final result is comparable to the GAN model.

Image generation based on Diffusion Model

Figure 2-Generation results of DDPM

However, the DDPM model is not perfect. Since the diffusion process is a Markov chain, its disadvantage is that it requires a relatively large number of diffusion steps to obtain better results, which results in very slow sample generation.

So after DDPM, in 2021, Song and others proposed DDIM (Denoising Diffusioin Implicit Model), which transformed the diffusion process of DDPM The sampling method extends the traditional Markov diffusion process to a non-Markov process, and can use smaller sampling steps to accelerate sample generation, greatly improving efficiency.

There are also some improvements in the follow-up work to integrate the diffusion model with the traditional generative network, such as the combination of VAE and DM models, the combination of GAN DM, etc. , I will not go into details here.

1.3 Outbreak

In 2022, Google launched a new AI system based on the diffusion model that can Text descriptions turned into realistic images.

Image generation based on Diffusion Model

image 3

Image generation based on Diffusion Model

Figure 4

It can be seen from the schematic diagram provided by Google that the input text is first encoded, and then converted into a 64*64 small image by a text-to-image diffusion model. Further, the small image is processed using a super-resolution diffusion model. , the resolution of the image is improved in the further iteration process, and the final generated result is obtained - a final image of 1024*1024. This magical process is just like what everyone feels when using it. You enter a piece of text - a golden retriever dog wearing a red dotted turtleneck and a blue checkered hat, and then the program automatically generates the above text for you. Pictures of dogs seen.

Another popular phenomenon-level application - novalAI, this was originally a website dedicated to AI writing. Based on the current hot image generation, it combines image resources on the Internet to train An image generation model focusing on two dimensions has been developed, and the effect has begun to reach the level of human painters.

Image generation based on Diffusion Model

Figure 5


In addition to the traditional inputting of text to produce pictures, it also supports inputting pictures as reference, allowing AI to generate new ones based on known pictures. pictures, which to a certain extent solves the problem of uncontrollable AI-generated results.

Part 02

##● Principle Explanation

So, what is the working process of such a powerful AI technology? Here we take the more classic DDPM model as an example to give a brief process:

2.1 Forward process

The forward process is a process of adding noise to the image in order to construct training sample GT.

For the given initial data distribution x0~q(x), we gradually add Gaussian noise to the data distribution. This process has T times, each step The result is x1,

##As mentioned above, this is a Markov chain process. Eventually, the data will tend to be an isotropic Gaussian distribution. Image generation based on Diffusion Model2.2 Inverse diffusion process

The reverse process is a denoising process. If you know

, x0 can be restored from the complete standard Gaussian distribution. It has been proved that if

satisfies the Gaussian distribution and Image generation based on Diffusion Model is small enough, then is still a Gaussian distribution, and then Image generation based on Diffusion Model cannot be simply inferred, so we use a The deep learning model with parameters Image generation based on Diffusion ModelImage generation based on Diffusion Model# is used to predict it, so there is: Image generation based on Diffusion Model

Image generation based on Diffusion Model

If x0 is known, then through Bayesian formula:

Image generation based on Diffusion Model


#2.3 Training process

Readers who know something about machine learning should know that all model training is to optimize the parameters of the model to obtain reliable mean and variance. We maximize the logarithm of the model's prediction distribution. Likelihood, that is:

Image generation based on Diffusion Model

##After a series of derivation, The DDPM model obtained the final loss function expression:

Image generation based on Diffusion Model


## Summarize the training process:

    1.
  • Get Input x0, randomly sample a t
  • from 1...T 2. Sample a noise from the standard Gaussian distribution
  • Image generation based on Diffusion Model
  • 3.
  • Calculate the loss and iteratively minimize the loss function

Image generation based on Diffusion Model

Figure 6


Part 03

Summary

The diffusion model has shown great potential. Compared with the VAE model, it does not need to align the posterior distribution, nor does it need to train an additional discriminator like GAN. Including computer vision, bioinformatics, and speech processing It has applications in image generation and other aspects. Its application in image generation will help improve the efficiency of image creation. It may allow AI to generate several pictures based on conditions, and humans can filter and modify the results. This will be a new trend in the field of 2D painting in the future. Working mode, which may greatly improve the production efficiency of 2D digital assets.

However, with the development of AI technology, there will always be some disputes, and the field of image generation is no exception. In addition to problems with the AI ​​technology itself, such as the generated image structure being wrong and unreasonable, It is also accompanied by some legal disputes, such as the copyright issues of the AI ​​works themselves. Technical problems can be solved through the development of the technology itself. We have reason to believe that with the development of AI technology, image generation will eventually reach a very high level, which will eliminate most low-end painting-related jobs, which will greatly Liberate human productivity. Copyright issues may still require government departments to pay enough attention to the development of related industries and improve relevant policies and systems. This requires us to think more about emerging fields so that AI technology can better serve us.

References

https://www.php.cn/link/3799b2e805a7fa8b076fc020574a73b2

https://www.php.cn/link/6872937617af85db5a39a5243e858d1f

##​

https://www.php.cn/link/831da40e5907987235ebe5616446e083

The above is the detailed content of Image generation based on Diffusion Model. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete