Home > Article > Technology peripherals > Image generation based on Diffusion Model
In 2015, it was proposed in the article Deep Unsupervised Learning using Nonequilibrium Thermodynamics that the generative models at that time, such as VAE, had a big difficulty. This type of model first defines the conditional distribution, and then defines the variational posterior for adaptation. In the end, it will be necessary to optimize the conditional distribution and the variational posteriori at the same time. However, this is very difficult. If we can define a simple process that maps the data distribution to a standard Gaussian, the task of the "generator" becomes simply fitting each small step of the inverse process of this process. This is the core idea of the diffusion model. . However, this article did not make any waves at the time.
In 2020, based on the thoughts of predecessors, the DDPM model (Denoising Diffusion Probabilistic Models), compared to the basic diffusion model, the author combines the diffusion model and denoising scores to guide the training and sampling process, which brings about an appropriate improvement in the generated image samples, making it easier and more stable to train. , the final result is comparable to the GAN model.
Figure 2-Generation results of DDPM
However, the DDPM model is not perfect. Since the diffusion process is a Markov chain, its disadvantage is that it requires a relatively large number of diffusion steps to obtain better results, which results in very slow sample generation.
So after DDPM, in 2021, Song and others proposed DDIM (Denoising Diffusioin Implicit Model), which transformed the diffusion process of DDPM The sampling method extends the traditional Markov diffusion process to a non-Markov process, and can use smaller sampling steps to accelerate sample generation, greatly improving efficiency.
There are also some improvements in the follow-up work to integrate the diffusion model with the traditional generative network, such as the combination of VAE and DM models, the combination of GAN DM, etc. , I will not go into details here.
In 2022, Google launched a new AI system based on the diffusion model that can Text descriptions turned into realistic images.
image 3
Figure 4
It can be seen from the schematic diagram provided by Google that the input text is first encoded, and then converted into a 64*64 small image by a text-to-image diffusion model. Further, the small image is processed using a super-resolution diffusion model. , the resolution of the image is improved in the further iteration process, and the final generated result is obtained - a final image of 1024*1024. This magical process is just like what everyone feels when using it. You enter a piece of text - a golden retriever dog wearing a red dotted turtleneck and a blue checkered hat, and then the program automatically generates the above text for you. Pictures of dogs seen.
Another popular phenomenon-level application - novalAI, this was originally a website dedicated to AI writing. Based on the current hot image generation, it combines image resources on the Internet to train An image generation model focusing on two dimensions has been developed, and the effect has begun to reach the level of human painters.
Figure 5
In addition to the traditional inputting of text to produce pictures, it also supports inputting pictures as reference, allowing AI to generate new ones based on known pictures. pictures, which to a certain extent solves the problem of uncontrollable AI-generated results.
So, what is the working process of such a powerful AI technology? Here we take the more classic DDPM model as an example to give a brief process:
The forward process is a process of adding noise to the image in order to construct training sample GT.
For the given initial data distribution x0~q(x), we gradually add Gaussian noise to the data distribution. This process has T times, each step The result is x1,
##As mentioned above, this is a Markov chain process. Eventually, the data will tend to be an isotropic Gaussian distribution. 2.2 Inverse diffusion process
The reverse process is a denoising process. If you know
satisfies the Gaussian distribution and is small enough, then is still a Gaussian distribution, and then cannot be simply inferred, so we use a The deep learning model with parameters # is used to predict it, so there is:
If x0 is known, then through Bayesian formula:
Readers who know something about machine learning should know that all model training is to optimize the parameters of the model to obtain reliable mean and variance. We maximize the logarithm of the model's prediction distribution. Likelihood, that is:
##After a series of derivation, The DDPM model obtained the final loss function expression:
Figure 6
However, with the development of AI technology, there will always be some disputes, and the field of image generation is no exception. In addition to problems with the AI technology itself, such as the generated image structure being wrong and unreasonable, It is also accompanied by some legal disputes, such as the copyright issues of the AI works themselves. Technical problems can be solved through the development of the technology itself. We have reason to believe that with the development of AI technology, image generation will eventually reach a very high level, which will eliminate most low-end painting-related jobs, which will greatly Liberate human productivity. Copyright issues may still require government departments to pay enough attention to the development of related industries and improve relevant policies and systems. This requires us to think more about emerging fields so that AI technology can better serve us. https://www.php.cn/link/3799b2e805a7fa8b076fc020574a73b2 https://www.php.cn/link/6872937617af85db5a39a5243e858d1f https://www.php.cn/link/831da40e5907987235ebe5616446e083 #2.3 Training process
1.
Part 03
●
Summary ●
The diffusion model has shown great potential. Compared with the VAE model, it does not need to align the posterior distribution, nor does it need to train an additional discriminator like GAN. Including computer vision, bioinformatics, and speech processing It has applications in image generation and other aspects. Its application in image generation will help improve the efficiency of image creation. It may allow AI to generate several pictures based on conditions, and humans can filter and modify the results. This will be a new trend in the field of 2D painting in the future. Working mode, which may greatly improve the production efficiency of 2D digital assets. References
The above is the detailed content of Image generation based on Diffusion Model. For more information, please follow other related articles on the PHP Chinese website!