Image generation based on Diffusion Model-AI-php.cn

Home

Technology peripherals

Image generation based on Diffusion Model

王林

Apr 14, 2023 pm 02:58 PM

Part 01 ##● Development History ●

1.1 Origin

In 2015, it was proposed in the article Deep Unsupervised Learning using Nonequilibrium Thermodynamics that the generative models at that time, such as VAE, had a big difficulty. This type of model first defines the conditional distribution, and then defines the variational posterior for adaptation. In the end, it will be necessary to optimize the conditional distribution and the variational posteriori at the same time. However, this is very difficult. If we can define a simple process that maps the data distribution to a standard Gaussian, the task of the "generator" becomes simply fitting each small step of the inverse process of this process. This is the core idea of the diffusion model. . However, this article did not make any waves at the time.

1.2 Development

In 2020, based on the thoughts of predecessors, the DDPM model (Denoising Diffusion Probabilistic Models), compared to the basic diffusion model, the author combines the diffusion model and denoising scores to guide the training and sampling process, which brings about an appropriate improvement in the generated image samples, making it easier and more stable to train. , the final result is comparable to the GAN model.

Image generation based on Diffusion Model

Figure 2-Generation results of DDPM

However, the DDPM model is not perfect. Since the diffusion process is a Markov chain, its disadvantage is that it requires a relatively large number of diffusion steps to obtain better results, which results in very slow sample generation.

So after DDPM, in 2021, Song and others proposed DDIM (Denoising Diffusioin Implicit Model), which transformed the diffusion process of DDPM The sampling method extends the traditional Markov diffusion process to a non-Markov process, and can use smaller sampling steps to accelerate sample generation, greatly improving efficiency.

There are also some improvements in the follow-up work to integrate the diffusion model with the traditional generative network, such as the combination of VAE and DM models, the combination of GAN DM, etc. , I will not go into details here.

1.3 Outbreak

In 2022, Google launched a new AI system based on the diffusion model that can Text descriptions turned into realistic images.

Image generation based on Diffusion Model

image 3

Image generation based on Diffusion Model

Figure 4

It can be seen from the schematic diagram provided by Google that the input text is first encoded, and then converted into a 64*64 small image by a text-to-image diffusion model. Further, the small image is processed using a super-resolution diffusion model. , the resolution of the image is improved in the further iteration process, and the final generated result is obtained - a final image of 1024*1024. This magical process is just like what everyone feels when using it. You enter a piece of text - a golden retriever dog wearing a red dotted turtleneck and a blue checkered hat, and then the program automatically generates the above text for you. Pictures of dogs seen.

Another popular phenomenon-level application - novalAI, this was originally a website dedicated to AI writing. Based on the current hot image generation, it combines image resources on the Internet to train An image generation model focusing on two dimensions has been developed, and the effect has begun to reach the level of human painters.

Image generation based on Diffusion Model

Figure 5

In addition to the traditional inputting of text to produce pictures, it also supports inputting pictures as reference, allowing AI to generate new ones based on known pictures. pictures, which to a certain extent solves the problem of uncontrollable AI-generated results.

Part 02 ##● Principle Explanation ●

So, what is the working process of such a powerful AI technology? Here we take the more classic DDPM model as an example to give a brief process:

2.1 Forward process

The forward process is a process of adding noise to the image in order to construct training sample GT.

For the given initial data distribution x0~q(x), we gradually add Gaussian noise to the data distribution. This process has T times, each step The result is x1,

##As mentioned above, this is a Markov chain process. Eventually, the data will tend to be an isotropic Gaussian distribution. Image generation based on Diffusion Model 2.2 Inverse diffusion process

The reverse process is a denoising process. If you know

, x0 can be restored from the complete standard Gaussian distribution. It has been proved that if

satisfies the Gaussian distribution and Image generation based on Diffusion Model is small enough, then is still a Gaussian distribution, and then cannot be simply inferred, so we use a The deep learning model with parameters # is used to predict it, so there is:

Image generation based on Diffusion Model

If x0 is known, then through Bayesian formula:

Image generation based on Diffusion Model

#2.3 Training process

Readers who know something about machine learning should know that all model training is to optimize the parameters of the model to obtain reliable mean and variance. We maximize the logarithm of the model's prediction distribution. Likelihood, that is:

Image generation based on Diffusion Model

##After a series of derivation, The DDPM model obtained the final loss function expression:

Image generation based on Diffusion Model

## Summarize the training process:

Get Input x0, randomly sample a t
Calculate the loss and iteratively minimize the loss function

Image generation based on Diffusion Model

Figure 6

Part 03

● Summary ●

The diffusion model has shown great potential. Compared with the VAE model, it does not need to align the posterior distribution, nor does it need to train an additional discriminator like GAN. Including computer vision, bioinformatics, and speech processing It has applications in image generation and other aspects. Its application in image generation will help improve the efficiency of image creation. It may allow AI to generate several pictures based on conditions, and humans can filter and modify the results. This will be a new trend in the field of 2D painting in the future. Working mode, which may greatly improve the production efficiency of 2D digital assets.

However, with the development of AI technology, there will always be some disputes, and the field of image generation is no exception. In addition to problems with the AI technology itself, such as the generated image structure being wrong and unreasonable, It is also accompanied by some legal disputes, such as the copyright issues of the AI works themselves. Technical problems can be solved through the development of the technology itself. We have reason to believe that with the development of AI technology, image generation will eventually reach a very high level, which will eliminate most low-end painting-related jobs, which will greatly Liberate human productivity. Copyright issues may still require government departments to pay enough attention to the development of related industries and improve relevant policies and systems. This requires us to think more about emerging fields so that AI technology can better serve us.

References

https://www.php.cn/link/3799b2e805a7fa8b076fc020574a73b2

https://www.php.cn/link/6872937617af85db5a39a5243e858d1f

https://www.php.cn/link/831da40e5907987235ebe5616446e083

The above is the detailed content of Image generation based on Diffusion Model. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

A Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles