Home  >  Article  >  Technology peripherals  >  UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration

WBOY
WBOYforward
2023-11-14 08:30:091227browse

The diffusion model that has become popular in half the sky will be eliminated?

Currently, generative AI models, such as GAN, diffusion model or consistency model, generate images by mapping inputs to outputs corresponding to the target data distribution. The content that needs to be rewritten is :

Normally, this kind of model needs to learn a lot of real pictures, and then it can try to ensure the real features of the generated pictures. The content that needs to be rewritten is:

Recently, researchers from UC Berkeley and Google proposed a new generation model-Impotent Generative Network (IGN). The content that needs to be rewritten is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration Picture

Paper address: https://arxiv.org/abs/2311.01462

IGNs can be selected from a variety of Inputs, such as random noise, simple graphics, etc., generate realistic images in a single step without the need for multi-step iterations. What needs to be rewritten is:

This model aims to be A "global projector" can map any input data to the target data distribution. The content that needs to be rewritten is:

In short, the general image generation model must be What needs to be rewritten is this:

Interestingly, a highly effective scene in "Seinfeld" actually became the author's source of inspiration. What needs to be rewritten is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

This scene well summarizes the concept of "idempotent operator", which refers to During the operation, if the same input is repeatedly operated, the result will always be the same. The content that needs to be rewritten is:

, that is,

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

The content that needs to be rewritten is:

As Jerry Seinfeld humorously pointed out, some real-life behaviors can also be considered The idempotent content that needs to be rewritten is:

Impotent Generating Network

IGN has two important differences with GAN and diffusion model:

- Different from GAN, IGN does not require separate generators and discriminators. It is a "self-confrontation" model. The content that needs to be rewritten to complete generation and discrimination at the same time is:

- Unlike diffusion models that perform incremental steps, IGN attempts to map inputs to data distributions in a single step. What needs to be rewritten is:

What is the origin of IGN (idempotent generative model)?

It is trained to be from the source distribution UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationGiven the target distribution of the input samples UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration, the generated samples need to be rewritten The content is:

Given the example data setUC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration, each example is taken from The content is: Then, the researchers trained the model UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration to map UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration to UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration. The content that needs to be rewritten is: UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration

Assume that the distributions UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration and UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration are located in the same space, i.e. their instances have the same dimensions. What needs to be rewritten is: This allows UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration Applies to two types of instances UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration and UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration The content that needs to be rewritten is:

The figure shows the basic idea behind IGN: the real example (x) is invariant to the model fUC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationThe content that needs to be rewritten is: other inputs (z) are mapped to f By optimizing UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration, the content that needs to be rewritten on the instance stream mapped to itself is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

IGN training routine PyTorch code example that needs to be rewritten is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration##Picture

Experimental results

After getting IGN, what is the effect?

The author admits that at this stage, the generated results of IGN cannot compete with the most advanced models. The content that needs to be rewritten is:

At In the experiment, a smaller model and a lower-resolution data set were used, and the main focus in the exploration was on the simplified method. The content that needs to be rewritten is:

Of course, the basic generation Modeling technologies, such as GAN and diffusion models, also took a long time to achieve mature and large-scale performance. The content that needs to be rewritten is:

Experimental settings

The researchers evaluated IGN on MNIST (greyscale handwritten digits dataset) and CelebA (face image dataset), using image resolutions of 28×28 and 64×64 respectively. The content is:

The author uses a simple autoencoder architecture, where the encoder is a simple five-layer discriminator backbone from DCGAN, and the decoder is the generator. The content that needs to be rewritten is : The training and network hyperparameters are shown in Table 1. The content that needs to be rewritten is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

Generate result

Figure 4 shows the qualitative results for the two data sets after applying the model once and twice consecutively. What needs to be rewritten is:

As shown, applying IGN once (f (z)) will produce coherent generation results. What needs to be rewritten is: However, artifacts may occur, such as holes in MNIST digits, or the top of the head in facial images. The distorted pixels of hair and hair need to be rewritten:

Applying f (f (f (z))) again can correct these problems, fill holes, or reduce facial noise patches The total changes around what needs to be rewritten are:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

Figure 7 shows the additional results and applying f three times As a result, the content that needs to be rewritten is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

##Comparing UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration and UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration shows that when the image is close to the learned manifold When , applying f again results in minimal changes, as the image is considered distributed. What needs to be rewritten is:

Latent Space Manipulation

The author proves by performing operations that IGN has a consistent latent space, similar to that shown for GAN. Figure 6 shows that the latent space algorithm needs to be rewritten as:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPicture

Out-of-distribution mapping

The author also verified that by converting data from various distributions The image is input into the model to generate its equivalent "natural image" to verify the potential of IGN's "global mapping". The content that needs to be rewritten is:

The researchers passed the noisy image x n denoising, colorizing the grayscale imageUC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration, and converting the sketchUC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration to the real image in Figure 5 to prove this point needs to be rewritten is:

Original image x, these inverse tasks are ill-posed. What needs to be rewritten is: IGN can create a natural mapping that conforms to the original image structure. What needs to be rewritten is:

As shown, applying f continuously can improve image quality (for example, it removes dark and smoke artifacts in projected sketches) What needs to be rewritten is:

UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspirationPictures

Google Next?

It can be seen from the above results that IGN is more effective in inference and can generate results in a single step after training. The content that needs to be rewritten is:

They can also output more consistent results, which may be extended to more applications, such as medical image repair. The content that needs to be rewritten is:

The author of the paper stated:

We view this work as a first step toward models that learn to map arbitrary inputs to target distributions, a new paradigm in generative modeling that needs to be rewritten. The content is:

Next, the research team plans to expand the scale of IGN with more data, hoping to tap the full potential of new generative AI models that need to be rewritten. The content is:

The latest research code will be published on GitHub in the future. The content that needs to be rewritten is:

References:

https://www.php.cn/link/2bd388f731f26312bfc0fe30da009595

https://www .php.cn/link/e1e4e65fddf79af60aab04457a6565a6


The above is the detailed content of UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete