


ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality
Author | Pengfei Zheng
Unit | USTC, HKBU TMLR Group
In recent years, the rapid development of generative AI has injected strong impetus into eye-catching fields such as text-to-image generation and video generation. The core of these techniques lies in the application of diffusion models. The diffusion model first gradually changes the picture into Gaussian noise by defining a forward process that continuously adds noise, and then gradually denoises the Gaussian noise through a reverse process and turns it into a clear picture to obtain real samples. The diffusion ordinary differential model is used to interpolate the values of the generated images, which has great application potential in generating videos and some advertising creatives. However, we noticed that when this method is applied to natural images, the interpolated image effects are often unsatisfactory.
In general, the diffusion model samples Gaussian noise and then gradually denoises it to generate high-quality images. The low quality of the interpolated image means that its underlying variables no longer follow the Gaussian distribution we would expect. To improve the quality of the interpolated picture, we need to ensure that the underlying variables are more closely sampled from a Gaussian distribution. Directly scaling and offsetting the latent variables will severely damage the resulting image, and in order to preserve the information of the original image, we cannot modify the latent variables too much. Therefore, it becomes a difficult problem to improve the quality of interpolated images without destroying the underlying variables as much as possible.
We first change the noise level of the latent variables to analyze what kind of latent variables can be restored into high-quality pictures by the diffusion model, and combine the SDEdit method to introduce Gaussian noise to improve the quality of the interpolated pictures, and the Gaussian noise Introduction brings additional information. Furthermore we analyze potential orthogonality in high-dimensional spaces, which provides the basis for our approach. We combine the spherical linear interpolation method and the method of directly introducing noise to propose a new interpolation method: constraining potential extreme values, combining with tiny Gaussian noise to make it closer to the expected distribution, and introducing the original image to alleviate The problem of information loss. Using this interpolation method, we can significantly improve the interpolation results of natural images while retaining the original image information.
Next, I will briefly share our research results with you.
Paper title: NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation
Paper link:https:/ /www.php.cn/link/68310dc294a1c38c7ba636380151daca
Code link: https://www.php.cn/link/fc9e5c39356354a60d33ca59499913ca
Introduction
Figure 1: Application of spherical linear interpolation method on face images
Diffusion model is the most commonly used image interpolation method It is the spherical linear interpolation method [1,2]:
We apply this method to natural pictures. It can be observed from Figure 2 that when applying spherical linear interpolation method on natural pictures, the interpolation effect drops significantly.
Figure 2: Comparison of interpolation effects between natural pictures and generated pictures
Analysis
Figure 3: Effect of Gaussian noise denoising with different noise levels
We first study the impact of noise level on generated images. It is observed that only when the level of Gaussian noise matches the level of denoising (middle image), a higher quality image is obtained. If the noise level is lower than the denoising level (right image), or higher than the denoising level (left image), the quality of the generated image will be reduced. We use Theorem 1 to explain this phenomenon:
Theorem 1 explains the distribution characteristics of standard Gaussian noise in high-dimensional space: they are mainly concentrated on a hypersphere. On the inside of this hypersphere, although the probability density of points is relatively high, its overall contribution is not significant due to the small volume it occupies; while on the outside of the hypersphere, although the volume of points is larger, due to the probability Density decays rapidly with distance, so the contribution from outside points is also negligible. Therefore, when training a diffusion model, the latent variables we mainly observe are concentrated on the hypersphere, and the latent variables inside and outside the hypersphere are difficult to effectively denoise for these reasons.
Figure 4: Reasons why natural picture interpolation fails
Natural pictures often have complex features that the diffusion model has not seen during training, which makes the diffusion The model encounters difficulty when trying to convert natural images into standard Gaussian noise. Specifically, the latent variables of these images may contain Gaussian noise above or below the range of the model's denoising capabilities. However, the ability of the diffusion model is mainly limited to restoring Gaussian noise on the hypersphere described in Theorem 1. For noise outside this range, the model often cannot handle it effectively. Therefore, when performing image interpolation, lower quality interpolated images are often produced.
Introducing noise
Figure 5: Directly introducing noise interpolation
In order to improve the quality of the picture and make the latent variables closer to the hypersphere, We adopted a method combined with SDEdit [3]. Specifically, we directly add standard Gaussian noise to the image, then perform interpolation, and finally perform denoising. It can be clearly seen from Figure 5 that this method significantly improves the quality of interpolated images. However, it should be noted that this approach also introduces some additional information as shown in the figure.
Method
Figure 6: Overall design of NoiseDiffusion
In order to improve picture quality and reduce information loss as much as possible, we innovatively combine In addition to the spherical linear interpolation method and the interpolation method that directly introduces noise, a new NoiseDiffusion method is proposed. As shown in Figure 6, the overall design of NoiseDiffusion not only considers information retention during the interpolation process, but also improves picture quality by introducing noise, achieving an effective balance between the two. Next, we will elaborate on the design ideas of NoiseDiffusion.
Design 1:
Figure 7: Constraining the extreme values of potential variables
According to statistics, beyond a certain range Noise components can be considered outliers. Combined with Figure 3, we found that Gaussian noise higher than the denoising level will produce obvious noise points, which are very similar to the abnormal color patches on the interpolation results of natural pictures. Therefore, we have reason to believe that the extreme values of the latent variables are responsible for the problem. The production of these abnormal color patches. Based on these analyses, we impose constraints on the extreme values of the latent variables to control the impact of these abnormal noises. As can be seen from Figure 7, by constraining the extreme values of the latent variables, we have greatly improved the quality of the image.
Design 2:
Figure 8: Introducing original image information
We may be careless when imposing constraints on potential variables Affected some normal components, resulting in the loss of information. In order to compensate for this potential information loss, we introduce the original image information as a supplement. As shown in Figure 8, after introducing the original image information, the quality of the interpolated image has been significantly improved. This shows that the original image information plays an important role in compensating for information loss. By combining the constraints of latent variables and the supplement of original image information, we can reduce information loss while ensuring image quality, and achieve a more accurate and natural interpolation effect.
Design 3:
Spherical linear interpolation is an interpolation method that relies on calculating the angle between two latent variables. However, in practical applications, we observe that these latent variables often exhibit a nearly orthogonal state. In order to explain this phenomenon, we introduce Theorem 2 as theoretical support.
Figure 9: Introducing Gaussian noise of different sizes
Figure 10: Combined with Design 1 to reduce the amount of introduced Gaussian noise
It can be seen from Figure 9 It can be seen that as we gradually increase the amount of Gaussian noise introduced, the quality of the interpolated images is significantly improved. However, this improvement does not come without a cost, as as the amount of noise increases, so does the introduction of additional information. In the actual interpolation process, in order to minimize the introduction of additional information while meeting quality requirements, we combined the previously mentioned strategies to effectively reduce the amount of Gaussian noise that needs to be introduced (Figure 10), thereby better retaining Information about the original image.
Experiment
Figure 11: Comparison with spherical linear interpolation method
We compare the proposed method with spherical linear interpolation method The results are compared (shown in Figure 11). Judging from the interpolation results, our method significantly improves the quality of interpolated images while losing almost no information. This fully demonstrates the superior performance of our method in maintaining information integrity and improving image quality.
We also conducted experiments on Stable Diffusion [4]. Due to the highly unstructured latent space of Stable Diffusion, it is difficult to obtain smooth interpolation (Figure 12). Therefore, we consider interpolation () at a smaller time step, which can retain more features of the original image and make the interpolation result smoother, but it results in a reduction in image quality (Figure 13). To solve this problem, we applied our method NoiseDiffusion to correct the latent variables (Figure 14). It can be seen from the experimental results that our method significantly improves the quality of images while changing less information.
Figure 12: Using spherical linear interpolation when
Figure 13: Using spherical linear interpolation when
Figure 14: Using NoiseDiffusion interpolation when
Reference
[1] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
[2] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models . In ICLR, 2021.
[3] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon.
Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
[4]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High resolution image synthesis with latent diffusion models. In CVPR, 2022.
[5] Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. Gan
inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
Introduction to the research group
The Trustworthy Machine Learning and Reasoning Research Group (TMLR Group) of Hong Kong Baptist University consists of a number of young professors and postdoctoral researchers , doctoral students, visiting doctoral students and research assistants, the research team is affiliated with the Department of Computer Science, School of Science. The research group specializes in trustworthy representation learning, trustworthy learning based on causal reasoning, trustworthy basic models and other related algorithms, theory and system design, as well as applications in natural sciences. The specific research directions and related results can be found on the group's Github (https ://github.com/tmlr-group). The research team is funded by government research funds and industrial research funds, such as the Hong Kong Research Grants Council Outstanding Young Scholars Program, National Natural Science Foundation of China general projects and youth projects, as well as scientific research funds from Microsoft, NVIDIA, Baidu, Alibaba, Tencent and other companies. Young professors and senior researchers work hand in hand, and GPU computing resources are sufficient. It has long-term recruitment of many postdoctoral researchers, doctoral students, research assistants and research interns. In addition, the group also welcomes applications from self-funded visiting postdoctoral fellows, doctoral students and research assistants for at least 3-6 months, and remote access is supported. Interested students please send your resume and preliminary research plan to the email address (bhanml@comp.hkbu.edu.hk).
The above is the detailed content of ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality. For more information, please follow other related articles on the PHP Chinese website!

Upheaval Games: Revolutionizing Game Development with AI Agents Upheaval, a game development studio comprised of veterans from industry giants like Blizzard and Obsidian, is poised to revolutionize game creation with its innovative AI-powered platfor

Uber's RoboTaxi Strategy: A Ride-Hail Ecosystem for Autonomous Vehicles At the recent Curbivore conference, Uber's Richard Willder unveiled their strategy to become the ride-hail platform for robotaxi providers. Leveraging their dominant position in

Video games are proving to be invaluable testing grounds for cutting-edge AI research, particularly in the development of autonomous agents and real-world robots, even potentially contributing to the quest for Artificial General Intelligence (AGI). A

The impact of the evolving venture capital landscape is evident in the media, financial reports, and everyday conversations. However, the specific consequences for investors, startups, and funds are often overlooked. Venture Capital 3.0: A Paradigm

Adobe MAX London 2025 delivered significant updates to Creative Cloud and Firefly, reflecting a strategic shift towards accessibility and generative AI. This analysis incorporates insights from pre-event briefings with Adobe leadership. (Note: Adob

Meta's LlamaCon announcements showcase a comprehensive AI strategy designed to compete directly with closed AI systems like OpenAI's, while simultaneously creating new revenue streams for its open-source models. This multifaceted approach targets bo

There are serious differences in the field of artificial intelligence on this conclusion. Some insist that it is time to expose the "emperor's new clothes", while others strongly oppose the idea that artificial intelligence is just ordinary technology. Let's discuss it. An analysis of this innovative AI breakthrough is part of my ongoing Forbes column that covers the latest advancements in the field of AI, including identifying and explaining a variety of influential AI complexities (click here to view the link). Artificial intelligence as a common technology First, some basic knowledge is needed to lay the foundation for this important discussion. There is currently a large amount of research dedicated to further developing artificial intelligence. The overall goal is to achieve artificial general intelligence (AGI) and even possible artificial super intelligence (AS)

The effectiveness of a company's AI model is now a key performance indicator. Since the AI boom, generative AI has been used for everything from composing birthday invitations to writing software code. This has led to a proliferation of language mod


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version
Useful JavaScript development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment
