Technology peripherals

ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality

ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality

May 06, 2024 pm 02:01 PM

gittheory

Author | Pengfei Zheng

Unit | USTC, HKBU TMLR Group

In recent years, the rapid development of generative AI has injected strong impetus into eye-catching fields such as text-to-image generation and video generation. The core of these techniques lies in the application of diffusion models. The diffusion model first gradually changes the picture into Gaussian noise by defining a forward process that continuously adds noise, and then gradually denoises the Gaussian noise through a reverse process and turns it into a clear picture to obtain real samples. The diffusion ordinary differential model is used to interpolate the values of the generated images, which has great application potential in generating videos and some advertising creatives. However, we noticed that when this method is applied to natural images, the interpolated image effects are often unsatisfactory.

In general, the diffusion model samples Gaussian noise and then gradually denoises it to generate high-quality images. The low quality of the interpolated image means that its underlying variables no longer follow the Gaussian distribution we would expect. To improve the quality of the interpolated picture, we need to ensure that the underlying variables are more closely sampled from a Gaussian distribution. Directly scaling and offsetting the latent variables will severely damage the resulting image, and in order to preserve the information of the original image, we cannot modify the latent variables too much. Therefore, it becomes a difficult problem to improve the quality of interpolated images without destroying the underlying variables as much as possible.

We first change the noise level of the latent variables to analyze what kind of latent variables can be restored into high-quality pictures by the diffusion model, and combine the SDEdit method to introduce Gaussian noise to improve the quality of the interpolated pictures, and the Gaussian noise Introduction brings additional information. Furthermore we analyze potential orthogonality in high-dimensional spaces, which provides the basis for our approach. We combine the spherical linear interpolation method and the method of directly introducing noise to propose a new interpolation method: constraining potential extreme values, combining with tiny Gaussian noise to make it closer to the expected distribution, and introducing the original image to alleviate The problem of information loss. Using this interpolation method, we can significantly improve the interpolation results of natural images while retaining the original image information.

Next, I will briefly share our research results with you.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Paper title: NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation

Paper link:https:/ /www.php.cn/link/68310dc294a1c38c7ba636380151daca

Code link: https://www.php.cn/link/fc9e5c39356354a60d33ca59499913ca

Introduction

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 1: Application of spherical linear interpolation method on face images

Diffusion model is the most commonly used image interpolation method It is the spherical linear interpolation method [1,2]:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

We apply this method to natural pictures. It can be observed from Figure 2 that when applying spherical linear interpolation method on natural pictures, the interpolation effect drops significantly.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 2: Comparison of interpolation effects between natural pictures and generated pictures

Analysis

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 3: Effect of Gaussian noise denoising with different noise levels

We first study the impact of noise level on generated images. It is observed that only when the level of Gaussian noise matches the level of denoising (middle image), a higher quality image is obtained. If the noise level is lower than the denoising level (right image), or higher than the denoising level (left image), the quality of the generated image will be reduced. We use Theorem 1 to explain this phenomenon:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Theorem 1 explains the distribution characteristics of standard Gaussian noise in high-dimensional space: they are mainly concentrated on a hypersphere. On the inside of this hypersphere, although the probability density of points is relatively high, its overall contribution is not significant due to the small volume it occupies; while on the outside of the hypersphere, although the volume of points is larger, due to the probability Density decays rapidly with distance, so the contribution from outside points is also negligible. Therefore, when training a diffusion model, the latent variables we mainly observe are concentrated on the hypersphere, and the latent variables inside and outside the hypersphere are difficult to effectively denoise for these reasons.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 4: Reasons why natural picture interpolation fails

Natural pictures often have complex features that the diffusion model has not seen during training, which makes the diffusion The model encounters difficulty when trying to convert natural images into standard Gaussian noise. Specifically, the latent variables of these images may contain Gaussian noise above or below the range of the model's denoising capabilities. However, the ability of the diffusion model is mainly limited to restoring Gaussian noise on the hypersphere described in Theorem 1. For noise outside this range, the model often cannot handle it effectively. Therefore, when performing image interpolation, lower quality interpolated images are often produced.

Introducing noise

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 5: Directly introducing noise interpolation

In order to improve the quality of the picture and make the latent variables closer to the hypersphere, We adopted a method combined with SDEdit [3]. Specifically, we directly add standard Gaussian noise to the image, then perform interpolation, and finally perform denoising. It can be clearly seen from Figure 5 that this method significantly improves the quality of interpolated images. However, it should be noted that this approach also introduces some additional information as shown in the figure.

Method

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 6: Overall design of NoiseDiffusion

In order to improve picture quality and reduce information loss as much as possible, we innovatively combine In addition to the spherical linear interpolation method and the interpolation method that directly introduces noise, a new NoiseDiffusion method is proposed. As shown in Figure 6, the overall design of NoiseDiffusion not only considers information retention during the interpolation process, but also improves picture quality by introducing noise, achieving an effective balance between the two. Next, we will elaborate on the design ideas of NoiseDiffusion.

Design 1:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 7: Constraining the extreme values of potential variables

According to statistics, beyond a certain range Noise components can be considered outliers. Combined with Figure 3, we found that Gaussian noise higher than the denoising level will produce obvious noise points, which are very similar to the abnormal color patches on the interpolation results of natural pictures. Therefore, we have reason to believe that the extreme values of the latent variables are responsible for the problem. The production of these abnormal color patches. Based on these analyses, we impose constraints on the extreme values of the latent variables to control the impact of these abnormal noises. As can be seen from Figure 7, by constraining the extreme values of the latent variables, we have greatly improved the quality of the image.

Design 2:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 8: Introducing original image information

We may be careless when imposing constraints on potential variables Affected some normal components, resulting in the loss of information. In order to compensate for this potential information loss, we introduce the original image information as a supplement. As shown in Figure 8, after introducing the original image information, the quality of the interpolated image has been significantly improved. This shows that the original image information plays an important role in compensating for information loss. By combining the constraints of latent variables and the supplement of original image information, we can reduce information loss while ensuring image quality, and achieve a more accurate and natural interpolation effect.

Design 3:

Spherical linear interpolation is an interpolation method that relies on calculating the angle between two latent variables. However, in practical applications, we observe that these latent variables often exhibit a nearly orthogonal state. In order to explain this phenomenon, we introduce Theorem 2 as theoretical support.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 9: Introducing Gaussian noise of different sizes

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 10: Combined with Design 1 to reduce the amount of introduced Gaussian noise

It can be seen from Figure 9 It can be seen that as we gradually increase the amount of Gaussian noise introduced, the quality of the interpolated images is significantly improved. However, this improvement does not come without a cost, as as the amount of noise increases, so does the introduction of additional information. In the actual interpolation process, in order to minimize the introduction of additional information while meeting quality requirements, we combined the previously mentioned strategies to effectively reduce the amount of Gaussian noise that needs to be introduced (Figure 10), thereby better retaining Information about the original image.

Experiment

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 11: Comparison with spherical linear interpolation method

We compare the proposed method with spherical linear interpolation method The results are compared (shown in Figure 11). Judging from the interpolation results, our method significantly improves the quality of interpolated images while losing almost no information. This fully demonstrates the superior performance of our method in maintaining information integrity and improving image quality.

We also conducted experiments on Stable Diffusion [4]. Due to the highly unstructured latent space of Stable Diffusion, it is difficult to obtain smooth interpolation (Figure 12). Therefore, we consider interpolation () at a smaller time step, which can retain more features of the original image and make the interpolation result smoother, but it results in a reduction in image quality (Figure 13). To solve this problem, we applied our method NoiseDiffusion to correct the latent variables (Figure 14). It can be seen from the experimental results that our method significantly improves the quality of images while changing less information.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 12: Using spherical linear interpolation when

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声，提高插值图片质量

Figure 13: Using spherical linear interpolation when

Figure 14: Using NoiseDiffusion interpolation when

Reference

[1] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.

[2] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models . In ICLR, 2021.

[3] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon.

Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.

[4]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High resolution image synthesis with latent diffusion models. In CVPR, 2022.

[5] Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. Gan

inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Introduction to the research group

The Trustworthy Machine Learning and Reasoning Research Group (TMLR Group) of Hong Kong Baptist University consists of a number of young professors and postdoctoral researchers , doctoral students, visiting doctoral students and research assistants, the research team is affiliated with the Department of Computer Science, School of Science. The research group specializes in trustworthy representation learning, trustworthy learning based on causal reasoning, trustworthy basic models and other related algorithms, theory and system design, as well as applications in natural sciences. The specific research directions and related results can be found on the group's Github (https ://github.com/tmlr-group). The research team is funded by government research funds and industrial research funds, such as the Hong Kong Research Grants Council Outstanding Young Scholars Program, National Natural Science Foundation of China general projects and youth projects, as well as scientific research funds from Microsoft, NVIDIA, Baidu, Alibaba, Tencent and other companies. Young professors and senior researchers work hand in hand, and GPU computing resources are sufficient. It has long-term recruitment of many postdoctoral researchers, doctoral students, research assistants and research interns. In addition, the group also welcomes applications from self-funded visiting postdoctoral fellows, doctoral students and research assistants for at least 3-6 months, and remote access is supported. Interested students please send your resume and preliminary research plan to the email address (bhanml@comp.hkbu.edu.hk).

The above is the detailed content of ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete

Related Article

AI Game Development Enters Its Agentic Era With Upheaval's Dreamer Portal

AI Game Development Enters Its Agentic Era With Upheaval's Dreamer PortalMay 02, 2025 am 11:17 AM

Upheaval Games: Revolutionizing Game Development with AI Agents Upheaval, a game development studio comprised of veterans from industry giants like Blizzard and Obsidian, is poised to revolutionize game creation with its innovative AI-powered platfor

Uber Wants To Be Your Robotaxi Shop, Will Providers Let Them?

Uber Wants To Be Your Robotaxi Shop, Will Providers Let Them?May 02, 2025 am 11:16 AM

Uber's RoboTaxi Strategy: A Ride-Hail Ecosystem for Autonomous Vehicles At the recent Curbivore conference, Uber's Richard Willder unveiled their strategy to become the ride-hail platform for robotaxi providers. Leveraging their dominant position in

AI Agents Playing Video Games Will Transform Future Robots

AI Agents Playing Video Games Will Transform Future RobotsMay 02, 2025 am 11:15 AM

Video games are proving to be invaluable testing grounds for cutting-edge AI research, particularly in the development of autonomous agents and real-world robots, even potentially contributing to the quest for Artificial General Intelligence (AGI). A

The Startup Industrial Complex, VC 3.0, And James Currier's Manifesto

The Startup Industrial Complex, VC 3.0, And James Currier's ManifestoMay 02, 2025 am 11:14 AM

The impact of the evolving venture capital landscape is evident in the media, financial reports, and everyday conversations. However, the specific consequences for investors, startups, and funds are often overlooked. Venture Capital 3.0: A Paradigm

Adobe Updates Creative Cloud And Firefly At Adobe MAX London 2025

Adobe Updates Creative Cloud And Firefly At Adobe MAX London 2025May 02, 2025 am 11:13 AM

Adobe MAX London 2025 delivered significant updates to Creative Cloud and Firefly, reflecting a strategic shift towards accessibility and generative AI. This analysis incorporates insights from pre-event briefings with Adobe leadership. (Note: Adob

Everything Meta Announced At LlamaCon

Everything Meta Announced At LlamaConMay 02, 2025 am 11:12 AM

Meta's LlamaCon announcements showcase a comprehensive AI strategy designed to compete directly with closed AI systems like OpenAI's, while simultaneously creating new revenue streams for its open-source models. This multifaceted approach targets bo

The Brewing Controversy Over The Proposition That AI Is Nothing More Than Just Normal Technology

The Brewing Controversy Over The Proposition That AI Is Nothing More Than Just Normal TechnologyMay 02, 2025 am 11:10 AM

There are serious differences in the field of artificial intelligence on this conclusion. Some insist that it is time to expose the "emperor's new clothes", while others strongly oppose the idea that artificial intelligence is just ordinary technology. Let's discuss it. An analysis of this innovative AI breakthrough is part of my ongoing Forbes column that covers the latest advancements in the field of AI, including identifying and explaining a variety of influential AI complexities (click here to view the link). Artificial intelligence as a common technology First, some basic knowledge is needed to lay the foundation for this important discussion. There is currently a large amount of research dedicated to further developing artificial intelligence. The overall goal is to achieve artificial general intelligence (AGI) and even possible artificial super intelligence (AS)

Model Citizens, Why AI Value Is The Next Business Yardstick

Model Citizens, Why AI Value Is The Next Business YardstickMay 02, 2025 am 11:09 AM

The effectiveness of a company's AI model is now a key performance indicator. Since the AI boom, generative AI has been used for everything from composing birthday invitations to writing software code. This has led to a proliferation of language mod

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

4 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Blue Prince: How To Get To The Basement

3 weeks agoByDDD

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7926

15

1652

14

CakePHP Tutorial

1411

52

Laravel Tutorial

1303

25

1249

29