Home >Technology peripherals >AI >ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality

ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality

PHPz
PHPzforward
2024-05-06 14:01:241146browse

Author | Pengfei Zheng

Unit | USTC, HKBU TMLR Group

In recent years, the rapid development of generative AI has injected strong impetus into eye-catching fields such as text-to-image generation and video generation. The core of these techniques lies in the application of diffusion models. The diffusion model first gradually changes the picture into Gaussian noise by defining a forward process that continuously adds noise, and then gradually denoises the Gaussian noise through a reverse process and turns it into a clear picture to obtain real samples. The diffusion ordinary differential model is used to interpolate the values ​​of the generated images, which has great application potential in generating videos and some advertising creatives. However, we noticed that when this method is applied to natural images, the interpolated image effects are often unsatisfactory.

In general, the diffusion model samples Gaussian noise and then gradually denoises it to generate high-quality images. The low quality of the interpolated image means that its underlying variables no longer follow the Gaussian distribution we would expect. To improve the quality of the interpolated picture, we need to ensure that the underlying variables are more closely sampled from a Gaussian distribution. Directly scaling and offsetting the latent variables will severely damage the resulting image, and in order to preserve the information of the original image, we cannot modify the latent variables too much. Therefore, it becomes a difficult problem to improve the quality of interpolated images without destroying the underlying variables as much as possible.

We first change the noise level of the latent variables to analyze what kind of latent variables can be restored into high-quality pictures by the diffusion model, and combine the SDEdit method to introduce Gaussian noise to improve the quality of the interpolated pictures, and the Gaussian noise Introduction brings additional information. Furthermore we analyze potential orthogonality in high-dimensional spaces, which provides the basis for our approach. We combine the spherical linear interpolation method and the method of directly introducing noise to propose a new interpolation method: constraining potential extreme values, combining with tiny Gaussian noise to make it closer to the expected distribution, and introducing the original image to alleviate The problem of information loss. Using this interpolation method, we can significantly improve the interpolation results of natural images while retaining the original image information.

Next, I will briefly share our research results with you.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Paper title: NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation

Paper link:https:/ /www.php.cn/link/68310dc294a1c38c7ba636380151daca

Code link: https://www.php.cn/link/fc9e5c39356354a60d33ca59499913ca

Introduction

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 1: Application of spherical linear interpolation method on face images

Diffusion model is the most commonly used image interpolation method It is the spherical linear interpolation method [1,2]:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

We apply this method to natural pictures. It can be observed from Figure 2 that when applying spherical linear interpolation method on natural pictures, the interpolation effect drops significantly.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 2: Comparison of interpolation effects between natural pictures and generated pictures

Analysis

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 3: Effect of Gaussian noise denoising with different noise levels

We first study the impact of noise level on generated images. It is observed that only when the level of Gaussian noise matches the level of denoising (middle image), a higher quality image is obtained. If the noise level is lower than the denoising level (right image), or higher than the denoising level (left image), the quality of the generated image will be reduced. We use Theorem 1 to explain this phenomenon:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Theorem 1 explains the distribution characteristics of standard Gaussian noise in high-dimensional space: they are mainly concentrated on a hypersphere. On the inside of this hypersphere, although the probability density of points is relatively high, its overall contribution is not significant due to the small volume it occupies; while on the outside of the hypersphere, although the volume of points is larger, due to the probability Density decays rapidly with distance, so the contribution from outside points is also negligible. Therefore, when training a diffusion model, the latent variables we mainly observe are concentrated on the hypersphere, and the latent variables inside and outside the hypersphere are difficult to effectively denoise for these reasons.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 4: Reasons why natural picture interpolation fails

Natural pictures often have complex features that the diffusion model has not seen during training, which makes the diffusion The model encounters difficulty when trying to convert natural images into standard Gaussian noise. Specifically, the latent variables of these images may contain Gaussian noise above or below the range of the model's denoising capabilities. However, the ability of the diffusion model is mainly limited to restoring Gaussian noise on the hypersphere described in Theorem 1. For noise outside this range, the model often cannot handle it effectively. Therefore, when performing image interpolation, lower quality interpolated images are often produced.

Introducing noise

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 5: Directly introducing noise interpolation

In order to improve the quality of the picture and make the latent variables closer to the hypersphere, We adopted a method combined with SDEdit [3]. Specifically, we directly add standard Gaussian noise to the image, then perform interpolation, and finally perform denoising. It can be clearly seen from Figure 5 that this method significantly improves the quality of interpolated images. However, it should be noted that this approach also introduces some additional information as shown in the figure.

Method

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 6: Overall design of NoiseDiffusion

In order to improve picture quality and reduce information loss as much as possible, we innovatively combine In addition to the spherical linear interpolation method and the interpolation method that directly introduces noise, a new NoiseDiffusion method is proposed. As shown in Figure 6, the overall design of NoiseDiffusion not only considers information retention during the interpolation process, but also improves picture quality by introducing noise, achieving an effective balance between the two. Next, we will elaborate on the design ideas of NoiseDiffusion.

Design 1:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 7: Constraining the extreme values ​​of potential variables

According to statistics, beyond a certain range Noise components can be considered outliers. Combined with Figure 3, we found that Gaussian noise higher than the denoising level will produce obvious noise points, which are very similar to the abnormal color patches on the interpolation results of natural pictures. Therefore, we have reason to believe that the extreme values ​​of the latent variables are responsible for the problem. The production of these abnormal color patches. Based on these analyses, we impose constraints on the extreme values ​​of the latent variables to control the impact of these abnormal noises. As can be seen from Figure 7, by constraining the extreme values ​​of the latent variables, we have greatly improved the quality of the image.

Design 2:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 8: Introducing original image information

We may be careless when imposing constraints on potential variables Affected some normal components, resulting in the loss of information. In order to compensate for this potential information loss, we introduce the original image information as a supplement. As shown in Figure 8, after introducing the original image information, the quality of the interpolated image has been significantly improved. This shows that the original image information plays an important role in compensating for information loss. By combining the constraints of latent variables and the supplement of original image information, we can reduce information loss while ensuring image quality, and achieve a more accurate and natural interpolation effect.

Design 3:

Spherical linear interpolation is an interpolation method that relies on calculating the angle between two latent variables. However, in practical applications, we observe that these latent variables often exhibit a nearly orthogonal state. In order to explain this phenomenon, we introduce Theorem 2 as theoretical support.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 9: Introducing Gaussian noise of different sizes

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 10: Combined with Design 1 to reduce the amount of introduced Gaussian noise

It can be seen from Figure 9 It can be seen that as we gradually increase the amount of Gaussian noise introduced, the quality of the interpolated images is significantly improved. However, this improvement does not come without a cost, as as the amount of noise increases, so does the introduction of additional information. In the actual interpolation process, in order to minimize the introduction of additional information while meeting quality requirements, we combined the previously mentioned strategies to effectively reduce the amount of Gaussian noise that needs to be introduced (Figure 10), thereby better retaining Information about the original image.

Experiment

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 11: Comparison with spherical linear interpolation method

We compare the proposed method with spherical linear interpolation method The results are compared (shown in Figure 11). Judging from the interpolation results, our method significantly improves the quality of interpolated images while losing almost no information. This fully demonstrates the superior performance of our method in maintaining information integrity and improving image quality.

We also conducted experiments on Stable Diffusion [4]. Due to the highly unstructured latent space of Stable Diffusion, it is difficult to obtain smooth interpolation (Figure 12). Therefore, we consider interpolation () at a smaller time step, which can retain more features of the original image and make the interpolation result smoother, but it results in a reduction in image quality (Figure 13). To solve this problem, we applied our method NoiseDiffusion to correct the latent variables (Figure 14). It can be seen from the experimental results that our method significantly improves the quality of images while changing less information.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 12: Using spherical linear interpolation when

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 13: Using spherical linear interpolation when

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 14: Using NoiseDiffusion interpolation when

Reference

[1] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.

[2] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models . In ICLR, 2021.

[3] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon.

Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.

[4]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High resolution image synthesis with latent diffusion models. In CVPR, 2022.

[5] Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. Gan

inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Introduction to the research group

The Trustworthy Machine Learning and Reasoning Research Group (TMLR Group) of Hong Kong Baptist University consists of a number of young professors and postdoctoral researchers , doctoral students, visiting doctoral students and research assistants, the research team is affiliated with the Department of Computer Science, School of Science. The research group specializes in trustworthy representation learning, trustworthy learning based on causal reasoning, trustworthy basic models and other related algorithms, theory and system design, as well as applications in natural sciences. The specific research directions and related results can be found on the group's Github (https ://github.com/tmlr-group). The research team is funded by government research funds and industrial research funds, such as the Hong Kong Research Grants Council Outstanding Young Scholars Program, National Natural Science Foundation of China general projects and youth projects, as well as scientific research funds from Microsoft, NVIDIA, Baidu, Alibaba, Tencent and other companies. Young professors and senior researchers work hand in hand, and GPU computing resources are sufficient. It has long-term recruitment of many postdoctoral researchers, doctoral students, research assistants and research interns. In addition, the group also welcomes applications from self-funded visiting postdoctoral fellows, doctoral students and research assistants for at least 3-6 months, and remote access is supported. Interested students please send your resume and preliminary research plan to the email address (bhanml@comp.hkbu.edu.hk).

The above is the detailed content of ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete