search
HomeTechnology peripheralsAIICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality

Author | Pengfei Zheng

Unit | USTC, HKBU TMLR Group

In recent years, the rapid development of generative AI has injected strong impetus into eye-catching fields such as text-to-image generation and video generation. The core of these techniques lies in the application of diffusion models. The diffusion model first gradually changes the picture into Gaussian noise by defining a forward process that continuously adds noise, and then gradually denoises the Gaussian noise through a reverse process and turns it into a clear picture to obtain real samples. The diffusion ordinary differential model is used to interpolate the values ​​of the generated images, which has great application potential in generating videos and some advertising creatives. However, we noticed that when this method is applied to natural images, the interpolated image effects are often unsatisfactory.

In general, the diffusion model samples Gaussian noise and then gradually denoises it to generate high-quality images. The low quality of the interpolated image means that its underlying variables no longer follow the Gaussian distribution we would expect. To improve the quality of the interpolated picture, we need to ensure that the underlying variables are more closely sampled from a Gaussian distribution. Directly scaling and offsetting the latent variables will severely damage the resulting image, and in order to preserve the information of the original image, we cannot modify the latent variables too much. Therefore, it becomes a difficult problem to improve the quality of interpolated images without destroying the underlying variables as much as possible.

We first change the noise level of the latent variables to analyze what kind of latent variables can be restored into high-quality pictures by the diffusion model, and combine the SDEdit method to introduce Gaussian noise to improve the quality of the interpolated pictures, and the Gaussian noise Introduction brings additional information. Furthermore we analyze potential orthogonality in high-dimensional spaces, which provides the basis for our approach. We combine the spherical linear interpolation method and the method of directly introducing noise to propose a new interpolation method: constraining potential extreme values, combining with tiny Gaussian noise to make it closer to the expected distribution, and introducing the original image to alleviate The problem of information loss. Using this interpolation method, we can significantly improve the interpolation results of natural images while retaining the original image information.

Next, I will briefly share our research results with you.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Paper title: NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation

Paper link:https:/ /www.php.cn/link/68310dc294a1c38c7ba636380151daca

Code link: https://www.php.cn/link/fc9e5c39356354a60d33ca59499913ca

Introduction

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 1: Application of spherical linear interpolation method on face images

Diffusion model is the most commonly used image interpolation method It is the spherical linear interpolation method [1,2]:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

We apply this method to natural pictures. It can be observed from Figure 2 that when applying spherical linear interpolation method on natural pictures, the interpolation effect drops significantly.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 2: Comparison of interpolation effects between natural pictures and generated pictures

Analysis

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 3: Effect of Gaussian noise denoising with different noise levels

We first study the impact of noise level on generated images. It is observed that only when the level of Gaussian noise matches the level of denoising (middle image), a higher quality image is obtained. If the noise level is lower than the denoising level (right image), or higher than the denoising level (left image), the quality of the generated image will be reduced. We use Theorem 1 to explain this phenomenon:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Theorem 1 explains the distribution characteristics of standard Gaussian noise in high-dimensional space: they are mainly concentrated on a hypersphere. On the inside of this hypersphere, although the probability density of points is relatively high, its overall contribution is not significant due to the small volume it occupies; while on the outside of the hypersphere, although the volume of points is larger, due to the probability Density decays rapidly with distance, so the contribution from outside points is also negligible. Therefore, when training a diffusion model, the latent variables we mainly observe are concentrated on the hypersphere, and the latent variables inside and outside the hypersphere are difficult to effectively denoise for these reasons.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 4: Reasons why natural picture interpolation fails

Natural pictures often have complex features that the diffusion model has not seen during training, which makes the diffusion The model encounters difficulty when trying to convert natural images into standard Gaussian noise. Specifically, the latent variables of these images may contain Gaussian noise above or below the range of the model's denoising capabilities. However, the ability of the diffusion model is mainly limited to restoring Gaussian noise on the hypersphere described in Theorem 1. For noise outside this range, the model often cannot handle it effectively. Therefore, when performing image interpolation, lower quality interpolated images are often produced.

Introducing noise

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 5: Directly introducing noise interpolation

In order to improve the quality of the picture and make the latent variables closer to the hypersphere, We adopted a method combined with SDEdit [3]. Specifically, we directly add standard Gaussian noise to the image, then perform interpolation, and finally perform denoising. It can be clearly seen from Figure 5 that this method significantly improves the quality of interpolated images. However, it should be noted that this approach also introduces some additional information as shown in the figure.

Method

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 6: Overall design of NoiseDiffusion

In order to improve picture quality and reduce information loss as much as possible, we innovatively combine In addition to the spherical linear interpolation method and the interpolation method that directly introduces noise, a new NoiseDiffusion method is proposed. As shown in Figure 6, the overall design of NoiseDiffusion not only considers information retention during the interpolation process, but also improves picture quality by introducing noise, achieving an effective balance between the two. Next, we will elaborate on the design ideas of NoiseDiffusion.

Design 1:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 7: Constraining the extreme values ​​of potential variables

According to statistics, beyond a certain range Noise components can be considered outliers. Combined with Figure 3, we found that Gaussian noise higher than the denoising level will produce obvious noise points, which are very similar to the abnormal color patches on the interpolation results of natural pictures. Therefore, we have reason to believe that the extreme values ​​of the latent variables are responsible for the problem. The production of these abnormal color patches. Based on these analyses, we impose constraints on the extreme values ​​of the latent variables to control the impact of these abnormal noises. As can be seen from Figure 7, by constraining the extreme values ​​of the latent variables, we have greatly improved the quality of the image.

Design 2:

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 8: Introducing original image information

We may be careless when imposing constraints on potential variables Affected some normal components, resulting in the loss of information. In order to compensate for this potential information loss, we introduce the original image information as a supplement. As shown in Figure 8, after introducing the original image information, the quality of the interpolated image has been significantly improved. This shows that the original image information plays an important role in compensating for information loss. By combining the constraints of latent variables and the supplement of original image information, we can reduce information loss while ensuring image quality, and achieve a more accurate and natural interpolation effect.

Design 3:

Spherical linear interpolation is an interpolation method that relies on calculating the angle between two latent variables. However, in practical applications, we observe that these latent variables often exhibit a nearly orthogonal state. In order to explain this phenomenon, we introduce Theorem 2 as theoretical support.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 9: Introducing Gaussian noise of different sizes

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 10: Combined with Design 1 to reduce the amount of introduced Gaussian noise

It can be seen from Figure 9 It can be seen that as we gradually increase the amount of Gaussian noise introduced, the quality of the interpolated images is significantly improved. However, this improvement does not come without a cost, as as the amount of noise increases, so does the introduction of additional information. In the actual interpolation process, in order to minimize the introduction of additional information while meeting quality requirements, we combined the previously mentioned strategies to effectively reduce the amount of Gaussian noise that needs to be introduced (Figure 10), thereby better retaining Information about the original image.

Experiment

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 11: Comparison with spherical linear interpolation method

We compare the proposed method with spherical linear interpolation method The results are compared (shown in Figure 11). Judging from the interpolation results, our method significantly improves the quality of interpolated images while losing almost no information. This fully demonstrates the superior performance of our method in maintaining information integrity and improving image quality.

We also conducted experiments on Stable Diffusion [4]. Due to the highly unstructured latent space of Stable Diffusion, it is difficult to obtain smooth interpolation (Figure 12). Therefore, we consider interpolation () at a smaller time step, which can retain more features of the original image and make the interpolation result smoother, but it results in a reduction in image quality (Figure 13). To solve this problem, we applied our method NoiseDiffusion to correct the latent variables (Figure 14). It can be seen from the experimental results that our method significantly improves the quality of images while changing less information.

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 12: Using spherical linear interpolation when

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 13: Using spherical linear interpolation when

ICLR 2024 Spotlight | NoiseDiffusion: 矫正扩散模型噪声,提高插值图片质量

Figure 14: Using NoiseDiffusion interpolation when

Reference

[1] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.

[2] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models . In ICLR, 2021.

[3] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon.

Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.

[4]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High resolution image synthesis with latent diffusion models. In CVPR, 2022.

[5] Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. Gan

inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Introduction to the research group

The Trustworthy Machine Learning and Reasoning Research Group (TMLR Group) of Hong Kong Baptist University consists of a number of young professors and postdoctoral researchers , doctoral students, visiting doctoral students and research assistants, the research team is affiliated with the Department of Computer Science, School of Science. The research group specializes in trustworthy representation learning, trustworthy learning based on causal reasoning, trustworthy basic models and other related algorithms, theory and system design, as well as applications in natural sciences. The specific research directions and related results can be found on the group's Github (https ://github.com/tmlr-group). The research team is funded by government research funds and industrial research funds, such as the Hong Kong Research Grants Council Outstanding Young Scholars Program, National Natural Science Foundation of China general projects and youth projects, as well as scientific research funds from Microsoft, NVIDIA, Baidu, Alibaba, Tencent and other companies. Young professors and senior researchers work hand in hand, and GPU computing resources are sufficient. It has long-term recruitment of many postdoctoral researchers, doctoral students, research assistants and research interns. In addition, the group also welcomes applications from self-funded visiting postdoctoral fellows, doctoral students and research assistants for at least 3-6 months, and remote access is supported. Interested students please send your resume and preliminary research plan to the email address (bhanml@comp.hkbu.edu.hk).

The above is the detailed content of ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete
五个时间序列预测的深度学习模型对比总结五个时间序列预测的深度学习模型对比总结May 05, 2023 pm 05:16 PM

MakridakisM-Competitions系列(分别称为M4和M5)分别在2018年和2020年举办(M6也在今年举办了)。对于那些不了解的人来说,m系列得比赛可以被认为是时间序列生态系统的一种现有状态的总结,为当前得预测的理论和实践提供了经验和客观的证据。2018年M4的结果表明,纯粹的“ML”方法在很大程度上胜过传统的统计方法,这在当时是出乎意料的。在两年后的M5[1]中,最的高分是仅具有“ML”方法。并且所有前50名基本上都是基于ML的(大部分是树型模型)。这场比赛看到了LightG

RLHF与AlphaGo核心技术强强联合,UW/Meta让文本生成能力再上新台阶RLHF与AlphaGo核心技术强强联合,UW/Meta让文本生成能力再上新台阶Oct 27, 2023 pm 03:13 PM

在一项最新的研究中,来自UW和Meta的研究者提出了一种新的解码算法,将AlphaGo采用的蒙特卡洛树搜索算法(Monte-CarloTreeSearch,MCTS)应用到经过近端策略优化(ProximalPolicyOptimization,PPO)训练的RLHF语言模型上,大幅提高了模型生成文本的质量。PPO-MCTS算法通过探索与评估若干条候选序列,搜索到更优的解码策略。通过PPO-MCTS生成的文本能更好满足任务要求。论文链接:https://arxiv.org/pdf/2309.150

MIT团队运用机器学习闭环自主分子发现平台,成功发现、合成和描述了303种新分子MIT团队运用机器学习闭环自主分子发现平台,成功发现、合成和描述了303种新分子Jan 04, 2024 pm 05:38 PM

编辑|X传统意义上,发现所需特性的分子过程一直是由手动实验、化学家的直觉以及对机制和第一原理的理解推动的。随着化学家越来越多地使用自动化设备和预测合成算法,自主研究设备越来越接近实现。近日,来自MIT的研究人员开发了由集成机器学习工具驱动的闭环自主分子发现平台,以加速具有所需特性的分子的设计。无需手动实验即可探索化学空间并利用已知的化学结构。在两个案例研究中,该平台尝试了3000多个反应,其中1000多个产生了预测的反应产物,提出、合成并表征了303种未报道的染料样分子。该研究以《Autonom

AI助力脑机接口研究,纽约大学突破性神经语音解码技术,登Nature子刊AI助力脑机接口研究,纽约大学突破性神经语音解码技术,登Nature子刊Apr 17, 2024 am 08:40 AM

作者|陈旭鹏编辑|ScienceAI由于神经系统的缺陷导致的失语会导致严重的生活障碍,它可能会限制人们的职业和社交生活。近年来,深度学习和脑机接口(BCI)技术的飞速发展为开发能够帮助失语者沟通的神经语音假肢提供了可行性。然而,神经信号的语音解码面临挑战。近日,约旦大学VideoLab和FlinkerLab的研究者开发了一个新型的可微分语音合成器,可以利用一个轻型的卷积神经网络将语音编码为一系列可解释的语音参数(例如音高、响度、共振峰频率等),并通过可微分神经网络将这些参数合成为语音。这个合成器

Code Llama代码能力飙升,微调版HumanEval得分超越GPT-4,一天发布Code Llama代码能力飙升,微调版HumanEval得分超越GPT-4,一天发布Aug 26, 2023 pm 09:01 PM

昨天,Meta开源专攻代码生成的基础模型CodeLlama,可免费用于研究以及商用目的。CodeLlama系列模型有三个参数版本,参数量分别为7B、13B和34B。并且支持多种编程语言,包括Python、C++、Java、PHP、Typescript(Javascript)、C#和Bash。Meta提供的CodeLlama版本包括:代码Llama,基础代码模型;代码羊-Python,Python微调版本;代码Llama-Instruct,自然语言指令微调版就其效果来说,CodeLlama的不同版

手机摄影技术让以假乱真的好莱坞级电影特效视频走红手机摄影技术让以假乱真的好莱坞级电影特效视频走红Sep 07, 2023 am 09:41 AM

一个普通人用一台手机就能制作电影特效的时代已经来了。最近,一个名叫Simulon的3D技术公司发布了一系列特效视频,视频中的3D机器人与环境无缝融合,而且光影效果非常自然。呈现这些效果的APP也叫Simulon,它能让使用者通过手机摄像头的实时拍摄,直接渲染出CGI(计算机生成图像)特效,就跟打开美颜相机拍摄一样。在具体操作中,你要先上传一个3D模型(比如图中的机器人)。Simulon会将这个模型放置到你拍摄的现实世界中,并使用准确的照明、阴影和反射效果来渲染它们。整个过程不需要相机解算、HDR

准确率 >98%,基于电子密度的 GPT 用于化学研究,登 Nature 子刊准确率 >98%,基于电子密度的 GPT 用于化学研究,登 Nature 子刊Mar 27, 2024 pm 02:16 PM

编辑|紫罗可合成分子的化学空间是非常广阔的。有效地探索这个领域需要依赖计算筛选技术,比如深度学习,以便快速地发现各种有趣的化合物。将分子结构转换为数字表示形式,并开发相应算法生成新的分子结构是进行化学发现的关键。最近,英国格拉斯哥大学的研究团队提出了一种基于电子密度训练的机器学习模型,用于生成主客体binders。这种模型能够以简化分子线性输入规范(SMILES)格式读取数据,准确率高达98%,从而实现对分子在二维空间的全面描述。通过变分自编码器生成主客体系统的电子密度和静电势的三维表示,然后通

谷歌用大型模型训练机器狗理解模糊指令,激动不已准备去野餐谷歌用大型模型训练机器狗理解模糊指令,激动不已准备去野餐Jan 16, 2024 am 11:24 AM

人类和四足机器人之间简单有效的交互是创造能干的智能助理机器人的途径,其昭示着这样一个未来:技术以超乎我们想象的方式改善我们的生活。对于这样的人类-机器人交互系统,关键是让四足机器人有能力响应自然语言指令。近来大型语言模型(LLM)发展迅速,已经展现出了执行高层规划的潜力。然而,对LLM来说,理解低层指令依然很难,比如关节角度目标或电机扭矩,尤其是对于本身就不稳定、必需高频控制信号的足式机器人。因此,大多数现有工作都会假设已为LLM提供了决定机器人行为的高层API,而这就从根本上限制了系统的表现能

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.