search
HomeTechnology peripheralsAISoft Diffusion: Google's new framework correctly schedules, learns and samples from a universal diffusion process

We know that score-based models and denoising diffusion probability models (DDPM) are two powerful types of generative models that generate samples by inverting the diffusion process. These two types of models have been unified into a single framework in the paper "Score-based generative modeling through stochastic differential equations" by Yang Song and other researchers, and are widely known as diffusion models.

At present, the diffusion model has achieved great success in a series of applications including image, audio, video generation and solving inverse problems. In the paper "Elucidating the design space of diffusionbased generative models", researchers such as Tero Karras analyzed the design space of the diffusion model and identified three stages, namely i) selecting the scheduling of the noise level, ii) selecting the network parameters. ization (each parameterization generates a different loss function), iii) design the sampling algorithm.

Recently, in an arXiv paper "Soft Diffusion: Score Matching for General Corruptions" jointly conducted by Google Research and UT-Austin, several researchers believe that the diffusion model still has a Important step: corruption. Generally speaking, corruption is a process of adding noise of different amplitudes, and for DDMP also requires rescaling. Although there have been attempts to use different distributions for diffusion, a general framework is still lacking. Therefore, the researchers proposed a diffusion model design framework for a more general damage process.

Specifically, they proposed a new training objective called Soft Score Matching and a novel sampling method, Momentum Sampler. Theoretical results show that for damage processes that satisfy regularity conditions, Soft Score MatchIng is able to learn their scores (i.e., likelihood gradients) that diffusion must transform any image into any image with non-zero likelihood.

In the experimental part, the researchers trained the model on CelebA and CIFAR-10. The model trained on CelebA achieved the SOTA FID score of the linear diffusion model - 1.85. At the same time, the model trained by the researchers is significantly faster than the model trained using the original Gaussian denoising diffusion.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

##Paper address: https://arxiv.org/pdf/2209.05442.pdf

Method Overview

Generally speaking, diffusion models generate images by inverting a damage process that gradually increases noise. The researchers show how to learn to invert diffusion involving linear deterministic degradation and stochastic additive noise.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

#Specifically, the researchers demonstrated a framework for using a more general damage model to train a diffusion model, which consists of three parts, each for new training objectives. Soft Score Matching, novel sampling method Momentum Sampler, and scheduling of damage mechanisms.

Let’s first look at the training target Soft Score Matching. The name is inspired by soft filtering, which is a photography term that refers to a filter that removes fine details. It learns the fraction of a conventional linear damage process in a provable way, also incorporates a filtering process into the network, and trains the model to predict images after damage that match diffusion observations.

This training objective can prove that the score is learned as long as diffusion assigns non-zero probability to any clean, corrupted image pair. Additionally, this condition is always satisfied when additive noise is present in the damage.

Specifically, the researchers explored the damage process in the following form.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

In the process, the researchers discovered that noise has both empirical (i.e., better results) and theoretical (i.e., for learning fractions) benefits. Very important. This also becomes a key difference from Cold Diffusion, a concurrent work that reverses deterministic corruption.

The second is the sampling method Momentum Sampling. The researchers demonstrated that the choice of sampler has a significant impact on the quality of the generated samples. They proposed Momentum Sampler for inverting a universal linear damage process. The sampler uses convex combinations of damage with different diffusion levels and is inspired by momentum methods in optimization.

This sampling method is inspired by the continuous formulation of the diffusion model proposed in the paper by Yang Song et al. above. The algorithm for Momentum Sampler is shown below.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

The following figure visually shows the impact of different sampling methods on the quality of the generated samples. The image sampled with Naive Sampler on the left seems repetitive and lacks detail, while the Momentum Sampler on the right significantly improves the sampling quality and FID score.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

The last thing is scheduling. Even if the type of degradation is predefined (like blurring), deciding how much to damage at each diffusion step is not trivial. The researchers propose a principled tool to guide the design of damage processes. To find the schedule, they minimize the Wasserstein distance between distributions along the path. Intuitively, researchers want a smooth transition from a completely corrupted distribution to a clean distribution.

Experimental Results

The researchers evaluated the proposed method on CelebA-64 and CIFAR-10, both of which are standard baselines for image generation. The main purpose of the experiment is to understand the role of damage type.

The researchers first tried to use blur and low-amplitude noise for damage. The results show that their proposed model achieves SOTA results on CelebA, i.e., an FID score of 1.85, outperforming all other methods that only add noise and possibly rescale the image. In addition, the FID score obtained on CIFAR-10 is 4.64, which is competitive even though it does not reach SOTA.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

In addition, on the CIFAR-10 and CelebA data sets, the researcher's method also performed better on another indicator, sampling time. Another added benefit is significant computational advantages. Deblurring (almost no noise) appears to be a more efficient manipulation compared to image generation denoising methods.

The graph below shows how the FID score changes with the Number of Function Evaluations (NFE). As can be seen from the results, our model can achieve the same or better quality than the standard Gaussian denoising diffusion model using significantly fewer steps on the CIFAR-10 and CelebA datasets.

Soft Diffusion:谷歌新框架从通用扩散过程中正确调度、学习和采样

The above is the detailed content of Soft Diffusion: Google's new framework correctly schedules, learns and samples from a universal diffusion process. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
California Taps AI To Fast-Track Wildfire Recovery PermitsCalifornia Taps AI To Fast-Track Wildfire Recovery PermitsMay 04, 2025 am 11:10 AM

AI Streamlines Wildfire Recovery Permitting Australian tech firm Archistar's AI software, utilizing machine learning and computer vision, automates the assessment of building plans for compliance with local regulations. This pre-validation significan

What The US Can Learn From Estonia's AI-Powered Digital GovernmentWhat The US Can Learn From Estonia's AI-Powered Digital GovernmentMay 04, 2025 am 11:09 AM

Estonia's Digital Government: A Model for the US? The US struggles with bureaucratic inefficiencies, but Estonia offers a compelling alternative. This small nation boasts a nearly 100% digitized, citizen-centric government powered by AI. This isn't

Wedding Planning Via Generative AIWedding Planning Via Generative AIMay 04, 2025 am 11:08 AM

Planning a wedding is a monumental task, often overwhelming even the most organized couples. This article, part of an ongoing Forbes series on AI's impact (see link here), explores how generative AI can revolutionize wedding planning. The Wedding Pl

What Are Digital Defense AI Agents?What Are Digital Defense AI Agents?May 04, 2025 am 11:07 AM

Businesses increasingly leverage AI agents for sales, while governments utilize them for various established tasks. However, consumer advocates highlight the need for individuals to possess their own AI agents as a defense against the often-targeted

A Business Leader's Guide To Generative Engine Optimization (GEO)A Business Leader's Guide To Generative Engine Optimization (GEO)May 03, 2025 am 11:14 AM

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

This Startup Is Using AI Agents To Fight Malicious Ads And Impersonator AccountsThis Startup Is Using AI Agents To Fight Malicious Ads And Impersonator AccountsMay 03, 2025 am 11:13 AM

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

How World Models Are Radically Reshaping The Future Of Generative AI And LLMsHow World Models Are Radically Reshaping The Future Of Generative AI And LLMsMay 03, 2025 am 11:12 AM

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

May Day 2050: What Have We Left To Celebrate?May Day 2050: What Have We Left To Celebrate?May 03, 2025 am 11:11 AM

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft