Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set-AI-php.cn

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

PHPz

Apr 16, 2023 pm 02:10 PM

imageModel

Last year, image generation models became popular. After a mass art carnival, copyright issues followed.

The training of deep learning models such as DALL-E 2, Imagen and Stable Diffusion are all trained on hundreds of millions of data. There is no way to get rid of the influence of the training set. , but are some of the generated images completely derived from the training set? If the generated image is very similar to the original image, who owns the copyright?

Recently, researchers from Google, Deepmind, ETH Zurich and many other well-known universities and companies jointly published a paper. They found that diffusion model It is indeed possible to remember the samples in the training set and reproduce them during the generation process.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

Paper link: https://arxiv.org/abs/2301.13188

In this work, the researchers show how a diffusion model can remember a single image in its training data and reproduce it as it is generated.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

The article proposes a generate-and-filter(generate-and-filter) pipeline, starting from The state-of-the-art model extracts more than a thousand training examples, covering photos of people, trademarks, company logos, and more. We also trained hundreds of diffusion models in different environments to analyze how different modeling and data decisions affect privacy.

Overall, the experimental results show that the diffusion model provides much worse privacy protection for the training set than previous generative models (such as GANs).

I remember, but not much

The denoising diffusion model is a new generative neural network that has emerged recently. It uses an iterative denoising process. Generating images from the training distribution is better than the previously commonly used GAN or VAE models, and it is easier to expand the model and control image generation, so it has quickly become a mainstream method for generating various high-resolution images.

Especially after OpenAI released DALL-E 2, the diffusion model quickly became popular in the entire field of AI generation.

The appeal of generative diffusion models stems from their ability to synthesize new images that are ostensibly different from anything in the training set. In fact, past large-scale training efforts "have not "Discover the problem of overfitting", and researchers in the privacy sensitive domain even proposed that the diffusion model can "protect the privacy of real images" by synthesizing images.

However, these works all rely on an assumption: That is, the diffusion model will not remember and regenerate training data, otherwise it will violate the privacy guarantee and cause many concerns. Problems of model generalization and digital forgery.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

But, is this the truth?

To determine whether the generated image comes from the training set, you first need to define what is "memorization".

Previous related work mainly focused on text language models. If the model can recover a verbatim recorded sequence verbatim from the training set, then this sequence is called "extraction" and "memory"; but because this work is based on high-resolution images, word-for-word matching definitions of memory are not suitable.

The following is a memory based on image similarity measures defined by the researchers.

If the distance between a generated image x and multiple samples in the training set is less than a given threshold, then the sample is considered to be from the training set What is obtained through concentration is Eidetic Memorization.

Then, the article designs a two-stage data extraction attack Method:

1 . Generate a large number of images

The first step is simple but computationally expensive: generate images in a black-box manner using the selected prompt as input .

The researchers generated 500 candidate images for each text prompt to increase the odds of discovering memories.

2. Carry out Membership Inference

generate those suspected to be based on the training set memory images are marked.

The member inference attack strategy designed by the researchers is based on the following idea: for two different random initial seeds, the probability of similarity between the two images generated by the diffusion model will be very high, and it is possible that The distance metric is considered to be generated from memory.

Extraction results

To evaluate the effectiveness of the attack, the researchers selected the 350,000 most repeated examples from the training data set and generated 500 images for each prompt Candidate images (175 million images generated in total).

First sort all these generated images by the average distance between images in the clique to identify those that were likely generated by memorizing the training data.

Then these generated images were compared with the training images, and each image was marked as "extracted" and "not extracted". Finally, 94 images were found that were suspected to be extracted from the training set. image.

Through visual analysis, the top 1000 images were manually labeled as "memorized" or "not memorized", and it was found that 13 images were generated by copying training samples.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

From the P-R curve, this attack method is very accurate: in 175 million generated images , can recognize 50 memorized images with a false positive rate of 0; and all images generated based on memory can be extracted with an accuracy higher than 50%

To better understand how and why memory occurs, the researchers also trained hundreds of smaller diffusion models on CIFAR10 to analyze the privacy impact of model accuracy, hyperparameters, augmentation, and deduplication.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

##Diffusion vs GAN

Unlike diffusion models, GANs are not explicitly trained to Memorize and reconstruct its training data set.

GANs consist of two competing neural networks: a generator and a discriminator. The generator also receives random noise as input, but unlike the diffusion model, it must convert this noise into a valid image in one forward pass.

In the process of training GAN, the discriminator needs to predict whether the image comes from the generator, and the generator needs to improve itself to deceive the discriminator.

Therefore, the difference between the two is that the generator of GAN is only trained using indirect information about the training data (that is, using the gradient from the discriminator) and does not directly receive training data as input.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

1 million unconditionally generated training images extracted from different pre-training generation models, and then sorted by FID Place the GAN model (lower is better) at the top and the diffusion model at the bottom.

The results show that diffusion models remember more than GAN models, and better generative models (lower FID) tend to remember more data, that is, Diffusion models are the least private form of image models, leaking more than twice as much training data as GANs.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

And from the above results, we can also find that the existing privacy enhancement technology does not provide an acceptable privacy-performance trade-off. To improve the quality of generation, you need to remember more data in the training set.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

Overall, this paper highlights the tension between increasingly powerful generative models and data privacy, and raises questions about how diffusion models work and how they can be deployed responsibly.

Copyright Issue

Technically speaking, reconstruction is the advantage of the diffusion model; but from a copyright perspective, reconstruction is its weakness.

Artists have variously argued over their copyright issues due to the excessive similarity between the images generated by the diffusion model and the training data.

For example, AI is prohibited from using its own works for training, and a large number of watermarks are added to published works; and Stable Diffusion has also announced that it plans to only use training that contains authorized content in the next step. dataset, and provides an artist exit mechanism.

The same problem is faced in the NLP field. Some netizens said that millions of words of text have been published since 1993, and all AI including ChatGPT-3 are "being used". It is unethical to use AI-based generative models trained on stolen content.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

Although a lot of articles are plagiarized in the world, for ordinary people, plagiarism is just a dispensable thing Shortcuts; but for the creators, the plagiarized content is their hard work.

Will the diffusion model still have advantages in the future?

The above is the detailed content of Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Guide to Reinforcement Finetuning - Analytics VidhyaApr 28, 2025 am 09:30 AM

Reinforcement finetuning has shaken up AI development by teaching models to adjust based on human feedback. It blends supervised learning foundations with reward-based updates to make them safer, more accurate, and genuinely help

Let's Dance: Structured Movement To Fine-Tune Our Human Neural NetsApr 27, 2025 am 11:09 AM

Scientists have extensively studied human and simpler neural networks (like those in C. elegans) to understand their functionality. However, a crucial question arises: how do we adapt our own neural networks to work effectively alongside novel AI s

New Google Leak Reveals Subscription Changes For Gemini AIApr 27, 2025 am 11:08 AM

Google's Gemini Advanced: New Subscription Tiers on the Horizon Currently, accessing Gemini Advanced requires a $19.99/month Google One AI Premium plan. However, an Android Authority report hints at upcoming changes. Code within the latest Google P

How Data Analytics Acceleration Is Solving AI's Hidden BottleneckApr 27, 2025 am 11:07 AM

Despite the hype surrounding advanced AI capabilities, a significant challenge lurks within enterprise AI deployments: data processing bottlenecks. While CEOs celebrate AI advancements, engineers grapple with slow query times, overloaded pipelines, a

MarkItDown MCP Can Convert Any Document into Markdowns!Apr 27, 2025 am 09:47 AM

Handling documents is no longer just about opening files in your AI projects, it’s about transforming chaos into clarity. Docs such as PDFs, PowerPoints, and Word flood our workflows in every shape and size. Retrieving structured

How to Use Google ADK for Building Agents? - Analytics VidhyaApr 27, 2025 am 09:42 AM

Harness the power of Google's Agent Development Kit (ADK) to create intelligent agents with real-world capabilities! This tutorial guides you through building conversational agents using ADK, supporting various language models like Gemini and GPT. W

Use of SLM over LLM for Effective Problem Solving - Analytics VidhyaApr 27, 2025 am 09:27 AM

summary: Small Language Model (SLM) is designed for efficiency. They are better than the Large Language Model (LLM) in resource-deficient, real-time and privacy-sensitive environments. Best for focus-based tasks, especially where domain specificity, controllability, and interpretability are more important than general knowledge or creativity. SLMs are not a replacement for LLMs, but they are ideal when precision, speed and cost-effectiveness are critical. Technology helps us achieve more with fewer resources. It has always been a promoter, not a driver. From the steam engine era to the Internet bubble era, the power of technology lies in the extent to which it helps us solve problems. Artificial intelligence (AI) and more recently generative AI are no exception

How to Use Google Gemini Models for Computer Vision Tasks? - Analytics VidhyaApr 27, 2025 am 09:26 AM

Harness the Power of Google Gemini for Computer Vision: A Comprehensive Guide Google Gemini, a leading AI chatbot, extends its capabilities beyond conversation to encompass powerful computer vision functionalities. This guide details how to utilize

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),