Home >Technology peripherals >AI >Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set
Last year, image generation models became popular. After a mass art carnival, copyright issues followed.
The training of deep learning models such as DALL-E 2, Imagen and Stable Diffusion are all trained on hundreds of millions of data. There is no way to get rid of the influence of the training set. , but are some of the generated images completely derived from the training set? If the generated image is very similar to the original image, who owns the copyright?
Recently, researchers from Google, Deepmind, ETH Zurich and many other well-known universities and companies jointly published a paper. They found that diffusion model It is indeed possible to remember the samples in the training set and reproduce them during the generation process.
Paper link: https://arxiv.org/abs/2301.13188
In this work, the researchers show how a diffusion model can remember a single image in its training data and reproduce it as it is generated.
The article proposes a generate-and-filter(generate-and-filter) pipeline, starting from The state-of-the-art model extracts more than a thousand training examples, covering photos of people, trademarks, company logos, and more. We also trained hundreds of diffusion models in different environments to analyze how different modeling and data decisions affect privacy.
Overall, the experimental results show that the diffusion model provides much worse privacy protection for the training set than previous generative models (such as GANs).
The denoising diffusion model is a new generative neural network that has emerged recently. It uses an iterative denoising process. Generating images from the training distribution is better than the previously commonly used GAN or VAE models, and it is easier to expand the model and control image generation, so it has quickly become a mainstream method for generating various high-resolution images.
Especially after OpenAI released DALL-E 2, the diffusion model quickly became popular in the entire field of AI generation.
The appeal of generative diffusion models stems from their ability to synthesize new images that are ostensibly different from anything in the training set. In fact, past large-scale training efforts "have not "Discover the problem of overfitting", and researchers in the privacy sensitive domain even proposed that the diffusion model can "protect the privacy of real images" by synthesizing images.
However, these works all rely on an assumption: That is, the diffusion model will not remember and regenerate training data, otherwise it will violate the privacy guarantee and cause many concerns. Problems of model generalization and digital forgery.
But, is this the truth?
To determine whether the generated image comes from the training set, you first need to define what is "memorization".
Previous related work mainly focused on text language models. If the model can recover a verbatim recorded sequence verbatim from the training set, then this sequence is called "extraction" and "memory"; but because this work is based on high-resolution images, word-for-word matching definitions of memory are not suitable.
The following is a memory based on image similarity measures defined by the researchers.
If the distance between a generated image x and multiple samples in the training set is less than a given threshold, then the sample is considered to be from the training set What is obtained through concentration is Eidetic Memorization.
Then, the article designs a two-stage data extraction attack Method:
1 . Generate a large number of images
The first step is simple but computationally expensive: generate images in a black-box manner using the selected prompt as input .
The researchers generated 500 candidate images for each text prompt to increase the odds of discovering memories.
2. Carry out Membership Inference
generate those suspected to be based on the training set memory images are marked.
The member inference attack strategy designed by the researchers is based on the following idea: for two different random initial seeds, the probability of similarity between the two images generated by the diffusion model will be very high, and it is possible that The distance metric is considered to be generated from memory.
To evaluate the effectiveness of the attack, the researchers selected the 350,000 most repeated examples from the training data set and generated 500 images for each prompt Candidate images (175 million images generated in total).
First sort all these generated images by the average distance between images in the clique to identify those that were likely generated by memorizing the training data.
Then these generated images were compared with the training images, and each image was marked as "extracted" and "not extracted". Finally, 94 images were found that were suspected to be extracted from the training set. image.
Through visual analysis, the top 1000 images were manually labeled as "memorized" or "not memorized", and it was found that 13 images were generated by copying training samples.
From the P-R curve, this attack method is very accurate: in 175 million generated images , can recognize 50 memorized images with a false positive rate of 0; and all images generated based on memory can be extracted with an accuracy higher than 50%
To better understand how and why memory occurs, the researchers also trained hundreds of smaller diffusion models on CIFAR10 to analyze the privacy impact of model accuracy, hyperparameters, augmentation, and deduplication.
##Diffusion vs GANUnlike diffusion models, GANs are not explicitly trained to Memorize and reconstruct its training data set.
GANs consist of two competing neural networks: a generator and a discriminator. The generator also receives random noise as input, but unlike the diffusion model, it must convert this noise into a valid image in one forward pass.
In the process of training GAN, the discriminator needs to predict whether the image comes from the generator, and the generator needs to improve itself to deceive the discriminator.
Therefore, the difference between the two is that the generator of GAN is only trained using indirect information about the training data (that is, using the gradient from the discriminator) and does not directly receive training data as input.
1 million unconditionally generated training images extracted from different pre-training generation models, and then sorted by FID Place the GAN model (lower is better) at the top and the diffusion model at the bottom.
The results show that diffusion models remember more than GAN models, and better generative models (lower FID) tend to remember more data, that is, Diffusion models are the least private form of image models, leaking more than twice as much training data as GANs.
And from the above results, we can also find that the existing privacy enhancement technology does not provide an acceptable privacy-performance trade-off. To improve the quality of generation, you need to remember more data in the training set.
Overall, this paper highlights the tension between increasingly powerful generative models and data privacy, and raises questions about how diffusion models work and how they can be deployed responsibly.
Technically speaking, reconstruction is the advantage of the diffusion model; but from a copyright perspective, reconstruction is its weakness.
Artists have variously argued over their copyright issues due to the excessive similarity between the images generated by the diffusion model and the training data.
For example, AI is prohibited from using its own works for training, and a large number of watermarks are added to published works; and Stable Diffusion has also announced that it plans to only use training that contains authorized content in the next step. dataset, and provides an artist exit mechanism.
The same problem is faced in the NLP field. Some netizens said that millions of words of text have been published since 1993, and all AI including ChatGPT-3 are "being used". It is unethical to use AI-based generative models trained on stolen content.
Although a lot of articles are plagiarized in the world, for ordinary people, plagiarism is just a dispensable thing Shortcuts; but for the creators, the plagiarized content is their hard work.
Will the diffusion model still have advantages in the future?
The above is the detailed content of Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set. For more information, please follow other related articles on the PHP Chinese website!