


Last year, image generation models became popular. After a mass art carnival, copyright issues followed.
The training of deep learning models such as DALL-E 2, Imagen and Stable Diffusion are all trained on hundreds of millions of data. There is no way to get rid of the influence of the training set. , but are some of the generated images completely derived from the training set? If the generated image is very similar to the original image, who owns the copyright?
Recently, researchers from Google, Deepmind, ETH Zurich and many other well-known universities and companies jointly published a paper. They found that diffusion model It is indeed possible to remember the samples in the training set and reproduce them during the generation process.
Paper link: https://arxiv.org/abs/2301.13188
In this work, the researchers show how a diffusion model can remember a single image in its training data and reproduce it as it is generated.
The article proposes a generate-and-filter(generate-and-filter) pipeline, starting from The state-of-the-art model extracts more than a thousand training examples, covering photos of people, trademarks, company logos, and more. We also trained hundreds of diffusion models in different environments to analyze how different modeling and data decisions affect privacy.
Overall, the experimental results show that the diffusion model provides much worse privacy protection for the training set than previous generative models (such as GANs).
I remember, but not much
The denoising diffusion model is a new generative neural network that has emerged recently. It uses an iterative denoising process. Generating images from the training distribution is better than the previously commonly used GAN or VAE models, and it is easier to expand the model and control image generation, so it has quickly become a mainstream method for generating various high-resolution images.
Especially after OpenAI released DALL-E 2, the diffusion model quickly became popular in the entire field of AI generation.
The appeal of generative diffusion models stems from their ability to synthesize new images that are ostensibly different from anything in the training set. In fact, past large-scale training efforts "have not "Discover the problem of overfitting", and researchers in the privacy sensitive domain even proposed that the diffusion model can "protect the privacy of real images" by synthesizing images.
However, these works all rely on an assumption: That is, the diffusion model will not remember and regenerate training data, otherwise it will violate the privacy guarantee and cause many concerns. Problems of model generalization and digital forgery.
But, is this the truth?
To determine whether the generated image comes from the training set, you first need to define what is "memorization".
Previous related work mainly focused on text language models. If the model can recover a verbatim recorded sequence verbatim from the training set, then this sequence is called "extraction" and "memory"; but because this work is based on high-resolution images, word-for-word matching definitions of memory are not suitable.
The following is a memory based on image similarity measures defined by the researchers.
If the distance between a generated image x and multiple samples in the training set is less than a given threshold, then the sample is considered to be from the training set What is obtained through concentration is Eidetic Memorization.
Then, the article designs a two-stage data extraction attack Method:
1 . Generate a large number of images
The first step is simple but computationally expensive: generate images in a black-box manner using the selected prompt as input .
The researchers generated 500 candidate images for each text prompt to increase the odds of discovering memories.
2. Carry out Membership Inference
generate those suspected to be based on the training set memory images are marked.
The member inference attack strategy designed by the researchers is based on the following idea: for two different random initial seeds, the probability of similarity between the two images generated by the diffusion model will be very high, and it is possible that The distance metric is considered to be generated from memory.
Extraction results
To evaluate the effectiveness of the attack, the researchers selected the 350,000 most repeated examples from the training data set and generated 500 images for each prompt Candidate images (175 million images generated in total).
First sort all these generated images by the average distance between images in the clique to identify those that were likely generated by memorizing the training data.
Then these generated images were compared with the training images, and each image was marked as "extracted" and "not extracted". Finally, 94 images were found that were suspected to be extracted from the training set. image.
Through visual analysis, the top 1000 images were manually labeled as "memorized" or "not memorized", and it was found that 13 images were generated by copying training samples.
From the P-R curve, this attack method is very accurate: in 175 million generated images , can recognize 50 memorized images with a false positive rate of 0; and all images generated based on memory can be extracted with an accuracy higher than 50%
To better understand how and why memory occurs, the researchers also trained hundreds of smaller diffusion models on CIFAR10 to analyze the privacy impact of model accuracy, hyperparameters, augmentation, and deduplication.
Unlike diffusion models, GANs are not explicitly trained to Memorize and reconstruct its training data set.
GANs consist of two competing neural networks: a generator and a discriminator. The generator also receives random noise as input, but unlike the diffusion model, it must convert this noise into a valid image in one forward pass.
In the process of training GAN, the discriminator needs to predict whether the image comes from the generator, and the generator needs to improve itself to deceive the discriminator.
Therefore, the difference between the two is that the generator of GAN is only trained using indirect information about the training data (that is, using the gradient from the discriminator) and does not directly receive training data as input.
1 million unconditionally generated training images extracted from different pre-training generation models, and then sorted by FID Place the GAN model (lower is better) at the top and the diffusion model at the bottom.
The results show that diffusion models remember more than GAN models, and better generative models (lower FID) tend to remember more data, that is, Diffusion models are the least private form of image models, leaking more than twice as much training data as GANs.
And from the above results, we can also find that the existing privacy enhancement technology does not provide an acceptable privacy-performance trade-off. To improve the quality of generation, you need to remember more data in the training set.
Overall, this paper highlights the tension between increasingly powerful generative models and data privacy, and raises questions about how diffusion models work and how they can be deployed responsibly.
Copyright Issue
Technically speaking, reconstruction is the advantage of the diffusion model; but from a copyright perspective, reconstruction is its weakness.
Artists have variously argued over their copyright issues due to the excessive similarity between the images generated by the diffusion model and the training data.
For example, AI is prohibited from using its own works for training, and a large number of watermarks are added to published works; and Stable Diffusion has also announced that it plans to only use training that contains authorized content in the next step. dataset, and provides an artist exit mechanism.
The same problem is faced in the NLP field. Some netizens said that millions of words of text have been published since 1993, and all AI including ChatGPT-3 are "being used". It is unethical to use AI-based generative models trained on stolen content.
Although a lot of articles are plagiarized in the world, for ordinary people, plagiarism is just a dispensable thing Shortcuts; but for the creators, the plagiarized content is their hard work.
Will the diffusion model still have advantages in the future?
The above is the detailed content of Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set. For more information, please follow other related articles on the PHP Chinese website!

1 前言在发布DALL·E的15个月后,OpenAI在今年春天带了续作DALL·E 2,以其更加惊艳的效果和丰富的可玩性迅速占领了各大AI社区的头条。近年来,随着生成对抗网络(GAN)、变分自编码器(VAE)、扩散模型(Diffusion models)的出现,深度学习已向世人展现其强大的图像生成能力;加上GPT-3、BERT等NLP模型的成功,人类正逐步打破文本和图像的信息界限。在DALL·E 2中,只需输入简单的文本(prompt),它就可以生成多张1024*1024的高清图像。这些图像甚至

Wav2vec 2.0 [1],HuBERT [2] 和 WavLM [3] 等语音预训练模型,通过在多达上万小时的无标注语音数据(如 Libri-light )上的自监督学习,显著提升了自动语音识别(Automatic Speech Recognition, ASR),语音合成(Text-to-speech, TTS)和语音转换(Voice Conversation,VC)等语音下游任务的性能。然而这些模型都没有公开的中文版本,不便于应用在中文语音研究场景。 WenetSpeech [4] 是

“Making large models smaller”这是很多语言模型研究人员的学术追求,针对大模型昂贵的环境和训练成本,陈丹琦在智源大会青源学术年会上做了题为“Making large models smaller”的特邀报告。报告中重点提及了基于记忆增强的TRIME算法和基于粗细粒度联合剪枝和逐层蒸馏的CofiPruning算法。前者能够在不改变模型结构的基础上兼顾语言模型困惑度和检索速度方面的优势;而后者可以在保证下游任务准确度的同时实现更快的处理速度,具有更小的模型结构。陈丹琦 普

由于复杂的注意力机制和模型设计,大多数现有的视觉 Transformer(ViT)在现实的工业部署场景中不能像卷积神经网络(CNN)那样高效地执行。这就带来了一个问题:视觉神经网络能否像 CNN 一样快速推断并像 ViT 一样强大?近期一些工作试图设计 CNN-Transformer 混合架构来解决这个问题,但这些工作的整体性能远不能令人满意。基于此,来自字节跳动的研究者提出了一种能在现实工业场景中有效部署的下一代视觉 Transformer——Next-ViT。从延迟 / 准确性权衡的角度看,

3月27号,Stability AI的创始人兼首席执行官Emad Mostaque在一条推文中宣布,Stable Diffusion XL 现已可用于公开测试。以下是一些事项:“XL”不是这个新的AI模型的官方名称。一旦发布稳定性AI公司的官方公告,名称将会更改。与先前版本相比,图像质量有所提高与先前版本相比,图像生成速度大大加快。示例图像让我们看看新旧AI模型在结果上的差异。Prompt: Luxury sports car with aerodynamic curves, shot in a

译者 | 李睿审校 | 孙淑娟近年来, Transformer 机器学习模型已经成为深度学习和深度神经网络技术进步的主要亮点之一。它主要用于自然语言处理中的高级应用。谷歌正在使用它来增强其搜索引擎结果。OpenAI 使用 Transformer 创建了著名的 GPT-2和 GPT-3模型。自从2017年首次亮相以来,Transformer 架构不断发展并扩展到多种不同的变体,从语言任务扩展到其他领域。它们已被用于时间序列预测。它们是 DeepMind 的蛋白质结构预测模型 AlphaFold

人工智能就是一个「拼财力」的行业,如果没有高性能计算设备,别说开发基础模型,就连微调模型都做不到。但如果只靠拼硬件,单靠当前计算性能的发展速度,迟早有一天无法满足日益膨胀的需求,所以还需要配套的软件来协调统筹计算能力,这时候就需要用到「智能计算」技术。最近,来自之江实验室、中国工程院、国防科技大学、浙江大学等多达十二个国内外研究机构共同发表了一篇论文,首次对智能计算领域进行了全面的调研,涵盖了理论基础、智能与计算的技术融合、重要应用、挑战和未来前景。论文链接:https://spj.scien

说起2010年南非世界杯的最大网红,一定非「章鱼保罗」莫属!这只位于德国海洋生物中心的神奇章鱼,不仅成功预测了德国队全部七场比赛的结果,还顺利地选出了最终的总冠军西班牙队。不幸的是,保罗已经永远地离开了我们,但它的「遗产」却在人们预测足球比赛结果的尝试中持续存在。在艾伦图灵研究所(The Alan Turing Institute),随着2022年卡塔尔世界杯的持续进行,三位研究员Nick Barlow、Jack Roberts和Ryan Chan决定用一种AI算法预测今年的冠军归属。预测模型图


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

Dreamweaver Mac version
Visual web development tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),