


CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples
After entering the pre-training era, the performance of visual recognition models has developed rapidly, but image generation models, such as generative adversarial networks (GAN), seem to have fallen behind.
Usually GAN training is done from scratch in an unsupervised manner, which is time-consuming and labor-intensive. The "knowledge" learned through big data in large-scale pre-training is not used. Isn't it a big loss?
Moreover, the image generation itself needs to be able to capture and simulate complex statistical data in real-world visual phenomena. Otherwise, the generated images will not conform to the laws of the physical world and will be directly identified as "fake" at a glance.
The pre-trained model provides knowledge and the GAN model provides generation capabilities. The combination of the two is probably a beautiful thing!
The question is, which pre-trained models and how to combine them can improve the generation ability of the GAN model?
Recently, researchers from CMU and Adobe published an article in CVPR 2022, combining the pre-training model with the training of the GAN model through "selection".
Paper link: https://arxiv.org/abs/2112.09130
Project link: https://github.com/nupurkmr9/vision- aided-gan
Video link: https://www.youtube.com/watch?v=oHdyJNdQ9E4
The training process of the GAN model consists of a discriminator and a generator, where the discriminator The generator is used to learn the relevant statistics that distinguish real samples from generated samples, while the goal of the generator is to make the generated images as consistent as possible with the real distribution.
Ideally, the discriminator should be able to measure the distribution gap between the generated image and the real image.
But when the amount of data is very limited, directly using a large-scale pre-trained model as the discriminator can easily lead to the generator being "ruthlessly crushed" and then "overfitting".
Through experiments on the FFHQ 1k data set, even if the latest differentiable data enhancement method is used, the discriminator will still be overfitted. The training set performance is very strong, but it performs very poorly on the verification set. Difference.
# Additionally, the discriminator may focus on disguises that are indiscernible to humans but obvious to machines.
In order to balance the capabilities of the discriminator and generator, researchers proposed to assemble the representations of a different set of pre-trained models as the discriminator.
This method has two advantages:
1. Training a shallow classifier on pre-trained features allows the deep network to adapt to small scale A common method for data sets while reducing overfitting.
That is to say, as long as the parameters of the pre-trained model are fixed, and then a lightweight classification network is added to the top layer, a stable training process can be provided.
For example, from the Ours curve in the above experiment, you can see that the accuracy of the verification set is much improved compared to StyleGAN2-ADA.
2. Some recent studies have proven that deep networks can capture meaningful visual concepts, from low-level visual cues (edges and textures) to high-level concepts (objects and object parts). .
The discriminator based on these features may be more in line with human perception.
And combining multiple pre-trained models can promote the generator to match the real distribution in different, complementary feature spaces.
In order to select the best pre-training network, the researchers first collected multiple sota models to form a "model bank", including VGG-16 for classification and Swin-T for detection and segmentation. wait.
Then based on the linear segmentation of real and fake images in the feature space, an automatic model search strategy is proposed, and label smoothing and differentiable enhancement techniques are used to further stabilize the model training to reduce overfitting.
Specifically, the union of real training samples and generated images is divided into a training set and a verification set.
For each pre-trained model, train a logical linear discriminator to classify whether the sample is from a real sample or a generated one, and use "negative binary cross-entropy loss" on the validation split to measure the distribution gap, and Return the model with the smallest error.
A lower validation error is associated with higher linear detection accuracy, indicating that these features are useful for distinguishing real samples from generated samples, and using these features can provide more useful feedback to the generator.
Researchers We empirically verified GAN training using 1000 training samples from the FFHQ and LSUN CAT data sets.
The results show that GAN trained with the pre-trained model has higher linear detection accuracy and, generally speaking, can achieve better FID indicators.
In order to incorporate feedback from multiple ready-made models, the article also explores two model selection and integration strategies
1) K-fixed model selection strategy, selecting the K best ones at the beginning of training Ready-made models and train until convergence;
2) K-progressive model selection strategy, iteratively select and add the best performing and unused model after a fixed number of iterations.
The experimental results show that compared with the K-fixed strategy, the progressive approach has lower computational complexity and is also helpful in selecting pre-trained models to capture differences in data distribution. For example, the first two models selected by the progressive strategy are usually a pair of self-supervised and supervised models.
The experiments in the article are mainly progressive.
The final training algorithm first trains a GAN with a standard adversarial loss.
Given a baseline generator, linear probing can be used to search for the best pre-trained model and introduce a loss objective function during training.
In the K-progressive strategy, after training for a fixed number of iterations proportional to the number of available real training samples, a new visually auxiliary discriminator is added to the previous stage with the best training set In the snapshot of FID.
During the training process, data augmentation is performed by horizontal flipping, and differentiable augmentation techniques and one-sided label smoothing are used as regularization terms.
It can also be observed that using only off-the-shelf models as discriminators leads to divergence, while the combination of original discriminators and pre-trained models can improve this situation.
The final experiment shows the results when the training samples of the FFHQ, LSUN CAT and LSUN CHURCH data sets vary from 1k to 10k.
In all settings, FID can achieve significant improvements, proving the effectiveness of this method in limited data scenarios.
In order to qualitatively analyze the differences between this method and StyleGAN2-ADA, according to the quality of samples generated by the two methods, the new method proposed in the article can improve the quality of the worst samples, especially for FFHQ and LSUN CAT
When we gradually add the next discriminator, we can see that the accuracy of linear detection on the features of the pre-trained model is gradually declining, that is to say Generators are stronger.
Overall, with only 10,000 training samples, this method performs better on FID on LSUN CAT than training on 1.6 million images The performance of StyleGAN2 is similar.
On the complete data set, this method improves FID by 1.5 to 2 times on the LSUN cat, church, and horse categories.
The author Richard Zhang received his PhD from the University of California, Berkeley, and his undergraduate and master's degrees from Cornell University. Main research interests include computer vision, machine learning, deep learning, graphics and image processing, often working with academic researchers through internships or university.
The author Jun-Yan Zhu is an assistant professor in the School of Robotics in the School of Computer Science at Carnegie Mellon University, and serves in the Department of Computer Science and the Machine Learning Department. ,The main research areas include computer vision, computer graphics, machine learning and computational photography.
Before joining CMU, he was a research scientist at Adobe Research. He graduated from Tsinghua University with a bachelor's degree and a Ph.D. from the University of California, Berkeley, and then worked as a postdoctoral fellow at MIT CSAIL.
#
The above is the detailed content of CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples. For more information, please follow other related articles on the PHP Chinese website!

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Zend Studio 13.0.1
Powerful PHP integrated development environment

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool