Home > Article > Technology peripherals > 2022 Top10 self-supervised learning models released! Eight achievements of the United States and China dominate the list
Self-supervised learning enables computers to observe the world and understand it by learning the structure of images, speech, or text. This has driven many of the recent major advances in artificial intelligence.
Despite the considerable efforts that researchers around the world have invested in this area, there are currently large differences in the way self-supervised learning algorithms learn from images, speech, text and other modalities. Therefore, the artificial intelligence forum Analytics India Magazine launches the top ten self-supervised learning models in 2022 for the readers.
Paper link: https://arxiv.org/pdf/2202.03555.pdf
Open source code: https://t.co/3x8VCwGI2x pic.twitter.com/Q9TNDg1paj
Meta AI released the data2vec algorithm in January for speech, image and text related computer vision model. According to the AI team, the model is highly competitive in NLP tasks.
It does not use contrastive learning or reconstruction that relies on input examples. The Meta AI team stated that the training method of data2vec is to represent the predictive model by providing a partial view of the input data.
The team said: "We first encode the masked training samples in the student model. After that, in the same model, we encode the unmasked input samples to build the training target. This model (teacher model) and the student model only differ in parameters."
This model predicts the model representation of unmasked training samples based on the masked training samples. This eliminates the dependence on modality-specific objectives in the learning task.
Paper link: https://arxiv.org/pdf/2201.03545.pdf
Open source code: https://t.co/nWx2KFtl7X
ConvNext, also called ConvNet model for the 2020s, is a model released by the Meta AI team in March Model. It is entirely based on ConvNet's modules and is therefore accurate, simple in design, and scalable.
##VICReg##Paper link: https:// t.co/H7crDPHCHV
Open source code: https://t.co/oadSBT61P3
Variance-invariant covariance regularization (VICReg) combines the variance terms and Decorrelation mechanism based on redundancy reduction and covariance regularization to avoid the collapse of the encoder producing constant or uninformative vectors.
VICReg does not require techniques such as weight sharing between branches, batch normalization, feature normalization, output quantization, stopping gradients, memory banks, etc., and performs well on several downstream tasks The results achieved are comparable to the state of the art. Furthermore, it has been experimentally demonstrated that the variance regularization term can stabilize the training of other methods and promote performance improvements.
STEGO
Paper link: https://arxiv.org/abs/2203.08414
MIT’s Computer Science and Artificial Intelligence Laboratory collaborated with Microsoft and Cornell University to develop the Self-supervised Transformer for Energy-Based Graph Optimization (STEGO) to solve one of the most difficult tasks in computer vision. : Assign a label to every pixel of an image without human supervision.
#STEGO learned "semantic segmentation" - simply put, assigning a label to each pixel in the image.
Semantic segmentation is an important skill for today's computer vision systems because images may be interfered by objects. To make matters more difficult, these objects don't always fit within the text box. Algorithms are often better suited to discrete “things” like people and cars than to hard-to-quantify things like vegetation, the sky, and mashed potatoes.
Take the scene of dogs playing in the park as an example. Previous systems may only be able to identify dogs, but by assigning a label to each pixel of the image, STEGO can decompose the image into several main components: Dog , sky, grass and its owner.
Machines that can "see the world" are crucial to a variety of emerging technologies, such as self-driving cars and predictive models for medical diagnosis. Since STEGO can learn without labels, it can detect objects in different domains, even objects that humans do not yet fully understand.
Paper link: https://arxiv.org/pdf/2210.04062.pdf
For self-supervised speech representation learning, researchers from the Chinese University of Hong Kong (Shenzhen) proposed Code BERT (CoBERT). Unlike other self-distillation methods, their model predicts representations from different modalities. The model converts speech into a sequence of discrete codes for representation learning.
First, the research team used the HuBERT pre-trained code model to train in discrete space. They then refined the code model into a speech model, aiming to perform better learning across modalities. The significant improvement on the ST task suggests that CoBERT's representations may carry more linguistic information than previous work.
CoBERT outperforms the performance of the best current algorithms on ASR tasks and brings significant improvements in the SUPERB Speech Translation (ST) task.
##Paper link: https://arxiv.org/pdf/ 2202.00758.pdf
Researchers at Nokia Bell Labs, in collaboration with Georgia Institute of Technology and the University of Cambridge, have developed ColloSSL, a collaborative self-supervised algorithm for human activity recognition.
Unlabeled sensor data sets captured simultaneously by multiple devices can be viewed as natural transformations of each other, which then generate signals for representation learning. This paper proposes three methods - device selection, contrastive sampling and multi-view contrastive loss.
Paper link: https://arxiv.org/pdf/2207.10023.pdf
Sungkyunkwan A university research team proposes a simple self-supervised auxiliary task that predicts localizable rotations (LoRot) with three attributes to assist in supervising the target.
This model has three major characteristics. First, the research team guided the model to learn rich features. Second, distributed training does not change significantly while the self-supervision transition occurs. Third, the model is lightweight and versatile and has high adaptability to previous technologies.
The above is the detailed content of 2022 Top10 self-supervised learning models released! Eight achievements of the United States and China dominate the list. For more information, please follow other related articles on the PHP Chinese website!