Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA-AI-php.cn

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 15, 2023 pm 03:04 PM

aipaper

Unconsciously, large models with small samples have become the mainstream approach in the field of small sample learning. In many task contexts, a common idea is to first label small data samples, and then start from pre-training large samples. The model is trained based on small data samples. Although as we have seen, large models have achieved amazing results on a wide range of small sample learning tasks, it also naturally puts some of the inherent shortcomings of large models in the spotlight of small sample learning.

Small sample learning expects the model to have the ability to complete independent reasoning based on a small number of samples. In other words, the ideal model should master problem-solving ideas by solving problems, so as to face new emerging problems. Questions can draw inferences from one instance to another. However, the ideal and practical learning ability of large models and small samples seems to rely on the large amount of information stored during the training of large models to memorize the process of solving a problem. Although it is extremely brave on various data sets, it will always give People are confused. Is a student who studies in this way really a potential student?

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

The paper introduced today by Meta AI takes a new approach and applies the retrieval enhancement method to the field of small sample learning. Not only With only 64 examples, it achieved an accuracy of 42% on the Natural Questions data set (Natural Questions). It also compared the large model PaLM to reduce the number of parameters by 50 times (540B->11B), and improved the interpretability. , controllability, updateability and other aspects have significant advantages that other large models do not have.

Paper title:Few-shot Learning with Retrieval Augmented Language ModelsPaper link:https://arxiv.org/pdf/2208.03299.pdf

Retrieval enhanced traceability

The beginning of the paper , and asked everyone a question: "In the field of small sample learning, is it really necessary to use a huge number of parameters to store information?" Looking at the development of large models, successive large models can continue to work on SOTA. One of the reasons is that its huge parameters store the information needed for the problem. Since the birth of Transformer, large models have been the mainstream paradigm in the field of NLP. With the gradual development of large models, "big" problems are constantly exposed, and it is quite meaningful to ask about the necessity of "big". The author of the paper starts from Starting from this question, a negative answer is given to this question, and the method is to retrieve the enhanced model.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

Enhanced traceability retrieval. In fact, although its technology is mainly used in tasks such as open domain question answering, machine reading, and text generation, retrieval The idea of reinforcement can be traced back to the RNN era of NLP. The shortcoming of the RNN model that cannot solve the long-term dependence of data has prompted researchers to widely explore solutions. The Transformer, which we are quite familiar with, uses the Attention mechanism to effectively solve the problem of the model's inability to remember, thus opening the door to pre-training large models. era.

At that time, there was actually another way, which was Cached LM. Its core idea is that since RNN may not be able to remember it as soon as it enters the exam room, then simply let it In the RNN open-book exam, the Cache mechanism is introduced to store the words predicted during training in the Cache. During prediction, the information from both query and cache index can be combined to complete the task, thereby solving the shortcomings of the RNN model at that time.

As a result, retrieval enhancement technology has embarked on a completely different path from large models that rely on parameter memory information. The model based on retrieval enhancement allows the introduction of external knowledge from different sources, and these retrieval sources include training corpus, external data, unsupervised data and other options. Retrieval enhancement models generally consist of a retriever and a generator. The retriever obtains relevant knowledge from external retrieval sources based on the query, and the generator combines the query with the retrieved relevant knowledge to perform model predictions.

In the final analysis, the goal of the retrieval-enhanced model is to expect the model to not only learn to remember the data, but also to learn to find the data on its own. This feature has great advantages in many knowledge-intensive tasks and the retrieval-enhanced model also Great success has been achieved in these areas, but whether retrieval enhancement is suitable for few-shot learning is unknown. Going back to this paper in Meta AI, we successfully tested the application of retrieval enhancement in small sample learning, and Atlas came into being.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

Model structure

Atlas has two sub-models, a retriever and a language model. When faced with a task, Atlas uses a searcher to generate the most relevant top-k documents from a large amount of corpus based on the input question, and then puts these documents into the language model together with the question query to generate the required Output.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

The basic training strategy of the Atlas model is to jointly train the retriever and the language model using the same loss function. Both the retriever and the language model are based on the pre-trained Transformer network, where:

The retriever is designed based on Contriever, which is pre-trained through unsupervised data and uses a two-layer encoder , query and document are independently encoded into the encoder, and the similarity between query and document is obtained through the dot product of the corresponding output. This design allows Atlas to train the retriever without document annotations, significantly reducing memory requirements.
The language model is trained based on T5. Different documents and queries are spliced to each other and processed independently by the encoder. Finally, the decoder performs Cross-Attention on all retrieved paragraphs in series to get the final result. Output. This Fusion-in-Decoder approach helps Atlas effectively adapt to the expansion of the number of documents.

It is worth noting that the author compared and tested four loss functions and the situation without joint training of the retriever and language model. The results are as follows:

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

It can be seen that in a small sample environment, the accuracy obtained by using the joint training method is significantly higher than that without joint training. Therefore, the author It is concluded that this joint training of the retriever and the language model is the key to Atlas' small-shot learning capabilities.

Experimental results

In the large-scale multi-task language understanding task (MMLU), compared with other models, Atlas has only 11B parameters. In this case, it has a better accuracy rate than GPT-3, which has 15 times the number of parameters of Atlas. After the introduction of multi-task training, the accuracy rate in the 5-shot test is even close to that of Gopher, which has 25 times the number of parameters of Atlas.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

In the two test data of open domain question answering-NaturalQuestions and TriviaQA, the performance of Atlas and other models on 64 examples were compared. And the performance on the full training set is shown in the figure below. Atlas achieved a new SOTA in 64-shot, achieving an accuracy of 84.7% on TrivuaQA using only 64 data.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

In the fact-checking task (FEVER), Atlas also performed significantly better on small samples than Gopher and Gopher, which had dozens of times the number of parameters as Atlas. ProoFVer, outperformed Gopher by 5.1% in the 15-shot task.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

On KILT, the self-published benchmark for knowledge-intensive natural language processing tasks, the accuracy of Atlas trained using 64 samples in some tasks It is even close to the accuracy obtained by other models using full samples. After using full samples to train Atlas, Atlas refreshed the SOTA on five data sets.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

Interpretability, Controllability, Updateability

According to the research in this paper, the retrieval enhancement model not only takes into account smaller and better, but also In terms of interpretability, it also has significant advantages that other large models do not have. The black-box nature of large models makes it difficult for researchers to use large models to analyze the model's operating mechanism. However, the retrieval-enhanced model can directly extract the retrieved documents, so that by analyzing the articles retrieved by the retrieval, we can obtain insights into Atlas work. Better understanding. For example, the paper found that in the field of abstract algebra, 73% of the model's corpus relied on Wikipedia, while in ethics-related fields, only 3% of the documents extracted by the searcher came from Wikipedia, which is consistent with human intuition. As shown in the statistical chart on the left side of the figure below, although the model prefers to use CCNet data, in STEM fields that focus more on formulas and reasoning, the usage rate of Wikipedia articles has increased significantly.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

According to the statistical chart on the right side of the above figure, the author found that as the number of retrieved articles containing correct answers increases, the model is more accurate The accuracy rate is also rising. When the article does not contain the answer, it is only 55% correct. When the answer is mentioned more than 15 times, the accuracy rate reaches 77%. In addition, when manually inspecting the documents retrieved by 50 search engines, it was found that 44% of them contained useful background information. Obviously, these materials containing background information on issues can provide researchers with great opportunities to expand their reading. help.

Generally speaking, we tend to think that large models have the risk of "leakage" of training data, that is, sometimes the answers of large models to test questions are not based on the learning ability of the model but on the large model. The memory ability of the model means that the answers to the test questions are leaked in the large amount of corpus learned by the large model. In this paper, after the author manually eliminated the corpus information that may have been leaked, the model accuracy dropped from 56.4%. It reached 55.8%, a decrease of only 0.6%. It can be seen that the retrieval enhancement method can effectively avoid the risk of model cheating.

Finally, updateability is also a unique advantage of the retrieval enhancement model. The retrieval enhancement model can be updated from time to time without retraining, but only by updating or replacing the corpus it relies on. By constructing a time series data set, as shown in the figure below, without updating the Atlas parameters, the author achieved an accuracy of 53.1% just by using the 2020 corpus Atlas. What is interesting is that even with the 2020 data fine-tuning T5, T5 also did not perform very well. The author believes that the reason is largely due to the fact that the data used in the pre-training of T5 is data before 2020.

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

Conclusion

We can imagine that there are three students, and one student only relies on rote memorization to solve the problem. , the answer to a math problem can be recited accurately. One student relies on looking up books. When encountering a problem, he will not first search for the information to find the most suitable one and then answer one by one. The last student is talented and smart and can learn simply. With some knowledge in textbooks, you can confidently go to the examination room to give pointers.

Obviously, the ideal of small sample learning is to become the third student, but the reality is likely to stay above the first student. Large models are easy to use, but "big" is by no means the ultimate goal of the model. Returning to the original intention of small sample learning to expect the model to have reasoning judgment and the ability to draw inferences similar to humans, then we can see that this paper is from a different perspective. It would be good to take a step forward, at least to make it easier for the student not to load so much potentially redundant knowledge in his head, but to pick up a textbook and travel lightly. Perhaps even allowing students to take open-book exams with the textbook for constant review , and it will be closer to intelligence than students memorizing by rote!

Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA

The above is the detailed content of Completely blasting GPT3 and Google PaLM! Retrieval enhanced model Atlas refreshes knowledge-based small sample tasks SOTA. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

Dreamweaver Mac version

Visual web development tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7638

CakePHP Tutorial

1391

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

150