Home  >  Article  >  Technology peripherals  >  Google's 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Google's 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

王林
王林forward
2024-02-27 16:22:29838browse

To be said to be the most depressing company recently, Google must be one: its own Gemini 1.5 Just after it was released, it was stolen by OpenAI's Sora, which can be called the "Wang Feng" in the AI ​​industry.

Specifically, what Google launched this time is the first version of Gemini 1.5 for early testing-Gemini 1.5 Pro. It is a medium-sized multimodal model (across text, video, audio) with similar performance levels to Google’s largest model to date, 1.0 Ultra, and introduces groundbreaking experimental features in long-context understanding. It can stably handle up to 1 million tokens (equivalent to 1 hour of video, 11 hours of audio, more than 30,000 lines of code, or 700,000 words), with a limit of 10 million tokens (equivalent to the "Lord of the Rings" trilogy), Set a record for the longest context window.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Additionally, it can be learned with just a 500-page grammar book, 2,000 bilingual entries, and 400 additional parallel sentences The translation of a small language (there is no relevant information on the Internet) has reached a level close to human learners in terms of translation.

Many people who have used the Gemini 1.5 Pro consider this model to be underrated. Someone conducted an experiment and input the complete code base and related issues downloaded from Github into Gemini 1.5 Pro. The results were surprising: not only did it understand the entire code base, but it was also able to identify the most urgent issues and fix them. .

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

In another code-related test, Gemini 1.5 Pro demonstrated excellent search capabilities, being able to quickly find in the code base Most relevant example. In addition, it demonstrates strong understanding and is able to accurately find the code that controls animations and provide personalized code suggestions. Likewise, Gemini 1.5 Pro also demonstrated excellent cross-mode capabilities, being able to pinpoint demo content through screenshots and providing guidance for editing image code.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Such a model should attract everyone's attention. Moreover, it is worth noting that the ability of Gemini 1.5 Pro to handle ultra-long contexts has also made many researchers start to think, is the traditional RAG method still necessary?

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

An X netizen said that in a test he conducted, Gemini that supports ultra-long context The 1.5 Pro does indeed do what the RAG cannot.

RAG is going to be killed by the long context model?

"A model with a 10 million token context window makes most existing RAG frameworks unnecessary, that is, 10 million token context kills RAG ," Fu Yao, a doctoral student at the University of Edinburgh, wrote in a post evaluating Gemini 1.5 Pro.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?


Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?##

RAG is the abbreviation of "Retrieval-Augmented Generation", which can be translated into Chinese as "Retrieval Enhanced Generation". RAG typically consists of two stages: retrieving context-relevant information and using the retrieved knowledge to guide the generation process. For example, as an employee, you can directly ask the big model, "What are the penalties for lateness in our company?" Without reading the "Employee Handbook", the big model has no way to answer. However, with the help of the RAG method, we can first let a search model search for the most relevant answers in the "Employee Handbook", and then send your question and the relevant answers it found to the generation model, allowing the large model to generate Answer. This solves the problem that the context window of many previous large models was not large enough (for example, it could not accommodate the "Employee Handbook"), but RAGfangfa was lacking in capturing the subtle connections between contexts.

Fu Yao believes that if a model can directly process the contextual information of 10 million tokens, there is no need to go through additional retrieval steps to find and integrate relevant information. Users can put all the data they need directly into the model as context and then interact with the model as usual. "The large language model itself is already a very powerful searcher, so why bother to build a weak searcher and spend a lot of engineering energy on chunking, embedding, indexing, etc.?" he continued to write.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

#However, Fu Yao’s views have been refuted by many researchers. He said that many of the objections are reasonable, and he also systematically sorted out these opinions:

1. Cost issue: Critics point out that RAG is cheaper than the long context model. Fu Yao acknowledged this, but he compared the development history of different technologies, pointing out that although low-cost models (such as BERT-small or n-gram) are indeed cheap, in the history of AI development, the cost of advanced technologies will eventually decrease. His point is to pursue the performance of smart models first, and then reduce costs through technological advancement, because it is much easier to make smart models cheap than to make cheap models smart.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

#2. Integration of retrieval and reasoning: Fu Yao emphasized that the long context model can mix retrieval and reasoning during the entire decoding process, while RAG only Retrieve at start. The long context model can be retrieved at each layer and each token, which means that the model can dynamically determine the information to be retrieved based on the results of preliminary inference, achieving closer integration of retrieval and inference.

3. Number of supported tokens: Although the number of tokens supported by RAG has reached trillions, the long context model currently supports millions. Fu Yao believes that in natural distribution Among the input documents, most of the cases that need to be retrieved are below the million level. He cited legal document analysis and learning machine learning as examples, and believed that the input volume in these cases would not exceed millions.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

#4. Caching mechanism: Regarding the problem that the long context model requires re-entering the entire document, Fu Yao pointed out that there is a so-called KV (key value) cache mechanism, complex cache and memory hierarchies can be designed so that input only needs to be read once, and subsequent queries can reuse the KV cache. He also mentioned that although KV caches can be large, he is optimistic that efficient KV cache compression algorithms will emerge in the future.

5. The need to call search engines: He admitted that in the short term, calling search engines for retrieval is still necessary. However, he proposed a bold idea, which is to let the language model directly access the entire Google search index to absorb all the information, which reflects the great imagination of the future potential of AI technology.

#6. Performance issues: Fu Yao admitted that the current Gemini 1.5 is slow when processing 1M context, but he is optimistic about speed up and believes that the speed of long context model will be greatly improved in the future. Improvement, may eventually reach speeds comparable to RAG.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

In addition to Fu Yao, many other researchers have also expressed their views on the prospects of RAG on the X platform, such as AI blogger @elvis.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Overall, he does not think that the long context model can replace RAG. The reasons include:

1. Specific data types Challenge: @elvis presented a scenario where the data has a complex structure, changes regularly, and has a significant time dimension (e.g. code edits/changes and web logs). This type of data may be connected to historical data points, and possibly more data points in the future. @elvis believes that today's long context language models alone cannot handle use cases that rely on such data because the data may be too complex for LLM and the current maximum context window is not feasible for such data. When dealing with this kind of data, you may end up needing some clever retrieval mechanism.

2. Processing of dynamic information: Today’s long context LLM performs well in processing static information (such as books, video recordings, PDFs, etc.), but it is difficult to process highly dynamic information. and knowledge have not yet been tested in practice. @elvis believes that while we will make progress towards solving some challenges (such as "lost in the middle") and dealing with more complex structured and dynamic data, we still have a long way to go.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

3. @elvis proposed that in order to solve these types of problems, RAG and long context LLM can be combined to build a powerful system. Retrieve and analyze critical historical information effectively and efficiently. He stressed that even this may not be enough in many cases. Especially because large amounts of data can change rapidly, AI-based agents add even more complexity. @elvis thinks that for complex use cases it will most likely be a combination of these ideas rather than a general purpose or long context LLM replacing everything.

4. Demand for different types of LLM: @elvis pointed out that not all data is static, and a lot of data is dynamic. When considering these applications, keep in mind the three Vs of big data: velocity, volume, and variety. @elvis learned this lesson through experience working at a search company. He believes that different types of LLMs will help solve different types of problems, and we need to move away from the idea that one LLM will rule them all.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

@elvis ended by quoting Oriol Vinyals (Vice President of Research at Google DeepMind), pointing out that even now we can handle 1 million or more tokens context, the era of RAG is far from over. RAG actually has some very nice features. Not only can these properties be enhanced by long context models, but long context models can also be enhanced by RAG. RAG allows us to find relevant information, but the way the model accesses this information may become too restricted due to data compression. The long context model can help bridge this gap, somewhat similar to how L1/L2 cache and main memory work together in modern CPUs. In this collaborative model, cache and main memory each play different roles but complement each other, thereby increasing processing speed and efficiency. Similarly, the combined use of RAG and long context can achieve more flexible and efficient information retrieval and generation, making full use of their respective advantages to handle complex data and tasks.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

It seems that "whether the era of RAG is coming to an end" has not yet been determined. But many people say that the Gemini 1.5 Pro is really underrated as an extra-long context window model. @elvis also gave his test results.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

##Gemini 1.5 Pro preliminary evaluation report

Long document analysis capabilities

To demonstrate Gemini 1.5 Pro's ability to process and analyze documents, @elvis started with a very basic question-answering task. He uploaded a PDF file and asked a simple question: What is this paper about?

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

The model's response is accurate and concise as it provides an acceptable summary of the Galactica paper. The example above uses free-form prompts in Google AI Studio, but you can also use chat format to interact with uploaded PDFs. This is a very useful feature if you have a lot of questions that you would like answered from the documentation provided.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

To take full advantage of the long context window, @elvis next uploaded two PDFs for testing and asked a question spanning both PDFs .

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

The response given by Gemini 1.5 Pro is reasonable. Interestingly, the information extracted from the first paper (a review paper on LLM) comes from a table. The "architecture" information also looks correct. However, the "Performance" part does not belong here because it was not included in the first paper. In this task, it is important to put the prompt "Please list the facts mentioned in the first paper about the large language model introduced in the second paper" at the top and label the paper, such as "Paper 1" and "Paper 2". Another related follow-up task to this lab is to write a related work by uploading a set of papers and instructions on how to summarize them. Another interesting task asked the model to include newer LLM papers in a review.

Video Understanding

##Gemini 1.5 Pro is trained on multi-modal data from the start. @elvis tested some prompts using Andrej Karpathy’s recent LLM lecture video:

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

The second task he asked the model to complete was to provide a concise A brief outline of the lecture (one page in length). The answer is as follows (edited for brevity):

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Gemini 1.5 Pro The summary given by Gemini 1.5 Pro is very concise and gives a good summary of the lecture content and Points.

When specific details are important, be aware that models may sometimes "hallucinate" or retrieve incorrect information for various reasons. For example, when the model is asked the following question: "What are the FLOPs reported for Llama 2 in the lecture?", its answer is "The lecture reports that training Llama 2 70B requires approximately 1 trillion FLOPs", which is inaccurate . The correct answer should be "~1e24 FLOPs". The technical report contains numerous examples of where these long-context models stumble when asked specific questions about videos.

The next task is to extract table information from the video. Test results show that the model is able to generate tables with some details correct and some incorrect. For example, the columns of the table are correct, but the label for one of the rows is wrong (i.e. Concept Resolution should be Coref Resolution). Testers tested some of these extraction tasks with other tables and other different elements (such as text boxes) and found similar inconsistencies.

One interesting example documented in the technical report is the model’s ability to retrieve details from videos based on specific scenes or timestamps. In the first example, the tester asks the model where a certain part begins. The model answered correctly.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

In the next example, he asked the model to explain a diagram on a slide. The model seems to make good use of the information provided to explain the results in the graph.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Here are snapshots of the corresponding slides:

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

@elvis said that he has begun the second round of testing, and interested students can go to the X platform to watch.

The above is the detailed content of Google's 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete