search
HomeTechnology peripheralsAIGoogle's 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

To be said to be the most depressing company recently, Google must be one: its own Gemini 1.5 Just after it was released, it was stolen by OpenAI's Sora, which can be called the "Wang Feng" in the AI ​​industry.

Specifically, what Google launched this time is the first version of Gemini 1.5 for early testing-Gemini 1.5 Pro. It is a medium-sized multimodal model (across text, video, audio) with similar performance levels to Google’s largest model to date, 1.0 Ultra, and introduces groundbreaking experimental features in long-context understanding. It can stably handle up to 1 million tokens (equivalent to 1 hour of video, 11 hours of audio, more than 30,000 lines of code, or 700,000 words), with a limit of 10 million tokens (equivalent to the "Lord of the Rings" trilogy), Set a record for the longest context window.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Additionally, it can be learned with just a 500-page grammar book, 2,000 bilingual entries, and 400 additional parallel sentences The translation of a small language (there is no relevant information on the Internet) has reached a level close to human learners in terms of translation.

Many people who have used the Gemini 1.5 Pro consider this model to be underrated. Someone conducted an experiment and input the complete code base and related issues downloaded from Github into Gemini 1.5 Pro. The results were surprising: not only did it understand the entire code base, but it was also able to identify the most urgent issues and fix them. .

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

In another code-related test, Gemini 1.5 Pro demonstrated excellent search capabilities, being able to quickly find in the code base Most relevant example. In addition, it demonstrates strong understanding and is able to accurately find the code that controls animations and provide personalized code suggestions. Likewise, Gemini 1.5 Pro also demonstrated excellent cross-mode capabilities, being able to pinpoint demo content through screenshots and providing guidance for editing image code.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Such a model should attract everyone's attention. Moreover, it is worth noting that the ability of Gemini 1.5 Pro to handle ultra-long contexts has also made many researchers start to think, is the traditional RAG method still necessary?

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

An X netizen said that in a test he conducted, Gemini that supports ultra-long context The 1.5 Pro does indeed do what the RAG cannot.

RAG is going to be killed by the long context model?

"A model with a 10 million token context window makes most existing RAG frameworks unnecessary, that is, 10 million token context kills RAG ," Fu Yao, a doctoral student at the University of Edinburgh, wrote in a post evaluating Gemini 1.5 Pro.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?


Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?##

RAG is the abbreviation of "Retrieval-Augmented Generation", which can be translated into Chinese as "Retrieval Enhanced Generation". RAG typically consists of two stages: retrieving context-relevant information and using the retrieved knowledge to guide the generation process. For example, as an employee, you can directly ask the big model, "What are the penalties for lateness in our company?" Without reading the "Employee Handbook", the big model has no way to answer. However, with the help of the RAG method, we can first let a search model search for the most relevant answers in the "Employee Handbook", and then send your question and the relevant answers it found to the generation model, allowing the large model to generate Answer. This solves the problem that the context window of many previous large models was not large enough (for example, it could not accommodate the "Employee Handbook"), but RAGfangfa was lacking in capturing the subtle connections between contexts.

Fu Yao believes that if a model can directly process the contextual information of 10 million tokens, there is no need to go through additional retrieval steps to find and integrate relevant information. Users can put all the data they need directly into the model as context and then interact with the model as usual. "The large language model itself is already a very powerful searcher, so why bother to build a weak searcher and spend a lot of engineering energy on chunking, embedding, indexing, etc.?" he continued to write.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

#However, Fu Yao’s views have been refuted by many researchers. He said that many of the objections are reasonable, and he also systematically sorted out these opinions:

1. Cost issue: Critics point out that RAG is cheaper than the long context model. Fu Yao acknowledged this, but he compared the development history of different technologies, pointing out that although low-cost models (such as BERT-small or n-gram) are indeed cheap, in the history of AI development, the cost of advanced technologies will eventually decrease. His point is to pursue the performance of smart models first, and then reduce costs through technological advancement, because it is much easier to make smart models cheap than to make cheap models smart.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

#2. Integration of retrieval and reasoning: Fu Yao emphasized that the long context model can mix retrieval and reasoning during the entire decoding process, while RAG only Retrieve at start. The long context model can be retrieved at each layer and each token, which means that the model can dynamically determine the information to be retrieved based on the results of preliminary inference, achieving closer integration of retrieval and inference.

3. Number of supported tokens: Although the number of tokens supported by RAG has reached trillions, the long context model currently supports millions. Fu Yao believes that in natural distribution Among the input documents, most of the cases that need to be retrieved are below the million level. He cited legal document analysis and learning machine learning as examples, and believed that the input volume in these cases would not exceed millions.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

#4. Caching mechanism: Regarding the problem that the long context model requires re-entering the entire document, Fu Yao pointed out that there is a so-called KV (key value) cache mechanism, complex cache and memory hierarchies can be designed so that input only needs to be read once, and subsequent queries can reuse the KV cache. He also mentioned that although KV caches can be large, he is optimistic that efficient KV cache compression algorithms will emerge in the future.

5. The need to call search engines: He admitted that in the short term, calling search engines for retrieval is still necessary. However, he proposed a bold idea, which is to let the language model directly access the entire Google search index to absorb all the information, which reflects the great imagination of the future potential of AI technology.

#6. Performance issues: Fu Yao admitted that the current Gemini 1.5 is slow when processing 1M context, but he is optimistic about speed up and believes that the speed of long context model will be greatly improved in the future. Improvement, may eventually reach speeds comparable to RAG.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

In addition to Fu Yao, many other researchers have also expressed their views on the prospects of RAG on the X platform, such as AI blogger @elvis.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Overall, he does not think that the long context model can replace RAG. The reasons include:

1. Specific data types Challenge: @elvis presented a scenario where the data has a complex structure, changes regularly, and has a significant time dimension (e.g. code edits/changes and web logs). This type of data may be connected to historical data points, and possibly more data points in the future. @elvis believes that today's long context language models alone cannot handle use cases that rely on such data because the data may be too complex for LLM and the current maximum context window is not feasible for such data. When dealing with this kind of data, you may end up needing some clever retrieval mechanism.

2. Processing of dynamic information: Today’s long context LLM performs well in processing static information (such as books, video recordings, PDFs, etc.), but it is difficult to process highly dynamic information. and knowledge have not yet been tested in practice. @elvis believes that while we will make progress towards solving some challenges (such as "lost in the middle") and dealing with more complex structured and dynamic data, we still have a long way to go.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

3. @elvis proposed that in order to solve these types of problems, RAG and long context LLM can be combined to build a powerful system. Retrieve and analyze critical historical information effectively and efficiently. He stressed that even this may not be enough in many cases. Especially because large amounts of data can change rapidly, AI-based agents add even more complexity. @elvis thinks that for complex use cases it will most likely be a combination of these ideas rather than a general purpose or long context LLM replacing everything.

4. Demand for different types of LLM: @elvis pointed out that not all data is static, and a lot of data is dynamic. When considering these applications, keep in mind the three Vs of big data: velocity, volume, and variety. @elvis learned this lesson through experience working at a search company. He believes that different types of LLMs will help solve different types of problems, and we need to move away from the idea that one LLM will rule them all.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

@elvis ended by quoting Oriol Vinyals (Vice President of Research at Google DeepMind), pointing out that even now we can handle 1 million or more tokens context, the era of RAG is far from over. RAG actually has some very nice features. Not only can these properties be enhanced by long context models, but long context models can also be enhanced by RAG. RAG allows us to find relevant information, but the way the model accesses this information may become too restricted due to data compression. The long context model can help bridge this gap, somewhat similar to how L1/L2 cache and main memory work together in modern CPUs. In this collaborative model, cache and main memory each play different roles but complement each other, thereby increasing processing speed and efficiency. Similarly, the combined use of RAG and long context can achieve more flexible and efficient information retrieval and generation, making full use of their respective advantages to handle complex data and tasks.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

It seems that "whether the era of RAG is coming to an end" has not yet been determined. But many people say that the Gemini 1.5 Pro is really underrated as an extra-long context window model. @elvis also gave his test results.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

##Gemini 1.5 Pro preliminary evaluation report

Long document analysis capabilities

To demonstrate Gemini 1.5 Pro's ability to process and analyze documents, @elvis started with a very basic question-answering task. He uploaded a PDF file and asked a simple question: What is this paper about?

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

The model's response is accurate and concise as it provides an acceptable summary of the Galactica paper. The example above uses free-form prompts in Google AI Studio, but you can also use chat format to interact with uploaded PDFs. This is a very useful feature if you have a lot of questions that you would like answered from the documentation provided.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

To take full advantage of the long context window, @elvis next uploaded two PDFs for testing and asked a question spanning both PDFs .

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

The response given by Gemini 1.5 Pro is reasonable. Interestingly, the information extracted from the first paper (a review paper on LLM) comes from a table. The "architecture" information also looks correct. However, the "Performance" part does not belong here because it was not included in the first paper. In this task, it is important to put the prompt "Please list the facts mentioned in the first paper about the large language model introduced in the second paper" at the top and label the paper, such as "Paper 1" and "Paper 2". Another related follow-up task to this lab is to write a related work by uploading a set of papers and instructions on how to summarize them. Another interesting task asked the model to include newer LLM papers in a review.

Video Understanding

##Gemini 1.5 Pro is trained on multi-modal data from the start. @elvis tested some prompts using Andrej Karpathy’s recent LLM lecture video:

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

The second task he asked the model to complete was to provide a concise A brief outline of the lecture (one page in length). The answer is as follows (edited for brevity):

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Gemini 1.5 Pro The summary given by Gemini 1.5 Pro is very concise and gives a good summary of the lecture content and Points.

When specific details are important, be aware that models may sometimes "hallucinate" or retrieve incorrect information for various reasons. For example, when the model is asked the following question: "What are the FLOPs reported for Llama 2 in the lecture?", its answer is "The lecture reports that training Llama 2 70B requires approximately 1 trillion FLOPs", which is inaccurate . The correct answer should be "~1e24 FLOPs". The technical report contains numerous examples of where these long-context models stumble when asked specific questions about videos.

The next task is to extract table information from the video. Test results show that the model is able to generate tables with some details correct and some incorrect. For example, the columns of the table are correct, but the label for one of the rows is wrong (i.e. Concept Resolution should be Coref Resolution). Testers tested some of these extraction tasks with other tables and other different elements (such as text boxes) and found similar inconsistencies.

One interesting example documented in the technical report is the model’s ability to retrieve details from videos based on specific scenes or timestamps. In the first example, the tester asks the model where a certain part begins. The model answered correctly.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

In the next example, he asked the model to explain a diagram on a slide. The model seems to make good use of the information provided to explain the results in the graph.

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

Here are snapshots of the corresponding slides:

Googles 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?

@elvis said that he has begun the second round of testing, and interested students can go to the X platform to watch.

The above is the detailed content of Google's 10M context window is killing RAG? Is Gemini underrated after being stolen away from the limelight by Sora?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Are You At Risk Of AI Agency Decay? Take The Test To Find OutAre You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaHow to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIRevisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkUnderstanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedThe Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniInsights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphA Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment