Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.-AI-php.cn

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

王林

Oct 24, 2023 pm 08:13 PM

aitrain

A large model with only a 4k window length can still read large sections of text!

A latest achievement by a Chinese doctoral student at Princeton has successfully "broken through" the limit of the window length of large models.

Not only can it answer various questions, but the entire implementation process can be completed entirely by prompt, without any additional training .

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

The research team created a tree memory strategy called MemWalker that can break through the window length limit of the model itself.

During the test, the longest text read by the model contained 12,000 tokens, and the results were significantly improved compared to LongChat.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

Compared to the similar TreeIndex, MemWalker can reason and answer any question instead of just making generalizations.

The research and development of MemWalker utilized the idea of "divide and conquer". Some netizens commented:

Every time we make the thinking process of large models more like humans, their performance will improve. The better

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

#So, what exactly is the tree memory strategy, and how does it read long text with a limited window length?

One window is not enough, just open a few more

On the model, MemWalker uses Stable Beluga 2 as the basic model, which is obtained by Llama 2-70B through command tuning.

Before selecting this model, the developers compared its performance with the original Llama 2 and finally decided on the choice.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

Just like the name MemWalker, its working process is like a memory stream walking.

Specifically, it is roughly divided into two stages: Memory tree construction and Navigation retrieval.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

When building the memory tree, the long text will be divided into multiple small segments (seg1-6) , and the large model will do the processing for each segment separately. Out the summary, get "leaf nodes"（leaf nodes,summ1-6）.

When segmenting, the longer each segment is, the fewer the levels will be, which is beneficial to subsequent retrieval. However, if it is too long, it will lead to a decrease in accuracy, so comprehensive considerations are needed to determine the length of each segment.

The author believes that the reasonable length of each paragraph is 500-2000 tokens, and the one used in the experiment is 1000 tokens.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

Then, the model recursively summarizes the contents of these leaf nodes again to form "non-leaf nodes"(non-leaf nodes,summ7-8).

Another difference between the two is that leaf nodes contain original information, while non-leaf nodes only have summarized secondary information.

Functionally, non-leaf nodes are used to navigate and locate the leaf nodes where the answer is located, while leaf nodes are used to reason about the answer.

The non-leaf nodes can have multiple levels, and the model is gradually summarized until the "root node" is obtained to form a complete tree structure.

After the memory tree is established, you can enter the navigation retrieval stage to generate answers.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

In this process, the model starts from the root node, reads the contents of the -level child nodes one by one, and then infers that this node should be entered Or return.

After deciding to enter this node, repeat the process again until the leaf node is read. If the content of the leaf node is suitable, the answer is generated, otherwise it is returned.

In order to ensure the completeness of the answer, the end condition of this process is not that a suitable leaf node is found, but that the model believes that a complete answer is obtained, or the maximum number of steps is reached.

During the navigation process, if the model finds that it has entered the wrong path, it can also navigate back.

In addition, MemWalker also introduces a working memory mechanism to improve accuracy.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

#This mechanism will add the visited node content to the context of the current content.

When the model enters a new node, the current node content will be added to the memory.

This mechanism allows the model to utilize the content of visited nodes at each step to avoid the loss of important information.

Experimental results show that the working memory mechanism can increase the accuracy of MemWalker by about 10%.

Moreover, the process mentioned above can be completed only by relying on prompt, and no additional training is required.

Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.

Theoretically, MemWalker can read infinitely long text as long as it has enough computing power.

However, the time and space complexity when constructing the memory tree becomes exponential as the length of the text increases.

About the author

The first author of the paper is Howard Chen, a Chinese doctoral student in the NLP Laboratory of Princeton University.

Tsinghua Yao Class alumnus Chen Danqi is Howard’s mentor, and her academic report on ACL this year was also related to search.

This result was completed by Howard during his internship at Meta. Three scholars, Ramakanth Pasunuru, Jason Weston and Asli Celikyilmaz from the Meta AI Laboratory also participated in this project.

Paper address: https://arxiv.org/abs/2310.05029

The above is the detailed content of Long texts can be read with a 4k window length. Chen Danqi and his disciples teamed up with Meta to launch a new method to enhance the memory of large models.. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Tool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7501

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers