


In natural language processing, a lot of information is actually repeated.
If the prompt words can be effectively compressed, it is equivalent to expanding the length of the context supported by the model to some extent.
Existing information entropy methods reduce this redundancy by removing certain words or phrases.
However, the calculation based on information entropy only covers the one-way context of the text and may ignore key information required for compression; moreover, the calculation method of information entropy is not fully consistent with the compression tips the actual purpose of the word.
To meet these challenges, researchers from Tsinghua University and Microsoft jointly proposed a new data processing process called LLMLingua-2. It aims to extract knowledge from large language models (LLM) and achieve information refinement by compressing prompt words while ensuring that key information is not lost.
The project has gained 3.1k stars on GitHub
The results show that LLMLingua-2 can The text length is significantly reduced to the original 20%, effectively reducing processing time and costs.
In addition, LLMLingua 2’s processing speed is increased by 3 to 6 times compared to the previous version of LLMLingua and other similar technologies.
Paper address: https://arxiv.org/abs/2403.12968
In this process , raw text is first fed into the model.
The model will evaluate the importance of each word and decide whether to retain or delete it, while also taking into account the relationship between words.
Finally, the model will select those words with the highest scores to form a shorter prompt word.
The team tested the LLMLingua-2 model on multiple datasets including MeetingBank, LongBench, ZeroScrolls, GSM8K and BBH.
Although this model is small, it achieves significant performance improvements in benchmark tests and demonstrates its performance on different large language models (from GPT-3.5 to Mistral- 7B) Excellent generalization ability across languages (from English to Chinese).
System prompt:
As an outstanding linguist, you are good at converting long Condensing paragraphs of text into brief expressions by removing unimportant words while retaining as much information as possible.
User Tips:
Please compress the given text into A short expression that allows you (GPT-4) to restore the original text as accurately as possible. Different from regular text compression, I need you to follow the following five conditions:
1. Only remove unimportant words.
2. Keep the order of the original words unchanged.
3. Keep the original vocabulary unchanged.
4. Do not use any abbreviations or emoticons.
5. Do not add any new words or symbols.
Please compress the original text as much as possible while retaining as much information as possible. If you understand, please compress the following text: {Text to be compressed}
The compressed text is: [...]
The results show that LLMLingua-2 significantly outperforms the original LLMLingua model and other selective context strategies in multiple language tasks such as question and answer, summary writing, and logical reasoning.
It is worth mentioning that this compression method is equally effective for different large language models (from GPT-3.5 to Mistral-7B) and different languages (from English to Chinese) .
Moreover, the deployment of LLMLingua-2 can be achieved with just two lines of code.
Currently, the model has been integrated into the widely used RAG frameworks LangChain and LlamaIndex.
Implementation method
In order to overcome the problems faced by existing information entropy-based text compression methods, LLMLingua-2 adopts an innovation data extraction strategy.
This strategy extracts essential information from large language models such as GPT-4, achieving efficient text editing without losing key content and avoiding adding erroneous information. compression.
Prompt Design
To fully utilize the text compression potential of GPT-4, the key lies in how to set the precise compression instructions.
That is, when compressing text, instruct GPT-4 to only remove those words that are not so important in the original text, while avoiding the introduction of any new words in the process.
The purpose of this is to ensure that the compressed text maintains the authenticity and integrity of the original text as much as possible.
##Annotation and filtering
The researchers used the GPT-4 Using the knowledge extracted from large language models, a novel data annotation algorithm was developed.
This algorithm can mark each word in the original text and clearly indicate which words must be retained during the compression process.
In order to ensure the high quality of the constructed data set, they also designed two quality monitoring mechanisms specifically to identify and exclude those data samples with poor quality.
Compressor
Finally, the researchers solved the problem of text compression It is transformed into a task of classifying each word (Token), and a powerful Transformer is used as the feature extractor.
This tool can understand the context of text to accurately capture the information critical for text compression.
By training on a carefully constructed data set, the researchers' model is able to calculate a probability value to decide whether a word should be retained based on its importance. In the final compressed text, it should still be discarded.
Performance Evaluation
The researchers tested the performance of LLMLingua-2 on a range of tasks, These tasks include context learning, text summarization, dialogue generation, multi- and single-document question answering, code generation, and synthesis tasks, including both in-domain and out-of-domain datasets.
Test results show that the researchers’ method reduces minimal performance loss while maintaining high performance, and performs outstandingly among task-unspecific text compression methods.
- In-domain test (MeetingBank)
The researchers compared the performance of LLMLingua-2 on the MeetingBank test set with Other powerful baseline methods are compared.
Although their model size is much smaller than the LLaMa-2-7B used in the baseline, the researchers' method not only significantly improved performance on question answering and text summarization tasks, but also was consistent with The original text prompts performed similarly.
##-Out-of-domain testing (LongBench, GSM8K and BBH)
Considering that the researchers’ model was only trained on MeetingBank’s meeting record data, the researchers further explored its generalization capabilities in different scenarios such as long text, logical reasoning, and contextual learning.
It is worth mentioning that although LLMLingua-2 was only trained on one dataset, in out-of-domain testing, its performance was not only comparable to the current state-of-the-art task-independent compression The methods are comparable, and in some cases even better.
Even the researchers’ smaller model (BERT-base size) was able to achieve comparable performance to the original hint, in some cases Down or even slightly above the original tip.
While the researchers’ approach achieved promising results, it still has shortcomings when compared with other task-aware compression methods, such as LongLLMlingua on Longbench.
The researchers attribute this performance gap to the extra information they get from the questions. However, the researchers' model is task-agnostic, making it an efficient option with good generalizability when deployed in different scenarios.
Table 4 above lists the results of different methods using Mistral-7Bv0.1 4 as the target LLM.
Compared with other baseline methods, the researchers' method has a significant improvement in performance, demonstrating its good generalization ability on the target LLM.
It is worth noting that LLMLingua-2 performs even better than the original prompt.
Researchers speculate that Mistral-7B may not be as good at managing long contexts as GPT-3.5-Turbo.
The researchers’ approach effectively improves Mistral7B’s final inference performance by providing short hints with higher information density.
Table 5 above shows the latency of different systems on the V100-32G GPU with different compression ratios.
The results show that compared with other compression methods, LLMLlingua2 has much less computational overhead and can achieve an end-to-end speed improvement of 1.6 times to 2.9 times.
In addition, the researchers' method can reduce GPU memory costs by 8 times, thus reducing the demand for hardware resources.
Context-Aware Observations The researchers observed that as the compression ratio increases, LLMLingua-2 can effectively maintain the most informative words with complete context.
This is thanks to the adoption of a bidirectional context-aware feature extractor and a strategy that is explicitly optimized towards the goal of timely compression.
The researchers observed that as the compression ratio increases, LLMLingua-2 can effectively maintain the most informative words related to the complete context. .
This is thanks to the adoption of a bidirectional context-aware feature extractor and a strategy that is explicitly optimized towards the goal of timely compression.
Finally the researchers had GPT-4 reconstruct the original tones from the LLMLlingua-2 compression prompts.
The results show that GPT-4 can effectively reconstruct the original tip, indicating that no essential information is lost during LLMLingua-2 compression.
The above is the detailed content of Tsinghua Microsoft open sourced a new prompt word compression tool, the length dropped by 80%! GitHub gets 3.1K stars. For more information, please follow other related articles on the PHP Chinese website!

微软宣布进一步扩展和 Meta 的 AI 合作伙伴关系,Meta 已选择 Azure 作为战略性云供应商,以帮助加速 AI 研发。在 2017 年,微软和 Meta(彼时还被称为 Facebook)共同发起了 ONNX(即 Open Neural Network Exchange),一个开放的深度学习开发工具生态系统,旨在让开发者能够在不同的 AI 框架之间移动深度学习模型。2018 年,微软宣布开源了 ONNX Runtime —— ONNX 格式模型的推理引擎。作为此次深化合作的一部分,Me

OTO 是业内首个自动化、一站式、用户友好且通用的神经网络训练与结构压缩框架。 在人工智能时代,如何部署和维护神经网络是产品化的关键问题考虑到节省运算成本,同时尽可能小地损失模型性能,压缩神经网络成为了 DNN 产品化的关键之一。DNN 压缩通常来说有三种方式,剪枝,知识蒸馏和量化。剪枝旨在识别并去除冗余结构,给 DNN 瘦身的同时尽可能地保持模型性能,是最为通用且有效的压缩方法。三种方法通常来讲可以相辅相成,共同作用来达到最佳的压缩效果。然而现存的剪枝方法大都只针对特定模型,特定任务,且需要很

ChatGPT在手,有问必答。你可知,与它每次对话的计算成本简直让人泪目。此前,分析师称ChatGPT回复一次,需要2美分。要知道,人工智能聊天机器人所需的算力背后烧的可是GPU。这恰恰让像英伟达这样的芯片公司豪赚了一把。2月23日,英伟达股价飙升,使其市值增加了700多亿美元,总市值超5800亿美元,大约是英特尔的5倍。在英伟达之外,AMD可以称得上是图形处理器行业的第二大厂商,市场份额约为20%。而英特尔持有不到1%的市场份额。ChatGPT在跑,英伟达在赚随着ChatGPT解锁潜在的应用案

随着OpenAI DALL-E和Midjourney的推出,AI艺术生成器开始变得越来越流行,它们接受文本提示并将其变成美丽的、通常是超现实的艺术品——如今,有两家大企业加入了这一行列。微软宣布,将通过Bing Image Creator把由DALL-E模型提供支持的AI图像生成功能引入Bing搜索引擎和Edge浏览器。创意软件开发商Adobe也透露,将通过名为Firefly的AI艺术生成产品来增强自己的工具。对于有权访问Bing聊天预览的用户来说,这一新的AI图像生成器已经可以在“创意”模式下

近日微软推出了Security Copilot,这款新工具旨在通过AI助手简化网络安全人员的工作,帮助他们应对安全威胁。 网络安全人员往往要管理很多工具,和来自多个来源的海量数据。近日微软宣布推出了Security Copilot,这款新工具旨在通过AI助手简化网络安全人员的工作,帮助他们应对安全威胁。Copilot利用基于OpenAI的GPT-4最新技术,让网络安全人员能够就当前影响环境的安全问题提问并获得答案,甚至可以直接整合公司内部的知识,为团队提供有用的信息,从现有信息中进行学习,将当前

自然语言处理(NLP)模型读不懂人话、将文本理解为相反的意思,是业界顽疾了。 现在微软表示,开发出解决此弊的方法。微软开发AdaTest方法来测试NLP模型 可作为跨越各种应用基础的大型模型,或称平台模型的进展已经大大改善了AI处理自然语言的能力。但自然语言处理(NLP)模型仍然远不完美,有时会以令人尴尬的方式暴露缺陷。 例如有个顶级的商用模型,将葡萄牙语中的「我不推荐这道菜」翻译成英语中的「我非常推荐这道菜」。 这些失败之所以继续存在,部分原因是寻找和修复NLP模型中的错误很难,以至于严重的

大家好,我是菜鸟哥!最近逛G网,发现微软开源了一个项目叫「playwright-python」,作为一个兴起项目。Playwright 是针对 Python 语言的纯自动化工具,它可以通过单个API自动执行 Chromium,Firefox 和 WebKit 浏览器,连代码都不用写,就能实现自动化功能。虽然测试工具 selenium 具有完备的文档,但是其学习成本让一众小白们望而却步,对比之下 playwright-python 简直是小白们的神器。Playwright真的适用于Python吗?

微软必应完善文字生成图像能力,Adobe 今日也发布 Firefly,杀入生成式 AI 这场游戏。 昨晚实在是有些热闹。一边英伟达 GTC 正在进行中,一边谷歌正式开放了 Bard 的测试,这里微软必应也不甘寂寞。今日,微软正式宣布,必应搜索引擎接入了 OpenAI 的 DALL·E 模型,增加了 AI 生成图像的功能。也就是说,在接入 ChatGPT 之后,必应再次强化,Bing Image Creator 能够让用户用 DALL·E 模型生成图像。「对于拥有必应预览版权限的用户,Bing I


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Dreamweaver Mac version
Visual web development tools

SublimeText3 Chinese version
Chinese version, very easy to use

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

SublimeText3 Linux new version
SublimeText3 Linux latest version
