search
HomeTechnology peripheralsAIPromptPG: When reinforcement learning meets large-scale language models

Mathematical reasoning is a core ability of human intelligence, but abstract thinking and logical reasoning are still a big challenge for machines. Large-scale pre-trained language models, such as GPT-3 and GPT-4, have made significant progress in text-based mathematical reasoning (such as mathematical word problems). However, it is currently unclear whether these models can handle more complex problems involving heterogeneous information such as tabular data. To fill this gap, researchers from UCLA and the Allen Institute for Artificial Intelligence (AI2) launched Tabular Math Word Problems (TabMWP), a dataset of 38,431 open-domain problems that require both text and Perform mathematical reasoning on tabular data to get the correct answer. Each question in TabMWP is associated with a context that contains an image, text, or table in a structured format.

Researchers evaluated different pre-trained models on TabMWP including Few-shot GPT-3. As existing research has found, Few-shot GPT-3 relies heavily on the selection of in-context examples, which results in its performance being quite unstable when examples are randomly selected. This instability is even more severe when dealing with complex inference problems like TabMWP. In order to solve this problem, the author proposed the PromptPG method, which converts the selection of examples into the contextual bandit problem in reinforcement learning, and uses Policy Gradient to train a policy network to learn to select the optimal in from a small amount of training data. -context example. Experimental results show that their proposed PromptPG method exceeds the optimal baseline (Few-shot CoT GPT-3) by 5.31% in answering questions, and their method significantly reduces the problem compared to randomly selected in-context examples. The variance of predictions improves the stability of this type of method.

PromptPG: When reinforcement learning meets large-scale language models


  • Paper link: https://arxiv.org/abs/2209.14610
  • Code link: https://github.com/lupantech/PromptPG
  • ## Project homepage: https://promptpg.github.io
  • Data visualization: https://promptpg.github.io/explore
1. TabMWP data set

The following are two examples from the TabMWP data set. One is a free-text question with numerical answers, and the other is a multi-choice question with text answers. As you can see, each question provides a solution that includes step-by-step reasoning. To solve problems in TabMWP, the system must be capable of both table lookup and multi-step mathematical reasoning. Take the example in the picture below, to answer "how much will she spend (if Tracy buys three kinds of breads)", we need to first find the corresponding prices of the three kinds of bread in the table, and then calculate the cost of buying each kind of bread. costs and sum them to get the final cost.

PromptPG: When reinforcement learning meets large-scale language models

As shown in the statistics in the table below, the TabMWP data set contains 38,431 tabular math problems. 74.7% of the questions were free-text questions and 25.3% were multiple-choice questions. TabMWP has a total of 28,876 unique questions, 6,153 unique answers, and 35,442 unique solutions, indicating its rich diversity in question distribution. The average length of the questions was 22.1 words and the average length of the answers was 49.5 words, indicating the lexical richness of TabMWP. A distinguishing feature of TabMWP is that each problem is accompanied by a table context, without which the problem cannot be solved. TabMWP has a total of 37,644 different tables, with an average table size of 5.9 rows and 2.2 columns, 12.9 cells, and a maximum of 54 cells. These statistics show that the tables in TabMWP are also rich in diversity.

PromptPG: When reinforcement learning meets large-scale language models

The TabMWP dataset has two different question types and five different answer types:

PromptPG: When reinforcement learning meets large-scale language models

Every question in TabMWP has a tabular context, which is represented in three formats: image, semi-structured text and structured. This opens the possibility to develop different types of inference models.

PromptPG: When reinforcement learning meets large-scale language models

Compared with existing data sets, TabMWP requires both table understanding and mathematical reasoning abilities to answer questions. In addition, TabMWP has a detailed multi-step reasoning process for each question, which has obvious advantages in data set size, table type, question type and answer type. To the best of the knowledge of this paper, TabMWP is the first mathematical reasoning dataset in the open-domain tabular scenario.

PromptPG: When reinforcement learning meets large-scale language models

2. PromptPG method

Considering the achievements of large-scale pre-trained models such as GPT-3 in solving mathematical application problems Successfully, the authors first established a benchmark on TabMWP using few-shot GPT-3. They randomly select some contextual examples from the training set as well as test examples to form prompts that prompt GPT-3 to predict answers. However, recent research shows that this kind of few-shot learning based on random selection may perform very unstable on different contextual example selections. Random selection may be even less effective when dealing with complex inference problems like TabMWP, which involve tables of different types and formats.

In order to solve this problem, the author proposed an improved method: Prompt learning through Policy Gradient, learning to select contextual examples from a small amount of training data, called for PromptPG. As shown in Figure 2, the policy network learns to find the best in-context example from the candidate pool (candidate examples), and its optimization goal is to maximize the prediction of a given training example (training example) when interacting with the GPT-3 environment award. The policy network for selecting examples is a BERT language model based on fixed parameters and a single-layer neural network with learnable parameters. After completing optimization learning, PromptPG can dynamically select different optimal examples from candidate examples for different test questions, thereby maximizing the inference performance of GPT-3.

PromptPG: When reinforcement learning meets large-scale language models

The following is the learning algorithm of PromptPG.

PromptPG: When reinforcement learning meets large-scale language models

3. Experiment and analysis

PromptPG: When reinforcement learning meets large-scale language models

Pre-training and fine-tuning

Table 3 compares the results of PromptPG and different benchmarks on the TabMWP data set. It can be seen that TAPEX performs better than UnifiedQA due to pre-training on tabular data with similar parameter amounts. For both TAPEX and UnifiedQA, increasing the number of parameters in the model can improve the accuracy of predictions. In addition, fine-tuning the model on TabMWP can also greatly improve the accuracy of predictions.

Large-scale language model

GPT-3 without any fine-tuning (Zero-shot GPT- 3), it can achieve accuracy similar to the fine-tuned UnifiedQA and TAPEX models. If the Few-shot GPT-3 model randomly selects two in-context examples as GPT-3 hints, it can further improve by 0.17% compared to Zero-shot GPT-3. By having Few-shot GPT-3 generate multiple intermediate steps before generating the final answer (Few-shot-CoT GPT-3), the researchers were able to obtain an optimal baseline model with an accuracy of 62.92%.

PromptPG

Different from randomly selecting in-context examples, the PromptPG proposed in this article trains a policy network through Policy Gradient to select more appropriate in-context examples, and achieved the highest prediction result (68.23%) on TabMWP. Its average prediction accuracy exceeds the best baseline model (Few-shot-CoT GPT-3) by 5.31%. Notably, PromptPG demonstrates its superiority in prediction accuracy for almost all question types, answer types, and question difficulties. Despite this, PromptPG still has a lot of room for improvement from the human performance of 90.22%.

Ablation experiment

PromptPG: When reinforcement learning meets large-scale language models

Table 4 shows that all input elements of TabMWP (question text, form information, option information) are all critical to answering the question correctly. Only with all problem elements as input information, Zero-shot GPT-3 achieved its relatively highest average prediction accuracy (59.50%).

Different sample selection

PromptPG: When reinforcement learning meets large-scale language models

As a comparative experiment, the researchers also Other methods with different sample selections were compared. As shown in Table 5, choosing the same question type or answer type as the test question can help the model find more relevant examples and improve the accuracy of the answer. Choosing the most complex examples does not consistently improve answer accuracy. Fixed selection of the two best examples among the candidate examples can slightly improve accuracy and reduce variance. Selecting the example that is semantically closest to the test problem achieves the closest accuracy to the PromptPG method. Overall, PromptPG fully demonstrated its advantages in improving prediction accuracy and reducing prediction variance.

The following figure shows an example of PromptPG selection and the final prediction result. It can be seen that the PromptPG method can improve the inference performance of Few-shot GPT-3 by selecting examples with similar mathematical abilities to the test questions.

PromptPG: When reinforcement learning meets large-scale language models

Example of successful prediction

The following shows PromptPG for a free Correct answers to text questions. This question requires adding and dividing eight numbers in a table to find the average.

PromptPG: When reinforcement learning meets large-scale language models

In the following example, the model is asked to understand a tax report and calculate the salary after tax deductions.

PromptPG: When reinforcement learning meets large-scale language models

The following shows PromptPG’s correct predictions for multiple-choice questions. The given table has a total of 9 rows and 6 columns. The model successfully locates the target cell in the table and performs multi-step inference to predict the correct answer.

PromptPG: When reinforcement learning meets large-scale language models

In the following example, the model needs to compare the budget and total costs to verify whether Ariana has enough money.

PromptPG: When reinforcement learning meets large-scale language models

Example of prediction failure

The following shows PromptPG for free text Misprediction of the problem. The model retrieved the wrong price for rose quartz, thereby miscalculating the total cost of the three items.

PromptPG: When reinforcement learning meets large-scale language models

In the following example, the question provides an abstract stem-and-leaf table. The model was unable to understand this domain-specific table and lacked advanced logical reasoning capabilities to get the wrong answers.

PromptPG: When reinforcement learning meets large-scale language models

#The following examples show that existing models do not seem to have the ability to sort numbers.

PromptPG: When reinforcement learning meets large-scale language models

In the following example, the time exactly consistent with the current time mentioned in the question does not appear in the table, so the model cannot accurately locate the next time. Departure time for one stop.

PromptPG: When reinforcement learning meets large-scale language models

#In the following example, it is difficult for the model to accurately complete arithmetic operations on a long series of numbers.

PromptPG: When reinforcement learning meets large-scale language models

#4. Conclusion and outlook

The author proposed TabMWP, which is the first mathematical problem solving in tabular context. large-scale data sets. TabMWP contains 38,431 open-domain questions, including two question types and five answer types, with each question marked with a multi-step solution process. The authors used state-of-the-art QA and TableQA methods, conducted comprehensive experiments on TabMWP in pre-trained and fine-tuned settings, and evaluated using the large pre-trained language model GPT-3. The author further proposes a new reinforcement learning method, PromptPG, which uses Policy Gradient learning to select optimal instances from the training data for prompting the GPT-3 model. Experimental results show that PromptPG significantly outperforms existing baselines and reduces performance instability in predictions compared to random selection.

The above is the detailed content of PromptPG: When reinforcement learning meets large-scale language models. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
从VAE到扩散模型:一文解读以文生图新范式从VAE到扩散模型:一文解读以文生图新范式Apr 08, 2023 pm 08:41 PM

1 前言在发布DALL·E的15个月后,OpenAI在今年春天带了续作DALL·E 2,以其更加惊艳的效果和丰富的可玩性迅速占领了各大AI社区的头条。近年来,随着生成对抗网络(GAN)、变分自编码器(VAE)、扩散模型(Diffusion models)的出现,深度学习已向世人展现其强大的图像生成能力;加上GPT-3、BERT等NLP模型的成功,人类正逐步打破文本和图像的信息界限。在DALL·E 2中,只需输入简单的文本(prompt),它就可以生成多张1024*1024的高清图像。这些图像甚至

普林斯顿陈丹琦:如何让「大模型」变小普林斯顿陈丹琦:如何让「大模型」变小Apr 08, 2023 pm 04:01 PM

“Making large models smaller”这是很多语言模型研究人员的学术追求,针对大模型昂贵的环境和训练成本,陈丹琦在智源大会青源学术年会上做了题为“Making large models smaller”的特邀报告。报告中重点提及了基于记忆增强的TRIME算法和基于粗细粒度联合剪枝和逐层蒸馏的CofiPruning算法。前者能够在不改变模型结构的基础上兼顾语言模型困惑度和检索速度方面的优势;而后者可以在保证下游任务准确度的同时实现更快的处理速度,具有更小的模型结构。陈丹琦 普

找不到中文语音预训练模型?中文版 Wav2vec 2.0和HuBERT来了找不到中文语音预训练模型?中文版 Wav2vec 2.0和HuBERT来了Apr 08, 2023 pm 06:21 PM

Wav2vec 2.0 [1],HuBERT [2] 和 WavLM [3] 等语音预训练模型,通过在多达上万小时的无标注语音数据(如 Libri-light )上的自监督学习,显著提升了自动语音识别(Automatic Speech Recognition, ASR),语音合成(Text-to-speech, TTS)和语音转换(Voice Conversation,VC)等语音下游任务的性能。然而这些模型都没有公开的中文版本,不便于应用在中文语音研究场景。 WenetSpeech [4] 是

解锁CNN和Transformer正确结合方法,字节跳动提出有效的下一代视觉Transformer解锁CNN和Transformer正确结合方法,字节跳动提出有效的下一代视觉TransformerApr 09, 2023 pm 02:01 PM

由于复杂的注意力机制和模型设计,大多数现有的视觉 Transformer(ViT)在现实的工业部署场景中不能像卷积神经网络(CNN)那样高效地执行。这就带来了一个问题:视觉神经网络能否像 CNN 一样快速推断并像 ViT 一样强大?近期一些工作试图设计 CNN-Transformer 混合架构来解决这个问题,但这些工作的整体性能远不能令人满意。基于此,来自字节跳动的研究者提出了一种能在现实工业场景中有效部署的下一代视觉 Transformer——Next-ViT。从延迟 / 准确性权衡的角度看,

Stable Diffusion XL 现已推出—有什么新功能,你知道吗?Stable Diffusion XL 现已推出—有什么新功能,你知道吗?Apr 07, 2023 pm 11:21 PM

3月27号,Stability AI的创始人兼首席执行官Emad Mostaque在一条推文中宣布,Stable Diffusion XL 现已可用于公开测试。以下是一些事项:“XL”不是这个新的AI模型的官方名称。一旦发布稳定性AI公司的官方公告,名称将会更改。与先前版本相比,图像质量有所提高与先前版本相比,图像生成速度大大加快。示例图像让我们看看新旧AI模型在结果上的差异。Prompt: Luxury sports car with aerodynamic curves, shot in a

五年后AI所需算力超100万倍!十二家机构联合发表88页长文:「智能计算」是解药五年后AI所需算力超100万倍!十二家机构联合发表88页长文:「智能计算」是解药Apr 09, 2023 pm 07:01 PM

人工智能就是一个「拼财力」的行业,如果没有高性能计算设备,别说开发基础模型,就连微调模型都做不到。但如果只靠拼硬件,单靠当前计算性能的发展速度,迟早有一天无法满足日益膨胀的需求,所以还需要配套的软件来协调统筹计算能力,这时候就需要用到「智能计算」技术。最近,来自之江实验室、中国工程院、国防科技大学、浙江大学等多达十二个国内外研究机构共同发表了一篇论文,首次对智能计算领域进行了全面的调研,涵盖了理论基础、智能与计算的技术融合、重要应用、挑战和未来前景。论文链接:​https://spj.scien

​什么是Transformer机器学习模型?​什么是Transformer机器学习模型?Apr 08, 2023 pm 06:31 PM

译者 | 李睿审校 | 孙淑娟​近年来, Transformer 机器学习模型已经成为深度学习和深度神经网络技术进步的主要亮点之一。它主要用于自然语言处理中的高级应用。谷歌正在使用它来增强其搜索引擎结果。OpenAI 使用 Transformer 创建了著名的 GPT-2和 GPT-3模型。自从2017年首次亮相以来,Transformer 架构不断发展并扩展到多种不同的变体,从语言任务扩展到其他领域。它们已被用于时间序列预测。它们是 DeepMind 的蛋白质结构预测模型 AlphaFold

AI模型告诉你,为啥巴西最可能在今年夺冠!曾精准预测前两届冠军AI模型告诉你,为啥巴西最可能在今年夺冠!曾精准预测前两届冠军Apr 09, 2023 pm 01:51 PM

说起2010年南非世界杯的最大网红,一定非「章鱼保罗」莫属!这只位于德国海洋生物中心的神奇章鱼,不仅成功预测了德国队全部七场比赛的结果,还顺利地选出了最终的总冠军西班牙队。不幸的是,保罗已经永远地离开了我们,但它的「遗产」却在人们预测足球比赛结果的尝试中持续存在。在艾伦图灵研究所(The Alan Turing Institute),随着2022年卡塔尔世界杯的持续进行,三位研究员Nick Barlow、Jack Roberts和Ryan Chan决定用一种AI算法预测今年的冠军归属。预测模型图

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool