Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings-AI-php.cn

Home

Technology peripherals

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 12, 2023 pm 03:37 PM

AIalgorithm

Large-scale language models can be said to be the cornerstone of modern natural language processing technology, such as GPT-3 with 175 billion parameters and PaLM with 540 billion parameters. Pre-training models provide very powerful few-shot learning for downstream tasks. ability.

But reasoning tasks are still a difficult problem, especially questions that require multi-step reasoning to get the correct answer.

Recently, researchers have discovered that as long as a properly designed prompt can guide the model to perform multi-step reasoning to generate the final answer, this method is also called chain-of-thought reasoning.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

Thinking chain technology increased the accuracy from 17.9% to 58.1% on the arithmetic benchmark GSM8K. The self-consistency mechanism of voting introduced later further improved the accuracy. Increased to 74.4%

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

Simply put, complex reasoning tasks usually have multiple reasoning paths that can get the correct answer. The self-consistent method samples from the language model through the thinking chain A set of different reasoning paths, and then the most consistent answer among them is returned.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

Recently, researchers from Peking University and Microsoft based on the new self-consistent method DiVeRSe, which contains three main innovation points, further improving the model's reasoning capabilities.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

Paper link: https://arxiv.org/abs/2206.02336

Code link: https://github.com/microsoft/DiVerSe

First, inspired by the self-consistent approach of "different ideas, same answers", that is, sampling different reasoning paths from the language model, DiVeRSe goes a step further in diversity, following the principle of "all roads lead to Rome" The idea is that using multiple prompts to generate answers can generate more complete and complementary answers.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

The researchers first provide 5 different prompts for each question, then sample 20 reasoning paths for each prompt, and finally Generate 100 solution reasoning paths for each question.

A key issue is how to obtain different prompts. Assume that after obtaining a sample library, we can sample K samples from it to construct a prompt, and then repeat it 5 times

If there are not enough samples, use self-teaching to improve prompt diversity, that is, generate pseudo inference paths and pairs from a part of the samples.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

Second, when generating the inference path, there is no mechanism in the language model to correct errors in previous steps, which may lead to confusion in the final prediction result. DiVeRSe draws on the idea of verifier to verify the correctness of each reasoning path to guide the voting mechanism. That is, not all reasoning mechanisms are equally important or good.

Suppose we have 100 reasoning paths for a question, 60 of which result in "the answer is 110", and 40 of which result in "the answer is 150". Without a validator (i.e. the original self-consistent method), "the answer is 110" is a majority vote, so we can treat 110 as the final answer and delete the 40 reasoning paths that result in 150.

verifier scores the reasoning path. The function f is trained by a two-classifier. The input is question x, path z and answer y, and the output is the probability of positive.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

With verifier, assume that the average score of the 60 reasoning paths of "The answer is 110" is 0.3; the average score of the 40 reasoning paths of "The answer is 150" is 0.8. Then the final answer should be 150, because 40*0.8>60*0.3

Thirdly, since the answer is generated based on multiple steps of reasoning, when a path generates a correct answer, it can be considered All steps contribute to the final correctness. However, when a wrong answer is generated, it does not mean that all steps were wrong or contributed to the error.

In other words, although the result is wrong, some intermediate steps may still be correct, but some subsequent deviation steps lead to the final wrong answer. DiVeRSe designed a mechanism to assign a fine-grained label to each step and proposed a step-aware verifier and assigned correctness to the reasoning of each step instead of just looking at the final answer.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

The main body is still a two-classifier, but the key question is how to obtain the step-level negative label, because if the final answer is wrong, without human participation, we I don’t know which step went wrong, but the correct answer is that the process should be correct.

Researchers proposed the concept of supports. For example, in arithmetic tasks, there needs to be an intermediate result of another example that is the same as the result of the intermediate step.

Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

Based on these three improvements, the researchers conducted experiments on 5 arithmetic reasoning data sets. It can be seen that the DiVeRSe method based on code-davinci-002 has achieved The new SOTA algorithm has an average improvement rate of 6.2%. ), it is speculated that the reason may be that the common sense reasoning task is a multiple-choice task rather than an open-ended generation task, resulting in more false-positive pseudo-examples.

On the inductive reasoning task, DiVeRSe achieved a score of 95.9% on the CLUTRR task, exceeding the previous SOTA fine-tuning result (28.9%) Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

In the ablation experiment, you can see voting The performance improvement of the verifier mechanism is relatively obvious.

# In most experiments, extending the voting verifier to a step-aware version can improve performance. For code-davinci-002 on GSM8K, the step-aware version of verifier will cause a slight decrease in performance.

The possible reason is that code-davinci-002 is more powerful and can produce higher quality inference paths for GSM8K, thereby reducing the necessity of step-level information, i.e. text-davinci is more likely to generate short/incomplete inference path, while code-davinci is more friendly to generating growing content. Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

The first author of the paper is Yifei Li. He graduated from Northeastern University with a bachelor's degree in software engineering in 2020. He is currently studying for a master's degree at Peking University. His main research direction is natural language processing. , especially prompt-tuning and inference in large-scale language models.

The second author of the article is Zeqi Lin, a DKI researcher at Microsoft Research Asia. He received his bachelor's degree and doctorate degree from Peking University in 2014 and 2019 respectively. His main research direction is machine learning and its application in software analysis. and applications in data analysis. Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings

The above is the detailed content of Beyond PaLM! Peking University Master proposed DiVeRSe, completely refreshing the NLP reasoning rankings. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Tool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7501

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers