Google launches BIG-Bench Mistake data set to help AI improve error correction capabilities-AI-php.cn

Home

Technology peripherals

Google launches BIG-Bench Mistake data set to help AI improve error correction capabilities

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jan 16, 2024 pm 06:57 PM

Google Research recently conducted an evaluation study on popular language models, using its own BIG-Bench benchmark and the newly established "BIG-Bench Mistake" data set. They mainly focused on the error probability and error correction ability of the language model. This study provides valuable data to better understand the performance of language models on the market.

可协助 AI 改善纠错能力，谷歌推出 BIG-Bench Mistake 数据集

Google researchers said they created a special benchmark data set called "BIG-Bench Mistake" to evaluate the "error probability" and "self-correction ability" of large language models. This is due to the lack of corresponding data sets in the past to effectively evaluate and test these key indicators.

The researchers used the PaLM language model to run 5 tasks in their own BIG-Bench benchmark task, and added the generated "Chain-of-Thought" trajectory to the "Logic Error" part. Retest model accuracy.

In order to improve the accuracy of the data set, Google researchers repeatedly performed the above process and finally created a benchmark data set specifically for evaluation, which contains 255 logical errors, called "BIG-Bench Mistake" .

The researchers pointed out that the logical errors in the "BIG-Bench Mistake" data set are very obvious and therefore can be used as a good standard for language model testing. This dataset helps the model learn from simple errors and gradually improve its ability to identify errors.

The researchers used this data set to test models on the market and found that although most language models can identify logical errors in the reasoning process and correct themselves, this process is not very ideal. Often, human intervention is also required to correct what the model outputs.

可协助 AI 改善纠错能力，谷歌推出 BIG-Bench Mistake 数据集

▲ Picture source Google Research Press Release

According to the report, Google claims that it is considered the most advanced large language model currently, but its self-correction ability is relatively limited. In tests, the best-performing model found only 52.9% of logical errors.

可协助 AI 改善纠错能力，谷歌推出 BIG-Bench Mistake 数据集

Google researchers also claimed that this BIG-Bench Mistake data set is conducive to improving the self-correction ability of the model. After fine-tuning the model on relevant test tasks, "even small model performance is usually better than that of large models with zero sample prompts." better".

According to this, Google believes that in terms of model error correction, proprietary small models can be used to "supervise" large models. Instead of letting large language models learn to "correct self-errors", deploying small dedicated models dedicated to supervising large models has the advantage of This will help improve efficiency, reduce related AI deployment costs, and make fine-tuning easier.

The above is the detailed content of Google launches BIG-Bench Mistake data set to help AI improve error correction capabilities. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:搜狐. If there is any infringement, please contact admin@php.cn delete

How Powerful Nations Are Using Visas To Win The Global AI Talent RaceMay 16, 2025 am 02:13 AM

The globe's leading nations are fiercely competing for a shrinking group of elite AI researchers. They are employing accelerated visa procedures and fast-tracked citizenship to draw in the top international talent. This international race is turning

Do I need a phone number to register for ChatGPT? We also explain what to do if you can't registerMay 16, 2025 am 01:24 AM

No mobile number is required for ChatGPT registration? This article will explain in detail the latest changes in the ChatGPT registration process, including the advantages of no longer mandatory mobile phone numbers, as well as scenarios where mobile phone number authentication is still required in special circumstances such as API usage and multi-account creation. In addition, we will also discuss the security of mobile phone number registration and provide solutions to common errors during the registration process. ChatGPT registration: Mobile phone number is no longer required In the past, registering for ChatGPT required mobile phone number verification. But an update in December 2023 canceled the requirement. Now, you can easily register for ChatGPT by simply having an email address or Google, Microsoft, or Apple account. It should be noted that although it is not necessary

Top Ten Uses Of AI Puts Therapy And Companionship At The #1 SpotMay 16, 2025 am 12:43 AM

Let's delve into the fascinating world of AI and its top uses as outlined in the latest analysis.This exploration of a groundbreaking AI development is a continuation of my ongoing Forbes column, where I delve into the latest advancements in AI, incl

Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

4 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Clair Obscur: Expedition 33 - How To Get Perfect Chroma Catalysts

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.