Home >Technology peripherals >AI >Google releases BIG-Bench Mistake dataset to help AI language models improve self-correction capabilities

Google releases BIG-Bench Mistake dataset to help AI language models improve self-correction capabilities

王林
王林forward
2024-01-16 16:39:131299browse

可协助 AI 语言模型改善自我纠错能力,谷歌推出 BIG-Bench Mistake 数据集

Google Research used its own BIG-Bench benchmark to establish the "BIG-Bench Mistake" data set and evaluate the error probability and error correction capabilities of popular language models on the market. Research. This initiative aims to improve the quality and accuracy of language models and provide better support for applications in the fields of intelligent search and natural language processing.

可协助 AI 语言模型改善自我纠错能力,谷歌推出 BIG-Bench Mistake 数据集

Google researchers said they created a special dataset called "BIG-Bench Mistake" to evaluate the error probability and self-correction of large language models ability. The purpose of this dataset is to fill the gap in the past lack of datasets to assess these capabilities.

The researchers ran 5 tasks on the BIG-Bench benchmark using the PaLM language model. Subsequently, they modified the generated "Chain-of-Thought" trajectory, added a "logical error" part, and used the model again to determine errors in the chain-of-thought trajectory.

In order to improve the accuracy of the data set, Google researchers repeated the above process and formed a dedicated benchmark data set called "BIG-Bench Mistake", which contained 255 logical errors.

The researchers pointed out that the logical errors in the "BIG-Bench Mistake" data set are very obvious, so it can be used as a good testing standard to help the language model start practicing from simple logical errors and gradually improve the ability to identify errors. ability.

The researchers used this data set to test models on the market and found that although the vast majority of language models can identify logical errors that occur during the reasoning process and correct themselves, this process "is not sufficient." Ideal" , often requiring human intervention to correct what the model outputs.

可协助 AI 语言模型改善自我纠错能力,谷歌推出 BIG-Bench Mistake 数据集

▲ Picture source Google Research Press Release

This site found from the report that Google claims that "currently the most advanced large language model" self-corrects errors The ability is also relatively limited. The model that performed best in the relevant test results only found 52.9% of the logical errors

.

可协助 AI 语言模型改善自我纠错能力,谷歌推出 BIG-Bench Mistake 数据集 Google researchers also claimed that this BIG-Bench Mistake data set will help improve the model’s self-correction ability. After fine-tuning the model on relevant test tasks, “even Small models also generally perform better than large models with zero-sample cues."

Accordingly, Google believes that in terms of model error correction, proprietary small models can be used to "supervise" large models. Instead of letting large language models learn to "correct self-errors",

deployment is dedicated to supervising large models. Small, specialized models of models help improve efficiency, reduce associated AI deployment costs, and make fine-tuning easier.

The above is the detailed content of Google releases BIG-Bench Mistake dataset to help AI language models improve self-correction capabilities. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete