search
HomeTechnology peripheralsAIGPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%


GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%

News on April 4th, OpenAI’s latest language model GPT-4 is not only able to generate various texts like humans , also able to design and execute tests to evaluate and improve their performance. This "reflection" technology has allowed GPT-4 to achieve significant improvements in many difficult tests, with test performance improved by 30%.

GPT-4 is the most advanced system launched by OpenAI after GPT, GPT-2 and GPT-3, and is currently the largest multi-modal model (can accept image and text input and output text). It leverages deep learning technology, using artificial neural networks to imitate human writing.

Researchers Noah Shinn and Ashwin Gopinath wrote in the paper: "We have developed a novel technology that allows AI agents to Simulate human self-reflection and evaluate one's own performance. When completing various tests, GPT-4 will add some extra steps, allowing it to design its own tests to check its own answers and identify errors and deficiencies. Then modify your solution based on your findings."

GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%

In the HumanEval coding test, GPT-4 used a self-reflection loop, and the accuracy increased from 67% to 88%

GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%

GPT-4 can be designed and executed to critique its own performance, and as shown in the AlfWorld test results, its performance can be greatly improved

Research The team used this technique to conduct several different performance tests on GPT-4. In the HumanEval test, GPT-4 needed to solve 164 never-before-seen Python programming problems. The original accuracy was 67%. After using reflection technology, the accuracy increased to 88%. In the Alfworld test, the AI ​​needs to make decisions and solve multi-step tasks by performing a number of allowed operations in a variety of different interactive environments. After using reflection techniques, GPT-4's accuracy increased from 73% to 97%, with only 4 task failures. In the HotPotQA test, GPT-4 accessed Wikipedia and answered 100 questions that required parsing content and reasoning from multiple supporting documents. The original accuracy was 34%. After using reflection technology, the accuracy increased to 54%.

This research shows that solutions to AI problems sometimes rely on AI itself. IT House found that this is a bit like a generative adversarial network, which is a method for two AIs to improve each other's skills. For example, one AI tries to generate some pictures that look like real pictures, and the other AI tries to distinguish which ones are fake. Which ones are true. But in this case, GPT is both a writer and an editor, using self-reflection to improve the quality of his or her output.

The above is the detailed content of GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Newest Annual Compilation Of The Best Prompt Engineering TechniquesNewest Annual Compilation Of The Best Prompt Engineering TechniquesApr 10, 2025 am 11:22 AM

For those of you who might be new to my column, I broadly explore the latest advances in AI across the board, including topics such as embodied AI, AI reasoning, high-tech breakthroughs in AI, prompt engineering, training of AI, fielding of AI, AI re

Europe's AI Continent Action Plan: Gigafactories, Data Labs, And Green AIEurope's AI Continent Action Plan: Gigafactories, Data Labs, And Green AIApr 10, 2025 am 11:21 AM

Europe's ambitious AI Continent Action Plan aims to establish the EU as a global leader in artificial intelligence. A key element is the creation of a network of AI gigafactories, each housing around 100,000 advanced AI chips – four times the capaci

Is Microsoft's Straightforward Agent Story Enough To Create More Fans?Is Microsoft's Straightforward Agent Story Enough To Create More Fans?Apr 10, 2025 am 11:20 AM

Microsoft's Unified Approach to AI Agent Applications: A Clear Win for Businesses Microsoft's recent announcement regarding new AI agent capabilities impressed with its clear and unified presentation. Unlike many tech announcements bogged down in te

Selling AI Strategy To Employees: Shopify CEO's ManifestoSelling AI Strategy To Employees: Shopify CEO's ManifestoApr 10, 2025 am 11:19 AM

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

IBM Launches Z17 Mainframe With Full AI IntegrationIBM Launches Z17 Mainframe With Full AI IntegrationApr 10, 2025 am 11:18 AM

IBM's z17 Mainframe: Integrating AI for Enhanced Business Operations Last month, at IBM's New York headquarters, I received a preview of the z17's capabilities. Building on the z16's success (launched in 2022 and demonstrating sustained revenue grow

5 ChatGPT Prompts To Stop Depending On Others And Trust Yourself Fully5 ChatGPT Prompts To Stop Depending On Others And Trust Yourself FullyApr 10, 2025 am 11:17 AM

Unlock unshakeable confidence and eliminate the need for external validation! These five ChatGPT prompts will guide you towards complete self-reliance and a transformative shift in self-perception. Simply copy, paste, and customize the bracketed in

AI Is Dangerously Similar To Your MindAI Is Dangerously Similar To Your MindApr 10, 2025 am 11:16 AM

A recent [study] by Anthropic, an artificial intelligence security and research company, begins to reveal the truth about these complex processes, showing a complexity that is disturbingly similar to our own cognitive domain. Natural intelligence and artificial intelligence may be more similar than we think. Snooping inside: Anthropic Interpretability Study The new findings from the research conducted by Anthropic represent significant advances in the field of mechanistic interpretability, which aims to reverse engineer internal computing of AI—not just observe what AI does, but understand how it does it at the artificial neuron level. Imagine trying to understand the brain by drawing which neurons fire when someone sees a specific object or thinks about a specific idea. A

Dragonwing Showcases Qualcomm's Edge MomentumDragonwing Showcases Qualcomm's Edge MomentumApr 10, 2025 am 11:14 AM

Qualcomm's Dragonwing: A Strategic Leap into Enterprise and Infrastructure Qualcomm is aggressively expanding its reach beyond mobile, targeting enterprise and infrastructure markets globally with its new Dragonwing brand. This isn't merely a rebran

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use