


GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%
News on April 4th, OpenAI’s latest language model GPT-4 is not only able to generate various texts like humans , also able to design and execute tests to evaluate and improve their performance. This "reflection" technology has allowed GPT-4 to achieve significant improvements in many difficult tests, with test performance improved by 30%.
GPT-4 is the most advanced system launched by OpenAI after GPT, GPT-2 and GPT-3, and is currently the largest multi-modal model (can accept image and text input and output text). It leverages deep learning technology, using artificial neural networks to imitate human writing.
Researchers Noah Shinn and Ashwin Gopinath wrote in the paper: "We have developed a novel technology that allows AI agents to Simulate human self-reflection and evaluate one's own performance. When completing various tests, GPT-4 will add some extra steps, allowing it to design its own tests to check its own answers and identify errors and deficiencies. Then modify your solution based on your findings."
In the HumanEval coding test, GPT-4 used a self-reflection loop, and the accuracy increased from 67% to 88%
GPT-4 can be designed and executed to critique its own performance, and as shown in the AlfWorld test results, its performance can be greatly improved
Research The team used this technique to conduct several different performance tests on GPT-4. In the HumanEval test, GPT-4 needed to solve 164 never-before-seen Python programming problems. The original accuracy was 67%. After using reflection technology, the accuracy increased to 88%. In the Alfworld test, the AI needs to make decisions and solve multi-step tasks by performing a number of allowed operations in a variety of different interactive environments. After using reflection techniques, GPT-4's accuracy increased from 73% to 97%, with only 4 task failures. In the HotPotQA test, GPT-4 accessed Wikipedia and answered 100 questions that required parsing content and reasoning from multiple supporting documents. The original accuracy was 34%. After using reflection technology, the accuracy increased to 54%.
This research shows that solutions to AI problems sometimes rely on AI itself. IT House found that this is a bit like a generative adversarial network, which is a method for two AIs to improve each other's skills. For example, one AI tries to generate some pictures that look like real pictures, and the other AI tries to distinguish which ones are fake. Which ones are true. But in this case, GPT is both a writer and an editor, using self-reflection to improve the quality of his or her output.
The above is the detailed content of GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%. For more information, please follow other related articles on the PHP Chinese website!

For those of you who might be new to my column, I broadly explore the latest advances in AI across the board, including topics such as embodied AI, AI reasoning, high-tech breakthroughs in AI, prompt engineering, training of AI, fielding of AI, AI re

Europe's ambitious AI Continent Action Plan aims to establish the EU as a global leader in artificial intelligence. A key element is the creation of a network of AI gigafactories, each housing around 100,000 advanced AI chips – four times the capaci

Microsoft's Unified Approach to AI Agent Applications: A Clear Win for Businesses Microsoft's recent announcement regarding new AI agent capabilities impressed with its clear and unified presentation. Unlike many tech announcements bogged down in te

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

IBM's z17 Mainframe: Integrating AI for Enhanced Business Operations Last month, at IBM's New York headquarters, I received a preview of the z17's capabilities. Building on the z16's success (launched in 2022 and demonstrating sustained revenue grow

Unlock unshakeable confidence and eliminate the need for external validation! These five ChatGPT prompts will guide you towards complete self-reliance and a transformative shift in self-perception. Simply copy, paste, and customize the bracketed in

A recent [study] by Anthropic, an artificial intelligence security and research company, begins to reveal the truth about these complex processes, showing a complexity that is disturbingly similar to our own cognitive domain. Natural intelligence and artificial intelligence may be more similar than we think. Snooping inside: Anthropic Interpretability Study The new findings from the research conducted by Anthropic represent significant advances in the field of mechanistic interpretability, which aims to reverse engineer internal computing of AI—not just observe what AI does, but understand how it does it at the artificial neuron level. Imagine trying to understand the brain by drawing which neurons fire when someone sees a specific object or thinks about a specific idea. A

Qualcomm's Dragonwing: A Strategic Leap into Enterprise and Infrastructure Qualcomm is aggressively expanding its reach beyond mobile, targeting enterprise and infrastructure markets globally with its new Dragonwing brand. This isn't merely a rebran


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Zend Studio 13.0.1
Powerful PHP integrated development environment

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Chinese version
Chinese version, very easy to use