GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors-AI-php.cn

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors

PHPz

Nov 13, 2023 pm 08:17 PM

bardgpt-4vllava

GPT-4 solved the famous Internet meme "Chihuahua or blueberry muffin", which once amazed countless people.

However, now it is accused of "cheating"!

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Pictures

The pictures that appear in the original question are all used, but the order and arrangement are messed up.

The latest version of GPT-4 is famous for its all-in-one feature. Surprisingly, however, it made errors in the number of images it recognized, and even the Chihuahua, which was originally correctly recognized, also had recognition errors

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Pictures

What is the reason why GPT-4 performs well on the original image?

According to UCSC Assistant Professor Xin Eric Wang’s speculation, the reason for conducting this test is because the original images on the Internet are too popular. He believes that GPT-4 has encountered the original answers many times during the training process and successfully memorized them

LeCun, one of the three Turing Award winners, also paid attention to this matter and said:

Be careful about testing on the training set.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Can’t tell the difference between Teddy and fried chicken

How popular is the original picture, not only on the Internet The famous problem has even become a classic problem in the field of computer vision, and has appeared many times in related paper research.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Many netizens have proposed their own test plans regarding the areas where GPT-4’s capabilities are limited, regardless of the impact of the original image

In order to rule out whether the arrangement is too complicated and has any impact, some people changed it to a simple 3x3 arrangement and made a lot of mistakes.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Pictures

Someone took out some of the pictures and sent them to GPT separately- 4, got a 5/5 accuracy rate.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Xin Eric Wang believes that putting these easily confused images together is at the heart of this challenge

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

In the end, someone successfully used the two key techniques of letting the artificial intelligence "take a deep breath" and "think step by step" at the same time, and got the correct results

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

GPT-4's wording in the answer "This is an example of a visual pun or a famous meme" also reveals that the original image may indeed exist in the training data. Rephrased as follows: However, GPT-4 used in its answer: "This is an example of a visual pun or a famous meme", which also reveals that the original image may indeed exist in the training data

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Finally, someone also tested the "Teddy or fried chicken" test that often appears together, and found that GPT-4 cannot distinguish well.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

This "blueberry or chocolate bean" is a bit too much...

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Visual illusion has become a popular direction

The "nonsense" of large models is called an illusion problem in academia, multi-modal large models The problem of visual hallucinations has become a hot research direction recently.

In a study at EMNLP 2023, we created the GVIL dataset, which contains 1,600 data points, and conducted a systematic evaluation of the problem of visual illusions

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Studies show that larger scale models are more susceptible to illusions and are closer to human perception

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Another recent study focuses on assessing two types of illusions: bias and interference

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Bias refers to model tendencies Certain types of responses may be caused by imbalances in the training data.
Interference may occur due to the way the text prompt is worded or the way the input image is presented.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

The study pointed out that GPT-4V often gets confused when interpreting multiple images together, and performs better when sending images separately, consistent with Observations from the “Chihuahua or Waffle” test.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Popular mitigation measures, such as self-correction and thought chain prompts, do not effectively solve these problems, and testing shows that LLaVA and Bard, etc. Modal models also have similar problems

In addition, research also found that GPT-4V is better at interpreting images with Western cultural backgrounds or images with English text.

For example, GPT-4V can correctly count the seven dwarfs Snow White, but it counts the seven gourd dolls into 10.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Reference link: [1]https://twitter.com/xwang_lk/status/1723389615254774122[2]https://arxiv. org/abs/2311.00047[3]https://arxiv.org/abs/2311.03287

The above is the detailed content of GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Why Sam Altman And Others Are Now Using Vibes As A New Gauge For The Latest Progress In AIMay 06, 2025 am 11:12 AM

Let's discuss the rising use of "vibes" as an evaluation metric in the AI field. This analysis is part of my ongoing Forbes column on AI advancements, exploring complex aspects of AI development (see link here). Vibes in AI Assessment Tradi

Inside The Waymo Factory Building A Robotaxi FutureMay 06, 2025 am 11:11 AM

Waymo's Arizona Factory: Mass-Producing Self-Driving Jaguars and Beyond Located near Phoenix, Arizona, Waymo operates a state-of-the-art facility producing its fleet of autonomous Jaguar I-PACE electric SUVs. This 239,000-square-foot factory, opened

Inside S&P Global's Data-Driven Transformation With AI At The CoreMay 06, 2025 am 11:10 AM

S&P Global's Chief Digital Solutions Officer, Jigar Kocherlakota, discusses the company's AI journey, strategic acquisitions, and future-focused digital transformation. A Transformative Leadership Role and a Future-Ready Team Kocherlakota's role

The Rise Of Super-Apps: 4 Steps To Flourish In A Digital EcosystemMay 06, 2025 am 11:09 AM

From Apps to Ecosystems: Navigating the Digital Landscape The digital revolution extends far beyond social media and AI. We're witnessing the rise of "everything apps"—comprehensive digital ecosystems integrating all aspects of life. Sam A

Mastercard And Visa Unleash AI Agents To Shop For YouMay 06, 2025 am 11:08 AM

Mastercard's Agent Pay: AI-Powered Payments Revolutionize Commerce While Visa's AI-powered transaction capabilities made headlines, Mastercard has unveiled Agent Pay, a more advanced AI-native payment system built on tokenization, trust, and agentic

Backing The Bold: Future Ventures' Transformative Innovation PlaybookMay 06, 2025 am 11:07 AM

Future Ventures Fund IV: A $200M Bet on Novel Technologies Future Ventures recently closed its oversubscribed Fund IV, totaling $200 million. This new fund, managed by Steve Jurvetson, Maryanna Saenko, and Nico Enriquez, represents a significant inv

As AI Use Soars, Companies Shift From SEO To GEOMay 05, 2025 am 11:09 AM

With the explosion of AI applications, enterprises are shifting from traditional search engine optimization (SEO) to generative engine optimization (GEO). Google is leading the shift. Its "AI Overview" feature has served over a billion users, providing full answers before users click on the link. [^2] Other participants are also rapidly rising. ChatGPT, Microsoft Copilot and Perplexity are creating a new “answer engine” category that completely bypasses traditional search results. If your business doesn't show up in these AI-generated answers, potential customers may never find you—even if you rank high in traditional search results. From SEO to GEO – What exactly does this mean? For decades

Big Bets On Which Of These Pathways Will Push Today's AI To Become Prized AGIMay 05, 2025 am 11:08 AM

Let's explore the potential paths to Artificial General Intelligence (AGI). This analysis is part of my ongoing Forbes column on AI advancements, delving into the complexities of achieving AGI and Artificial Superintelligence (ASI). (See related art

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Hot Tools

WebStorm Mac version

Useful JavaScript development tools

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Dreamweaver CS6

Visual web development tools

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

1659

1416

1310

1258

1233