Home  >  Article  >  Technology peripherals  >  GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors

PHPz
PHPzforward
2023-11-13 20:17:23642browse

GPT-4 solved the famous Internet meme "Chihuahua or blueberry muffin", which once amazed countless people.

However, now it is accused of "cheating"!

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

The pictures that appear in the original question are all used, but the order and arrangement are messed up.

The latest version of GPT-4 is famous for its all-in-one feature. Surprisingly, however, it made errors in the number of images it recognized, and even the Chihuahua, which was originally correctly recognized, also had recognition errors

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

What is the reason why GPT-4 performs well on the original image?

According to UCSC Assistant Professor Xin Eric Wang’s speculation, the reason for conducting this test is because the original images on the Internet are too popular. He believes that GPT-4 has encountered the original answers many times during the training process and successfully memorized them

LeCun, one of the three Turing Award winners, also paid attention to this matter and said:

Be careful about testing on the training set.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Can’t tell the difference between Teddy and fried chicken

How popular is the original picture, not only on the Internet The famous problem has even become a classic problem in the field of computer vision, and has appeared many times in related paper research.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Many netizens have proposed their own test plans regarding the areas where GPT-4’s capabilities are limited, regardless of the impact of the original image

In order to rule out whether the arrangement is too complicated and has any impact, some people changed it to a simple 3x3 arrangement and made a lot of mistakes.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

Someone took out some of the pictures and sent them to GPT separately- 4, got a 5/5 accuracy rate.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Xin Eric Wang believes that putting these easily confused images together is at the heart of this challenge

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

In the end, someone successfully used the two key techniques of letting the artificial intelligence "take a deep breath" and "think step by step" at the same time, and got the correct results

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

GPT-4's wording in the answer "This is an example of a visual pun or a famous meme" also reveals that the original image may indeed exist in the training data. Rephrased as follows: However, GPT-4 used in its answer: "This is an example of a visual pun or a famous meme", which also reveals that the original image may indeed exist in the training data

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Finally, someone also tested the "Teddy or fried chicken" test that often appears together, and found that GPT-4 cannot distinguish well.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

This "blueberry or chocolate bean" is a bit too much...

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Visual illusion has become a popular direction

The "nonsense" of large models is called an illusion problem in academia, multi-modal large models The problem of visual hallucinations has become a hot research direction recently.

In a study at EMNLP 2023, we created the GVIL dataset, which contains 1,600 data points, and conducted a systematic evaluation of the problem of visual illusions

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Studies show that larger scale models are more susceptible to illusions and are closer to human perception

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Another recent study focuses on assessing two types of illusions: bias and interference

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

  • Bias refers to model tendencies Certain types of responses may be caused by imbalances in the training data.
  • Interference may occur due to the way the text prompt is worded or the way the input image is presented.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

The study pointed out that GPT-4V often gets confused when interpreting multiple images together, and performs better when sending images separately, consistent with Observations from the “Chihuahua or Waffle” test.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Popular mitigation measures, such as self-correction and thought chain prompts, do not effectively solve these problems, and testing shows that LLaVA and Bard, etc. Modal models also have similar problems

In addition, research also found that GPT-4V is better at interpreting images with Western cultural backgrounds or images with English text.

For example, GPT-4V can correctly count the seven dwarfs Snow White, but it counts the seven gourd dolls into 10.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Reference link: [1]https://twitter.com/xwang_lk/status/1723389615254774122[2]https://arxiv. org/abs/2311.00047[3]https://arxiv.org/abs/2311.03287

The above is the detailed content of GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete