search
HomeTechnology peripheralsAIGPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors

GPT-4 solved the famous Internet meme "Chihuahua or blueberry muffin", which once amazed countless people.

However, now it is accused of "cheating"!

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

The pictures that appear in the original question are all used, but the order and arrangement are messed up.

The latest version of GPT-4 is famous for its all-in-one feature. Surprisingly, however, it made errors in the number of images it recognized, and even the Chihuahua, which was originally correctly recognized, also had recognition errors

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

What is the reason why GPT-4 performs well on the original image?

According to UCSC Assistant Professor Xin Eric Wang’s speculation, the reason for conducting this test is because the original images on the Internet are too popular. He believes that GPT-4 has encountered the original answers many times during the training process and successfully memorized them

LeCun, one of the three Turing Award winners, also paid attention to this matter and said:

Be careful about testing on the training set.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Can’t tell the difference between Teddy and fried chicken

How popular is the original picture, not only on the Internet The famous problem has even become a classic problem in the field of computer vision, and has appeared many times in related paper research.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Many netizens have proposed their own test plans regarding the areas where GPT-4’s capabilities are limited, regardless of the impact of the original image

In order to rule out whether the arrangement is too complicated and has any impact, some people changed it to a simple 3x3 arrangement and made a lot of mistakes.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPictures

Someone took out some of the pictures and sent them to GPT separately- 4, got a 5/5 accuracy rate.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Xin Eric Wang believes that putting these easily confused images together is at the heart of this challenge

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

In the end, someone successfully used the two key techniques of letting the artificial intelligence "take a deep breath" and "think step by step" at the same time, and got the correct results

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

GPT-4's wording in the answer "This is an example of a visual pun or a famous meme" also reveals that the original image may indeed exist in the training data. Rephrased as follows: However, GPT-4 used in its answer: "This is an example of a visual pun or a famous meme", which also reveals that the original image may indeed exist in the training data

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Finally, someone also tested the "Teddy or fried chicken" test that often appears together, and found that GPT-4 cannot distinguish well.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

This "blueberry or chocolate bean" is a bit too much...

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

The "nonsense" of large models is called an illusion problem in academia, multi-modal large models The problem of visual hallucinations has become a hot research direction recently.

In a study at EMNLP 2023, we created the GVIL dataset, which contains 1,600 data points, and conducted a systematic evaluation of the problem of visual illusions

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Studies show that larger scale models are more susceptible to illusions and are closer to human perception

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Another recent study focuses on assessing two types of illusions: bias and interference

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

  • Bias refers to model tendencies Certain types of responses may be caused by imbalances in the training data.
  • Interference may occur due to the way the text prompt is worded or the way the input image is presented.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

The study pointed out that GPT-4V often gets confused when interpreting multiple images together, and performs better when sending images separately, consistent with Observations from the “Chihuahua or Waffle” test.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Popular mitigation measures, such as self-correction and thought chain prompts, do not effectively solve these problems, and testing shows that LLaVA and Bard, etc. Modal models also have similar problems

In addition, research also found that GPT-4V is better at interpreting images with Western cultural backgrounds or images with English text.

For example, GPT-4V can correctly count the seven dwarfs Snow White, but it counts the seven gourd dolls into 10.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errorsPicture

Reference link: [1]https://twitter.com/xwang_lk/status/1723389615254774122[2]https://arxiv. org/abs/2311.00047[3]https://arxiv.org/abs/2311.03287

The above is the detailed content of GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
GPT-4被曝作弊!LeCun呼吁谨慎在训练集上测试,吉娃娃or松饼的顺序混乱导致错误GPT-4被曝作弊!LeCun呼吁谨慎在训练集上测试,吉娃娃or松饼的顺序混乱导致错误Nov 13, 2023 pm 08:17 PM

GPT-4解决网络名梗“吉娃娃or蓝莓松饼”,一度惊艳无数人。然而,如今它被指控为“作弊”!图片全用原题中出现的图,只是打乱顺序和排列方式。最新版本的GPT-4以其全模式合一的特点而闻名。然而,令人惊讶的是,它在识别图片数量方面出现了错误,而且连原本能够正确识别的吉娃娃也出现了识别错误图片GPT-4在原图上表现出色的原因是什么呢?根据UCSC助理教授XinEricWang的猜测,搞这项测试的原因是因为互联网上的原图太受欢迎了。他认为GPT-4在训练过程中多次遇到过原始答案,并成功地记住了它们图灵

介绍八种免费开源的大模型解决方案,因为ChatGPT和Bard价格太高。介绍八种免费开源的大模型解决方案,因为ChatGPT和Bard价格太高。May 08, 2023 pm 10:13 PM

1.LLaMALLaMA项目包含了一组基础语言模型,其规模从70亿到650亿个参数不等。这些模型在数以百万计的token上进行训练,而且它完全在公开的数据集上进行训练。结果,LLaMA-13B超过了GPT-3(175B),而LLaMA-65B的表现与Chinchilla-70B和PaLM-540B等最佳模型相似。图片来自LLaMA资源:研究论文:“LLaMA:OpenandEfficientFoundationLanguageModels(arxiv.org)”[https://arxiv.or

清华浙大主导开源视觉模型爆炸, GPT-4V与LLaVA、CogAgent等平台带来革命性变革清华浙大主导开源视觉模型爆炸, GPT-4V与LLaVA、CogAgent等平台带来革命性变革Jan 04, 2024 am 08:10 AM

目前,GPT-4Vision在语言理解和视觉处理方面显示出了令人惊叹的能力。然而,对于那些希望在不影响性能的情况下寻求成本效益替代方案的人来说,开源方案是一个具有无限潜力的选择。YoussefHosni是一位国外开发者,他为我们提供了三种可访问性绝对保障的开源替代方案来取代GPT-4V。三种开源视觉语言模型LLaVa、CogAgent和BakLLaVA在视觉处理领域拥有巨大潜力,值得我们深入了解。这些模型的研究和开发,可以为我们提供更高效、精准的视觉处理解决方案。通过运用这些模型,我们可以提升图

UC伯克利成功开发通用视觉推理大模型,三位资深学者合力参与研究UC伯克利成功开发通用视觉推理大模型,三位资深学者合力参与研究Dec 04, 2023 pm 06:25 PM

仅靠视觉(像素)模型能走多远?UC伯克利、约翰霍普金斯大学的新论文探讨了这一问题,并展示了大型视觉模型(LVM)在多种CV任务上的应用潜力。最近一段时间以来,GPT和LLaMA等大型语言模型(LLM)已经风靡全球。构建大型视觉模型(LVM)是一个备受关注的问题,我们需要什么来实现它呢?LLaVA等视觉语言模型所提供的思路很有趣,也值得探索,但根据动物界的规律,我们已经知道视觉能力和语言能力二者并不相关。比如许多实验都表明,非人类灵长类动物的视觉世界与人类的视觉世界非常相似,尽管它们和人类的语言体

GPT-4不服被Bard反超:最新模型已入场GPT-4不服被Bard反超:最新模型已入场Feb 01, 2024 pm 05:39 PM

“大模型排位赛”权威榜单ChatbotArena刷新:谷歌Bard超越GPT-4,排名位居第二,仅次于GPT-4Turbo。然鹅,众多网友对此却表示“不服”、“不公平”。原来,谷歌AI掌门人JeffDean透露,Bard性能大幅提升,是因为搭载了新版大模型——GeminiPro-scale。这也就意味着,打“排位赛”的Bard具备了联网功能。网友的质疑正是围绕着这一点展开:在同一个排行榜上混合在线和离线大模型,是极易引起误解的。HuggingFace的“首席羊驼官”OmarSanseviero也

Bard:ChatGPT 的新竞争对手Bard:ChatGPT 的新竞争对手Nov 08, 2023 am 11:46 AM

在不断追求优化人工智能用户体验的过程中,谷歌推出了最新、最先进的对话系统Bard 。

连葫芦娃都数不明白,解说英雄联盟的GPT-4V面临幻觉挑战连葫芦娃都数不明白,解说英雄联盟的GPT-4V面临幻觉挑战Nov 13, 2023 pm 09:21 PM

让大模型同时理解图像和文字可能比想象中要难。在被称为「AI春晚」的OpenAI首届开发者大会拉开帷幕后,很多人的朋友圈都被这家公司发布的新产品刷了屏,比如不需要写代码就能定制应用的GPTs、能解说球赛甚至「英雄联盟」游戏的GPT-4视觉API等等。不过,在大家纷纷夸赞这些产品有多好用的时候,也有人发现了弱点,指出像GPT-4V这样强大的多模态模型其实还存在很大的幻觉,在基本的视觉能力上也还存在缺陷,比如分不清「松糕和吉娃娃」、「泰迪犬和炸鸡」等相似图像。GPT-4V分不清松糕和吉娃娃。图源:Xi

ChatGPT vs Google Bard (2023): 深度比较​ChatGPT vs Google Bard (2023): 深度比较​Jun 08, 2023 pm 05:10 PM

ChatGPT和GoogleBard都是人工智能聊天机器人,旨在对用户输入的提示生成回复。如果使用得当,ChatGPT和GoogleBard都可以用于支持部分内容生产、开发等方面的业务流程。阅读本文,了解每种工具的功能、优点和缺点,看看哪种最适合您的业务。ChatGPT是什么?ChatGPT是一个由OpenAI开发的人工智能聊天机器人,能够基于用户输入的文本生成类似人类的回答,目前已在大量大语言模型上进行了训练。GoogleBard是什么?GoogleBard也是人工智能聊天机器人。与ChatG

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version