Technology peripherals

Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI

Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 09, 2023 am 09:31 AM

text generation

This article is reproduced from Lei Feng.com. If you need to reprint, please go to the official website of Lei Feng.com to apply for authorization.

Since the advent of DALL-E 2, many people have believed that AI capable of drawing realistic images is a big step towards artificial general intelligence (AGI). OpenAI CEO Sam Altman once declared "AGI is going to be wild" when DALL-E 2 was released, and the media are also exaggerating the significance of these systems for the progress of general intelligence.

But is it really so? Gary Marcus, a well-known AI scholar and enthusiast who pours cold water on AI, expressed his "reservations."

Recently, he suggested that when evaluating progress in AGI, it is key to see whether systems like Dall-E, Imagen, Midjourney and Stable Diffusion truly understand the world and can reason based on this knowledge. and make decisions.

When judging the significance of these systems to AI (including narrow and broad AI), we can ask the following three questions:

Can the image synthesis system Generate high quality images?

Can they relate language input to the images they produce?

Do they understand the world behind the images they present?

1 AI does not understand the relationship between language and images

On the first question, the answer is yes. The only difference is that trained human artists can do a better job at using AI to generate images.

On the second question, the answer is not necessarily certain. These systems can perform well on certain language inputs. For example, the following picture is the "astronaut on a horse" generated by DALL-E 2:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

But in other cases On some language inputs, these AIs perform poorly and are easily fooled. For example, Marcus pointed out on Twitter some time ago that these systems have difficulty generating corresponding accurate images when faced with "a horse riding an astronaut":

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Although deep learning advocates have fiercely countered this, such as AI researcher Joscha Bach who believes that "Imagen may just use the wrong training set", machine learning professor Luca Ambrogioni counters that this shows that "Imagen already has a certain degree of common sense", so refuse to generate something ridiculous.

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

There is also a Google scientist Behnam Neyshabur who proposed that if "asked in the right way", Imagen can draw "a horse riding an astronaut":

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

However, Marcus believes that the key to the problem is not whether the system can generate images. Smart people can always find ways to make the system draw specific images, but these systems There is no deep understanding of the connection between language and images, which is the key.

2 Don’t know what a bicycle wheel is? How can it be called AGI?

The system's understanding of language is only one aspect. Marcus pointed out that the most important thing is that judging the contribution of systems such as DALL-E to AGI ultimately depends on the third question: If all the system can do is Converting many sentences into images in an accidental but stunning way, they may revolutionize human art, but still are not truly comparable to, and do not represent, AGI at all.

What makes Marcus despair about the ability of these systems to understand the world are recent examples, such as graphic designer Irina Blok’s “coffee cup with many holes” image generated using Imagen:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Normal people will think it goes against common sense after looking at this picture. It is impossible for coffee not to leak from the hole. Similar ones include:

"Bicycle with square wheels"

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI

"Toilet paper covered with cactus spines"

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Gary Marcus: The text-generated image system cannot understand the world and is still far from AGI

It is easy to say "yes" but difficult to say "no", who Can you know what a thing that doesn't exist should look like? This is where the difficulty lies in getting AI to draw the impossible.

But maybe, the system just "wanted" to draw a surreal image. As DeepMind research professor Michael Bronstein said, he didn't think that was a bad result. Instead, it was He can also draw like this.

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

#So how to finally solve this problem? Gary Marcus found new inspiration in a recent conversation with philosopher Dave Chalmers.

In order to understand the system's understanding of parts and wholes, and functions, Gary Marcus proposed a task to have a clearer idea of whether the system performance is correct, giving the text prompt "Sketch a bicycle and label the parts that roll on the ground" and "Sketch a ladder and label one of the parts you stand on" part).

The special thing about this test is that it does not directly give prompts such as "Draw a bicycle and mark the wheels" or "Draw a ladder and mark the pedals", but Letting AI deduce corresponding things from descriptions such as "parts rolling on the ground" and "parts standing" is a test of AI's ability to understand the world.

But Marcus’ test results show that Craiyon (formerly known as DALL-E mini) is terrible at this kind of thing. It does not understand what bicycle wheels and ladder pedals are:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

So is this a problem unique to DALL-E Mini?

Gary Marcus found that it was not the case. The same result also appeared in Stable Diffusion, the most popular text generation image system at present.

For example, let Stable Diffusion "Sketch a person and make the parts that hold things purple" (Sketch a person and make the parts that hold things purple), the result is:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Obviously, Stable Diffusion does not understand what human hands are.

And out of the next nine attempts, only one was successfully completed (in the upper right corner), and the accuracy was not high:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

The next test is, "Draw a white bicycle and turn the part pushed by the foot into orange", and the resulting image is:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

So it cannot understand what a bicycle pedal is.

And in the test of drawing "a sketch of the bicycle and marking the part rolling on the ground", its performance was not very good:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

If the text prompt contains a negative word, such as "Draw a white bicycle without wheels", the result is as follows:

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

This Indicates that the system does not understand negative logical relationships.

Even if it is as simple as "drawing a white bicycle with green wheels" that only focuses on the relationship between the part and the whole, and does not have complex syntax or functions, the results still have problems. ：

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

#So, Marcus asks, can a system that doesn’t understand what wheels are or what they are used for be considered a major step in artificial intelligence? Progress?

Today, Gary Marcus also issued a poll on this issue. He asked the question, "How much do systems such as Dall-E and Stable Diffusion know about the world they depict? ”

Among them, 86.1% of people think that systems do not understand the world well, and only 13.9% think that these systems understand the world to a high degree.

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

In response, Emad Mostique, CEO of Stability.AI, also responded that I voted for "not many" and admitted that "they are just puzzle pieces." A small piece of it."

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

Alexey Guzey from the scientific organization New Science also made a similar discovery to Marcus. He asked DALL-E to draw a bicycle , but the result is just a bunch of bike elements piled together.

Gary Marcus：文本生成图像系统理解不了世界，离 AGI 还差得远

#So he believes that there is no model that can truly understand what a bicycle is and how it works, and generating current ML models can almost rival or replace humans. Humans are ridiculous.

What do you think?

The above is the detailed content of Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Related Article

California Taps AI To Fast-Track Wildfire Recovery Permits

California Taps AI To Fast-Track Wildfire Recovery PermitsMay 04, 2025 am 11:10 AM

AI Streamlines Wildfire Recovery Permitting Australian tech firm Archistar's AI software, utilizing machine learning and computer vision, automates the assessment of building plans for compliance with local regulations. This pre-validation significan

What The US Can Learn From Estonia's AI-Powered Digital Government

What The US Can Learn From Estonia's AI-Powered Digital GovernmentMay 04, 2025 am 11:09 AM

Estonia's Digital Government: A Model for the US? The US struggles with bureaucratic inefficiencies, but Estonia offers a compelling alternative. This small nation boasts a nearly 100% digitized, citizen-centric government powered by AI. This isn't

Wedding Planning Via Generative AI

Wedding Planning Via Generative AIMay 04, 2025 am 11:08 AM

Planning a wedding is a monumental task, often overwhelming even the most organized couples. This article, part of an ongoing Forbes series on AI's impact (see link here), explores how generative AI can revolutionize wedding planning. The Wedding Pl

What Are Digital Defense AI Agents?

What Are Digital Defense AI Agents?May 04, 2025 am 11:07 AM

Businesses increasingly leverage AI agents for sales, while governments utilize them for various established tasks. However, consumer advocates highlight the need for individuals to possess their own AI agents as a defense against the often-targeted

A Business Leader's Guide To Generative Engine Optimization (GEO)

A Business Leader's Guide To Generative Engine Optimization (GEO)May 03, 2025 am 11:14 AM

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

This Startup Is Using AI Agents To Fight Malicious Ads And Impersonator Accounts

This Startup Is Using AI Agents To Fight Malicious Ads And Impersonator AccountsMay 03, 2025 am 11:13 AM

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

How World Models Are Radically Reshaping The Future Of Generative AI And LLMs

How World Models Are Radically Reshaping The Future Of Generative AI And LLMsMay 03, 2025 am 11:12 AM

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

May Day 2050: What Have We Left To Celebrate?

May Day 2050: What Have We Left To Celebrate?May 03, 2025 am 11:11 AM

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Dead Rails - How To Tame Wolves

3 weeks agoByDDD

Blue Prince: How To Get To The Basement

3 weeks agoByDDD

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

Hot Topics

1655

14

CakePHP Tutorial

1413

52

Laravel Tutorial

1306

25

1252

29

1226

24