


Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI
This article is reproduced from Lei Feng.com. If you need to reprint, please go to the official website of Lei Feng.com to apply for authorization.
Since the advent of DALL-E 2, many people have believed that AI capable of drawing realistic images is a big step towards artificial general intelligence (AGI). OpenAI CEO Sam Altman once declared "AGI is going to be wild" when DALL-E 2 was released, and the media are also exaggerating the significance of these systems for the progress of general intelligence.
But is it really so? Gary Marcus, a well-known AI scholar and enthusiast who pours cold water on AI, expressed his "reservations."
Recently, he suggested that when evaluating progress in AGI, it is key to see whether systems like Dall-E, Imagen, Midjourney and Stable Diffusion truly understand the world and can reason based on this knowledge. and make decisions.
When judging the significance of these systems to AI (including narrow and broad AI), we can ask the following three questions:
Can the image synthesis system Generate high quality images?
Can they relate language input to the images they produce?
Do they understand the world behind the images they present?
1 AI does not understand the relationship between language and images
On the first question, the answer is yes. The only difference is that trained human artists can do a better job at using AI to generate images.
On the second question, the answer is not necessarily certain. These systems can perform well on certain language inputs. For example, the following picture is the "astronaut on a horse" generated by DALL-E 2:
But in other cases On some language inputs, these AIs perform poorly and are easily fooled. For example, Marcus pointed out on Twitter some time ago that these systems have difficulty generating corresponding accurate images when faced with "a horse riding an astronaut":
Although deep learning advocates have fiercely countered this, such as AI researcher Joscha Bach who believes that "Imagen may just use the wrong training set", machine learning professor Luca Ambrogioni counters that this shows that "Imagen already has a certain degree of common sense", so refuse to generate something ridiculous.
There is also a Google scientist Behnam Neyshabur who proposed that if "asked in the right way", Imagen can draw "a horse riding an astronaut":
However, Marcus believes that the key to the problem is not whether the system can generate images. Smart people can always find ways to make the system draw specific images, but these systems There is no deep understanding of the connection between language and images, which is the key.
2 Don’t know what a bicycle wheel is? How can it be called AGI?
The system's understanding of language is only one aspect. Marcus pointed out that the most important thing is that judging the contribution of systems such as DALL-E to AGI ultimately depends on the third question: If all the system can do is Converting many sentences into images in an accidental but stunning way, they may revolutionize human art, but still are not truly comparable to, and do not represent, AGI at all.
What makes Marcus despair about the ability of these systems to understand the world are recent examples, such as graphic designer Irina Blok’s “coffee cup with many holes” image generated using Imagen:
Normal people will think it goes against common sense after looking at this picture. It is impossible for coffee not to leak from the hole. Similar ones include:
"Bicycle with square wheels"
Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI
"Toilet paper covered with cactus spines"
Gary Marcus: The text-generated image system cannot understand the world and is still far from AGI
It is easy to say "yes" but difficult to say "no", who Can you know what a thing that doesn't exist should look like? This is where the difficulty lies in getting AI to draw the impossible.
But maybe, the system just "wanted" to draw a surreal image. As DeepMind research professor Michael Bronstein said, he didn't think that was a bad result. Instead, it was He can also draw like this.
#So how to finally solve this problem? Gary Marcus found new inspiration in a recent conversation with philosopher Dave Chalmers.
In order to understand the system's understanding of parts and wholes, and functions, Gary Marcus proposed a task to have a clearer idea of whether the system performance is correct, giving the text prompt "Sketch a bicycle and label the parts that roll on the ground" and "Sketch a ladder and label one of the parts you stand on" part).
The special thing about this test is that it does not directly give prompts such as "Draw a bicycle and mark the wheels" or "Draw a ladder and mark the pedals", but Letting AI deduce corresponding things from descriptions such as "parts rolling on the ground" and "parts standing" is a test of AI's ability to understand the world.
But Marcus’ test results show that Craiyon (formerly known as DALL-E mini) is terrible at this kind of thing. It does not understand what bicycle wheels and ladder pedals are:
So is this a problem unique to DALL-E Mini?
Gary Marcus found that it was not the case. The same result also appeared in Stable Diffusion, the most popular text generation image system at present.
For example, let Stable Diffusion "Sketch a person and make the parts that hold things purple" (Sketch a person and make the parts that hold things purple), the result is:
Obviously, Stable Diffusion does not understand what human hands are.
And out of the next nine attempts, only one was successfully completed (in the upper right corner), and the accuracy was not high:
The next test is, "Draw a white bicycle and turn the part pushed by the foot into orange", and the resulting image is:
So it cannot understand what a bicycle pedal is.
And in the test of drawing "a sketch of the bicycle and marking the part rolling on the ground", its performance was not very good:
If the text prompt contains a negative word, such as "Draw a white bicycle without wheels", the result is as follows:
This Indicates that the system does not understand negative logical relationships.
Even if it is as simple as "drawing a white bicycle with green wheels" that only focuses on the relationship between the part and the whole, and does not have complex syntax or functions, the results still have problems. :
#So, Marcus asks, can a system that doesn’t understand what wheels are or what they are used for be considered a major step in artificial intelligence? Progress?
Today, Gary Marcus also issued a poll on this issue. He asked the question, "How much do systems such as Dall-E and Stable Diffusion know about the world they depict? ”
Among them, 86.1% of people think that systems do not understand the world well, and only 13.9% think that these systems understand the world to a high degree.
In response, Emad Mostique, CEO of Stability.AI, also responded that I voted for "not many" and admitted that "they are just puzzle pieces." A small piece of it."
Alexey Guzey from the scientific organization New Science also made a similar discovery to Marcus. He asked DALL-E to draw a bicycle , but the result is just a bunch of bike elements piled together.
#So he believes that there is no model that can truly understand what a bicycle is and how it works, and generating current ML models can almost rival or replace humans. Humans are ridiculous.
What do you think?
The above is the detailed content of Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI. For more information, please follow other related articles on the PHP Chinese website!

AI Streamlines Wildfire Recovery Permitting Australian tech firm Archistar's AI software, utilizing machine learning and computer vision, automates the assessment of building plans for compliance with local regulations. This pre-validation significan

Estonia's Digital Government: A Model for the US? The US struggles with bureaucratic inefficiencies, but Estonia offers a compelling alternative. This small nation boasts a nearly 100% digitized, citizen-centric government powered by AI. This isn't

Planning a wedding is a monumental task, often overwhelming even the most organized couples. This article, part of an ongoing Forbes series on AI's impact (see link here), explores how generative AI can revolutionize wedding planning. The Wedding Pl

Businesses increasingly leverage AI agents for sales, while governments utilize them for various established tasks. However, consumer advocates highlight the need for individuals to possess their own AI agents as a defense against the often-targeted

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Chinese version
Chinese version, very easy to use

Dreamweaver CS6
Visual web development tools

Atom editor mac version download
The most popular open source editor
