search
HomeTechnology peripheralsAI'Censored' during image generation: Failure cases of stable diffusion are affected by four major factors

Text-to-image diffusion generation models, such as Stable Diffusion, DALL-E 2 and mid-journey, have been in a state of vigorous development and have strong text-to-image generation capabilities, but "overturned ” Cases will occasionally appear.

As shown in the figure below, when given a text prompt: "A photo of a warthog", the Stable Diffusion model can generate a corresponding, clear and realistic photo of a warthog. However, when we slightly modify this text prompt and change it to: "A photo of a warthog and a traitor", what about the warthog? How did it become a car?

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

Let’s take a look at the next few examples. What new species are these?

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

What causes these strange phenomena? These generation failure cases all come from a recently published paper "Stable Diffusion is Unstable":

Censored during image generation: Failure cases of stable diffusion are affected by four major factors


  • Paper address: https://arxiv.org/abs/2306.02583

In this paper A gradient-based adversarial algorithm for text-to-image models is proposed for the first time. This algorithm can efficiently and effectively generate a large number of offensive text prompts, and can effectively explore the instability of the Stable diffusion model. This algorithm achieved an attack success rate of 91.1% on short text prompts and 81.2% on long text prompts. In addition, this algorithm provides rich cases for studying the failure modes of text-to-image generation models, laying a foundation for research on the controllability of image generation.

Based on a large number of generation failure cases generated by this algorithm, the researcher summarized four reasons for generation failure, which are:

  • Difference in generation speed
  • Similarity of coarse-grained features
  • Ambiguity of words
  • The position of the word in the prompt

Difference in generation speed

When a prompt (prompt) contains multiple generation targets, we often encounter There is an issue where a certain target disappears during the generation process. Theoretically, all targets within the same cue should share the same initial noise. As shown in Figure 4, the researchers generated one thousand category targets on ImageNet under the condition of fixed initial noise. They used the last image generated by each target as a reference image and calculated the Structural Similarity Index (SSIM) score between the image generated at each time step and the image generated at the last step to demonstrate the different targets. Differences in build speed.

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

Coarse-grained feature similarity

In the diffusion generation process, the researcher found that when When there is global or local coarse-grained feature similarity between two types of targets, problems will arise when calculating cross attention weights. This is because the two target nouns may focus on the same block of the same picture at the same time, resulting in feature entanglement. For example, in Figure 6, feather and silver salmon have certain similarities in coarse-grained features, which results in feather being able to continue to complete its generation task in the eighth step of the generation process based on silver salmon. For two types of targets without entanglement, such as silver salmon and magician, magician cannot complete its generation task on the intermediate step image based on silver salmon.

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

Polysemy

In this chapter, researchers explore in depth what happens when a word has multiple meanings time generation. What they found was that, without any outside perturbation, the resulting image often represented a specific meaning of the word. Take "warthog" as an example. The first line in Figure A4 is generated based on the meaning of the word "warthog".

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

However, researchers also found that when other words are injected into the original prompt , which may cause semantic shifts. For example, when the word "traitor" is introduced in a prompt describing "warthog", the generated image content may deviate from the original meaning of "warthog" and generate entirely new content.

The position of the word in prompt

In Figure 10, the researcher observed an interesting phenomenon. Although from a human perspective, the prompts arranged in different orders generally have the same meaning, they are all describing a picture of a cat, clogs, and a pistol. However, for the language model, that is, the CLIP text encoder, the order of the words affects its understanding of the text to a certain extent, which in turn changes the content of the generated images. This phenomenon shows that although our descriptions are semantically consistent, the model may produce different understanding and generation results due to the different order of words. This not only reveals that the way models process language and understands semantics is different from humans, but also reminds us that we need to pay more attention to the impact of word order when designing and using such models.

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

Model structure

As shown in Figure 1 below, without changing the original target noun in the prompt Under the premise, the researcher continuousizes the discrete process of word replacement or expansion by learning the Gumbel Softmax distribution, thereby ensuring the differentiability of perturbation generation. After generating the image, the CLIP classifier and margin loss are used to optimize ω, aiming to generate CLIP For images that cannot be correctly classified, in order to ensure that offensive cues have a certain similarity with clean cues, researchers have further used semantic similarity constraints and text fluency constraints.

Once this distribution is learned, the algorithm is able to sample multiple text prompts with attack effects for the same clean text prompt.

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

Censored during image generation: Failure cases of stable diffusion are affected by four major factors

# See the original article for more details.

The above is the detailed content of 'Censored' during image generation: Failure cases of stable diffusion are affected by four major factors. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Reading The AI Index 2025: Is AI Your Friend, Foe, Or Co-Pilot?Reading The AI Index 2025: Is AI Your Friend, Foe, Or Co-Pilot?Apr 11, 2025 pm 12:13 PM

The 2025 Artificial Intelligence Index Report released by the Stanford University Institute for Human-Oriented Artificial Intelligence provides a good overview of the ongoing artificial intelligence revolution. Let’s interpret it in four simple concepts: cognition (understand what is happening), appreciation (seeing benefits), acceptance (face challenges), and responsibility (find our responsibilities). Cognition: Artificial intelligence is everywhere and is developing rapidly We need to be keenly aware of how quickly artificial intelligence is developing and spreading. Artificial intelligence systems are constantly improving, achieving excellent results in math and complex thinking tests, and just a year ago they failed miserably in these tests. Imagine AI solving complex coding problems or graduate-level scientific problems – since 2023

Getting Started With Meta Llama 3.2 - Analytics VidhyaGetting Started With Meta Llama 3.2 - Analytics VidhyaApr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and MoreAV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and MoreApr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

The Human Cost Of Talking To Machines: Can A Chatbot Really Care?The Human Cost Of Talking To Machines: Can A Chatbot Really Care?Apr 11, 2025 pm 12:00 PM

The comforting illusion of connection: Are we truly flourishing in our relationships with AI? This question challenged the optimistic tone of MIT Media Lab's "Advancing Humans with AI (AHA)" symposium. While the event showcased cutting-edg

Understanding SciPy Library in PythonUnderstanding SciPy Library in PythonApr 11, 2025 am 11:57 AM

Introduction Imagine you're a scientist or engineer tackling complex problems – differential equations, optimization challenges, or Fourier analysis. Python's ease of use and graphics capabilities are appealing, but these tasks demand powerful tools

3 Methods to Run Llama 3.2 - Analytics Vidhya3 Methods to Run Llama 3.2 - Analytics VidhyaApr 11, 2025 am 11:56 AM

Meta's Llama 3.2: A Multimodal AI Powerhouse Meta's latest multimodal model, Llama 3.2, represents a significant advancement in AI, boasting enhanced language comprehension, improved accuracy, and superior text generation capabilities. Its ability t

Automating Data Quality Checks with DagsterAutomating Data Quality Checks with DagsterApr 11, 2025 am 11:44 AM

Data Quality Assurance: Automating Checks with Dagster and Great Expectations Maintaining high data quality is critical for data-driven businesses. As data volumes and sources increase, manual quality control becomes inefficient and prone to errors.

Do Mainframes Have A Role In The AI Era?Do Mainframes Have A Role In The AI Era?Apr 11, 2025 am 11:42 AM

Mainframes: The Unsung Heroes of the AI Revolution While servers excel at general-purpose applications and handling multiple clients, mainframes are built for high-volume, mission-critical tasks. These powerful systems are frequently found in heavil

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.