Recently, the new image generation model released by Google and OpenAI have attracted widespread attention, and its core technology is completely different from previous models. Ethan Mollick's article in One Useful Thing explores the working mechanisms of these new models and their impact on human users. This article will interpret Mollick's views.
The potential of multimodal image generation
Mollick pointed out that traditional image generation systems are the product of the collaborative work of multiple models, and not a single model completes all tasks.
"In the past, large language model (LLM) generated images were not done directly by LLM. AI would send text prompts to independent image generation tools and then display the results. AI was responsible for creating text prompts, while another system with weaker capabilities was responsible for generating images."
The diffusion model has become a thing of the past
Old models rely mainly on diffusion model work. The working principle of the diffusion model is: introduce the image into noise, perform abstraction processing, and then remove the noise to generate an image that matches the prompt in the computer's known image library.
However, the limitation of this method is that the generated images lack the model's own reasoning and judgment, and are just a simple combination of existing image libraries and cannot provide valuable information.
Advantages of multimodal control
Today, the emergence of multimodal control technology has completely changed this situation.
Mollick gave an example: prompting the model to generate "a room without an elephant and mark the reason". Traditional models generate images containing elephants because it cannot understand the context of the prompt. The generated text may also be meaningless or even contain fictional characters, because the model's understanding of letters also stems from training data.
The multimodal model can accurately generate images that meet the requirements and add comments, such as "the door is too small", explaining why there are no elephants in the room.
Tip Challenges from Traditional Models
A significant drawback of traditional models is that once it is required to exclude an element, it will instead contain that element because it cannot understand the instructions. In addition, each modification or adjustment changes the basic structure of the image. For example, modifying a character's hat may lead to a complete change in the character's image.
The multimodal image generation model can make subtle adjustments on the basis of retaining the original results.
Environmental maintenance
Mollick also shows another example: an otter holding a specific item in one hand and then appears in a different context and in a different style. This demonstrates the fine integration capabilities of multimodal image generators.
Complete presentation
Mollick also shows how to design a complete presentation using multimodal models, such as a recommendation about guacamole. Just provide simple instructions, and the model can search for relevant information on the Internet, integrate it, and generate the final result.
As Mollick said, this will quickly lead to the replacement of many human work. We need to seriously consider establishing a corresponding framework.
The above is the detailed content of Mollick Presents The Meaning Of New Image Generation Models. For more information, please follow other related articles on the PHP Chinese website!

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver Mac version
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.