The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun-AI-php.cn

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

王林

Apr 12, 2023 pm 11:58 PM

AIArt

Let the image generation system DALL·E 2 created by OpenAI draw a picture of "a goldfish sipping Coca-Cola on the beach" and it will spit out a surreal image. The program encountered images of beaches, goldfish, and Coca-Cola during training, but it was unlikely to see images of all three at the same time. However, DALL·E 2 could combine these concepts into something that might have made Dalí proud.

DALL·E 2 is a generative model - a system that attempts to use training data to generate new things that rival the data in quality and diversity. This is one of the most difficult problems in machine learning, and getting to this point has been a tough journey.

The first important image generation model used an artificial intelligence method called a neural network - a program composed of multiple layers of computational units called artificial neurons . But even as their image quality got better, the models proved unreliable and difficult to train. Meanwhile, a powerful generative model—created by a postdoctoral researcher with a passion for physics— lay dormant until two graduate students made a technological breakthrough that brought the beast back to life.

DALL·E 2 is such a beast. The key insights that make DALL·E 2’s images possible, as well as those of its competitors Stable Diffusion and Imagen, come from the world of physics. The systems that underpin them are called diffusion models and are heavily inspired by non-equilibrium thermodynamics, which governs phenomena such as fluid and gas diffusion. “There are a lot of techniques originally invented by physicists that are now very important in machine learning,” said Yang Song, a machine learning researcher at OpenAI.

The power of these models shocked the industry and users. “This is an exciting time for generative models,” said Anima Anandkumar, a computer scientist at the California Institute of Technology and senior director of machine learning research at Nvidia.

While the realistic images created by diffusion models sometimes perpetuate social and cultural biases, she said, “We have shown that generative models are useful for downstream tasks, [which] improve predictions Fairness of artificial intelligence models.」

High probability

To understand how to create data for images, let’s start with just two phases Start with a simple image consisting of adjacent grayscale pixels. We can fully describe this image with two values based on the shade of each pixel (from 0 for full black to 255 for full white). You can use these two values to plot the image as a point in 2D space.

If we plot multiple images as points, clustering may occur - some images and their corresponding pixel values appear more frequently than others. Now imagine that there is a curved surface above the plane, with the height of the surface corresponding to the density of the clusters. This surface plots a probability distribution. You are most likely to find a single data point below the highest part of the surface, and rarely below the lowest part of the surface.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

DALL·E 2 created these images of "goldfish sipping Coca-Cola on the beach". This program, created by OpenAI, may have never encountered similar images, but can still generate them on its own.

Now you can use this probability distribution to generate new images. All you need to do is randomly generate new data points, while adhering to the constraints of generating more possible data more often - a process called "sampling" the distribution. Each new point is a new image.

The same analysis applies to more realistic grayscale photos, such as one million pixels each. Only now, instead of two axes, drawing each image requires a million. The probability distribution for such an image would be some complex million-plus-one-dimensional surface. If you sample this distribution, you will produce a million pixel values. Print these pixels on a piece of paper and the image will most likely look like a photo from the original dataset.

The challenge of generative modeling is to learn this complex probability distribution for some set of images that make up the training data. The distribution is useful partly because it captures a broad range of information about the data, and partly because researchers can combine probability distributions from different types of data, such as text and images, to compose ultra-realistic outputs, such as a goldfish sipping on a beach Drink Coca-Cola. "You can mix and match different concepts... to create completely new scenarios that have never been seen in the training data," Anandkumar said.

In 2014, a model called a generative adversarial network (GAN) became the first to generate realistic images. "It's so exciting," Anandkumar said. But GANs are difficult to train: they may not learn the full probability distribution, and may only generate images from a subset of the distribution. For example, a GAN trained on images of various animals might only generate images of dogs.

Machine learning requires a more powerful model. Jascha Sohl-Dickstein, whose work is inspired by physics, will provide an answer.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Jascha Sohl-Dickstein.

Excited Spot

Before and after the invention of GAN, Sohl-Dickstein was a postdoc at Stanford University, studying generative models, Also interested in non-equilibrium thermodynamics. This branch of physics studies systems that are not in thermal equilibrium—those that exchange matter and energy internally and with their environment.

An illustrative example is a drop of blue ink spreading through a container of water. At first, it forms a black spot in one place. At this point, if you want to calculate the probability of finding ink molecules in some small volume of the container, you need a probability distribution that clearly models the initial state before the ink starts to spread. But this distribution is complex, making it difficult to sample from it.

Eventually, however, the ink spreads throughout the water, turning the water a light blue. This allows for a simpler, more uniform probability distribution of molecules described by simple mathematical expressions. Nonequilibrium thermodynamics describes the probability distribution at each step in the diffusion process. Crucially, each step is reversible - with small enough steps, you can go back from a simple distribution to a complex distribution.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Jascha Sohl-Dickstein created a new generative modeling approach based on diffusion principles. ——Asako Miyakawa

Sohl-Dickstein developed generative modeling algorithms using diffusion principles. The idea is simple: The algorithm first converts the complex images in the training data set into simple noise—similar to changing from a drop of ink to a diffuse light blue of water—and then teaches the system how to reverse the process, converting the noise into for images.

Here's how it works. First, the algorithm obtains images from the training set. As before, assuming that each of the million pixels has some value, we can plot the image as a point in a million-dimensional space. The algorithm adds some noise to each pixel at each time step, equivalent to the spread of ink after a small time step. As this process continues, the pixel values become less and less related to their values in the original image, and the pixels look more like a simple noise distribution. (The algorithm also nudges each pixel value every time step a little towards the origin, which is the zero value on all these axes. This nudge prevents the pixel values from becoming too large for the computer to handle easily.)

Doing this for all images in the dataset, the initial complex distribution of points in a million-dimensional space (which cannot be easily described and sampled) becomes a simple, normal distribution around the origin point.

Sohl-Dickstein said: "Transformation sequence very slowly turns your data distribution into a big ball of noise." This "forward process" gives you a sample that can be easily sampled Distribution.

Next comes the machine learning part: feed the neural network the noisy images obtained from the forward pass and train it to predict less noisy images that appeared one step earlier. It makes mistakes at first, so you adjust the parameters of the network to make it do better. Ultimately, neural networks can reliably convert noisy images representing samples from simple distributions all the way to images representing samples from complex distributions.

The trained network is a mature generative model. Now you don't even need the original image to do the forward pass: you have a complete mathematical description of the simple distribution, so you can sample directly from it. The neural network can turn this sample—which is essentially just static—into a final image that resembles the images in the training data set.

Sohl-Dickstein recalls the first output of his diffusion model. "You squint and say, 'I think that colored blob looks like a truck,'" he said. "I spent many months staring at different pixel patterns, trying to see a structure that I liked, [and this is more organized than I've ever gotten before.] I'm super excited."

Looking ahead

Sohl-Dickstein published his diffusion model algorithm in 2015, but it still lags far behind the capabilities of GANs. While the diffusion model can sample the entire distribution and never spit out just a subset of the image, the image looks worse and the process is too slow. "I don't think it was exciting at the time," Sohl-Dickstein said.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Paper address: https://doi.org/10.48550/arXiv.1503.03585

It took two students who knew neither Sohl-Dickstein nor each other to connect the dots from the original work to modern diffusion models such as DALL·E 2. The first was Song, then a doctoral student at Stanford University. In 2019, he and his mentor published a new method for building generative models that does not estimate probability distributions of data (high-dimensional surfaces). Instead, it estimates the gradient of the distribution (think of it as the slope of a high-dimensional surface).

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Yang Song helped propose a new technique for generating images by training a network to efficiently interpret noisy images.

Song found that if he first perturbed each image in the training dataset with increasing noise levels and then had his neural network predict the original image using the gradient of the distribution, it worked To denoise it, his technique has the best effect. Once trained, his neural network can draw noisy images from a simple distribution and gradually convert them back into images representative of the training data set. The image quality is great, but his machine learning model is very slow to sample. And he did it without knowing anything about Sohl-Dickstein's work. "I didn't know anything about diffusion models," Song said. "After our 2019 paper was published, I received an email from Jascha. He pointed out to me that [our models] were very closely related."

2020 A second student saw these connections and realized that Song's work could improve Sohl-Dickstein's diffusion model. Jonathan Ho recently completed his PhD research in generative modeling at the University of California, Berkeley, but is still continuing his research. "I think this is the most mathematically beautiful subdiscipline of machine learning," he said.

Ho redesigned and updated Sohl-Dickstein's diffusion model using some of Song's ideas and other advances in the field of neural networks. “I knew that in order to get the community’s attention, I needed the model to generate beautiful samples,” he said. "I was convinced it was the most important thing I could do at that time."

His intuition was correct. Ho and colleagues announced this new and improved diffusion model in a 2020 paper titled "Denoising Probabilistic Diffusion Models." It quickly became such a landmark that researchers now refer to it simply as DDPM. On an image quality benchmark that compares the distribution of generated images to the distribution of training images, these models matched or exceeded all competing generative models, including GANs. It didn't take long for big companies to take notice. Today, DALL·E 2, Stable Diffusion, Imagen, and other commercial models use some variation of DDPM.

Jonathan Ho and colleagues combined the methods of Sohl-Dickstein and Song to enable modern diffusion models such as DALL· E2.

Modern diffusion models also have a key element: large language models (LLMs), such as GPT-3. These are generative models trained on Internet text to learn probability distributions over words rather than images. In 2021, Ho (now a research scientist at a stealth company) and his colleague Tim Salimans at Google Research and other groups elsewhere showed how to combine information from LLM and image-generating diffusion models using text (e.g., " Goldfish Sipping Coca-Cola on the Beach") to guide the diffusion process and thus image generation. This "guided diffusion" process is behind the success of text-to-image models such as DALL·E 2.

"They far exceeded my wildest expectations," Ho said. "I'm not going to pretend I saw it all." Images of its peers are still far from perfect. Large language models can reflect cultural and social biases, such as racism and sexism, in the text they generate. That's because they're trained on texts lifted from the internet, often containing racist and sexist language. LLMs that learn probability distributions on such texts are fraught with the same biases. Diffusion models are also trained on uncurated images taken from the internet, which may contain similarly biased data. It’s no wonder that combining an LL.M. with today’s communication models sometimes produces images that reflect social ills.

Anandkumar has personal experience. She was shocked when she tried generating a stylized avatar of herself using an application based on diffusion models. "So [many] images are highly sexualized," she said, "and what it presents to men is not." She's not alone.

These biases can be reduced by sorting and filtering the data (an extremely difficult task given the sheer size of the dataset) or by examining the input cues and outputs of these models . "Of course, there's no substitute for careful and extensive security testing" of a model, Ho said. “This is an important challenge for the field.”

Despite these concerns, Anandkumar still believes in the power of generative modeling. “I really like Richard Feynman’s quote: ‘What I can’t create, I don’t understand,’” she says. The increased understanding allows her team to develop generative models that, for example, generate synthetic training data for underrepresented classes for prediction tasks, such as darker skin tones for facial recognition, helping to improve fairness. Generative models can also give us insights into how our brains process noisy inputs, or how they evoke mental images and consider future actions. Building more complex models could give AI similar capabilities.

Anandkumar said: "I think we are just beginning to explore the possibilities of generative artificial intelligence."

The above is the detailed content of The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Let's Dance: Structured Movement To Fine-Tune Our Human Neural NetsApr 27, 2025 am 11:09 AM

Scientists have extensively studied human and simpler neural networks (like those in C. elegans) to understand their functionality. However, a crucial question arises: how do we adapt our own neural networks to work effectively alongside novel AI s

New Google Leak Reveals Subscription Changes For Gemini AIApr 27, 2025 am 11:08 AM

Google's Gemini Advanced: New Subscription Tiers on the Horizon Currently, accessing Gemini Advanced requires a $19.99/month Google One AI Premium plan. However, an Android Authority report hints at upcoming changes. Code within the latest Google P

How Data Analytics Acceleration Is Solving AI's Hidden BottleneckApr 27, 2025 am 11:07 AM

Despite the hype surrounding advanced AI capabilities, a significant challenge lurks within enterprise AI deployments: data processing bottlenecks. While CEOs celebrate AI advancements, engineers grapple with slow query times, overloaded pipelines, a

MarkItDown MCP Can Convert Any Document into Markdowns!Apr 27, 2025 am 09:47 AM

Handling documents is no longer just about opening files in your AI projects, it’s about transforming chaos into clarity. Docs such as PDFs, PowerPoints, and Word flood our workflows in every shape and size. Retrieving structured

How to Use Google ADK for Building Agents? - Analytics VidhyaApr 27, 2025 am 09:42 AM

Harness the power of Google's Agent Development Kit (ADK) to create intelligent agents with real-world capabilities! This tutorial guides you through building conversational agents using ADK, supporting various language models like Gemini and GPT. W

Use of SLM over LLM for Effective Problem Solving - Analytics VidhyaApr 27, 2025 am 09:27 AM

summary: Small Language Model (SLM) is designed for efficiency. They are better than the Large Language Model (LLM) in resource-deficient, real-time and privacy-sensitive environments. Best for focus-based tasks, especially where domain specificity, controllability, and interpretability are more important than general knowledge or creativity. SLMs are not a replacement for LLMs, but they are ideal when precision, speed and cost-effectiveness are critical. Technology helps us achieve more with fewer resources. It has always been a promoter, not a driver. From the steam engine era to the Internet bubble era, the power of technology lies in the extent to which it helps us solve problems. Artificial intelligence (AI) and more recently generative AI are no exception

How to Use Google Gemini Models for Computer Vision Tasks? - Analytics VidhyaApr 27, 2025 am 09:26 AM

Harness the Power of Google Gemini for Computer Vision: A Comprehensive Guide Google Gemini, a leading AI chatbot, extends its capabilities beyond conversation to encompass powerful computer vision functionalities. This guide details how to utilize

Gemini 2.0 Flash vs o4-mini: Can Google Do Better Than OpenAI?Apr 27, 2025 am 09:20 AM

The AI landscape of 2025 is electrifying with the arrival of Google's Gemini 2.0 Flash and OpenAI's o4-mini. These cutting-edge models, launched weeks apart, boast comparable advanced features and impressive benchmark scores. This in-depth compariso

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

Atom editor mac version download

The most popular open source editor

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Dreamweaver CS6

Visual web development tools

SublimeText3 Chinese version

Chinese version, very easy to use

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7758

1644

1399

1293

1234