


Let the image generation system DALL·E 2 created by OpenAI draw a picture of "a goldfish sipping Coca-Cola on the beach" and it will spit out a surreal image. The program encountered images of beaches, goldfish, and Coca-Cola during training, but it was unlikely to see images of all three at the same time. However, DALL·E 2 could combine these concepts into something that might have made Dalí proud.
DALL·E 2 is a generative model - a system that attempts to use training data to generate new things that rival the data in quality and diversity. This is one of the most difficult problems in machine learning, and getting to this point has been a tough journey.
The first important image generation model used an artificial intelligence method called a neural network - a program composed of multiple layers of computational units called artificial neurons . But even as their image quality got better, the models proved unreliable and difficult to train. Meanwhile, a powerful generative model—created by a postdoctoral researcher with a passion for physics— lay dormant until two graduate students made a technological breakthrough that brought the beast back to life.
DALL·E 2 is such a beast. The key insights that make DALL·E 2’s images possible, as well as those of its competitors Stable Diffusion and Imagen, come from the world of physics. The systems that underpin them are called diffusion models and are heavily inspired by non-equilibrium thermodynamics, which governs phenomena such as fluid and gas diffusion. “There are a lot of techniques originally invented by physicists that are now very important in machine learning,” said Yang Song, a machine learning researcher at OpenAI.
The power of these models shocked the industry and users. “This is an exciting time for generative models,” said Anima Anandkumar, a computer scientist at the California Institute of Technology and senior director of machine learning research at Nvidia.
While the realistic images created by diffusion models sometimes perpetuate social and cultural biases, she said, “We have shown that generative models are useful for downstream tasks, [which] improve predictions Fairness of artificial intelligence models.」
High probability
To understand how to create data for images, let’s start with just two phases Start with a simple image consisting of adjacent grayscale pixels. We can fully describe this image with two values based on the shade of each pixel (from 0 for full black to 255 for full white). You can use these two values to plot the image as a point in 2D space.
If we plot multiple images as points, clustering may occur - some images and their corresponding pixel values appear more frequently than others. Now imagine that there is a curved surface above the plane, with the height of the surface corresponding to the density of the clusters. This surface plots a probability distribution. You are most likely to find a single data point below the highest part of the surface, and rarely below the lowest part of the surface.
DALL·E 2 created these images of "goldfish sipping Coca-Cola on the beach". This program, created by OpenAI, may have never encountered similar images, but can still generate them on its own.
Now you can use this probability distribution to generate new images. All you need to do is randomly generate new data points, while adhering to the constraints of generating more possible data more often - a process called "sampling" the distribution. Each new point is a new image.
The same analysis applies to more realistic grayscale photos, such as one million pixels each. Only now, instead of two axes, drawing each image requires a million. The probability distribution for such an image would be some complex million-plus-one-dimensional surface. If you sample this distribution, you will produce a million pixel values. Print these pixels on a piece of paper and the image will most likely look like a photo from the original dataset.
The challenge of generative modeling is to learn this complex probability distribution for some set of images that make up the training data. The distribution is useful partly because it captures a broad range of information about the data, and partly because researchers can combine probability distributions from different types of data, such as text and images, to compose ultra-realistic outputs, such as a goldfish sipping on a beach Drink Coca-Cola. "You can mix and match different concepts... to create completely new scenarios that have never been seen in the training data," Anandkumar said.
In 2014, a model called a generative adversarial network (GAN) became the first to generate realistic images. "It's so exciting," Anandkumar said. But GANs are difficult to train: they may not learn the full probability distribution, and may only generate images from a subset of the distribution. For example, a GAN trained on images of various animals might only generate images of dogs.
Machine learning requires a more powerful model. Jascha Sohl-Dickstein, whose work is inspired by physics, will provide an answer.
Jascha Sohl-Dickstein.
Excited Spot
Before and after the invention of GAN, Sohl-Dickstein was a postdoc at Stanford University, studying generative models, Also interested in non-equilibrium thermodynamics. This branch of physics studies systems that are not in thermal equilibrium—those that exchange matter and energy internally and with their environment.
An illustrative example is a drop of blue ink spreading through a container of water. At first, it forms a black spot in one place. At this point, if you want to calculate the probability of finding ink molecules in some small volume of the container, you need a probability distribution that clearly models the initial state before the ink starts to spread. But this distribution is complex, making it difficult to sample from it.
Eventually, however, the ink spreads throughout the water, turning the water a light blue. This allows for a simpler, more uniform probability distribution of molecules described by simple mathematical expressions. Nonequilibrium thermodynamics describes the probability distribution at each step in the diffusion process. Crucially, each step is reversible - with small enough steps, you can go back from a simple distribution to a complex distribution.
Jascha Sohl-Dickstein created a new generative modeling approach based on diffusion principles. ——Asako Miyakawa
Sohl-Dickstein developed generative modeling algorithms using diffusion principles. The idea is simple: The algorithm first converts the complex images in the training data set into simple noise—similar to changing from a drop of ink to a diffuse light blue of water—and then teaches the system how to reverse the process, converting the noise into for images.
Here's how it works. First, the algorithm obtains images from the training set. As before, assuming that each of the million pixels has some value, we can plot the image as a point in a million-dimensional space. The algorithm adds some noise to each pixel at each time step, equivalent to the spread of ink after a small time step. As this process continues, the pixel values become less and less related to their values in the original image, and the pixels look more like a simple noise distribution. (The algorithm also nudges each pixel value every time step a little towards the origin, which is the zero value on all these axes. This nudge prevents the pixel values from becoming too large for the computer to handle easily.)
Doing this for all images in the dataset, the initial complex distribution of points in a million-dimensional space (which cannot be easily described and sampled) becomes a simple, normal distribution around the origin point.
Sohl-Dickstein said: "Transformation sequence very slowly turns your data distribution into a big ball of noise." This "forward process" gives you a sample that can be easily sampled Distribution.
Next comes the machine learning part: feed the neural network the noisy images obtained from the forward pass and train it to predict less noisy images that appeared one step earlier. It makes mistakes at first, so you adjust the parameters of the network to make it do better. Ultimately, neural networks can reliably convert noisy images representing samples from simple distributions all the way to images representing samples from complex distributions.
The trained network is a mature generative model. Now you don't even need the original image to do the forward pass: you have a complete mathematical description of the simple distribution, so you can sample directly from it. The neural network can turn this sample—which is essentially just static—into a final image that resembles the images in the training data set.
Sohl-Dickstein recalls the first output of his diffusion model. "You squint and say, 'I think that colored blob looks like a truck,'" he said. "I spent many months staring at different pixel patterns, trying to see a structure that I liked, [and this is more organized than I've ever gotten before.] I'm super excited."
Looking ahead
Sohl-Dickstein published his diffusion model algorithm in 2015, but it still lags far behind the capabilities of GANs. While the diffusion model can sample the entire distribution and never spit out just a subset of the image, the image looks worse and the process is too slow. "I don't think it was exciting at the time," Sohl-Dickstein said.
Paper address: https://doi.org/10.48550/arXiv.1503.03585
It took two students who knew neither Sohl-Dickstein nor each other to connect the dots from the original work to modern diffusion models such as DALL·E 2. The first was Song, then a doctoral student at Stanford University. In 2019, he and his mentor published a new method for building generative models that does not estimate probability distributions of data (high-dimensional surfaces). Instead, it estimates the gradient of the distribution (think of it as the slope of a high-dimensional surface).
Yang Song helped propose a new technique for generating images by training a network to efficiently interpret noisy images.
Song found that if he first perturbed each image in the training dataset with increasing noise levels and then had his neural network predict the original image using the gradient of the distribution, it worked To denoise it, his technique has the best effect. Once trained, his neural network can draw noisy images from a simple distribution and gradually convert them back into images representative of the training data set. The image quality is great, but his machine learning model is very slow to sample. And he did it without knowing anything about Sohl-Dickstein's work. "I didn't know anything about diffusion models," Song said. "After our 2019 paper was published, I received an email from Jascha. He pointed out to me that [our models] were very closely related."
2020 A second student saw these connections and realized that Song's work could improve Sohl-Dickstein's diffusion model. Jonathan Ho recently completed his PhD research in generative modeling at the University of California, Berkeley, but is still continuing his research. "I think this is the most mathematically beautiful subdiscipline of machine learning," he said.
Ho redesigned and updated Sohl-Dickstein's diffusion model using some of Song's ideas and other advances in the field of neural networks. “I knew that in order to get the community’s attention, I needed the model to generate beautiful samples,” he said. "I was convinced it was the most important thing I could do at that time."
His intuition was correct. Ho and colleagues announced this new and improved diffusion model in a 2020 paper titled "Denoising Probabilistic Diffusion Models." It quickly became such a landmark that researchers now refer to it simply as DDPM. On an image quality benchmark that compares the distribution of generated images to the distribution of training images, these models matched or exceeded all competing generative models, including GANs. It didn't take long for big companies to take notice. Today, DALL·E 2, Stable Diffusion, Imagen, and other commercial models use some variation of DDPM.
Jonathan Ho and colleagues combined the methods of Sohl-Dickstein and Song to enable modern diffusion models such as DALL· E2.
Modern diffusion models also have a key element: large language models (LLMs), such as GPT-3. These are generative models trained on Internet text to learn probability distributions over words rather than images. In 2021, Ho (now a research scientist at a stealth company) and his colleague Tim Salimans at Google Research and other groups elsewhere showed how to combine information from LLM and image-generating diffusion models using text (e.g., " Goldfish Sipping Coca-Cola on the Beach") to guide the diffusion process and thus image generation. This "guided diffusion" process is behind the success of text-to-image models such as DALL·E 2.
"They far exceeded my wildest expectations," Ho said. "I'm not going to pretend I saw it all." Images of its peers are still far from perfect. Large language models can reflect cultural and social biases, such as racism and sexism, in the text they generate. That's because they're trained on texts lifted from the internet, often containing racist and sexist language. LLMs that learn probability distributions on such texts are fraught with the same biases. Diffusion models are also trained on uncurated images taken from the internet, which may contain similarly biased data. It’s no wonder that combining an LL.M. with today’s communication models sometimes produces images that reflect social ills.
Anandkumar has personal experience. She was shocked when she tried generating a stylized avatar of herself using an application based on diffusion models. "So [many] images are highly sexualized," she said, "and what it presents to men is not." She's not alone.
These biases can be reduced by sorting and filtering the data (an extremely difficult task given the sheer size of the dataset) or by examining the input cues and outputs of these models . "Of course, there's no substitute for careful and extensive security testing" of a model, Ho said. “This is an important challenge for the field.”
Despite these concerns, Anandkumar still believes in the power of generative modeling. “I really like Richard Feynman’s quote: ‘What I can’t create, I don’t understand,’” she says. The increased understanding allows her team to develop generative models that, for example, generate synthetic training data for underrepresented classes for prediction tasks, such as darker skin tones for facial recognition, helping to improve fairness. Generative models can also give us insights into how our brains process noisy inputs, or how they evoke mental images and consider future actions. Building more complex models could give AI similar capabilities.
Anandkumar said: "I think we are just beginning to explore the possibilities of generative artificial intelligence."
The above is the detailed content of The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun. For more information, please follow other related articles on the PHP Chinese website!

机器学习是一个不断发展的学科,一直在创造新的想法和技术。本文罗列了2023年机器学习的十大概念和技术。 本文罗列了2023年机器学习的十大概念和技术。2023年机器学习的十大概念和技术是一个教计算机从数据中学习的过程,无需明确的编程。机器学习是一个不断发展的学科,一直在创造新的想法和技术。为了保持领先,数据科学家应该关注其中一些网站,以跟上最新的发展。这将有助于了解机器学习中的技术如何在实践中使用,并为自己的业务或工作领域中的可能应用提供想法。2023年机器学习的十大概念和技术:1. 深度神经网

本文将详细介绍用来提高机器学习效果的最常见的超参数优化方法。 译者 | 朱先忠审校 | 孙淑娟简介通常,在尝试改进机器学习模型时,人们首先想到的解决方案是添加更多的训练数据。额外的数据通常是有帮助(在某些情况下除外)的,但生成高质量的数据可能非常昂贵。通过使用现有数据获得最佳模型性能,超参数优化可以节省我们的时间和资源。顾名思义,超参数优化是为机器学习模型确定最佳超参数组合以满足优化函数(即,给定研究中的数据集,最大化模型的性能)的过程。换句话说,每个模型都会提供多个有关选项的调整“按钮

实现自我完善的过程是“机器学习”。机器学习是人工智能核心,是使计算机具有智能的根本途径;它使计算机能模拟人的学习行为,自动地通过学习来获取知识和技能,不断改善性能,实现自我完善。机器学习主要研究三方面问题:1、学习机理,人类获取知识、技能和抽象概念的天赋能力;2、学习方法,对生物学习机理进行简化的基础上,用计算的方法进行再现;3、学习系统,能够在一定程度上实现机器学习的系统。

截至3月20日的数据显示,自微软2月7日推出其人工智能版本以来,必应搜索引擎的页面访问量增加了15.8%,而Alphabet旗下的谷歌搜索引擎则下降了近1%。 3月23日消息,外媒报道称,分析公司Similarweb的数据显示,在整合了OpenAI的技术后,微软旗下的必应在页面访问量方面实现了更多的增长。截至3月20日的数据显示,自微软2月7日推出其人工智能版本以来,必应搜索引擎的页面访问量增加了15.8%,而Alphabet旗下的谷歌搜索引擎则下降了近1%。这些数据是微软在与谷歌争夺生

荣耀的人工智能助手叫“YOYO”,也即悠悠;YOYO除了能够实现语音操控等基本功能之外,还拥有智慧视觉、智慧识屏、情景智能、智慧搜索等功能,可以在系统设置页面中的智慧助手里进行相关的设置。

人工智能在教育领域的应用主要有个性化学习、虚拟导师、教育机器人和场景式教育。人工智能在教育领域的应用目前还处于早期探索阶段,但是潜力却是巨大的。

阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。 阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。使用 Python 和 C

人工智能在生活中的应用有:1、虚拟个人助理,使用者可通过声控、文字输入的方式,来完成一些日常生活的小事;2、语音评测,利用云计算技术,将自动口语评测服务放在云端,并开放API接口供客户远程使用;3、无人汽车,主要依靠车内的以计算机系统为主的智能驾驶仪来实现无人驾驶的目标;4、天气预测,通过手机GPRS系统,定位到用户所处的位置,在利用算法,对覆盖全国的雷达图进行数据分析并预测。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Atom editor mac version download
The most popular open source editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.
