Variational Autoencoders: Theory and Implementation-AI-php.cn

Home

Technology peripherals

Variational Autoencoders: Theory and Implementation

PHPz

Jan 24, 2024 am 11:36 AM

machine learningArtificial neural networks

如何实现变分自动编码器变分自动编码器的原理和实现步骤

Variational Autoencoder (VAE) is a generative model based on neural networks. Its goal is to learn low-dimensional latent variable representations of high-dimensional data and use these latent variables for data reconstruction and generation. Compared with traditional autoencoders, VAE can generate more realistic and diverse samples by learning the distribution of the latent space. The implementation method of VAE will be introduced in detail below.

1. The basic principle of VAE

The basic idea of VAE is to map high-dimensional data to a low-dimensional latent space. Dimensionality reduction and reconstruction. It consists of two parts: encoder and decoder. The encoder maps the input data x to the mean μ and variance σ^2 of the latent space. In this way, VAE can sample the data in the latent space and reconstruct the sampled results into the original data through the decoder. This encoder-decoder structure enables VAE to generate new samples with good continuity in the latent space, making similar samples closer in the latent space. Therefore, VAE can not only be used for dimensionality reduction and

\begin{aligned}
\mu &=f_{\mu}(x)\
\sigma^2 &=f_{\sigma}(x)
\end{aligned}

where, f_{\mu} and f_{\sigma} can be any neural network model. Typically, we use a Multilayer Perceptron (MLP) to implement the encoder.

The decoder maps the latent variable z back to the original data space, that is:

x&#x27;=g(z)

Among them, g can also be any neural network model. Likewise, we usually use an MLP to implement the decoder.

In VAE, the latent variable $z$ is sampled from a prior distribution (usually Gaussian distribution), that is:

z\sim\mathcal{N}(0,I)

In this way, we VAE can be trained by minimizing the reconstruction error and the KL divergence of the latent variables, thereby achieving dimensionality reduction and generation of data. Specifically, the loss function of VAE can be expressed as:

\mathcal{L}=\mathbb{E}_{z\sim q(z|x)}[\log p(x|z)]-\beta\mathrm{KL}[q(z|x)||p(z)]

where, q(z|x) is the posterior distribution, that is, the conditional distribution of the latent variable z when the input x is given; p(x|z ) is the generating distribution, that is, the corresponding data distribution when the latent variable $z$ is given; p(z) is the prior distribution, that is, the marginal distribution of the latent variable z; \beta is a hyperparameter used to balance the reconstruction error and KL divergence.

By minimizing the above loss function, we can learn a transformation function f(x), which can map the input data x to the distribution q(z|x) of the latent space , and the latent variable z can be sampled from it, thereby achieving dimensionality reduction and generation of data.

2. VAE implementation steps

Below we will introduce how to implement a basic VAE model, including encoder, decoder and loss function definition. We take the MNIST handwritten digits data set as an example. This data set contains 60,000 training samples and 10,000 test samples, each sample is a 28x28 grayscale image.

2.1 Data preprocessing

First, we need to preprocess the MNIST data set to convert each sample into a 784-dimensional vector and normalize it to the range [0,1]. The code is as follows:

# python

import torch

import torchvision.transforms as transforms

from torchvision.datasets import MNIST

# 定义数据预处理

transform = transforms.Compose([
    transforms.ToTensor(),  # 将图像转换成Tensor格式
    transforms.Normalize(mean=(0.

2.2 Define the model structure

Next, we need to define the structure of the VAE model, including encoder, decoder and Sampling function of the latent variable. In this example, we use a two-layer MLP as the encoder and decoder, with the number of hidden units in each layer being 256 and 128 respectively. The dimension of the latent variable is 20. The code is as follows:

import torch.nn as nn

class VAE(nn.Module):
    def __init__(self, input_dim=784, hidden_dim=256, latent_dim=20):
        super(VAE, self).__init__()

        # 定义编码器的结构
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim//2),
            nn.ReLU(),
            nn.Linear(hidden_dim//2, latent_dim*2)  # 输出均值和方差
        )

        # 定义解码器的结构
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim//2),
            nn.ReLU(),
            nn.Linear(hidden_dim//2, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim),
            nn.Sigmoid()  # 输出范围在[0, 1]之间的概率
        )

    # 潜在变量的采样函数
    def sample_z(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    # 前向传播函数
    def forward(self, x):
        # 编码器
        h = self.encoder(x)
        mu, logvar = h[:, :latent_dim], h[:, latent_dim:]
        z = self.sample_z(mu, logvar)

        # 解码器
        x_hat = self.decoder(z)
        return x_hat, mu, logvar

In the above code, we use a two-layer MLP as the encoder and decoder. The encoder maps the input data to the mean and variance of the latent space, where the dimension of the mean is 20 and the dimension of the variance is also 20, which ensures that the dimension of the latent variable is 20. The decoder maps the latent variables back to the original data space, where the last layer uses the Sigmoid function to limit the output range to [0, 1].

When implementing the VAE model, we also need to define the loss function. In this example, we use the reconstruction error and KL divergence to define the loss function, where the reconstruction error uses the cross-entropy loss function and the KL divergence uses the standard normal distribution as the prior distribution. The code is as follows:

# 定义损失函数
def vae_loss(x_hat, x, mu, logvar, beta=1):
    # 重构误差
    recon_loss = nn.functional.binary_cross_entropy(x_hat, x, reduction=&#x27;sum&#x27;)

    # KL散度
    kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

    return recon_loss + beta*kl_loss

In the above code, we use the cross-entropy loss function to calculate the reconstruction error and the KL divergence to calculate the difference between the distribution of the latent variable and the prior distribution. Among them, \beta is a hyperparameter used to balance the reconstruction error and KL divergence.

2.3 Training model

Finally, we need to define the training function and train the VAE model on the MNIST data set. During the training process, we first need to calculate the loss function of the model, and then use the backpropagation algorithm to update the model parameters. The code is as follows:

# python
# 定义训练函数

def train(model, dataloader, optimizer, device, beta):
    model.train()
    train_loss = 0

for x, _ in dataloader:
    x = x.view(-1, input_dim).to(device)
    optimizer.zero_grad()
    x_hat, mu, logvar = model(x)
    loss = vae_loss(x_hat, x, mu, logvar, beta)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()

return train_loss / len(dataloader.dataset)

Now, we can use the above training function to train the VAE model on the MNIST data set. The code is as follows:

# 定义模型和优化器
model = VAE().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# 训练模型
num_epochs = 50
for epoch in range(num_epochs):
    train_loss = train(model, trainloader, optimizer, device, beta=1)
    print(f&#x27;Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}&#x27;)

# 测试模型
model.eval()
with torch.no_grad():
    test_loss = 0
    for x, _ in testloader:
        x = x.view(-1, input_dim).to(device)
        x_hat, mu, logvar = model(x)
        test_loss += vae_loss(x_hat, x, mu, logvar, beta=1).item()
    test_loss /= len(testloader.dataset)
    print(f&#x27;Test Loss: {test_loss:.4f}&#x27;)

During the training process, we use the Adam optimizer and the hyperparameter of \beta=1 to update the model parameters. After training is completed, we use the test set to calculate the loss function of the model. In this example, we use reconstruction error and KL divergence to calculate the loss function, so the smaller the test loss, the better the potential representation learned by the model, and the more realistic the generated samples are.

2.4 Generating samples

最后，我们可以使用VAE模型生成新的手写数字样本。生成样本的过程非常简单，只需要在潜在空间中随机采样，然后将采样结果输入到解码器中生成新的样本。代码如下：

# 生成新样本
n_samples = 10
with torch.no_grad():
    # 在潜在空间中随机采样
    z = torch.randn(n_samples, latent_dim).to(device)
    # 解码生成样本
    samples = model.decode(z).cpu()
    # 将样本重新变成图像的形状
    samples = samples.view(n_samples, 1, 28, 28)
    # 可视化生成的样本
    fig, axes = plt.subplots(1, n_samples, figsize=(20, 2))
    for i, ax in enumerate(axes):
        ax.imshow(samples[i][0], cmap=&#x27;gray&#x27;)
        ax.axis(&#x27;off&#x27;)
    plt.show()

在上述代码中，我们在潜在空间中随机采样10个点，然后将这些点输入到解码器中生成新的样本。最后，我们将生成的样本可视化展示出来，可以看到，生成的样本与MNIST数据集中的数字非常相似。

综上，我们介绍了VAE模型的原理、实现和应用，可以看到，VAE模型是一种非常强大的生成模型，可以学习到高维数据的潜在表示，并用潜在表示生成新的样本。

The above is the detailed content of Variational Autoencoders: Theory and Implementation. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:网易伏羲. If there is any infringement, please contact admin@php.cn delete

A Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver Mac version

Visual web development tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Hot Topics

Where is the login entrance for gmail email?

7520

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers