Home >Technology peripherals >AI >Variational Autoencoders: Theory and Implementation
Variational Autoencoder (VAE) is a generative model based on neural networks. Its goal is to learn low-dimensional latent variable representations of high-dimensional data and use these latent variables for data reconstruction and generation. Compared with traditional autoencoders, VAE can generate more realistic and diverse samples by learning the distribution of the latent space. The implementation method of VAE will be introduced in detail below.
The basic idea of VAE is to map high-dimensional data to a low-dimensional latent space. Dimensionality reduction and reconstruction. It consists of two parts: encoder and decoder. The encoder maps the input data x to the mean μ and variance σ^2 of the latent space. In this way, VAE can sample the data in the latent space and reconstruct the sampled results into the original data through the decoder. This encoder-decoder structure enables VAE to generate new samples with good continuity in the latent space, making similar samples closer in the latent space. Therefore, VAE can not only be used for dimensionality reduction and
\begin{aligned} \mu &=f_{\mu}(x)\ \sigma^2 &=f_{\sigma}(x) \end{aligned}
where, f_{\mu} and f_{\sigma} can be any neural network model. Typically, we use a Multilayer Perceptron (MLP) to implement the encoder.
The decoder maps the latent variable z back to the original data space, that is:
x'=g(z)
Among them, g can also be any neural network model. Likewise, we usually use an MLP to implement the decoder.
In VAE, the latent variable $z$ is sampled from a prior distribution (usually Gaussian distribution), that is:
z\sim\mathcal{N}(0,I)
In this way, we VAE can be trained by minimizing the reconstruction error and the KL divergence of the latent variables, thereby achieving dimensionality reduction and generation of data. Specifically, the loss function of VAE can be expressed as:
\mathcal{L}=\mathbb{E}_{z\sim q(z|x)}[\log p(x|z)]-\beta\mathrm{KL}[q(z|x)||p(z)]
where, q(z|x) is the posterior distribution, that is, the conditional distribution of the latent variable z when the input x is given; p(x|z ) is the generating distribution, that is, the corresponding data distribution when the latent variable $z$ is given; p(z) is the prior distribution, that is, the marginal distribution of the latent variable z; \beta is a hyperparameter used to balance the reconstruction error and KL divergence.
By minimizing the above loss function, we can learn a transformation function f(x), which can map the input data x to the distribution q(z|x) of the latent space , and the latent variable z can be sampled from it, thereby achieving dimensionality reduction and generation of data.
Below we will introduce how to implement a basic VAE model, including encoder, decoder and loss function definition. We take the MNIST handwritten digits data set as an example. This data set contains 60,000 training samples and 10,000 test samples, each sample is a 28x28 grayscale image.
First, we need to preprocess the MNIST data set to convert each sample into a 784-dimensional vector and normalize it to the range [0,1]. The code is as follows:
# python import torch import torchvision.transforms as transforms from torchvision.datasets import MNIST # 定义数据预处理 transform = transforms.Compose([ transforms.ToTensor(), # 将图像转换成Tensor格式 transforms.Normalize(mean=(0.
Next, we need to define the structure of the VAE model, including encoder, decoder and Sampling function of the latent variable. In this example, we use a two-layer MLP as the encoder and decoder, with the number of hidden units in each layer being 256 and 128 respectively. The dimension of the latent variable is 20. The code is as follows:
import torch.nn as nn class VAE(nn.Module): def __init__(self, input_dim=784, hidden_dim=256, latent_dim=20): super(VAE, self).__init__() # 定义编码器的结构 self.encoder = nn.Sequential( nn.Linear(input_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim//2), nn.ReLU(), nn.Linear(hidden_dim//2, latent_dim*2) # 输出均值和方差 ) # 定义解码器的结构 self.decoder = nn.Sequential( nn.Linear(latent_dim, hidden_dim//2), nn.ReLU(), nn.Linear(hidden_dim//2, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, input_dim), nn.Sigmoid() # 输出范围在[0, 1]之间的概率 ) # 潜在变量的采样函数 def sample_z(self, mu, logvar): std = torch.exp(0.5*logvar) eps = torch.randn_like(std) return mu + eps*std # 前向传播函数 def forward(self, x): # 编码器 h = self.encoder(x) mu, logvar = h[:, :latent_dim], h[:, latent_dim:] z = self.sample_z(mu, logvar) # 解码器 x_hat = self.decoder(z) return x_hat, mu, logvar
In the above code, we use a two-layer MLP as the encoder and decoder. The encoder maps the input data to the mean and variance of the latent space, where the dimension of the mean is 20 and the dimension of the variance is also 20, which ensures that the dimension of the latent variable is 20. The decoder maps the latent variables back to the original data space, where the last layer uses the Sigmoid function to limit the output range to [0, 1].
When implementing the VAE model, we also need to define the loss function. In this example, we use the reconstruction error and KL divergence to define the loss function, where the reconstruction error uses the cross-entropy loss function and the KL divergence uses the standard normal distribution as the prior distribution. The code is as follows:
# 定义损失函数 def vae_loss(x_hat, x, mu, logvar, beta=1): # 重构误差 recon_loss = nn.functional.binary_cross_entropy(x_hat, x, reduction='sum') # KL散度 kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) return recon_loss + beta*kl_loss
In the above code, we use the cross-entropy loss function to calculate the reconstruction error and the KL divergence to calculate the difference between the distribution of the latent variable and the prior distribution. Among them, \beta is a hyperparameter used to balance the reconstruction error and KL divergence.
Finally, we need to define the training function and train the VAE model on the MNIST data set. During the training process, we first need to calculate the loss function of the model, and then use the backpropagation algorithm to update the model parameters. The code is as follows:
# python # 定义训练函数 def train(model, dataloader, optimizer, device, beta): model.train() train_loss = 0 for x, _ in dataloader: x = x.view(-1, input_dim).to(device) optimizer.zero_grad() x_hat, mu, logvar = model(x) loss = vae_loss(x_hat, x, mu, logvar, beta) loss.backward() train_loss += loss.item() optimizer.step() return train_loss / len(dataloader.dataset)
Now, we can use the above training function to train the VAE model on the MNIST data set. The code is as follows:
# 定义模型和优化器 model = VAE().to(device) optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) # 训练模型 num_epochs = 50 for epoch in range(num_epochs): train_loss = train(model, trainloader, optimizer, device, beta=1) print(f'Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}') # 测试模型 model.eval() with torch.no_grad(): test_loss = 0 for x, _ in testloader: x = x.view(-1, input_dim).to(device) x_hat, mu, logvar = model(x) test_loss += vae_loss(x_hat, x, mu, logvar, beta=1).item() test_loss /= len(testloader.dataset) print(f'Test Loss: {test_loss:.4f}')
During the training process, we use the Adam optimizer and the hyperparameter of \beta=1 to update the model parameters. After training is completed, we use the test set to calculate the loss function of the model. In this example, we use reconstruction error and KL divergence to calculate the loss function, so the smaller the test loss, the better the potential representation learned by the model, and the more realistic the generated samples are.
最后,我们可以使用VAE模型生成新的手写数字样本。生成样本的过程非常简单,只需要在潜在空间中随机采样,然后将采样结果输入到解码器中生成新的样本。代码如下:
# 生成新样本 n_samples = 10 with torch.no_grad(): # 在潜在空间中随机采样 z = torch.randn(n_samples, latent_dim).to(device) # 解码生成样本 samples = model.decode(z).cpu() # 将样本重新变成图像的形状 samples = samples.view(n_samples, 1, 28, 28) # 可视化生成的样本 fig, axes = plt.subplots(1, n_samples, figsize=(20, 2)) for i, ax in enumerate(axes): ax.imshow(samples[i][0], cmap='gray') ax.axis('off') plt.show()
在上述代码中,我们在潜在空间中随机采样10个点,然后将这些点输入到解码器中生成新的样本。最后,我们将生成的样本可视化展示出来,可以看到,生成的样本与MNIST数据集中的数字非常相似。
综上,我们介绍了VAE模型的原理、实现和应用,可以看到,VAE模型是一种非常强大的生成模型,可以学习到高维数据的潜在表示,并用潜在表示生成新的样本。
The above is the detailed content of Variational Autoencoders: Theory and Implementation. For more information, please follow other related articles on the PHP Chinese website!