Home > Article > Technology peripherals > Improve Pytorch key points and improve the optimizer!
Hi, I’m Xiaozhuang!
Today we talk about the optimizer in Pytorch.
The choice of optimizer has a direct impact on the training effect and speed of the deep learning model. Different optimizers are suitable for different problems, and their performance differences may cause the model to converge faster and more stably, or perform better on a specific task. Therefore, when selecting an optimizer, trade-offs and decisions need to be made based on the characteristics of the specific problem.
Therefore, choosing the right optimizer is crucial for tuning deep learning models. The choice of optimizer will significantly affect not only the performance of the model, but also the efficiency of the training process.
PyTorch provides a variety of optimizers that can be used to train neural networks and update model weights. These optimizers include the common SGD, Adam, RMSprop, etc. Each optimizer has its unique characteristics and applicable scenarios. Choosing an appropriate optimizer can accelerate model convergence and improve training results. When using the optimizer, you need to set hyperparameters such as learning rate and weight decay, as well as define loss functions and model parameters.
Let us first list some commonly used optimizers in PyTorch and give a brief introduction to them:
Let’s understand how SGD (Stochastic Gradient Descent) works. SGD is a commonly used optimization algorithm used to solve the parameters of machine learning models. It estimates the gradient by randomly selecting a small batch of samples and uses the negative direction of the gradient to update the parameters. This allows the model's performance to be gradually optimized during an iterative process. The advantage of SGD is high computational efficiency, especially suitable for
Stochastic gradient descent is a commonly used optimization algorithm used to minimize the loss function. It works by calculating the gradient of the weights relative to the loss function and updating the weights in the negative direction of the gradient. This algorithm is widely used in machine learning and deep learning.
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
(2) Adam
Adam is an adaptive learning rate optimization algorithm that combines the ideas of AdaGrad and RMSProp. Compared with the traditional gradient descent algorithm, Adam can calculate different learning rates for each parameter to better adapt to the characteristics of different parameters. By adaptively adjusting the learning rate, Adam can improve the convergence speed and performance of the model.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
(3) Adagrad
Adagrad is an adaptive learning rate optimization algorithm that adjusts the learning rate based on the historical gradient of parameters. However, as the learning rate gradually decreases, training may stop prematurely.
optimizer = torch.optim.Adagrad(model.parameters(), lr=learning_rate)
(4) RMSProp
RMSProp is also an adaptive learning rate algorithm that adjusts the learning rate by considering the sliding average of the gradient.
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
(5) Adadelta
Adadelta is an adaptive learning rate optimization algorithm and an improved version of RMSProp. It dynamically adjusts learning by considering the moving average of the gradient and the moving average of the parameters. Rate.
optimizer = torch.optim.Adadelta(model.parameters(), lr=learning_rate)
Here, let’s talk about how to use PyTorch to train a simple convolutional neural network (CNN) for handwritten digit recognition.
This case uses the MNIST data set, and uses the Matplotlib library to draw the loss curve and accuracy curve.
import torchimport torch.nn as nnimport torch.optim as optimfrom torchvision import datasets, transformsfrom torch.utils.data import DataLoaderimport matplotlib.pyplot as plt# 设置随机种子torch.manual_seed(42)# 定义数据转换transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])# 下载和加载MNIST数据集train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)# 定义简单的卷积神经网络模型class CNN(nn.Module):def __init__(self):super(CNN, self).__init__()self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)self.relu = nn.ReLU()self.pool = nn.MaxPool2d(kernel_size=2, stride=2)self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)self.fc1 = nn.Linear(64 * 7 * 7, 128)self.fc2 = nn.Linear(128, 10)def forward(self, x):x = self.conv1(x)x = self.relu(x)x = self.pool(x)x = self.conv2(x)x = self.relu(x)x = self.pool(x)x = x.view(-1, 64 * 7 * 7)x = self.fc1(x)x = self.relu(x)x = self.fc2(x)return x# 创建模型、损失函数和优化器model = CNN()criterion = nn.CrossEntropyLoss()optimizer = optim.Adam(model.parameters(), lr=0.001)# 训练模型num_epochs = 5train_losses = []train_accuracies = []for epoch in range(num_epochs):model.train()total_loss = 0.0correct = 0total = 0for inputs, labels in train_loader:optimizer.zero_grad()outputs = model(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()total_loss += loss.item()_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()accuracy = correct / totaltrain_losses.append(total_loss / len(train_loader))train_accuracies.append(accuracy)print(f"Epoch {epoch+1}/{num_epochs}, Loss: {train_losses[-1]:.4f}, Accuracy: {accuracy:.4f}")# 绘制损失曲线和准确率曲线plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.plot(train_losses, label='Training Loss')plt.title('Training Loss')plt.xlabel('Epoch')plt.ylabel('Loss')plt.legend()plt.subplot(1, 2, 2)plt.plot(train_accuracies, label='Training Accuracy')plt.title('Training Accuracy')plt.xlabel('Epoch')plt.ylabel('Accuracy')plt.legend()plt.tight_layout()plt.show()# 在测试集上评估模型model.eval()correct = 0total = 0with torch.no_grad():for inputs, labels in test_loader:outputs = model(inputs)_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()accuracy = correct / totalprint(f"Accuracy on test set: {accuracy * 100:.2f}%")
In the above code, we define a simple convolutional neural network (CNN), trained using cross-entropy loss and Adam optimizer.
During the training process, we recorded the loss and accuracy of each epoch, and used the Matplotlib library to draw the loss curve and accuracy curve.
I’m Xiao Zhuang, see you next time!
The above is the detailed content of Improve Pytorch key points and improve the optimizer!. For more information, please follow other related articles on the PHP Chinese website!