The first layer of convolution layer 1, the number of convolution kernels is 96; the second layer of convolution layer 2, the number of convolutions is 256; the third layer of convolution 3, the input is the second The output of the layer, the number of convolution kernels is 384; the fourth layer of convolution 4, the input is the output of the third layer, the number of convolution kernels is 384; the fifth layer of convolution 5, the input is the output of the fourth layer, The number of convolution kernels is 256.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
The AlexNet network was designed by Hinton, the 2012 ImageNet competition champion, and his student Alex Krizhevsky. After that year, more and deeper neural networks were proposed, such as the excellent vgg and GoogleLeNet. The accuracy of its officially provided data model reaches 57.1%, and top 1-5 reaches 80.2%. This is already quite outstanding for traditional machine learning classification algorithms.
Network structure analysis
The above picture shows the network structure of alexnet in caffe, using two GPU servers, everything will be seen Two flowcharts. The network model of AlexNet is interpreted as follows:
The interpretation is as follows:
第一层:卷积层1,输入为 224×224×3 224 \times 224 \times 3224×224×3的图像,卷积核的数量为96,论文中两片GPU分别计算48个核; 卷积核的大小为 11×11×3 11 \times 11 \times 311×11×3; stride = 4, stride表示的是步长, pad = 0, 表示不扩充边缘;卷积后的图形大小是怎样的呢? wide = (224 + 2 * padding - kernel_size) / stride + 1 = 54height = (224 + 2 * padding - kernel_size) / stride + 1 = 54dimention = 96然后进行 (Local Response Normalized), 后面跟着池化pool_size = (3, 3), stride = 2, pad = 0 最终获得第一层卷积的feature map最终第一层卷积的输出为 第二层:卷积层2, 输入为上一层卷积的feature map, 卷积的个数为256个,论文中的两个GPU分别有128个卷积核。卷积核的大小为:5×5×48 5 \times 5 \times 485×5×48; pad = 2, stride = 1; 然后做 LRN, 最后 max_pooling, pool_size = (3, 3), stride = 2; 第三层:卷积3, 输入为第二层的输出,卷积核个数为384, kernel_size = (3×3×256 3 \times 3 \times 2563×3×256), padding = 1, 第三层没有做LRN和Pool 第四层:卷积4, 输入为第三层的输出,卷积核个数为384, kernel_size = (3×3 3 \times 33×3), padding = 1, 和第三层一样,没有LRN和Pool 第五层:卷积5, 输入为第四层的输出,卷积核个数为256, kernel_size = (3×3 3 \times 33×3), padding = 1。然后直接进行max_pooling, pool_size = (3, 3), stride = 2;第6,7,8层是全连接层,每一层的神经元的个数为4096,最终输出softmax为1000,因为上面介绍过,ImageNet这个比赛的分类个数为1000。全连接层中使用了RELU和Dropout。
Use the drawing tool that comes with caffe ( caffe/python/draw_net.py) and the network structure diagram drawn by train_val.prototxt under the caffe/models/bvlc_alexnet/ directory is as follows:
python3 draw_net.py --rankdir TB ../models/bvlc_alexnet/train_val.prototxt AlexNet_structure.jpg
Algorithm Innovation Point
(1) Successfully used ReLU as the activation function of CNN, and verified that its effect exceeded Sigmoid in deeper networks, and successfully solved the gradient dispersion problem of Sigmoid in deeper networks. Although the ReLU activation function was proposed a long time ago, it was not until the emergence of AlexNet that it was carried forward.
(2) Use Dropout to randomly ignore some neurons during training to avoid model overfitting. Although Dropout has been discussed in a separate paper, AlexNet has made it practical and its effect has been confirmed through practice. In AlexNet, Dropout is mainly used in the last few fully connected layers.
(3) Use overlapping max pooling in CNN. Previously, average pooling was commonly used in CNN, and AlexNet all used maximum pooling to avoid the blurring effect of average pooling. In addition, AlexNet proposes that the step length is smaller than the size of the pooling kernel, so that there will be overlap and coverage between the outputs of the pooling layer, which improves the richness of features.
(4) The LRN layer is proposed to create a competition mechanism for the activity of local neurons, making the value with a larger response become relatively larger, and inhibiting other neurons with smaller feedback, enhancing the model generalization ability.
(5) Multi-GPU training can increase the scale of network training.
(6) Million-level ImageNet data image input. There are three Data Augmentation methods used in AlexNet:
Translation transformation (crop);
Reflection transformation (flip);
Illumination and color transformation (color jittering): First randomly translate the picture, and then flip it horizontally. When testing , first perform 5 translation transformations on the upper left, upper right, lower left, lower right and middle, and then average the results after flipping.
The summary is:
Use ReLU activation function;
Propose Dropout to prevent overfitting;
Use data augmentation to enhance the data set (Data augmentation);
Horizontal flipping of images, random cropping, translation transformation, color transformation, lighting transformation, etc.
Split the result of the upper layer into 2 parts according to the channel dimension and send them to 2 GPUs respectively. For example, the 27×27×96 pixel layer output by the previous layer (divided into two groups of 27×27×48 The pixel layer is placed in two different GPUs for operation);
Use of LRN local normalization;
Use Overlapping pooling (3*3 pooling kernel).
Training under the Caffe framework
Prepare the data set, modify the train.prototxt of the Alexnet network, configure the solver, deploy.prototxt file, and create a new train .sh script to start training.
For more computer-related knowledge, please visit the FAQ column!
The above is the detailed content of Detailed explanation of alexnet network structure. For more information, please follow other related articles on the PHP Chinese website!