The Alexnet network structure model proposed by Alex in 2012 triggered a craze in the application of neural networks and won the championship in the 2012 Image Recognition Competition, making CNN the core algorithm model in image classification.
AlexNet The model is divided into eight layers, 5 convolutional layers, and 3 fully connected layers. Each convolutional layer contains excitation Function RELU and local response normalization (LRN) processing, and then downsampling (pool processing). (Recommended learning: web front-end video tutorial)
The first layer: Convolutional layer 1, the input is 224×224×3224 \times 224 \ times 3224×224×3 image, the number of convolution kernels is 96. In the paper, the two GPUs calculate 48 kernels respectively; the size of the convolution kernel is 11×11×311 \times 11 \times 311×11×3; stride = 4, stride represents the step size, pad = 0, represents not expanding the edge;
What is the size of the graphic after convolution?
wide = (224 + 2 * padding - kernel_size) / stride + 1 = 54<br/>height = (224 + 2 * padding - kernel_size) / stride + 1 = 54<br/>dimention = 96<br/>
Then proceed (Local Response Normalized), followed by pooling pool_size = (3, 3), stride = 2, pad = 0 to finally obtain the feature map of the first layer of convolution
Second layer: Convolutional layer 2, the input is the feature map of the previous layer of convolution, the number of convolutions is 256, and the two GPUs in the paper each have 128 convolutions nuclear. The size of the convolution kernel is: 5×5×485 \times 5 \times 485×5×48; pad = 2, stride = 1; then do LRN, and finally max_pooling, pool_size = (3, 3), stride = 2;
The third layer: Convolution 3, the input is the output of the second layer, the number of convolution kernels is 384, kernel_size = (3×3×2563 \times 3 \times 2563 ×3×256), padding = 1, the third layer does not do LRN and Pool
Fourth layer: Convolution 4, the input is the output of the third layer, and the convolution kernel The number is 384, kernel_size = (3×33 \times 33×3), padding = 1, the same as the third layer, without LRN and Pool
The fifth layer: Convolution 5 , the input is the output of the fourth layer, the number of convolution kernels is 256, kernel_size = (3×33 \times 33×3), padding = 1. Then proceed directly to max_pooling, pool_size = (3, 3), stride = 2;
The 6th, 7th, and 8th layers are fully connected layers, the number of neurons in each layer is 4096, and the final output softmax is 1000, because as mentioned above, the number of categories in the ImageNet competition is 1000. RELU and Dropout are used in the fully connected layer.
The above is the detailed content of Detailed explanation of alexnet network structure. For more information, please follow other related articles on the PHP Chinese website!