Translator | Zhu Xianzhong

Reviewer|Sun Shujuan​

Deep learning neural networks have received a lot of attention recently, The reason is that it is Today's speech recognition, face detection, voice control, self-driving cars, The technology behind brain tumor detection technology was not part of our lives 20 years ago. Although these neural networks look complex, they learn just like humans do—by working through a variety of examples. However, the neural network uses a large number of data sets for training and is optimized through multiple network layers and multiple iterations in order to obtain the best computing results.

Over the past 20 years, the exponential growth in computing power and data volume has created the perfect development for deep learning neural networks condition. Although we stumble over fancy terms like machine learning and artificial intelligence; in reality, these techniques are nothing more than linear algebra and calculus combined with computation.

Frameworks such as Keras, PyTorch, and TensorFlow facilitate the difficult construction and training of custom deep neural networks , verification and deployment process. When it comes to creating deep learning applications in real life, these frameworks clearly become the first choice. Still, sometimes it’s crucial to take a step back and move forward, and by that I mean really understand what’s going on behind the scenes of the framework. In this article, we will do this by creating a deep neural network and applying it to an image classification problem using only the basic framework NumPy. You may get lost somewhere during the calculations, especially during backpropagation related to calculus, but don't worry. In the framework process, intuition about the process is more important than calculation.

In this article, we will build an image classification (cat or no cat) neural network, which will be trained using two sets of 1652 images . Among them, 852 images are cat images from the

Dog and Cat Image Dataset, and the other 800 images are from the Unsplash random image setRandom Image. To start, we first need to convert the image into an array, we will speed up the calculation by reducing the original size to 128x128 pixels, because if we keep the original shape, it will take a long time to train the model. All these 128x128 images have three color layers (red, green, and blue); when mixed, these colors reach the original color of the image. Each of the 128x128 pixels on each image has a red, green, and blue value ranging from 0 to 255, which are the values ​​in our image vector. Therefore, in our calculations we will be dealing with a total of 128x128x3 vectors for 1652 images. To run the above vector in a network, you need to reconstruct it by stacking three layers of colors into a single array, as shown in the image below. We will then get a (49152, 1652) size vector which will be used to train the model by using 1323 image vectors and test it by predicting the image classification (cat or no cat) using the trained model. After comparing these predictions with the true classification labels of the images, it will be possible to estimate the accuracy of the model.

Image 1Practical application of deep learning neural network for image classification


The process of converting an image to a vector

With the training vectors explained, it is now time to discuss the network architecture, as shown in Figure 2. Since 49152 values ​​were used in the training vector, the input layer of the model must have the same number of nodes (or neurons). Then, there are three hidden layers before the output layer, which will be the probability of a cat in this picture. In real-life models, there are usually more than 3 hidden layers because the network needs to be deeper to perform well in a big data environment.

In this article, we only use three hidden layers because they are good enough for simple classification models. Although the architecture only has 4 layers (the output layer is not counted), this code can create deeper neural networks by using the dimensions of the layers as parameters of the training function.

Practical application of deep learning neural network for image classification

Figure 2Network Architecture

Till now, we have explained the image vectors and the network architecture adopted; next, we will use the optimization algorithm in the gradient descent algorithm shown in Figure 3 describe. Again, don't worry if you can't complete all the steps right away as this article will go into detail about each step shown in its diagram later in the coding section.

Practical application of deep learning neural network for image classification

Figure 3: Training Process

First, we start the network parameters. These parameters are the weight (w) and bias (b) of each connection of the node shown in Image 2. In code, it's easier to understand how each weight and bias parameter works and how they are initialized. Later, when these parameters are initialized, it is time to run the forward propagation block and apply the sigmoid function on the last activation to obtain a probabilistic prediction.

#In our example, this is the probability that a cat appears in the photo. We then compared our predictions to the true label of the image (cat or not cat) via cross-entropy cost, a loss function widely used to optimize classification models. Finally, with the cost calculated, we return it through the backpropagation module to calculate its gradient with respect to the parameters w and b. Now that the gradients of the loss function with respect to w and b are known to us, the parameters can be updated by summing the individual gradients as they point in the direction of the values ​​of w and b that minimize the loss function.

Since the goal is to minimize the loss function, the loop should go through a predefined number of iterations, taking a small step toward the minimum value of the loss function. step. At some point, the parameters will stop changing because the gradient will tend to zero as the minimum approaches.

1. Load data​

import numpy as np
import pandas as pd
import os

from os.path import join
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from sklearn.model_selection import train_test_split

First, you need to load the library. In addition to using keras.preprprocessing.image to convert images into vectors, we only need to import three library modules: Numpy, Pandas and OS. On the other hand, we use sklearn.model_selection to split the image vector into two parts: training vector and test vector. .

cats_dir = "data\cats"
all_cats_path = [join(cats_dir,filename) for filename in os.listdir(cats_dir)]

images_dir = "data\random_images"
images_path = [join(images_dir,filename) for filename in os.listdir(images_dir)]

all_paths = all_cats_path + images_path

df = pd.DataFrame({
'path': all_paths,
'is_cat': [1 if path in all_cats_path else 0 for path in all_paths] })

Data must be loaded from two folders: cats and random_images. This can be done by getting all the filenames and building the path to each file. Then just merge all the file paths in the dataframe and create a conditional column "is_cat". If the path is in the cats folder, the value is 1; otherwise, the value is 0.

X = df.path
Y = df.is_cat

X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size=.2 , shuffle= True)

X_train = [load_img(img_path,target_size=(128,128)) for img_path in X_train]
X_train = np.array([img_to_array(img) for img in X_train])

X_test = [load_img(img_path,target_size=(128,128)) for img_path in X_test]
X_test = np.array([img_to_array(img) for img in X_test])


With the paths dataset in hand, it’s time to build our training and test vectors by splitting the images; 80% of them are for training and 20% for testing. Y represents the true label of the feature, while Finally, use the img_to_array function to convert the image to an array. These are the shapes of the X_train and X_test vectors:

Practical application of deep learning neural network for image classification


2. 初始化参数​

def initialize(layers_dimensions):

parameters = {}
L = len(layers_dimensions)

for l in range (1,L):
parameters['w' + str(l)] = np.random.randn(layers_dimensions[l],layers_dimensions[l-1]) / np.sqrt(layers_dimensions[l-1])
parameters['b' + str(l)] = np.zeros((layers_dimensions[l],1))

return parameters




Practical application of deep learning neural network for image classification




3. 正向传播​




def linear_forward(activation, weight, bias):

Z = np.dot(weight,activation) + bias

cache = (activation, weight, bias)

return Z, cache




Practical application of deep learning neural network for image classification


def sigmoid(Z):
activation = 1/ (1+ np.exp(-Z))
cache = Z
return activation, cache

def relu(Z):
activation = np.maximum(0,Z)
cache = Z
return activation, cache


def sigmoid_activation(previous_activation, weight, bias):

Z, linear_cache = linear_forward(previous_activation,weight, bias)

activation, activation_cache = sigmoid(Z)

cache = (linear_cache,activation_cache)

return activation, cache

def relu_activation(previous_activation, weight, bias):

Z, linear_cache = linear_forward(previous_activation,weight, bias)

activation, activation_cache = relu(Z)

cache = (linear_cache,activation_cache)

return activation, cache


def l_layer_model_forward(data, parameters):

caches = []
activation = data
n_layers = len(parameters)//2

for layer in range (1,n_layers):
previous_activation = activation

activation, cache = relu_activation(previous_activation,
weight = parameters['w' + str(layer)],
bias = parameters['b' + str(layer)])

last_activation, cache = sigmoid_activation(activation,
weight = parameters['w' + str(layer+1)],
bias = parameters['b' + str(layer+1)])

return last_activation, caches

4. 交叉熵损失函数​


Practical application of deep learning neural network for image classification



def cross_entropy_cost(last_activation,true_label):

m = true_label.shape[1]

cost = -1/m * np.sum(np.dot(true_label,np.log(last_activation).T) + np.dot(1-true_label, np.log(1-last_activation).T))

cost = np.squeeze(cost)

return cost

5. 反向传播​



Practical application of deep learning neural network for image classification





def linear_backward(dZ, cache):

previous_activation, weight, bias = cache
m = previous_activation.shape[1]

dw = 1/m * np.dot(dZ, previous_activation.T)
db = 1/m * np.sum(dZ, keepdims = True, axis = 1)
dpreviousactivation = np.dot(weight.T,dZ)

return dpreviousactivation, dw, db


Practical application of deep learning neural network for image classification


因此,必须首先计算Sigmoid函数和ReLU函数的导数。在ReLU中,如果该值为正,则导数为1;否则,未定义。但是,为了计算ReLU后向激活函数中的dZ,有可能只复制去激活向量(因为dactivation * 1 = dactivation),并在z为负时将dZ设置为0。对于Sigmoid函数s,其导数为s*(1-s),将该导数乘以去激活,矢量dZ在Sigmoid向后函数中实现。

def relu_backward(dactivation, cache):

Z = cache
dZ = np.array(dactivation, copy=True) 
dZ[Z <= 0] = 0

return dZ

def sigmoid_backward(dactivation, cache):
Z = cache

s = 1/(1+np.exp(-Z))
dZ = dactivation * s * (1-s)

return dZ


def linear_activation_backward(dactivation, cache, activation):

linear_cache, activation_cache = cache

if activation == 'relu':

dZ = relu_backward(dactivation, activation_cache)
dprevious_activation, dw, db = linear_backward(dZ,linear_cache)

elif activation == 'sigmoid':

dZ = sigmoid_backward(dactivation, activation_cache) 
dprevious_activation, dw, db = linear_backward(dZ,linear_cache)

return dprevious_activation, dw, db



Practical application of deep learning neural network for image classification



def l_layer_model_backward(last_activation, true_labels, caches):

gradients = {}
n_layers = len(caches)
true_labels = true_labels.reshape(last_activation.shape)
dlast_activation =-(np.divide(true_labels, last_activation) - np.divide(1 - true_labels, 1 - last_activation))

current_cache = caches[n_layers-1]
dprevious_activation, dw_temp, db_temp = linear_activation_backward(dlast_activation,current_cache,'sigmoid')
gradients["da" + str(n_layers-1)] = dprevious_activation
gradients["dw" + str(n_layers)] = dw_temp
gradients["db" + str(n_layers)] = db_temp

for layer in reversed(range(n_layers-1)):
current_cache = caches[layer]
dprevious_activation, dw_temp, db_temp = linear_activation_backward(gradients["da" + str(layer + 1)],current_cache,'relu')
gradients["da" + str(layer)] = dprevious_activation
gradients["dw" + str(layer+1)] = dw_temp
gradients["db" + str(layer+1)] = db_temp

return gradients




6. 参数更新​


def update_parameters(parameters, gradients, learning_rate):

parameters = parameters.copy()
n_layers = len(parameters) // 2 

for layer in range (n_layers):

parameters["w" + str(layer+1)] = parameters["w" + str(layer+1)]- learning_rate * gradients["dw" + str(layer+1)]
parameters["b" + str(layer+1)] = parameters["b" + str(layer+1)]- learning_rate * gradients["db" + str(layer+1)]

return parameters


7. 预处理矢量​



layers_dimensions = [49152, 20, 7, 5, 1]

X_train_flatten = X_train.reshape(X_train.shape[0], -1).T
X_test_flatten = X_test.reshape(X_test.shape[0], -1).T

X_train = X_train_flatten/255.
X_test = X_test_flatten/255.

y_train = np.array(y_train)
y_test = np.array(y_test)

Y_train = y_train.reshape(-1,1).T
Y_test = y_test.reshape(-1,1).T 

print(f'X train shape: {X_train.shape}')
print(f'Y train shape: {Y_train.shape}')
print(f'X test shape: {X_test.shape}')
print(f'Y test shape: {Y_test.shape}')


Practical application of deep learning neural network for image classification


8. 训练​


def l_layer_model(X, Y, layers_dimensions, learning_rate = 0.0075, iterations = 3000, print_cost=False):

costs = []

parameters = initialize(layers_dimensions)

for i in range(0, iterations):

last_activation, caches = l_layer_model_forward(X, parameters)

cost = cross_entropy_cost(last_activation, Y)

gradients = l_layer_model_backward(last_activation, Y, caches)

parameters = update_parameters(parameters, gradients, learning_rate)

if print_cost and i % 50 == 0 or i == iterations - 1:
print(f"Cost after iteration {i}: {np.squeeze(cost)}")
if i % 100 == 0 or i == iterations:

return parameters, costs




parameters, costs = l_layer_model(X_train, Y_train, layers_dimensions, iterations = 2500, print_cost = True)​



Practical application of deep learning neural network for image classification




9. 预测​


def predict(X, y, parameters):

m = X.shape[1]
p = np.zeros((1,m))

probs, _ = l_layer_model_forward(X, parameters)

for i in range(0, probs.shape[1]):
if probs[0,i] > 0.5:
p[0,i] = 1
p[0,i] = 0

print("Accuracy: "+ str(np.sum((p == y)/m)))
return p

pred_test = predict(X_test, Y_test , parameters)

Practical application of deep learning neural network for image classification






原文标题:Behind the Scenes of a Deep Learning Neural Network for Image Classification,作者:Bruno Caraffa​

