search
HomeBackend DevelopmentPython TutorialOptimizing Your Neural Networks

Optimizing Your Neural Networks

Oct 13, 2024 am 06:15 AM

Last week I posted an article about how to build simple neural networks, specifically multi-layer perceptrons. This article will dive deeper into the specifics of neural networks to discuss how we can maximize the performance of a neural network by tweaking its configurations.

How Long to Train Your Model For

When training a model, you might think that if you train your model enough, the model will become flawless. This may be true, but that only holds for the dataset it was trained on. In fact, if you give it another set of data where the values are different, the model could output completely incorrect predictions.

To understand this further, let's say you were practicing every single day for your driver's exam by driving in a straight line without moving the wheel. (Please don't do this.) While you would probably perform very well on the drag strip, if you were told to make a left turn on the actual exam, you might end up turning into a STOP sign instead.

This phenomenon is called overfitting. Your model can learn all the aspects and patterns of the data it's trained on but if it learns a pattern that adheres to the training dataset too closely, then when given a new dataset, your model will perform poorly. At the same time, if you don't train your model enough, then your model won't be able to recognize patterns in other datasets properly. In this case, you would be underfitting.

Optimizing Your Neural Networks


An example of overfitting. The validation loss, represented by the orange line is gradually increasing while the training loss, represented by the blue line is decreasing.

In the example above, a great position to stop training your model would be right when the validation loss reaches its minimum. It's possible to do this with early stopping, which stops training once there is no improvement in validation loss after an arbitrary number of training cycles (epochs).

Training your model is all about finding a balance between overfitting and underfitting while utilizing early stopping if necessary. That's why your training dataset should be as representative as possible of your overall population so that your model can more accurately make predictions on data it hasn't seen.

Loss Functions

Perhaps one of the most important training configurations that can be tweaked is the loss function, which is the "inaccuracy" between your model's predictions and their actual values. The "inaccuracy" can be represented mathematically in many different ways, one of the most common being mean squared error (MSE):

MSE=i=1n(yiˉyi)2ntext{MSE} = frac{sum_{i=1}^n (bar{y_i} - y_i)^2}{n} MSE=ni=1n(yiˉyi)2

where yiˉbar{y_i} yiˉ is the model's prediction and yiy_i yi is the true value. There's a similar variant called mean absolute error (MAE)

MAE=i=1nyiˉyintext{MAE} = frac{sum_{i=1}^n |bar{y_i} - y_i|}{n} MAE=ni=1nyiˉyi

What's the difference between these two and which one is better? The real answer is that it depends on a variety of factors. Let's consider a simple 2-dimensional linear regression example.

In many cases, there can be data points that act outliers, points that are far away from other data points. In terms of linear regression, this means that there are a few points on the xyxy xy -plane that are far away from the rest of them. If you remember from your statistics classes, it's points like these that can significantly affect the linear regression line that's calculated.

Optimizing Your Neural Networks
A simple graph with points on (1, 1), (2, 2), (3, 3), and (4, 4)

If you wanted to think of a line that could cross all four points, then y=xy = x y=x would be a great choice because this line would go through all the points.

Optimizing Your Neural Networks
A simple graph with points on (1, 1), (2, 2), (3, 3), and (4, 4) and the line y=xy = x y=x going through it

However, let's say I decide to add another point at (5,1)(5, 1) (5,1) . Now what should the regression line be? Well, it turns out that it's completely different: y=0.2x 1.6y = 0.2x 1.6 y=0.2x 1.6

Optimizing Your Neural Networks


A simple graph with points on (1, 1), (2, 2), (3, 3), (4, 4), and (5,1) with a linear regression line going through it.

Given the previous data points, the line would expect that the value of yy y when x=5x = 5 x=5 is 5, but because of the outlier and its MSE, the regression line is "pulled downwards" significantly.

This is just a simple example, but this poses a question that you, as a machine learning developer, need to stop and think about: How sensitive should my model be to outliers? If you want your model to be more sensitive to outliers, then you would choose a metric like MSE, because in that case, errors involving outliers are more pronounced due to the squaring and your model will adjust itself to minimize that. Otherwise, you would choose a metric like MAE, which doesn't care as much about outliers.

Optimizers

In my previous post, I also discussed the concept of backpropagation, gradient descent, and how they work to minimize the loss of the model. The gradient is a vector that points towards the direction of greatest change. A gradient descent algorithm will calculate this vector and move in the exact opposite direction so that it eventually reaches a minimum.

Most optimizers have a specific learning rate, commonly denoted as αalpha α that they adhere to. Essentially, this represents how much the algorithm will move towards the minimum each time it calculates the gradient. Be careful of setting your learning rate to be too large! Your algorithm may never reach the minimum due to the large steps it takes that could repeatedly skip over the minimum.

Optimizing Your Neural Networks
[Tensorflow's neural network playground](https://playground.tensorflow.org) showing what can happen if you set the learning rate to be too large. Notice how the testing and training loss are both `NaN`.

Going back to gradient descent, while it is effective in minimizing loss, this might significantly slow down the training process as the loss function is calculated on the entire dataset. There are several alternatives to gradient descent that are more efficient but have their respective downsides.

Stochastic Gradient Descent

One of the most popular alternatives to standard gradient descent is a variant called stochastic gradient descent (SGD). As with gradient descent, SGD has a fixed learning rate. But rather than running through the entire dataset like gradient descent, SGD takes a small sample is randomly selected and the weights of your neural network are updated based on the sample instead. Eventually, the parameter values converge to a point that approximately (but not exactly) minimizes the loss function. This is one of the downsides of SGD, as it doesn't always reach the exact minimum. Additionally, similar to gradient descent, it remains sensitive to the learning rate that you set.

The Adam Optimizer

The name, Adam, is derived from adaptive moment estimation. It essentially combines two variants of SGD to adjust the learning rate for each input parameter based on how often it gets updated during each training iteration (adaptive learning rate). At the same time, it also keeps track of past gradient calculations as a moving average to smooth out updates (momentum). However, because of its momentum characteristic, it can sometimes take longer to converge than other algorithms.

Putting it All Together

Now for an example!

I've created an example walkthrough on Google Colab that uses PyTorch to create a neural network that learns a simple linear relationship.

If you're a bit new to Python, don't worry! I've included some explanations that discuss what's going on in each section.

Reflection

While this obviously doesn't cover everything about optimizing neural networks, I wanted to at least cover a few of the most important concepts that you can take advantage of while training your own models. Hopefully you've learned something this week and thanks for reading!

The above is the detailed content of Optimizing Your Neural Networks. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Python and Time: Making the Most of Your Study TimePython and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MorePython: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C  : Applications and Use Cases ComparedPython vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachThe 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsPython: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.