Home >Common Problem >Gradient descent principle
The three elements of the gradient method idea: starting point, descent direction, and descent step size.
The weight update expression commonly used in machine learning is (recommended learning: Python video tutorial)
:, λ here is the learning rate. This article starts from this formula to explain clearly the various "gradient" descent methods in machine learning.
Machine learning target functions are generally convex functions. What is a convex function?
Due to space limitations, we will not go into deep development. Here we will make a vivid metaphor to solve the problem of convex function. You can imagine the target loss function as a pot to find the bottom of the pot. The very intuitive idea is that we go down along the gradient direction of the function at an initial point (that is, gradient descent). Here, let’s make another vivid analogy. If we compare this move to a force, then the three complete elements are step length (how much to move), direction, and starting point. This vivid metaphor makes it easier for us to solve the gradient problem. Cheerful, the starting point is very important and is the key to consider during initialization, and the direction and step size are the key. In fact, the difference between different gradients lies in these two points!
The gradient direction is
, and the step size is set to a constant Δ. Then you will find that if used When the gradient is large, it is far away from the optimal solution, and W is updated faster; however, when the gradient is small, that is, when it is closer to the optimal solution, W is updated at the same rate as before. This will cause W to be easily over-updated and move away from the optimal solution, and then oscillate back and forth near the optimal solution. Therefore, since the gradient is large when far away from the optimal solution and small when close to the optimal solution, we let the step length follow this rhythm, so we use λ|W| to replace Δ, Finally we get The formula we are familiar with:
So the λ at this time changes with the steepness and gentleness of the slope, even though it is a constant.
For more Python related technical articles, please visit the Python Tutorial column to learn!
The above is the detailed content of Gradient descent principle. For more information, please follow other related articles on the PHP Chinese website!