data:image/s3,"s3://crabby-images/80cf9/80cf918fb74374e1c23c8aa1d7c9ed6908970cf8" alt="Mastering Machine Learning Algorithms"
Ridge
Ridge regularization (also known as Tikhonov regularization) is based on the squared L2-norm of the parameter vector:
This penalty avoids an infinite growth of the parameters (for this reason, it's also known as weight shrinkage), and it's particularly useful when the model is ill-conditioned, or there is multicollinearity, due to the fact that the samples are completely independent (a relatively common condition).
In the following diagram, we see a schematic representation of the Ridge regularization in a bidimensional scenario:
Ridge (L2) regularization
The zero-centered circle represents the Ridge boundary, while the shaded surface is the original cost function. Without regularization, the minimum (w1, w2) has a magnitude (for example, the distance from the origin) which is about double the one obtained by applying a Ridge constraint, confirming the expected shrinkage. When applied to regressions solved with the Ordinary Least Squares (OLS) algorithm, it's possible to prove that there always exists a Ridge coefficient, so that the weights are shrunk with respect the OLS ones. The same result, with some restrictions, can be extended to other cost functions.