Lasso and Ridge Regularization

Understanding Lasso and Ridge Regularization in Machine Learning

Aviral Bhardwaj
5 min readJul 12, 2022

When using supervised learning algorithms on a data set in machine learning, there will be instances where the model performs extremely well on train data but may not perform well and may also have a high error rate when tested on new data. Numerous factors, including collinearity, bias-variance decomposition, and excessive modelling on train data, are to blame for this.

In this article, we will look at two different types of regularisation techniques: Lasso and Ridge Regularization. Furthermore, we will discuss bias, variation, underfitting, and overfitting.

What is Regularization?

Regularization is one method for enhancing our model’s ability to function on unobserved data by discarding the less significant characteristics. It strives to increase the model’s accuracy while minimising the validation loss. By penalising the model with large variance and reducing the beta coefficients to zero, it prevents overfitting.Underfitting and Overfitting

Bias and Variance

Bias

The underlying presumptions that data use to simplify the target function are known as biases. Indeed, bias makes the data more generalizable and the model less sensitive to isolated data points. Because the desired function is less complicated, it also cuts down on training time. High bias indicates that the target function is assumed to be more reliable. Sometimes this causes the model to be underfit.
The algorithms linear regression and logistic regression are examples of high bias algorithms.

Variance

Variance is a sort of mistake that happens in machine learning as a result of a model’s sensitivity to minute variations in the dataset. Due to the significant variation, an algorithm would model the noise and outliers in the training set. Overfitting is the term most often used to describe this. When evaluated on a new dataset, the model in this case does not provide accurate prediction since it essentially learns every data point.

A balanced model will have low bias and low variance, whereas high bias and high variance will result in underfitting and overfitting.

Low Bias - The average prediction is very near the desired number.

High Bias - When predictions and actual values vary too much.

Low Variance - The data points are small and rarely deviate significantly from the mean value.

High Variance -Dispersed data points that significantly deviate from the mean value and other data points.

We require a proper ratio of bias to variance in order to produce a decent match.

Underfitting and Overfitting

Underfitting

Underfitting happens when a model is unable to generalise correctly to the new data because it has not learned the patterns in the training data properly. On training data, an underfit model performs badly and makes bad predictions. When there is a high bias and a low variance, underfitting occurs.

Overfitting

When a model performs remarkably well on training data but poorly on test data, it is said to be overfit (fresh data). In this case, the machine learning model picks up on the noise and subtleties in the training data, which negatively affects how well the model performs on test data. Low bias and high variability might lead to overfitting.

You may read my entire article on underfitting and overfitting at the end of the article.

Two types of Regularization

  1. Lasso Regularization
  2. Ridge Regularization

Lasso Regularization (L1)

L1 regularisation is carried out using this regularisation method. In contrast to Ridge Regression, it alters the RSS by including a penalty (amount of shrinkage) equal to the total of the absolute values of the coefficients.
As seen in the equation below, Lasso (Least Absolute Shrinkage and Selection Operator) penalises the absolute magnitude of the regression coefficients in a manner similar to that of Ridge Regression. Additionally, it has a good track record of lowering variability and enhancing the precision of linear regression models.

Limitation of Lasso Regression

  • With various data types, Lasso occasionally has difficulties. Even if all predictors are significant, Lasso will choose at most n predictors as non-zero if the number of predictors (p) is more than the number of observations (n) (or may be used in the test set).
  • The LASSO regression method chooses one of the highly collinear variables at random when there are two or more, which is bad for data interpretation.

Ridge Regularization (L2)

This method carries out L2 regularisation. The primary approach used for this modifies the RSS by adding a penalty that is equal to the square of the coefficients’ magnitude. However, it is thought to be a method employed when the data exhibits multicollinearity (independent variables are highly correlated). Even though the least squares estimates (OLS) in multicollinearity are unbiased, their enormous variances cause the observed value to diverge much from the actual value. Ridge regression lowers the quality errors by biassing the regression estimates to some extent. It usually uses the shrinkage parameter to tackle the multicollinearity problem. Let’s look at the equation below now.

We are dealing with a two-part equation here. The first one stands for the least square term, and the last one is lambda, which is the sum of the β2 (beta- square), where β is the coefficient. In order to reduce the parameter’s size and give it a very low variance, this is added to the least square term.

Limitation of Ridge Regression

Since ridge regression never results in a coefficient being zero but simply minimises it, it reduces the complexity of a model without reducing the number of variables. This model is thus not suitable for feature reduction.

If you like my article and efforts towards the community, you may support and encourage me, by simply buying coffee for me

Conclusion

well I have good news for you I would be bringing some more articles to explain machine learning concepts and models with codes so leave a comment and tell me how excited are you about this.

--

--

Aviral Bhardwaj

One of the youngest writer and mentor on AI-ML & Technology.