# Regularization

Regularization a technique that helps in avoiding overfitting and also increasing model interpretability.

### Recap overfitting

![](/files/-Lj1WlnbJPyE9w3PcIHe)

![](/files/-Lj1YAf_tafhd6nzW1IV)

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well. This means that our squared error function will be close to 0.&#x20;

![](/files/-Lj1XLZMSB72L47ZVngf)

But this will make a function or curve that tried too hard to pass through all the different points. This means that it will fail to generalize to new examples (predict prices on new examples.&#x20;

**So what can we do to adress this? One thing we can do it regularization. Here we look at L1 (Lasso) and L2 (Ridge) regularization.**

**Regularization:** Keep all the features, but reduce the magnitude/values of parameters Theta\_j. Works well when we have a lot of feature, each of wehich contributes a bit to predicting y.&#x20;

For L1, or Lasso, we add a regularization term to our squred error function / or cost function (sum of all the errors) that affects every single parameter. (we start at j = 1 because we're not penalizing the intercept.&#x20;

![](/files/-Lj1Uwf8RR3VtQRwRn5y)

We take a parameter Landa, the absolute value of the coefficients, and take the sum of all that. This contributes to the overall sum. &#x20;

This function has 2 objectives: on the left the cost function where we want to fit the data well. On this right, where we want to keep the parameters small to keep the function simpler. Lambda controls the trade off between overfitting and underfitting.&#x20;

The more we add to this overall value, the worse we are actually performing. That is something we actually want, since we want to prevent overfitting.  Lambda tells us how strong this effect should be.&#x20;

L2 or ridge regularization - we take the square of the parameters. This means that we penalize more the large scale parameters.&#x20;

![](/files/-Lj1VxB0v1T_aaPZpeyc)

If a model wants to learn large parameters it will be penalized. &#x20;

**L1 can be used to completely remove features, but it is computationaly more expensive.**&#x20;

**L2 cannot be used for feature selection, but it is computationally more efficient. It penalizes large values,**&#x20;

**High Bias vs High variance**

**This sheds light on the obvious disadvantage of ridge regression, which is model interpretability.** It will shrink the coefficients for least important predictors, very close to zero. But it will never make them exactly zero. In other words, the final model will include all predictors. However, in the case of the lasso, the L1 penalty has the eﬀect of forcing some of the coeﬃcient estimates to be exactly equal to zero when the tuning parameter λ is suﬃciently large. **Therefore, the lasso method also performs variable selection and is said to yield sparse models.**

### Regularization Exercise <a href="#regularization-exercise" id="regularization-exercise"></a>

Perhaps it's not too surprising at this point, but there are classes in sklearn that will help you perform regularization with your linear regression. You'll get practice with implementing that in this exercise. In this assignment's data.csv, you'll find data for a bunch of points including six predictor variables and one outcome variable. Use sklearn's [`Lasso`](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) class to fit a linear regression model to the data, while also using L1 regularization to control for model complexity.

**Perform the following steps:**

**1. Load in the data**

* The data is in the file called 'data.csv'. Note that there's no header row on this file.
* Split the data so that the six predictor features (first six columns) are stored in `X`, and the outcome feature (last column) is stored in `y`.

**2. Fit data using linear regression with Lasso regularization**

* Create an instance of sklearn's [`Lasso`](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) class and assign it to the variable `lasso_reg`. You don't need to set any parameter values: use the default values for the quiz.
* Use the `Lasso` object's `.fit()` method to fit the regression model onto the data.

**3. Inspect the coefficients of the regression model**

* Obtain the coefficients of the fit regression model using the `.coef_` attribute of the `Lasso`object. Store this in the `reg_coef` variable: the coefficients will be printed out, and you will use your observations to answer the question at the bottom of the page.

```python
# TODO: Add import statements
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

# Assign the data to predictor and outcome variables
# TODO: Load the data
train_data = pd.read_csv('data.csv', header = None)
X = train_data.iloc[:,:-1]
y = train_data.iloc[:,-1]


# TODO: Create the linear regression model with lasso regularization.
lasso_reg = Lasso()

# TODO: Fit the model.
lasso_reg.fit(X, y)

# TODO: Retrieve and print out the coefficients from the regression model.
reg_coef = lasso_reg.coef_
print(reg_coef)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://julienbeaulieu.gitbook.io/wiki/sciences/machine-learning/linear-regression/regularization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
