JulienBeaulieu
  • Introduction
  • Sciences
    • Math
      • Probability
        • Bayes Rule
        • Binomial distribution
        • Conditional Probability
      • Statistics
        • Descriptive Statistics
        • Inferential Statistics
          • Normal Distributions
          • Sampling Distributions
          • Confidence Intervals
          • Hypothesis Testing
          • AB Testing
        • Simple Linear Regression
        • Multiple Linear Regression
          • Statistical learning course
          • Model Assumptions And How To Address Each
        • Logistic Regression
      • Calculus
        • The big picture of Calculus
          • Derivatives
          • 2nd derivatives
          • The exponential e^x
        • Calculus
        • Gradient
      • Linear Algebra
        • Matrices
          • Matrix Multiplication
          • Inverses and Transpose and permutations
        • Vector Space and subspaces
        • Orthogonality
          • Orthogonal Sets
          • Projections
          • Least Squares
        • Gaussian Elimination
    • Programming
      • Command Line
      • Git & GitHub
      • Latex
      • Linear Algebra
        • Element-wise operations, Multiplication Transpose
      • Encodings and Character Sets
      • Uncategorized
      • Navigating Your Working Directory and File I/O
      • Python
        • Problem Solving
        • Strings
        • Lists & Dictionaries
        • Storing Data
        • HTTP Requests
      • SQL
        • Basic Statements
        • Entity Relationship Diagram
      • Jupyter Notebooks
      • Data Analysis
        • Data Visualization
          • Data Viz Cheat Sheet
          • Explanatory Analysis
          • Univariate Exploration of Data
            • Bar Chart
            • Pie Charts
            • Histograms
            • Kernel Density Estimation
            • Figures, Axes, and Subplots
            • Choosing a Plot for Discrete Data
            • Scales and Transformations (Log)
          • Bivariate Exploration of Data
            • Scatterplots
            • Overplotting, Transparency, and Jitter
            • Heatmaps
            • Violin & Box Plots
            • Categorical Variable Analysis
            • Faceting
            • Line Plots
            • Adapted Bar Charts
            • Q-Q, Swarm, Rug, Strip, Stacked, and Rigeline Plots
          • Multivariate Exploration of Data
            • Non-Positional Encodings for Third Variables
            • Color Palettes
            • Faceting for Multivariate Data
            • Plot and Correlation Matrices
            • Other Adaptations of Bivariate PLots
            • Feature Engineering for Data Viz
        • Python - Cheat Sheet
    • Machine Learning
      • Courses
        • Practical Deep learning for coders
          • Convolutional Neural Networks
            • Image Restauration
            • U-net
          • Lesson 1
          • Lesson 2
          • Lesson 3
          • Lesson 4 NLP, Collaborative filtering, Embeddings
          • Lesson 5 - Backprop, Accelerated SGD
          • Tabular data
        • Fast.ai - Intro to ML
          • Neural Nets
          • Business Applications
          • Class 1 & 2 - Random Forests
          • Lessons 3 & 4
      • Unsupervised Learning
        • Dimensionality Reduction
          • Independant Component Analysis
          • Random Projection
          • Principal Component Analysis
        • K-Means
        • Hierarchical Clustering
        • DBSCAN
        • Gaussian Mixture Model Clustering
        • Cluster Validation
      • Preprocessing
      • Machine Learning Overview
        • Confusion Matrix
      • Linear Regression
        • Feature Scaling and Normalization
        • Regularization
        • Polynomial Regression
        • Error functions
      • Decision Trees
      • Support Vector Machines
      • Training and Tuning
      • Model Evaluation Metrics
      • NLP
      • Neural Networks
        • Perceptron Algorithm
        • Multilayer Perceptron
        • Neural Network Architecture
        • Gradient Descent
        • Backpropagation
        • Training Neural Networks
  • Business
    • Analytics
      • KPIs for a Website
  • Books
    • Statistics
      • Practice Statistics for Data Science
        • Exploring Binary and Categorical Data
        • Data and Sampling Distributions
        • Statistical Experiments and Significance Testing
        • Regression and Prediction
        • Classification
        • Correlation
    • Pragmatic Thinking and Learning
      • Untitled
    • A Mind For Numbers: How to Excel at Math and Science
      • Focused and diffuse mode
      • Procrastination
      • Working memory and long term memory
        • Chunking
      • Importance of sleeping
      • Q&A with Terrence Sejnowski
      • Illusions of competence
      • Seeing the bigger picture
        • The value of a Library of Chunks
        • Overlearning
Powered by GitBook
On this page
  • Recap overfitting
  • Regularization Exercise

Was this helpful?

  1. Sciences
  2. Machine Learning
  3. Linear Regression

Regularization

PreviousFeature Scaling and NormalizationNextPolynomial Regression

Last updated 5 years ago

Was this helpful?

Regularization a technique that helps in avoiding overfitting and also increasing model interpretability.

Recap overfitting

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well. This means that our squared error function will be close to 0.

But this will make a function or curve that tried too hard to pass through all the different points. This means that it will fail to generalize to new examples (predict prices on new examples.

So what can we do to adress this? One thing we can do it regularization. Here we look at L1 (Lasso) and L2 (Ridge) regularization.

Regularization: Keep all the features, but reduce the magnitude/values of parameters Theta_j. Works well when we have a lot of feature, each of wehich contributes a bit to predicting y.

For L1, or Lasso, we add a regularization term to our squred error function / or cost function (sum of all the errors) that affects every single parameter. (we start at j = 1 because we're not penalizing the intercept.

We take a parameter Landa, the absolute value of the coefficients, and take the sum of all that. This contributes to the overall sum.

This function has 2 objectives: on the left the cost function where we want to fit the data well. On this right, where we want to keep the parameters small to keep the function simpler. Lambda controls the trade off between overfitting and underfitting.

The more we add to this overall value, the worse we are actually performing. That is something we actually want, since we want to prevent overfitting. Lambda tells us how strong this effect should be.

L2 or ridge regularization - we take the square of the parameters. This means that we penalize more the large scale parameters.

If a model wants to learn large parameters it will be penalized.

L1 can be used to completely remove features, but it is computationaly more expensive.

L2 cannot be used for feature selection, but it is computationally more efficient. It penalizes large values,

High Bias vs High variance

This sheds light on the obvious disadvantage of ridge regression, which is model interpretability. It will shrink the coefficients for least important predictors, very close to zero. But it will never make them exactly zero. In other words, the final model will include all predictors. However, in the case of the lasso, the L1 penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter λ is sufficiently large. Therefore, the lasso method also performs variable selection and is said to yield sparse models.

Regularization Exercise

Perform the following steps:

1. Load in the data

  • The data is in the file called 'data.csv'. Note that there's no header row on this file.

  • Split the data so that the six predictor features (first six columns) are stored in X, and the outcome feature (last column) is stored in y.

2. Fit data using linear regression with Lasso regularization

  • Use the Lasso object's .fit() method to fit the regression model onto the data.

3. Inspect the coefficients of the regression model

  • Obtain the coefficients of the fit regression model using the .coef_ attribute of the Lassoobject. Store this in the reg_coef variable: the coefficients will be printed out, and you will use your observations to answer the question at the bottom of the page.

# TODO: Add import statements
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

# Assign the data to predictor and outcome variables
# TODO: Load the data
train_data = pd.read_csv('data.csv', header = None)
X = train_data.iloc[:,:-1]
y = train_data.iloc[:,-1]


# TODO: Create the linear regression model with lasso regularization.
lasso_reg = Lasso()

# TODO: Fit the model.
lasso_reg.fit(X, y)

# TODO: Retrieve and print out the coefficients from the regression model.
reg_coef = lasso_reg.coef_
print(reg_coef)

Perhaps it's not too surprising at this point, but there are classes in sklearn that will help you perform regularization with your linear regression. You'll get practice with implementing that in this exercise. In this assignment's data.csv, you'll find data for a bunch of points including six predictor variables and one outcome variable. Use sklearn's class to fit a linear regression model to the data, while also using L1 regularization to control for model complexity.

Create an instance of sklearn's class and assign it to the variable lasso_reg. You don't need to set any parameter values: use the default values for the quiz.

Lasso
Lasso