JulienBeaulieu
  • Introduction
  • Sciences
    • Math
      • Probability
        • Bayes Rule
        • Binomial distribution
        • Conditional Probability
      • Statistics
        • Descriptive Statistics
        • Inferential Statistics
          • Normal Distributions
          • Sampling Distributions
          • Confidence Intervals
          • Hypothesis Testing
          • AB Testing
        • Simple Linear Regression
        • Multiple Linear Regression
          • Statistical learning course
          • Model Assumptions And How To Address Each
        • Logistic Regression
      • Calculus
        • The big picture of Calculus
          • Derivatives
          • 2nd derivatives
          • The exponential e^x
        • Calculus
        • Gradient
      • Linear Algebra
        • Matrices
          • Matrix Multiplication
          • Inverses and Transpose and permutations
        • Vector Space and subspaces
        • Orthogonality
          • Orthogonal Sets
          • Projections
          • Least Squares
        • Gaussian Elimination
    • Programming
      • Command Line
      • Git & GitHub
      • Latex
      • Linear Algebra
        • Element-wise operations, Multiplication Transpose
      • Encodings and Character Sets
      • Uncategorized
      • Navigating Your Working Directory and File I/O
      • Python
        • Problem Solving
        • Strings
        • Lists & Dictionaries
        • Storing Data
        • HTTP Requests
      • SQL
        • Basic Statements
        • Entity Relationship Diagram
      • Jupyter Notebooks
      • Data Analysis
        • Data Visualization
          • Data Viz Cheat Sheet
          • Explanatory Analysis
          • Univariate Exploration of Data
            • Bar Chart
            • Pie Charts
            • Histograms
            • Kernel Density Estimation
            • Figures, Axes, and Subplots
            • Choosing a Plot for Discrete Data
            • Scales and Transformations (Log)
          • Bivariate Exploration of Data
            • Scatterplots
            • Overplotting, Transparency, and Jitter
            • Heatmaps
            • Violin & Box Plots
            • Categorical Variable Analysis
            • Faceting
            • Line Plots
            • Adapted Bar Charts
            • Q-Q, Swarm, Rug, Strip, Stacked, and Rigeline Plots
          • Multivariate Exploration of Data
            • Non-Positional Encodings for Third Variables
            • Color Palettes
            • Faceting for Multivariate Data
            • Plot and Correlation Matrices
            • Other Adaptations of Bivariate PLots
            • Feature Engineering for Data Viz
        • Python - Cheat Sheet
    • Machine Learning
      • Courses
        • Practical Deep learning for coders
          • Convolutional Neural Networks
            • Image Restauration
            • U-net
          • Lesson 1
          • Lesson 2
          • Lesson 3
          • Lesson 4 NLP, Collaborative filtering, Embeddings
          • Lesson 5 - Backprop, Accelerated SGD
          • Tabular data
        • Fast.ai - Intro to ML
          • Neural Nets
          • Business Applications
          • Class 1 & 2 - Random Forests
          • Lessons 3 & 4
      • Unsupervised Learning
        • Dimensionality Reduction
          • Independant Component Analysis
          • Random Projection
          • Principal Component Analysis
        • K-Means
        • Hierarchical Clustering
        • DBSCAN
        • Gaussian Mixture Model Clustering
        • Cluster Validation
      • Preprocessing
      • Machine Learning Overview
        • Confusion Matrix
      • Linear Regression
        • Feature Scaling and Normalization
        • Regularization
        • Polynomial Regression
        • Error functions
      • Decision Trees
      • Support Vector Machines
      • Training and Tuning
      • Model Evaluation Metrics
      • NLP
      • Neural Networks
        • Perceptron Algorithm
        • Multilayer Perceptron
        • Neural Network Architecture
        • Gradient Descent
        • Backpropagation
        • Training Neural Networks
  • Business
    • Analytics
      • KPIs for a Website
  • Books
    • Statistics
      • Practice Statistics for Data Science
        • Exploring Binary and Categorical Data
        • Data and Sampling Distributions
        • Statistical Experiments and Significance Testing
        • Regression and Prediction
        • Classification
        • Correlation
    • Pragmatic Thinking and Learning
      • Untitled
    • A Mind For Numbers: How to Excel at Math and Science
      • Focused and diffuse mode
      • Procrastination
      • Working memory and long term memory
        • Chunking
      • Importance of sleeping
      • Q&A with Terrence Sejnowski
      • Illusions of competence
      • Seeing the bigger picture
        • The value of a Library of Chunks
        • Overlearning
Powered by GitBook
On this page
  • Problems in multiple Linear regression
  • Colinearity
  • Multi-colinearity
  • Variance Inflation Factor
  • High Order Terms

Was this helpful?

  1. Sciences
  2. Math
  3. Statistics

Multiple Linear Regression

PreviousSimple Linear RegressionNextStatistical learning course

Last updated 5 years ago

Was this helpful?

Interpretation

The intercept coef is that if our home is a victorian home, we predict its price to be 1.046e+06 (1 million)

A lodge is predicted to be 7.411e+05 less than a victorian.

Each of the lodge and ranch coefs is a comparison to the baseline category, and the intercept it our prediction for the baseline.

Problems in multiple Linear regression

See more details here:

Colinearity

Multi-colinearity

Multicollinearity is when we have 3 or more predictor variables that are correlated with one another. One of the main concerns of multicolinearity is that it can lead to coefficients being flipped from the direction we expect from simple linear regression. Multicolinearity can emerge even when isolated pairs of variables are not colinear.

We would like x-variables to be related to the response, but not to be related to one another.

There are two consequences for multi-colinearity:

  1. The expected relationships between your x-variables and the response may not hold when multicollinearity is present. That is, you may expect a positive relationship between the explanatory variables and the response (based on the bivariate relationships), but in the multiple linear regression case, it tuns out the relationship is negative.

  2. Our hypothesis testing results may not be reliable. It turns out that having correlated explanatory variables means that our coefficient estimates are less stable. That is, standard deviations (often called standard errors) associated with your regression coefficients are quite large. Therefore, a particular variable might be useful for predicting the response, but because of the relationship it has with other x-variables, you will no longer see this association.

Two different ways of identifying multicollinearity:

  1. We can look at the correlation of each explanatory variable with each other explanatory variable (with a plot or the correlation coefficient).

Let's use a pairplot to see the relationship between some variables. The 3 varaibles have a strong relationship

Also we can specifically see that price and bedrooms have a positive relationship from one another:

However, the bedroom coefficient is negative in our multiple linear regression:

x-variables are related to one another, we can have flipped relationships in our multiple linear regression models from what we would expect when looking at the bivariate linear regression relationships.

2. We can look at Variance Inflation Factors (VIFs) for each variable.

Variance Inflation Factor

The Variance Inflation Factor (VIF) is a measure of colinearity among predictor variables within a multiple regression. It is calculated by taking the the ratio of the variance of all a given model's betas divide by the variane of a single beta if it were fit alone.

  1. Run a multiple regression.

  2. Calculate the VIF factors.

  3. Inspect the factors for each predictor variable, if the VIF is between 5-10, multicolinearity is likely present and you should consider dropping the variable.

dmatrices is imported from patsy, and we input the dependant variable, as well as our x variables right after. Then we use the variance inflation factor method.

We would want to remove at least one of the last two variables from our model because both of their VIFs are larger than 10. It is commun to remove the one which is of least interest.

High Order Terms

Higher order terms in linear models are created when multiplying two or more x-variables by one another. Common higher order terms include quadratics (x_1^2x12​) and cubics (x_1^3x13​) , where an x-variable is multiplied by itself, as well as interactions (x_1x_2x1​x2​) , where two or more x-variables are multiplied by one another.

In a model with no higher order terms, you might have an equation like:

y​=b0​+b1​x1​+b2​x2​y ^ ​ =b 0 ​ +b 1 ​ x 1 ​ +b 2 ​ x 2 ​ y​=b0​+b1​x1​+b2​x2​

Then we might decide the linear model can be improved with higher order terms. The equation might change to:

y​=b0​+b1​x1​+b2​x12​+b3​x2​+b4​x1​x2​y ^ ​ =b 0 ​ +b 1 ​ x 1 ​ +b 2 ​ x 1 2 ​ +b 3 ​ x 2 ​ +b 4 ​ x 1 ​ x 2 ​ y​=b0​+b1​x1​+b2​x12​+b3​x2​+b4​x1​x2​

Here, we have introduced a quadratic (b2x_1^2) and an interaction (b4x1x2​ ) term into the model.

In general, these terms can help you fit more complex relationships in your data. However, they also take away from the ease of interpreting coefficients, as we have seen so far. You might be wondering: "How do I identify if I need one of these higher order terms?"

When creating models with quadratic, cubic, or even higher orders of a variable, we are essentially looking at how many curves there are in the relationship between the explanatory and response variables.

If there is one curve, like in the plot below, then you will want to add a quadratic. Clearly, we can see a line isn't the best fit for this relationship.

Then, if we want to add a cubic relationship, it is because we see two curves in the relationship between the explanatory and response variable. An example of this is shown in the plot below.

In python:

How do you know if you should add an interaction term?

Interaction definition: The way that variable X1 is related to your response is dependent on the value of X2.

Mathematically, an interaction is created by multiplying two variables by one another and adding this term to our linear regression model.

Say you have 2 neighborhoods and their relationships to the area vs price of a house: area (x1​) and the neighborhood (x2​) of a home (either A or B) to predict the home price (y).

where b1​ is the way we estimate the relationship between area and price, which in this model we believe to be the same regardless of the neighborhood.

Then b2​ is the difference in price depending on which neighborhood you are in, which is the vertical distance between the two lines here:

Notice here that:

  • The way that area is related to price is the same regardless of neighborhood.

AND

  • The difference in price for the different neighborhoods is the same regardless of the area.

When these statements are true, we do not need an interaction term in our model. However, we need an interaction when the way that area is related to price is different depending on the neighborhood.

Mathematically, when the way area relates to price depends on the neighborhood, this suggests we should add an interaction. By adding the interaction, we allow the slopes of the line for each neighborhood to be different, as shown in the plot below. Here we have added the interaction, and you can see this allows for a difference in these two slopes.

The slopes are different. In order to account for this, we would want to add an interaction term for this between neighborhood and square footage.

Here we can see the way square footage is related to the homeprice, we dependent ont the neighborhood we are in. AKA the interaction definition: The way that variable X1 is related to your response is dependent on the value of X2.

Conclusion: if the slopes are close to equal, then we do NOT add an interaction. Else, we do.

Interpretation:

With the higher order term, the coefficients associated with area and area squared are not easily interpretable. However, coefficients that are not associated with the higher order terms are still interpretable in the way you did earlier.

Colinearity is the state where two variables are highly correlated and contain similiar information about the variance within a given dataset. To detect colinearity among variables, simply create a correlation matrix and find variables with large absolute values. In R use the function and in python this can by accomplished by using numpy's function.

For more on VIFs and multicollinearity, .

corr
corrcoef
here is the referenced post from the video on VIFs