JulienBeaulieu
  • Introduction
  • Sciences
    • Math
      • Probability
        • Bayes Rule
        • Binomial distribution
        • Conditional Probability
      • Statistics
        • Descriptive Statistics
        • Inferential Statistics
          • Normal Distributions
          • Sampling Distributions
          • Confidence Intervals
          • Hypothesis Testing
          • AB Testing
        • Simple Linear Regression
        • Multiple Linear Regression
          • Statistical learning course
          • Model Assumptions And How To Address Each
        • Logistic Regression
      • Calculus
        • The big picture of Calculus
          • Derivatives
          • 2nd derivatives
          • The exponential e^x
        • Calculus
        • Gradient
      • Linear Algebra
        • Matrices
          • Matrix Multiplication
          • Inverses and Transpose and permutations
        • Vector Space and subspaces
        • Orthogonality
          • Orthogonal Sets
          • Projections
          • Least Squares
        • Gaussian Elimination
    • Programming
      • Command Line
      • Git & GitHub
      • Latex
      • Linear Algebra
        • Element-wise operations, Multiplication Transpose
      • Encodings and Character Sets
      • Uncategorized
      • Navigating Your Working Directory and File I/O
      • Python
        • Problem Solving
        • Strings
        • Lists & Dictionaries
        • Storing Data
        • HTTP Requests
      • SQL
        • Basic Statements
        • Entity Relationship Diagram
      • Jupyter Notebooks
      • Data Analysis
        • Data Visualization
          • Data Viz Cheat Sheet
          • Explanatory Analysis
          • Univariate Exploration of Data
            • Bar Chart
            • Pie Charts
            • Histograms
            • Kernel Density Estimation
            • Figures, Axes, and Subplots
            • Choosing a Plot for Discrete Data
            • Scales and Transformations (Log)
          • Bivariate Exploration of Data
            • Scatterplots
            • Overplotting, Transparency, and Jitter
            • Heatmaps
            • Violin & Box Plots
            • Categorical Variable Analysis
            • Faceting
            • Line Plots
            • Adapted Bar Charts
            • Q-Q, Swarm, Rug, Strip, Stacked, and Rigeline Plots
          • Multivariate Exploration of Data
            • Non-Positional Encodings for Third Variables
            • Color Palettes
            • Faceting for Multivariate Data
            • Plot and Correlation Matrices
            • Other Adaptations of Bivariate PLots
            • Feature Engineering for Data Viz
        • Python - Cheat Sheet
    • Machine Learning
      • Courses
        • Practical Deep learning for coders
          • Convolutional Neural Networks
            • Image Restauration
            • U-net
          • Lesson 1
          • Lesson 2
          • Lesson 3
          • Lesson 4 NLP, Collaborative filtering, Embeddings
          • Lesson 5 - Backprop, Accelerated SGD
          • Tabular data
        • Fast.ai - Intro to ML
          • Neural Nets
          • Business Applications
          • Class 1 & 2 - Random Forests
          • Lessons 3 & 4
      • Unsupervised Learning
        • Dimensionality Reduction
          • Independant Component Analysis
          • Random Projection
          • Principal Component Analysis
        • K-Means
        • Hierarchical Clustering
        • DBSCAN
        • Gaussian Mixture Model Clustering
        • Cluster Validation
      • Preprocessing
      • Machine Learning Overview
        • Confusion Matrix
      • Linear Regression
        • Feature Scaling and Normalization
        • Regularization
        • Polynomial Regression
        • Error functions
      • Decision Trees
      • Support Vector Machines
      • Training and Tuning
      • Model Evaluation Metrics
      • NLP
      • Neural Networks
        • Perceptron Algorithm
        • Multilayer Perceptron
        • Neural Network Architecture
        • Gradient Descent
        • Backpropagation
        • Training Neural Networks
  • Business
    • Analytics
      • KPIs for a Website
  • Books
    • Statistics
      • Practice Statistics for Data Science
        • Exploring Binary and Categorical Data
        • Data and Sampling Distributions
        • Statistical Experiments and Significance Testing
        • Regression and Prediction
        • Classification
        • Correlation
    • Pragmatic Thinking and Learning
      • Untitled
    • A Mind For Numbers: How to Excel at Math and Science
      • Focused and diffuse mode
      • Procrastination
      • Working memory and long term memory
        • Chunking
      • Importance of sleeping
      • Q&A with Terrence Sejnowski
      • Illusions of competence
      • Seeing the bigger picture
        • The value of a Library of Chunks
        • Overlearning
Powered by GitBook
On this page

Was this helpful?

  1. Sciences
  2. Math
  3. Linear Algebra
  4. Orthogonality

Projections

PreviousOrthogonal SetsNextLeast Squares

Last updated 6 years ago

Was this helpful?

Projections

Let's project b onto a.

e is the error. We know that p is some multiple x of a. we'd like to find x. The key is that a is perpendicular to e. In other words that:

(because Ax = b -> AT * Ax = b * At, then factorize and make equation = 0).

If we simplify we get x:

The projection is, if we take p=ax

(when you multiply a vector b by a matrix, you always land in the column space). Therefore:

because P (capital P) is a matrix.

is a matrix (it's a column times a row which this is the case.)... We also know that P T = T so it's symetric.

Also, if I project twice, its the same result as projecting once, so P^2 = P.

Why project? Because Ax=b may have no solutions. So i'll try to solve the closest problem that I can solve. So therefore, I change b, and i chose the closest vector in the column space. Which means i'll solve Ax-hat=p instead, where p is the proejction of b in C(A) and it's the closest possible vector to b. The x-hat is there because it's not our initial x.

In 3 dimensions, this is the problem:

We have a matrix A, which has cols a1 and a2 (independent) and b, which is probably not in the plane. If it were we'd just take b. But we want the projection of b onto a, where e = b - p is perpendicular to the plane.

So we're looking for x-hat where p: which means that the error vector is perp to the plane. (In 2 dimensions p= xa, but in 3 dimensions, there is a1, and a2).

We have 2 equations because 2 x-hats and 2 a's.

Now let's put this into matrix form:

(b - A x-hat) = e. We're learning here that e is in the nullspace of A because the equation = 0 so they are orthogonal.

Our final equation to solve is:

Let's simplify and remember the definition of p, then combine both equations

So actually, P is the identify matrix IF and only IF A is square and invertible. If that is the case then the column space of A = R^n, so the projection of b onto the column space is the identify matrix, because b is already in the column space.

P is supposed to project that vector b to the nearest point in the column space.

If b is in the column space of A, then Pb = b because b we apply the identify matrix to b, which is b. If b is in the col space, then b = Ax. If we replace b by Ax in the second formula above, everything cancels, we are left with Ax, which = b.

Pb = 0 if b is perpendicular to the column space. if b is perpendicular to the column space, we also know it belongs to the nullspace of A transpose.

To sum up:

We have our col space and nullspace of A transpose. The projection of b onto the col space = p, and e is the error, both of them for b : b=p+e. This is what our matrix does.

For other A's, we can't simplify this formula and take the inverse, cuz it doesn't exist. Now let's use the propertises we know about P we defined earlier.

Is it symetrical? Yes, P T = P. How about P^2 = P? Yep it works out.

Now let's look at the application with least squares.