# Simple Linear Regression

A linear comparison of  only two quantitative variables.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVSrpy0CtXmw_vPkxq%2Fimage.png?alt=media\&token=5fec161f-4fa7-4535-b2e1-9833776881c1)

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVT-7aQ5aVgdUrqo5X%2Fimage.png?alt=media\&token=fa57fc85-f4b4-44eb-a0d1-93f67f2e224b)

### Correlation Coefficient r

The strenght and direction of a linear relationship.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVU5ITOEf5zCgY4fwU%2Fimage.png?alt=media\&token=ad5950bc-b8ee-4c3a-98be-1f61e52cdd96)

It's always between -1 and 1. &#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVUGvc4RVx6-f3krox%2Fimage.png?alt=media\&token=595723d5-6c91-42d0-a8f8-6f3c34f199e0)

r is a very field dependant measure. The bounderies are different in social sciences because humans are unpredictable, than environmental sciences.&#x20;

But in general:&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVUuKfir06vd9Eq4mq%2Fimage.png?alt=media\&token=af078832-638b-4063-8070-56fdf36ab024)

### Linear regression line

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVVZxb-o-tWnaAzry8%2Fimage.png?alt=media\&token=cf68acda-fb4d-4f4d-ad70-93dc0385c0d9)

b0 is used for the statistic (sample) and Beta0 for the parameter - the population.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVVh48YyxmP_Mk3rKb%2Fimage.png?alt=media\&token=15b25fa1-e495-4d30-8b11-a237e3999a12)

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVVmIIuYC743R3xGHL%2Fimage.png?alt=media\&token=2527e7db-e651-4d3e-90c1-9be4c0f49482)

y-hat defines the point of the line while the y defines the point of the real measure.&#x20;

### Least-Squares algorithm

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVXUWssXmuHyghkZyv%2Fimage.png?alt=media\&token=b25cbfc2-d2de-403b-be2e-89ece3535bef)

The main algorithm used to find the best fit line is called the **least-squares** algorithm, which finds the line that minimizes

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVXQRfPG8lVnnTcwEp%2Fimage.png?alt=media\&token=a7ced7ab-6bf5-42e1-ab7b-bacf1695dc24)

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVXtV4gzwYyvpzpRYZ%2Fimage.png?alt=media\&token=e979d9e0-9f6e-4ed0-92a9-e814afe2462d)

There are other functions like below, but this one is built into python by default, it tends to work well for most data sets,  and has calculus properties that make it good to use.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgPe-qwl_UDkT87KBC9%2F-LgVYG0aYUzibUs-JYHY%2Fimage.png?alt=media\&token=ad56774c-ecb2-49f9-a046-2061a4deb616)
