# Logistic Regression

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-LgmjuhnY-BZeHY0oOIT%2Fimage.png?alt=media\&token=7e90e32d-aac3-45a2-9279-8246dc1885dc)

This is the probability of an event occuring divided by the probability of the event not occuring.&#x20;

&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-LgmkRhGrC9a9adIE0Ke%2Fimage.png?alt=media\&token=aebfe84e-4462-4443-9277-faaabd18c4b6)

It's called the log ratio. And by taking the log, we control our probs to be between 0 and 1.&#x20;

With algebra the equation now looks like this:&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-LgmkdnR4GmrecC6E8oo%2Fimage.png?alt=media\&token=fecea867-dca2-4596-986b-634fe75904bd)

This solves the probability directly.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-Lgmkthy3Bzarb0A1NH6%2Fimage.png?alt=media\&token=797e8626-f93f-49d9-916d-de5d0f56a542)

### Interpretation

We need to exponentiate each of the coefficients. Then, with quantitative variables we would say, for a 1 unit increase in your explanatory variable x1, we expect a multiplicative change in the odds of being in the 1 category of e^b1 holding all other variables constant.&#x20;

For categorial interpretations: when in category x1, we expect a multiplicative change in the odds of a 1 by e^b1 compared to the basedline. &#x20;

So if we have:

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-Lgn2Ahjwf05cU-23nvF%2Fimage.png?alt=media\&token=a0d2a269-2631-4b6f-a350-e7de1530b6a3)

For the weekday dummy variables we would say: on weekdays, fraud is 12.76 times as likely as on weekends holding all else constant.&#x20;

For duration: for each 1 unit increase in duration, fraud is 0.23 times as likely holding all else constant.&#x20;

With returned values less than 1, it is often beneficial to obtain the reciprocal. This changes the direction of the unit decrease to increase.&#x20;

Therefore: for each 1 unit decrease in duration, fraud is 4.32 times as likely holding all else constant.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-Lgn96kFZk8qCIa5wFc2%2Fimage.png?alt=media\&token=dc05a376-62a3-496b-a579-d77d9e654d35)

### Accuracy

When determining how well your logistic regression model is doing at predicting the correct labels - accuray. &#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-LgnA1X29fMm1KA9SB98%2Fimage.png?alt=media\&token=d40f53bd-bfb5-4336-a18e-f0ce8d0b86e2)

There are some cases where accuracy won't work well particularly when you have large class imbalances in your data set.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LgmjlQUsF7qEJOk2zoM%2F-LgnAB4v1hqph1D26nsW%2Fimage.png?alt=media\&token=e94cef6b-3609-40d1-a68b-8a72849e6666)

So we'll go over some of the other metrics to determine whether your model is performing well or not.&#x20;

### Interpreting interaction with logistic regression

<http://www.cantab.net/users/filimon/cursoFCDEF/will/logistic_interact.pdf>
