Logistic Regression
Last updated
Last updated
This is the probability of an event occuring divided by the probability of the event not occuring.
It's called the log ratio. And by taking the log, we control our probs to be between 0 and 1.
With algebra the equation now looks like this:
This solves the probability directly.
We need to exponentiate each of the coefficients. Then, with quantitative variables we would say, for a 1 unit increase in your explanatory variable x1, we expect a multiplicative change in the odds of being in the 1 category of e^b1 holding all other variables constant.
For categorial interpretations: when in category x1, we expect a multiplicative change in the odds of a 1 by e^b1 compared to the basedline.
So if we have:
For the weekday dummy variables we would say: on weekdays, fraud is 12.76 times as likely as on weekends holding all else constant.
For duration: for each 1 unit increase in duration, fraud is 0.23 times as likely holding all else constant.
With returned values less than 1, it is often beneficial to obtain the reciprocal. This changes the direction of the unit decrease to increase.
Therefore: for each 1 unit decrease in duration, fraud is 4.32 times as likely holding all else constant.
When determining how well your logistic regression model is doing at predicting the correct labels - accuray.
There are some cases where accuracy won't work well particularly when you have large class imbalances in your data set.
So we'll go over some of the other metrics to determine whether your model is performing well or not.
http://www.cantab.net/users/filimon/cursoFCDEF/will/logistic_interact.pdf