Business Applications
Last updated
Last updated
Framework:
Lever: things an organization can do to have an impact on the objective. How are we going to achieve the objective. Bulding a model is not a lever.
Ex: Churn: The company can call the clients, can give them a promo,
Data : what data does the org have that can help the lever to achieve the objective.
Ex: Fraud - who's not going to pay? What's the data that they have available and what could they collect?
Model - first you have a simulation model: it's about understanding what happens if.... I give this promo or show this ad.
You could build this simulation model in Excel with all the variables, costs, revenu, etc... and find where some things are stochastic. We need a predictive model of how likely an employee is to stay if we change their salary, how likely they are to leave with their current salaray, how about next year? Then you can combine them together and optimize that. If I increase by x, you check what happens.
Do this instead of measuring your model with AUC, or RMSE, or whatever.
Then you can think of all the levers you can pull, then you can look at your random forest and look at all the features that are strong drivers for the outcome, then the intersections = the levers we can pull that actually matter.
Then we can build the partial depends to check that if we did change this, what would happen?
Ask your self, what would the person who has the data, the ceo, cmo, wtv, your client, what would they want to know?
Another example: lead scoring - if person X has the highest prob of buying, would you spend money reaching out to this person? Maybe not, this is why you need a simulation. What's the likely change in person x'bs behavior if I send out my best sales person and encourage person to sign.
This is an oportunity for data scientist to bring it all together, to move beyond predictive modeling. They can be more involved in strategy and and use ML to really help a business with all of its objectives.
To recap :
What's the business objective we're trying to solve?
What levers can we pull to make it happen?
What data do we have or can we collect to help make us a better decision on how we can pull the lever.
How can we use model interpretation to find the features that have the most impact on our outcome, how can we usepartial dependance to understand what will happen, how we can build a simulation and optimize it to make the best decisions.
Outcome of ML:
It's important to make the output of the model readily available to the decision makers in an org - whether they are at the top strategic level, all the way to the operational level - maybe at the level of speaking to a customer.
Readmission risk : prob that someone will come back to the hospital.
The predictive model is good in itself - maybe we should send this person home yet. But wouldn't it be nice if we had the tree interpreter. And know that the reason that he is high risk, is that we don't have a recent AKG for them, and without it we can't have a high confidence about their cardiac health. In which case the solution isn't to keep them in the hostipcal for a few weeks, it's let's give them an AKG (??!!). This is interaction between interpretation and predictive accuracy.
What does it tell us? Why are we interested in this?
It's basically a single observation interpretation stuff. We take one row, one observation, and check how confident we are about that. How much variance is there in the trees about that. Are there any groups (category 1,2,3,4 of feature x) where we are very unconfident?
Ex: Is Julien a good risk or bad risk? SHould we loan him 1 million$? The random forest would say, I think he's a good risk, but I'm not at all confidence. In which case we'd say, maybe we won't give him 1mill $.
So if we're putting out an algo which is making big decisions that could cost a lot of money, you probably not so much care about the average prediction of the RF, but maybe you care about the average - a few std deviations. Find what's the worst case prediction? Maybe there's a whole group we're kind of unconfident about.
How is it calculated? Variance of the predictions of the trees - (normally the prediction is just the average of the trees).
What does it tell us, why are we interested in this?
How its calculated (for a particular feature)? You take a feature, randomly shuffle it, and then take the validation score (R^2) after doing this. Another way to put it - instead of retraining our RF, using the existing RF, we can check how important a feature is by 1. shuffling the feature - so now its useless. 2. Then you look at your model score, RMSE or R¨2. Then you can make a table of all the features / scores. Then for importances, you take the difference between the real model, and each model with a shuffled feature.
It's important to look at the relative importance of the features.
Once you know which variables best predict something, you can focus on better collecting those variables as well.
The technique of getting feature importance can be used for any model. RF has it by default, but you could easily implement something similar for Neural Nets.
The vast majority of the time, when someone shows us a chart, it'll be a univariate chart. Plot x against y, and managers make a decision. But RL problems have a lot of interactions going on. Maybe there was a recession at some point, or ppl were buying a different type of equipement.
We want to know, all other things being equal, what is the relationship between x and y, year made and sale price.
How do we calculate it? We're going to leave all other features the same, but take 1 feature, Yearmade, and creat a partial dependance plot. Instead of randomly shuffling Yearmade, we'll replace every single value with exactly the same thing. Ex: 1960 - Then like before, we pass this through our RF to get back a set of predictions. y-1960. Plot this on a chart. We can do this for all of our dates and plot them. We would get this:
Each one of the blue lines is a single row. Then we can take the median and see, on average, what the relationship is between price and YM all other things being equal.
Why does this work? A simplified approach: what's the average auction for a sale? Our features would be the same, location, model type, etc. Then we run this row through the RF, but with YM with 1960. Again with 1961, 62, etc and plot all of them. But this doesn't work if we are selling lots of different things. So we use our data which has a bunch of different examples of what we sell and who we sell to. So the blue lines are extuacl examples of these relationship.
Then what we can do is instead of just plotting the median, we can do a cluster analysis.
This is kind of like feature importance, but this tree interpreter is like FI but for a particular observation. E: hospital readmission - which feature for that particular patient is going to impact the y. How can we change that? Calculated from the prediction of mean, then seeing how each feature is changing the behavior of that particular patient.
Answers the question - why is this patient likely to be readmitted?
Prediction of the r
Bias = the root of the tree, so the average for everyone.
Contributions: how important is each of the features.
How do we calculate?
1 obs = 1 path in the tree.
This tells us for each feature, how much the price changed.
If we do this for every row, it would be another way to do feature importance.
This happens because the RF is only taking the average of what it's seen already.
RF is not able to extrapolate to data it hasn't seem before, such as future time periods. The only thing RF does is to average stuff it has already seen.
One way to deal with this - use a neural net. Use all the time series techniques to fit some kind of time series and de-trend it. And use the RF to predict those. Or use a Gradient boosting machine which handles this nicely. They can't extrapolate to the future but they can deal with time dependant data more conveniently.