Confidence Intervals

We can use bootstrapping and sampling distributions to build confidence intervals for our parameters of interest.

By finding the statistic that best estimates our parameter(s) of interest (say the sample mean to estimate the population mean or the difference in sample means to estimate the difference in population means), we can easily build confidence intervals for the parameter of interest.

Exercise

Bootstrap approach to creating confidence intervals.

For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below.

diffs_coff_under21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
    under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)
    
np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
# For the under21 group, we have evidence that the non-coffee drinkers are on average taller

Interval:

(1.0593651244624267, 2.5931557940679042)

Confidence intervals are useful, but they are bring problems when we use them to make decisions.

Practical VS statistical significance.

Ex: even though you proved an add to convert more in an ab test, maybe it's way more expensive and the other version works fine anyway, so you decide to not take the better performing one.

Traditional VS Bootstrapping for Confidence Intervals

If you are truly confident that your data is representative of the population of interest, then the bootstrapping method will give you a better representation of what the population parameters might be.

However, with large enough sample sizes, the traditional formulas seen below should provide similar results.

One educated, but potentially biased opinion on the traditional methods is that these methods are no longer necessary with what is possible with statistics in modern computing, and these methods will become even less important with the future of computing. Therefore, memorizing these formulas to throw at a particular situation will be a glazed-over component of this class. However, there are resources below should you want to dive into a few of the hundreds if not thousands of hypothesis tests that are possible with traditional techniques.

To learn more about the traditional methods, see the documentation here on the Stat Trek site on the corresponding hypothesis tests. In the left margin of this Stat Trek page, you will see a drop-down list of the hypothesis tests available, as shown in the image below.

Each of these hypothesis tests is linked to a corresponding confidence interval, but again the bootstrapping approach can be used in place of any of these! Simply by understanding what you would like to estimate, and simulating the sampling distribution for the statistic that best estimates that value.

Traditional approach

There are many hypothesis tests and their linked confidence intervals

The boostraping approach can be used for any of these. These are built in python. It provides nearly identican confidence intervals.

Some confidence intervals in the traditional way:

  1. Mean, proportion, mean 1 minus mean 2, proportion 1 minus proportion 2:

Other confidence interval terms

Margin of error

The confidence interval is + and - the margin of error to the sample statistic.

Confidence Intervals (& Hypothesis Testing) vs. Machine Learning

Confidence intervals take an aggregate approach towards the conclusions made based on data, as these tests are aimed at understanding population parameters (which are aggregate population values).

Alternatively, machine learning techniques take an individual approach towards making conclusions, as they attempt to predict an outcome for each specific data point.

Last updated