Hypothesis Testing
Last updated
Last updated
Hypothesis testing and confidence intervals allow us to use only sample data to draw conclusions about an entire population.
H0: Is the condition we believe to be true before we collect any data. Mathematically the null hyp is a statement of 2 groups being equal or that the effect is equal to 0.
The null and the alternative hypotheses should be competing and non overlapping.
The alternative hypotheses is usually associated with what you want or what you want to prove to be true. Mathematically the alternative holds a = > or < sign.
Ex: A new version of a page:
Type 1
Type 2
Type I errors have the following features:
You should set up your null and alternative hypotheses, so that the worst of your errors is the type I error.
They are denoted by the symbol \alphaα.
The definition of a type I error is: Deciding the alternative (H_1H1) is true, when actually (H_0H0) is true.
Type I errors are often called false positives.
They are denoted by the symbol \betaβ.
The definition of a type II error is: Deciding the null (H_0H0) is true, when actually (H_1H1) is true.
Type II errors are often called false negatives.
Example:
is the same as
Keep in mind: hypothesis tests and confidence intervals tell us about parameters not statistics.
The definition of a p-value is the conditional probability of observing your statistic (or one more extreme in favor of the alternative) if the null hypothesis is true.
In this video, you learned exactly how to calculate this value. The more extreme in favor of the alternative portion of this statement determines the shading associated with your p-value.
The P-value depends on your alternative hypothesis. It determines what is considered more extreme.
Therefore, you have the following cases:
If your parameter is greater than some value in the alternative hypothesis, your shading would look like this to obtain your p-value. You have the following cases:
If your parameter is greater than some value in the alternative hypothesis, your shading would look like this to obtain your p-value: We share greater than the statistic. Here we found that our sample mean was 5, so we shade greater than 5.
If instead we inverse our hypothesis to the following image, then our P-Value would look like below. So the P-Value would be much higher if again, the sample mean we found was 5. That is because
If your parameter is less than some value in the alternative hypothesis, your shading would look like this to obtain your p-value:
If your parameter is not equal to some value in the alternative hypothesis (if we have a not equal sign in your alternative), your shading would look like this to obtain your p-value:
In these cases we just care about statisticas that are just far from the null in either direction.
How to choose H0 or H1 based on P-Value:
If you're willing to make 5% of error, where you choose the Hypothesis incorrectly:
Acceptable way to draw conclusions
When conducting a hyp test we should be asking ourselves, is my sample representative of my population of interest? Are there ways to assure that everyone in my population is accuratly represented in my sample? If our sample isn't representative, our conclusions won't be good.
We should know the impact of the sample size and the role they play on our results. As sample size increases, everything (even the smallest differences ) becomes statistically significant, so we're always choosing the alternative hypothesis.
As sample sizes increase as part of our data world, we are moving away from hypothesis tests and using different techniques for exactly this reason.
Ex: which of 2 coffee types will sell more on average. Say type 1 sells on average better than type 2. However, there are millions of individuals who still prefer a different type. With large sample sizes, we should do better than this. Discussing averages leaves out an entire part of the population who preferred type 2.
With machine learning we can individualize the approach. Maybe we sell 20 types of coffee, and we know what type every member of the pop wants. So large sample size are detrimental to hyp testing, but are great for machine learning.
One of the most important aspects of interpreting any statistical results (and one that is frequently overlooked) is assuring that your sample is truly representative of your population of interest.
Particularly in the way that data is collected today in the age of computers, response bias is so important to keep in mind. In the 2016 U.S election, polls conducted by many news media suggested a staggering difference from the reality of poll results. You can read about how response bias played a role here.
Hypothesis testing takes an aggregate approach towards the conclusions made based on data, as these tests are aimed at understanding population parameters (which are aggregate population values).
When performing more than one hypothesis test, your type I error compounds. In order to correct for this, a common technique is called the Bonferroni correction. This correction is very conservative, but says that your new type I error rate should be the error rate you actually want divided by the number of tests you are performing.
Therefore, if you would like to hold a type I error rate of 1% for each of 20 hypothesis tests, the Bonferroni corrected rate would be 0.01/20 = 0.0005. This would be the new rate you should use as your comparison to the p-value for each of the 20 tests to make your decision.
Additional techniques to protect against compounding type I errors include:
Q-values - in the medical field
A two-sided hypothesis test (that is a test involving a \neq≠ in the alternative) is the same in terms of the conclusions made as a confidence interval as long as:
1 - CI = \alpha1−CI=α
For example, a 95% confidence interval will draw the same conclusions as a hypothesis test with a type I error rate of 0.05 in terms of which hypothesis to choose, because:
1 - 0.95 = 0.051−0.95=0.05
assuming that the alternative hypothesis is a two sided test.