Source: Graphs created by Jonathan Osters
In this tutorial, you're going to learn about the power of a hypothesis test. Now, you might wonder what is power? Well, power is the ability of a hypothesis test to detect a difference that is present.
So this is the standard null hypothesis curve. The mean from the null hypothesis is here in the middle, and there are two tails, two rejection regions, where we either reject the null hypothesis or we fail to reject the null hypothesis. So these are our lines in the sand for a two sided test.
So suppose that the mean is actually all the way out here. Now, that means that because the mean is actually different than the one from the null hypothesis, we should reject the null hypothesis. What we end up with is an identical curve to the original normal curve. But if you take a look at this curve, the way the data is actually behaving, this is the way we thought it should behave based on the null hypothesis, but this is the way the data is actually going to behave.
This line in the sand still exists, which means that because we should reject the null hypothesis, this area over here is a mistake. Failing to reject the null hypothesis, being to the left of the line in the sand, is wrong, if this is actually the mean, which is different from the null hypothesis' mean. So this is a type II error.
This area on the other side, where we are correctly rejecting the null hypothesis when a difference is present, is called power. So power is the probability of rejecting the null hypothesis correctly, so rejecting when, in fact, the null hypothesis is false. It's a correct decision.
Power, we don't need to worry about calculating it, although we could calculate it by hand. It's almost always done using technology. So we're not going to be responsible for calculating power. We can have technology do it for us.
What we should understand, though, is that there are two different ways to increase the power of a hypothesis test. One way would be imagine if these normal curves were skinnier. That would move the line in the sand over this way, right? That would move the critical value over to the left. If both of these normal curves were skinnier, they would look like this.
So they are now identical, but they're skinnier than they were before. And notice, there's a lot less orange space, and a lot more yellow space. How do we do that? How do we make these curves skinnier?
Well, we decrease their standard deviation. The standard error is the standard deviation of, in this case, x bar. So how do we decrease the standard error? Well, it was sigma over the square root of n, which means if we make n bigger, because it's in the denominator, the standard error will go down, which means that these will have less spread, and there will be less overlap of this curve with that curve.
Now, the problem with increasing sample size is that maybe you have logistical constraints, like time or money. So you have to make those decisions, if you are the person actually doing the sampling. But if you increase the sample size, and it's worth it to you, then go ahead and do it, because that will increase the power of the test.
Now, suppose we didn't change it, and kept the sample size the same. What else is there to do to increase the amount of yellow space and decrease the amount of orange space? Well, we could actually just literally move the critical value in closer to mu from the null hypothesis. We could move these critical values, these gates, in. So take a look, that's what we're going to do right here.
If you notice the amount of blue space increased. Notice, the amount of yellow space is also bigger than it was before. I'm going to transition back and forth. That's what it was before, that's what it is now. More yellow space, less orange space.
Now, what did we do by moving these in? We actually increased the amount of blue area, which means that we increased the significance level. We learned in a different tutorial that the significance level is the probability of a type I error. So, in essence, we're actually just trading out one error for the other. By decreasing the amount of orange space, we're decreasing the probability of a type II error, but we're increasing the probability of a type I error. And in certain situations, we have to make that decision, as to whether or not that is actually worth it to us.
So to recap, the power of a significance test is the probability that the null hypothesis is rejected, given that it is really false. So we're using the alternate normal curve, as opposed to the one from the null hypothesis. There are two main ways to increase the power of the significance test. One is by increasing the sample size, which then decreases the standard error of the distribution, or you can increase the significance level, alpha. Both of these have benefits, by increasing the power, but they both have negative trade offs.
So we talked about power, and all the different things that we can do to increase it. Good luck, and we'll see you next time.