Source: Graphs and tables created by Jonathan Osters
In this tutorial, you're going to learn about test statistics. This is the statistic that we calculate using the statistics that we already have when we're running a hypothesis test. So let's take a look. A test statistic is a relative distance that the statistic that we got is from the hypothesized value of the parameter.
So when we have a hypothesized value for the parameter from the null hypothesis, we might get a statistic that's different than that number. And so that's how far it is from that parameter. It's measured in terms of how many standard deviations from the mean of the sampling distribution that statistic happens to be. So all that's saying is it's a z-score.
So a test statistic is going to be these words is the same as this formula. Test statistic is equal to the statistic minus the parameter. That's how far away in absolute distance. And then how many standard deviations is it? It's that difference divided by the standard deviation of the statistic.
So for instance, if we're dealing with means, the statistic that we obtain is a sample mean, x bar. The parameter from the null hypothesis is the hypothesized population mean mu. And standard deviation of the statistic that we have is sigma over square root of n. Therefore, the z-statistic that we can calculate is our test statistic. And its x bar minus mu over this fraction.
Meanwhile, for proportions, the statistic that we obtained from a sample is a sample proportion, p hat. The parameter from our null hypothesis is a value, p. And the standard deviation of the p hat statistic is going to be the square root, square root of p times q, which is 1 minus p over n-- all of that inside the square root. And so what we obtain is another z-statistic. The z-statistic is p hat minus p from the null hypothesis divided by that fraction.
So what does this look like? Both these situations have conditions under which they're normally distributed. So we can use the normal distribution to analyze and make a decision about the null hypothesis. So this normal curve operates under the assumption that the null hypothesis is in fact true.
And so what do we see? If we're dealing with means, the mean is here. And the standard deviation of the sampling distribution is this, sigma over square root of n. And if our x bar maybe is over here, the test statistic will become a z-score. And what we are going to find is what's called a p-value, the probability that we would get an x bar at least as high-- in this particular case, it's one sided-- as we got if the mean really is over here, mu.
We could do that, or if it was a two sided test, it would look like this. Another way to determine statistical significance not using a p-value would be with what's called a critical value. So this corresponds to the number of standard deviations away from the mean that we're willing to attribute to chance. So we might say that anything within this green area here is a typical value for x bar. And we're willing to attribute any deviations from mu to chance if it's in this region. This is the most typical 95 percent of values. If it's outside that region, it would be within the most unusual 5%.
And so we would be more willing to reject the null hypothesis in that case. A test statistic far from 0-- that is a z-statistic that's far from 0, provides evidence against the null hypothesis. So one way would be to say all right, well, if it's further than two standard deviations, and it's in the outermost 5%, we're going to reject the null hypothesis. And if it's in the most innermost 95 percent, we will fail to reject the null hypothesis. And with two-tailed tests like we have here, the critical values are actually symmetric around the mean. That means that if we use positive 2 here, we would be using negative 2 here.
There are some very common critical values that we use. The most common cutoff points are at 5%, 1%, and 10%. So if it's two-tailed, that was 1.96. I know we were saying two standard deviations on either side, but that actually is close to what it actually is, which is 1.96 standard deviations away.
Or if you were doing a one-tailed test with 0.05 as your significance level or a two-tailed test with rejecting the null hypothesis if it's among the most 10% extreme values, you'd use z-statistic critical value of 1.645. Or if you were doing a one-tailed test and you wanted to reject the most extreme 10% of values on one side, you'd use 1.282 for your critical value, or if you wanted to use 1%, two-tailed, it would be 2.576, or one-tailed, the 1% value would be 2.326.
So when we actually run a hypothesis test with the critical value, we should state it as a decision rule. So for instance, we'll say something like, we will reject the null hypothesis if the test statistic z is greater than 2.33. That's the same as saying that on a right-tailed test-- well, first of all, this is implying that the test is one-tailed, because we're saying that the rejection region is on the high of the normal curve.
And so this is a right tailed test saying reject the null hypothesis if the sample mean is among the highest 1% of all sample means that would occur by chance. So this is what we're not willing to attribute to chance. This is what we are willing to attribute to chance. So the decision rule is this is our line in the sand. Anything less than that will fail to reject the null hypothesis and attribute whatever differences exist for a mu to chance. Anything higher than 2.33 for a test statistic, we will reject the null hypothesis and not attribute the difference from mu to chance. And so that's a decision rule.
So to recap, when we actually go through a hypothesis test, we convert our sample statistic obtained, which is either x bar or p hat, into a test statistic, both of which are z's. We can then use the sampling distribution, which is approximately normal, to determine if our sample statistic is unusual or not, unusually high or unusually low or just unusually different, given that the null hypothesis is in fact true.
We can decide on different critical values based on how sure we want to be that the difference actually exists-- so different levels of what we would consider unusual. Do we need it to be among the highest 1% or the highest 5%? We can make that decision. And if our test statistic exceeds the critical value, we'll reject the null hypothesis. And that's our decision rule.
So we talked about test statistics, both of which were z's, p-values, which were the probabilities that you would get a statistic as extreme as what you've got by chance, and the critical values, which is our lines in the sand whereby if we exceed that number with our test statistic, we'll reject the null hypothesis. Good luck and we'll see you next time.