This tutorial will teach you about statistical significance, which is a huge term when it comes to hypothesis testing. Specifically you will focus on:
When you run a significance test, you need to determine what level of departure is considered a significant departure from what you would have expected to have happen.
IN CONTEXT
You work in research at Liter O'Cola company. They've developed a new Diet cola that they believe is indistinguishable from the Classic. So you obtain 120 individuals do a taste test. And if the claim is true, what percent of people should select the correct cola just by freak chance, just by guessing?
Well, if Liter O'Cola's claim is correct, about 50% of people would just guess correctly. And 50% of people would guess incorrectly if presented with the two options. And so now the question is, at what point are we going to stop believing Liter O'Cola's claim?
Suppose 61 people were able to pick the diet cola. Is this evidence against the claim? Now, 61's not that different from 60, so you're going to say no. That's not that much different. This is not significantly different from what you would expect.
Conversely, suppose 102 people were able to pick the diet cola correctly. Would that be evidence against the company's claim?
In this case, you would probably say so. 102 is way over 60. And 60 is what you would expect had they been randomly guessing. It's pretty the unusual that you would see 102 people get it right by randomly guessing out of 120. So this is evidence that some people can taste the difference.
This is the whole idea of statistical significance. 61 out of 120 is not a significant result, meaning that's not evidence against the claim. It's not evidence against the null hypothesis. Conversely, take a look at the 102. That would be evidence against the null hypothesis, because it's so much higher than what we would have expected.
Statistical significance means that you doubt that the results that we obtained are due to chance.
Instead, you believe that it's part of some larger trend. Like in the cola example, you don't believe the null hypothesis that people can't distinguish. You believe that the trend is that people in fact can distinguish.
So if 61 people correctly identify it, you're not convinced that over half can identify the diet. The difference might be only due to chance. In fact, it probably is. On the other hand, the difference of 42 from what you expect is probably not due to chance. That would be called statistically significant.
It's important to make the distinction between practical significance and statistical significance.
They're not necessarily the same thing. Suppose you had a large enough sample. It's possible if the sample size was large enough that even something as not different from 50 as 50.1% correct guessing could be considered statistically significant with the right sample size, even though 50.1 is not that different from 50%.
So the statistical significance argument is based largely on sample size and how far off from this 50% percent claim you are. If the sample size is big, you don't need to be very far off. If the sample size was small, you need to be further off in order to claim significance. But if the sample size is big, you might not get something that's practically significant. You wouldn't shout this 50.1% mark from the rooftops.
Statistical significance is the extent to which a sample measurement is evidence of a trend, like being able to taste the difference between regular cola and diet cola, or whether the difference is not that big a deal and you can write off the difference or attribute that difference to chance. It's not the same as practical significance, although sometimes it is. And sometimes very small differences can be statistically significant, though not have a whole lot of real-life meaning.
You learned about statistical significance and how you're going to measure it versus practical significance and how those two are not necessarily the same.
Good luck!
Source: This work adapted from Sophia Author Jonathan Osters.
A large difference from what we would expect. For large samples, small differences will not have practical significance, though they may be statistically significant.
The statistic obtained is so different from the hypothesized value that we are unable to attribute the difference to chance variation.