This tutorial will cover the chi-square statistic and how it's calculated. You’ll learn about:
In this tutorial, you will not run any significance tests because the chi-square tests have many different versions, and each of them will have their own tutorial. This tutorial is going to focus on how the statistic is calculated, as it's calculated the same regardless of the test you're running.
What is the Chi-Square Statistic?
Suppose you have a tin of colored beads. And you claim that the tin contains the colored beads in these proportions: 35% blue, 35% green, 15% yellow, and 15% red.
You draw 10 beads from the tin: 4 red, 3 blue, 1 green, and 2 yellow. This is called the observed counts.
The two yellow seems fairly consistent with the 15% claim. But the four red don't seem all that consistent with the 15% claim for red.
If the claim were true, you would have expected that out of 10 beads, 3 1/2 of them would be blue, 3 1/2 green, 1 1/2 yellow, and 1 1/2 red. This is called the expected counts.
You can't actually pull 3 1/2 blue beads, because you can't have half of a bead. So this is sort of an idealized scenario, representative of what you might expect in the long-term in samples of 10.
In your one sample of 10 beads, what you actually got was: 3 blue, 1 green, 2 yellow, and 4 red.
How can you measure the discrepancy between what you observed and what you expected?
Blue and yellow we're pretty close to what we expected, whereas, green and red were pretty far off.
The statistic that we use to measure discrepancy from what we expect is called chi-square, which is calculated this way:
The 3 1/2, 3 1/2, 1 1/2, and 1 1/2 was expected. The observed were the 3, 1, 2, and 4. So the chi-square statistic value is 6.1905.
You can use a table to calculate the chi-square statistic or you can use technology.
Now, it's worth noting that in this case, the conditions for inference with a chi-square test are not met. This is only meant to illustrate how a chi-square statistic would be calculated, although you can't do any real chi-square inference on this because the sample size isn't large enough.
The chi-square statistic is a measure of discrepancy across categories from what you would have expected in categorical data. You can only use it for data that appear in categories or qualitative data. The expected values may not be whole numbers since the expected values are long-term average values.
Thank you and good luck!
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS
The frequencies we would have expected within each of the categories in a qualitative distribution if the null hypothesis were true.
The frequencies within each of the categories in a qualitative distribution.
The sum of the ratios of the squared differences between the expected and observed counts to the expected counts.