Table of Contents |
A chi-square statistic is a particular test used for categorical data. It measures how expected frequency differs from observed frequency.
The observed frequency is the number of observations we actually see for a value, or what actually happened. The expected frequency is what we would expect to happen. It is the number of observations we would see for a value if the null hypothesis was true.
To measure the discrepancy between what you observed and what you expected, we need to calculate the chi-square statistic, which is calculated this way:
EXAMPLE
Suppose you have a tin of colored beads, and you claim that the tin contains the colored beads in these proportions: 35% blue, 35% green, 15% yellow, and 15% red. These will be used to find the expected frequencies.Color | Percentage |
Expected out of 10 |
---|---|---|
Blue | 35% | 3.5 |
Green | 35% | 3.5 |
Yellow | 15% | 1.5 |
Red | 15% | 1.5 |
Color | Expected | Observed | |
---|---|---|---|
Blue | 3.5 | 3 | 0.0714 |
Green | 3.5 | 1 | 1.7857 |
Yellow | 1.5 | 2 | 0.1667 |
Red | 1.5 | 4 | 4.1667 |
Sum | 6.1905 |
IN CONTEXT
Suppose there are four flavors of candy in a bag: cherry, lemon, orange, and strawberry. The company claims the flavors are equally distributed in each bag.
After opening a bag of candy and sorting the flavors, the following counts were produced:
Flavor Observed Cherry 11 Lemon 15 Orange 12 Strawberry 12 Total 50
In equal distribution, it is helpful to think of the proportions of each flavor and then make a hypothesis based on those proportions. For the null hypothesis, we can assume that the proportions for the four flavors are the same. The alternate hypothesis would state that is is not true; that the proportions are not the same.
- H0: pC = pL = pO = pS
- Ha: The proportions of the flavors are not the same.
Next, we need to compare the observed frequency with the expected frequency. The observed frequencies are the same as the above counts.
To find the expected frequency, we need to find the number of occurrences if the null hypothesis is true, which in this case, was that the flavor proportions are equal, or if the four flavor categories were all evenly distributed. Counting up all the flavors in that bag of candy gives us a total of 50 candies. If the flavor categories were evenly distributed among the 50 candies, we would need to divide the total candies evenly between the four flavors, so 50 divided by 4, or 12.5 candies. This means we would expect 12.5 candies in each flavor.
Flavor Observed Expected Cherry 11 12.5 Lemon 15 12.5 Orange 12 12.5 Strawberry 12 12.5
We can then use the chi-squared formula to calculate the chi-square statistic to compare the discrepency between the expected and observed frequencies.
A middle school is gathering information on its after-school clubs because it was assumed that the distribution of students in each grade was evenly distributed across the clubs, meaning there were the same amount of 6th graders in each club, the same amount of 7th graders in each club, and the same amount of 8th graders in each club.
This table lists the number of students from each grade participating in each club.
6th graders 7th graders 8th graders Coding Club 12 14 8 Photography Club 7 11 15 Debate Club 9 5 13
Suppose we want to find the observed frequency for 7th graders participating in the photography club. Using the chart, we can directly see the observed frequency for 7th graders participating in the photography club is 11.
To find the expected frequency for 7th graders participating in the photography club, we need to find the number of occurrences if the null hypothesis is true, which in this case, was that the three options are equally likely, or if the students in each grade were all evenly distributed across the clubs.
First, add up all the students in the 7th-grade column:
If each of these three clubs were evenly distributed among the 30 7th graders, we would need to divide the total evenly between the three options:
This means we would expect 10 7th graders to participate in the coding club, 10 7th graders to participate in the photography club, and 10 7th graders to participate in the debate club.
In summary, the observed and expected frequencies for 7th graders participating in photography club is:
- Observed: 11
- Expected: 10
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.