Source: Tables created by Jonathan Osters
In this tutorial, you're going to learn about a chi-square test of independence. This is sometimes called a chi-square test of association. But the null hypothesis in this problem says that two variables are independent. So let's take a look.
335 students of different backgrounds-- that is, rural, suburban, and urban schools-- were asked if they had to pick one thing about school that was most important to them, would it be getting good grades, being popular, or being good at sports. And here's the distribution of responses.
This is to say, five urban students said that being good at sports was the most important thing. 87 suburban students said that grades were the most important thing. So the question is, does there appear to be an association, in fact, between geographic location-- school location-- and the answer choice to the question, the goal?
So we're going to run a chi-square test of independence. And what we're checking to see is if the distribution of answer choices-- grades, popular, and sports-- differ significantly for each school location. So are they associated or are they independent?
In the null hypothesis, we're going to say that school location and goal are independent. That is, they don't have an association with each other. The alternative hypothesis is that they do. At least one of these distributions of grades, popular, and sports is different for suburban, urban, or rural than the others. At least one is different.
The nice thing about the test of independence is that a lot of the conditions-- in fact, all the conditions-- and the mechanics, that is, how the chi-square and p-value are calculated, are the same as in a test of homogeneity. So a lot of the same procedures are done. In this case, just as it was as in the test of homogeneity, the expected value for each cell is equal to that particular cell's row total, times its column total, divided by the grand total for all the cells.
Meaning, when you have this table, this is the expected table that results. So this 74.72 was calculated by multiplying the row total for grades, times the column total for rural, divided by the 335 grand total for all students. And you can repeat that process for all the cells.
What we are interested in is whether or not all the expected counts are at least 5. The smallest one is here, at 7.21. So that's OK. So the conditions are met. We'll choose an alpha level of 0.05 in this problem. We'll use technology to calculate the chi-square statistic and the p-value.
In this case, chi-square is equal to 18.564. That is big. That causes the p-value to be very small-- 0.001. So, now we need to link our p-value to a decision about the null hypothesis. And so since the p-value is low, that is, smaller than 0.05, we reject the null hypothesis in favor of the alternative and conclude that there is an association between the two categorical variables-- school location and goal.
So to recap. Chi-square test of independence allow us to see whether or not two categorical variables are related or if they are not. If the variables had no effect on each other, then we would expect the distributions across the rows or the columns to be about the same. And the expected count for each cell is the product of the row totals and column totals, divided by the grand total.
So we talked about chi-square tests for independence. Good luck. And we'll see you next time.
A hypothesis test that tests whether two qualitative variables have an association or not.