Use Sophia to knock out your gen-ed requirements quickly and affordably. Learn more
×

Chi-Square Test for Homogeneity

Author: Sophia

what's covered
This tutorial is going to run through a chi-square test of homogeneity. Our discussion breaks down as follows:

Table of Contents

1. Chi-Square Test of Homogeneity

A chi-square test of homogeneity is a test that uses multiple populations and tests to see if these populations are the same across categorical, or qualitative, variables. In other words, you are trying to determine if the distributions of categorical data differ across different populations.

Instead of comparing the distributions to some hypothesized distribution, you compare whether or not two sample distributions are significantly different from each other.

As with any chi-square test, you must follow these steps:

step by step
Step 1: State the null and alternative hypotheses.
Step 2: Check the conditions.
Step 3: Calculate the test-statistic and p-value.
Step 4: Compare your test statistic to your chosen critical value, or your p-value to your chosen significance level. Based on how they compare, state a decision about the null hypothesis and conclusion in the context of the problem.

EXAMPLE

Suppose that two colleges, the U and State, are worried about the student drinking behaviors, so they both independently choose random samples of their students. The results of the drinking behaviors are given in the table here:
Drinking Level The U State
None 140 186 326
Low 478 661 1,139
Moderate 300 173 473
High 63 16 79
981 1036 2017

The question is, does there appear to be a difference with drinking behaviors between the two colleges? Obviously, those who drink a lot represent the lowest category in both schools, and those who drink a little represent the highest in both schools. Perhaps the schools are not that different. You can run a test, though, to make sure whether that's the case or to dispute whether that's the case.

Step 1: State the null and alternative hypotheses.
In the test for homogeneity, the null hypothesis is that they are the same distribution, or that the two sample distributions are not significantly different; the distribution of drinking levels is the same at the U as it is for State. The alternative hypothesis is that the two distributions are not the same.
  • H0: The distribution of drinking levels is the same for The U as it is for State.
  • Ha: The distribution of drinking levels is not the same for The U as it is for State.
  • α: 0.05
Choose a significance level of 0.05.

Step 2: Check the conditions.
One of the conditions is going to be that the expected values are all greater than five. But the question is, how do you calculate expected values? You can't do the same thing you did in a goodness-of-fit test. Instead, you have to think about it a different way. Of the 2,017 students, 326 of them don't drink at all, which is equal to 16.2%.

Drinking Level The U State
None 140 186 326
Low 478 661 1,139
Moderate 300 173 473
High 63 16 79
981 1036 2017

The idea here is that if the two distributions were homogeneous, then it would be 16.2% at The U that don't drink at all and 16.2% at State that don't drink it all.

(0.162)(981) = 158.56 The U students expected to not drink at all (0.162)(1036) = 167.44 State students expected to not drink at all
So we would expect 158.56 students from The U and 167.44 students from State that participated in this survey to be in the "None" row.
Take a look at how this was calculated:

E x p e c t e d space
When you calculated the expected value for "None" and The U, you divided 326 by 2017 to get the 16.2%, and then multiplied by 981. In other words, we multiplied the total of "None" by the total of "The U", and divide all that by the grand total.

In general, what we can say is that the expected values for each cell are going to be the row total times the column total over the grand total.

formula to know
Expected Value for Cell in Chi-Square Test for Homogeneity
E x p e c t e d space V a l u e space f o r space C e l l equals fraction numerator open parentheses R o w space T o t a l close parentheses open parentheses C o l u m n space T o t a l close parentheses over denominator G r a n d space T o t a l end fraction

From that, it's not too hard to create an entire table of expected values.
Observed Table Expected Table
Drinking Level The U State Drinking Level The U State
None 140 186 326 None 158.56 167.44 326
Low 478 661 1139 Low 553.97 585.03 1139
Moderate 300 173 473 Moderate 212.54 224.46 473
High 63 16 79 High 38.42 40.58 79
981 1036 2017 981 1036 2017

The table on the left is what you observed; the table on the right is what you expected. Again, these values don't have to be integers.

The conditions for this hypothesis test are met: you have two independent random samples and all cell counts in the expected table are at least five, the smallest one being 38.42.

Step 3: Calculate the test-statistic and p-value.
At this point, you can calculate the chi-square statistic using the observed and expected. Recall that the formula for a chi-square statistic the observed minus expected, squared, over expected. Add all of them up.

formula to know
Chi-Square Test
X squared equals begin inline style stack sum begin display style fraction numerator left parenthesis O minus E right parenthesis squared over denominator E end fraction end style with blank below and blank on top end style

hint
You can also use technology to calculate the chi-square test statistic and the p-value.

The chi-square test statistic that you would obtain is 96.6.

The degrees of freedom, in this case, can be found by multiplying the number of rows minus one times the value of the number of columns minus one. This is technically the general rule and can be applied to the previous chi-square tests.

formula to know
Chi-Square Test Degrees of Freedom
D e g r e e s space o f space f r e e d o m equals left parenthesis r o w space t o t a l minus 1 right parenthesis left parenthesis c o l u m n space t o t a l minus 1 right parenthesis

Let's take another look at our data:
Drinking Level The U State
None 140 186
Low 478 661
Moderate 300 173
High 63 16

In this case, there were four rows (none, low, moderate, and high) and two columns (The U and State):

D e g r e e s space o f space f r e e d o m equals left parenthesis 4 minus 1 right parenthesis left parenthesis 2 minus 1 right parenthesis equals left parenthesis 3 right parenthesis left parenthesis 1 right parenthesis equals 3

So, the degrees of freedom is going to be equal to three. The chi-square statistic and p-value can all be obtained using technology and we get a corresponding p-value of 0.001. This is a very low value, less than 0.05.

Step 4: Compare your test statistic to your chosen critical value, or your p-value to your chosen significance level. Based on how they compare, state a decision about the null hypothesis and conclusion in the context of the problem.

Since the p-value is lower than the significance level, you reject the null hypothesis and conclude that there is a difference in drinking behavior between the students at the U and the students at State.

term to know
Chi-Square Test of Homogeneity
A test used to determine if there is no difference in a categorical variable across several populations or treatments.

summary
The chi-square test of homogeneity allows you to test whether two populations have significantly different distributions across the categories. The expected counts for each cell is the product of the row total and the column total divided by the grand total. The conditions are the same as they are for a goodness-of-fit test, in that all the expected values have to be greater than five.

Good luck!

Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

Terms to Know
Chi-square test for homogeneity

A test used to determine if there is no difference in a categorical variable across several populations or treatments

Formulas to Know
Chi-square Degrees of Freedom

d e g r e e s space o f space f r e e d o m space equals space left parenthesis r o w space t o t a l minus 1 right parenthesis left parenthesis c o l u m n space t o t a l minus 1 right parenthesis

Expected Value for Cell in Chi-Square Test for Homogeneity

fraction numerator left parenthesis R o w space T o t a l right parenthesis left parenthesis C o l u m n space T o t a l right parenthesis over denominator G r a n d space T o t a l end fraction