Hi. This tutorial covers the chi-squared test for homogeneity. So let's start with the definition. Chi-squared test for homogeneity is a type of hypothesis test to see if there is no difference in a categorical variable across several populations or treatments. So let's consider the four-step procedure for this type of test.
Step 1, formulate the null and alternative hypotheses and choose a significance level. So what your null hypothesis is generally going to be is that there is no difference in a categorical variable across several populations. Your alternative hypothesis would be that there is a difference.
Step 2-- these are the same conditions as a chi-squared test for goodness-of-fit, but the data comes from a random sample. All expected counts are at least 5, and individual observations are independent. Step 3, you are going to calculate a test statistic, chi-squared, and find the p-value. And step 4, decide whether to reject or not the null hypothesis and draw a conclusion.
So let's kind of see how the mechanics of this type of test work. So here's an example. So below is the data that displays the political affiliations of 50 men and 50 women. Let's assume that these were two separate random samples of 50 men and 50 women. So now we have DFL, GOP, and other for the political affiliations, and then men and women, and then we have the observed counts in the table here.
So now is there a difference in the political category proportions for men and women? So we want to know, are these two populations of men and women homogeneous or does it seem like they have differences in the political party that they support?
So let's go ahead and run through the four steps for this example. So let's start back with step one, formulate the null and alternative hypotheses. So just like always, null alternative. And what we just do for the null hypothesis is we just write a sentence. So instead of saying that a parameter equals something, we're just going to write a sentence here.
And the null hypothesis is that there's no difference in a categorical variable across several populations. So we would say that for the null hypothesis, the political category proportions are the same for men and women. So that's going to be my null hypothesis. The political category proportions are the same for men and women.
And then my alternative hypothesis-- all I need to do simply is just say, the null is false. So that's what you would write for your null and alternative hypotheses. And then let's pick a significance level. Again, generally, we stick to 0.05. This is giving me the probability of a type I error being 5%.
Step number 2 is check the conditions of the test. So again, we have three-- comes from a random sample, expected counts are at least 5, the individual observations are independent. So let's start with the first one, data comes from a random sample. So we said when I presented the data that we did have data from a random sample. Let's come back to the second one, expected counts are at least 5.
But now the third one, individual observations are independent-- I think we can assume that the political affiliation of one person did not affect the political affiliation of the next person, so we're going to go ahead and say that our observations are independent there.
So now let's go ahead and test to see if the expected counts are at least 5. So I'm going to pull back the table here, and I'm just going to put the expected counts in parentheses right next to the observed counts. So you kind of have to think about how these are calculated, but we know that we have 50 men and 50 women. So I'm going to put the column totals at the bottom here.
Now, what you need to do next is think about how many just people in general fell into each of the political categories here. So all we need do is calculate some row totals. So for DFL, I have 19 and 24, which gives me a total of 43. GOP actually also is 43. And then for other, we ended up with 14 here. So if we add those up, those should give us 100. If we add these two up, that should give us 100.
So now what we want to do is assume that our null hypothesis is true because that's how we'll base our expected values. So if these two populations are, in fact, homogeneous, these 43 DFLers would be split up equally between men and women. So if we split 43 in half, because half are men and half are women, our expected counts here would be 21.5, 21.5. So that's how many DFL men we would expect and how many DFL women we would expect.
Now, since GOP is the same-- also had 43-- we're going to expect the same number of men and women in this case, also. And then for our independents or our other category, we had 14. Now, if we assume, again, that our populations are homogeneous, we should have 7 and 7 fall within that category. And we can see now that all of our expected counts are greater than 5, so that condition is met, so we'll go ahead and just move right to step 3.
We want to calculate a test statistic, chi-squared, and then find the p-value. Remember, chi-squared is the sum of o minus E squared over E. So what I'm going to do is going to do this on my calculator. And I'm going to go back to my table, and I'm going to do all of this in a list. So I'm going to put my observed values in L1, and I'm just going to put them in no particular order.
And then I'm just going to make sure that I match up my corresponding expected count with my corresponding observed count. So then in L3, what I'm going to do is do o minus E squared over E. So I'm going to type in a formula. So it's going to be L1 minus L2, o minus E, squared, over my expected, which is E. So these are all of the values that will contribute to chi-squared.
So my next step is to take the sum, so I'm going to go ahead and do that, so the sum of L3. And this is going to be my value of chi-squared, so I'm going to write that down. So in my situation here, chi-squared was equal to 1.076.
Now what I'm going to do is calculate my p-value. So my p-value-- again, using my calculator, I'm going to go down to chi squared CDF and type in my chi-squared value. Then my second argument is going to be my value that represents positive infinity.
My third argument is degrees of freedom. If I have a table like this, I calculate the degrees of freedom by taking the number of rows minus 1 times the number of columns minus 1. So I had 3 rows, so minus 1 is 2. And then I had 2 columns, so 2 minus 1 is 1. So I'm going to end up with 2 degrees of freedom here. Now if I hit Enter, this will be my p-value. So my p-value is 0.584.
And then remember, I'm going to compare that to my value of alpha when I get to step 4. So in step 4, p-value was 0.584. My alpha value was 0.5. Of So what I'm going to say here is that since p-value is greater than alpha, reject the null-- sorry, fail to reject the null, excuse me. Fail to reject the null. So there is no evidence to conclude that males and females are not homogeneous. And that's referring to political party affiliation. So we don't have evidence to conclude that the males and females are not homogeneous.
That's been your tutorial on the chi-squared test for homogeneity. Thanks for watching.