First, please create an account

Already have a Sophia account?

Chi-Square Test for Association and Independence

Author: Ryan Backman

Video Chapters

( 00:00 - 00:25 ) Definition of Chi-Square Test for Association/Independence

( 00:26 - 01:16 ) Discussion of Four Steps of Test

( 01:17 - 02:08 ) Introduction to Example Context

( 02:09 - 03:13 ) Step 1 of Test Example (Hypotheses)

( 03:14 - 07:17 ) Step 2 of Test Example (Conditions)

( 07:18 - 08:24 ) Step 3 of Test Example (Test Statistic and p-value)

( 08:25 - 09:28 ) Step 4 of Test Example (Decision and Conclusion)

Video Transcription

Download PDF

Hi. This tutorial covers the chi-squared test for independence, also known as the chi-squared test for association. So let's start with a definition. So the chi-squared test for independence slash association is a type of hypothesis test to see if there is an association between two categorical variables in a population of interest.

So let's consider the four-step procedure for this type of test. So number 1, just like always, we're going to formulate a null and alternative-- the null and alternative hypotheses and choose a significance level. When we're dealing with this type of test, the null hypothesis is that there is no association between the two variables. Your alternative hypothesis would be that there is an association.

Step 2, similar to the other chi-squared tests, data comes from a random sample. All expected counts are at least 5. And individual observations are independent. Step 3, we're going to calculate our test statistic, which is chi-squared, and find the p-value. And then, finally, step 4, decide whether to reject or not the null hypothesis and draw a conclusion.

So let's take a look at an example, and then we're going to apply the four steps to it. So below is data that displays the political affiliations and residence type of 176 people. We're going to assume that these people were all sampled randomly and that the observations were independent. So you can see that the people were broken down by two categorical variables, residence type and political affiliation. Residence type, they were categorized as urban, suburban, and rural. And then political affiliation was DFL, GOP, and other. Now, is there an association between political affiliation and residence type?

So what we want to do is run this type of hypothesis test. So let's start with step 1, formulate the null and alternative hypotheses. OK, so we want a null and an alternative. Now, remember our null is that there is no association between the two variables. So for my null hypothesis, I can just say there is no association between political affiliation and residence type.

And then for my alternate, I can just say either there is an association or I can just say the null is false. And again, this is where we also set our value of alpha. I'm going to set it at 0.05. So I get a relatively small probability of making a type one error.

OK, let's now move to the conditions. So we can check the conditions. Data comes from a random sample. All expected counts are at least 5. And the individual observations are independent. Well, we already kind of commented on this. We'll assume that the data came from a random sample. And we'll also assume that those individual observations-- so from person to person, we had some independence here.

Now, what we need to check now is all expected counts are at least 5. So we actually need to calculate some expected counts. So I'm going to go back to my data. And remember, these are all of my observed counts. Now, to calculate my expected counts, I need to kind of think about what would be true if there actually were no association between these two-- so if all the kind of proportions fell into place.

So the way I'm going to calculate my expected values, I'm going to start just by doing this one, DFL and urban. So what I'm going to do first is calculate my row total here. So if I add up those three values, I'm going to end up with 69. And now I'm going to calculate a column total. So I'm going to kind of count up how many people had an urban residence type. And if I add those up, I should get 67 there.

And I know that my grand total here is 176. So I'm going to put that down here. So now, if we knew that there was no association, these 69 people that identified them as affiliated with the DFL would need to break down appropriately in the urban, suburban, and rural categories. So the way we're going to calculate this expected count is we're first of all going to figure out what proportion of the people were affiliated with the DFL. So to do that, I'm going to take 69 divided by 176. So there's about 39.2% of everybody that represent the DFL party.

Now, if we're also saying that 67 of them are urban, we would want that 39% of urban people to be in the DFL category. So all I'm going to do now is just multiply this by 67. So that's going to give me an expected count of about 26.3. So we can see that the observed count was a little bit bigger than the expected count there. So, now, I'm not going to do that for all of these. I'm going to show a calculator function that will do all this for you.

All right, so ahead of time what I did is I typed my observed values into a matrix. And I typed them into matrix A. So you can see 35, 17, 17, 18. Those match my observed values there. Now what I can do is I can go and select a function in the test menu that is the chi-squared test.

So what this asks for is it asks me to put my observed values in matrix A, which they are. And then what the calculator's going to do is it's going to populate matrix B with the expected value. So it's going to do all those calculations for you. So if you go to Calculate-- we'll come back to this screen and take a look at what these values are in a sec. But let's go and take a look at matrix B. Remember, our goal here is to determine that all of the expected counts are greater than or equal to 5.

And if we-- first of all, we can see that that number's pretty familiar. That 26.3 is what we calculated by hand. Now, the rest of these, we can see clearly that these are all greater than or equal to 5. So basically that second condition is met. All expected counts or at least 5. We also know that the data came from a random sample and the individual observations were independent.

So we can go ahead and proceed to step 3. Step 3 is to calculate a test statistic-- chi-squared in this case-- and find the p-value. All right, so that's kind of our next mission is to determine chi-squared and the p-value. And we can actually get them from that screen that we looked at before. So again, that was just doing the chi-squared test on the calculator. My chi-squared value is about 12.8, and my p-value ends up being about 0.0123. So I'm going to write down those two values, 12.8 and 0.0123.

And remember, what I need to do with that p-value is compare it to a value of alpha to see if we have significance here. So my alpha value, generally I pick 0.05. I can see here clearly that my p-value is less than my alpha value of 0.05.

All right, so what I'm going to do now is use that p-value and my alpha value to move on to step 4. OK, so what I can say then is that my p-value is less than alpha, which will allow me to reject the null hypothesis. So then my conclusion I can say that there is significant evidence to conclude an association between political affiliation and residence type.

So since our p-value is so small, we can reject the null hypothesis, giving us evidence for the alternative hypothesis, allowing us to conclude that there is an association between political affiliation and residence type. All right, that's been your tutorial on the chi-squared test for independence slash association. Thanks for watching.

Terms to Know

Chi-Square Test of Independence/Association: A hypothesis test that tests whether two qualitative variables have an association or not.

Formulas to Know

Chi-square Degrees of Freedom: $d e g r e e s space o f space f r e e d o m equals left parenthesis r o w space t o t a l minus 1 right parenthesis left parenthesis c o l u m n space t o t a l minus 1 right parenthesis$