Source: Tables created by Katherine Williams
This tutorial covers the chi-square test for goodness of fit. With the chi-square test for goodness of fit, the null hypothesis is that the population distribution matches a specified distribution. And that specified distribution can be whatever the problem calls for. And the alternative hypothesis is that the population distribution does not match that specified distribution.
When you're doing the chi-square test for goodness of fit, there's a set of conditions that we need to be checking for. We need to check that the data comes from a random sample, that the expected counts, all of them are at least 5, and that the individual observations are independent. Now, if these conditions are not met, the conclusions that we would draw from the chi-square test for goodness of fit might not necessarily be accurate.
Here's an example. In this example, our null hypothesis is that defects are evenly distributed across all the days of the work week-- so Monday, Tuesday, Wednesday, Thursday, Friday. And then, the alternative hypothesis is that defects are not evenly distributed across all five work days.
So here, in our chart we have our information. We have what's expected. We expect an even distribution. And, in fact, they expect 4 defects per day. And in reality, this is what we observed-- 4, 6, 3, 0, and 2. Now, what our chi-square test for goodness of fit is doing is it's determining whether or not the variation that we see here is that just random chance or is there in fact something different than an even distribution?
Now, we've set up our null and our alternative hypothesis. Now, we need to pick a significance level. We're going to pick a significant level of 5% and set our alpha to be 5%.
So I'm going to start by doing the observed minus the expected. And then, we'll square those values and divide by the expected. And then, we'll sum everything we find.
So first, for observed minus expected, we have 4 minus 4 for Monday. And that gets me 0. 6 minus 4 to get me 2. 3 minus 4 gets negative 1. 0 minus 4 gets negative 4. And 2 minus 4 gets me negative 2.
Then, we're going to square all those values. So we'll have 0, 4, 1, 16, and 4. With those squared values, we're then going to divide by the expected. In this case, we expected it to be 4 across the days. So every value is going to get divided by 4 in this case.
Now I'm just going to simplify those down to get 1 plus 1/4 plus 4 plus 1. And then I'm going to add all of those pieces together to get a chi-squared statistic of 6.25. Now, I'm going to use a table to find out the p-value for that chi-squared statistic of 6.25. And again, on this table, we need to look at the degrees of freedom, which is our sample size minus 1.
So we're looking at 5 days. So our sample size minus 1 gets us 4. So this is the line we're looking in. And then the chi-squared value of 6.25 means we're looking in between these two values. Now, these two values translate to a p-value of 0.20 and 0.10.
My significance level was 0.05. So in this case, our p-value is greater than our significance level. So we cannot reject the null hypothesis. So I cannot reject the null hypothesis that the defects are evenly distributed across all five workdays.
So this has been your tutorial on the chi-square test for goodness of fit.