Source: Table created by Katherine Williams
This tutorial talked about the chi-square statistic. Now a chi-square statistic is a particular test, and it's used for categorical data. It's measuring how the expected frequency differs from the observed frequency. Now chi-squared-- chi is a symbol-- it looks like this-- and squared is just referring to the squared. So it's kind of like an X with the curls on it, but it's chi.
Now in chi-squared, we need to know about the observed frequency. So the observed frequency is the number of observations we actually see for a value, so what actually happens. So the observed is what you see, what you observe, and it's what you actually see. On the other hand, it also uses expected frequency. So expected frequency is what you would expect would happen, and it's the number of observations we would see for a value, if the null hypothesis were true. So what we think would happen, what is planned, what is expected.
Now in this example here, we start off with the formula, so chi-squared equals the sum of observed minus expected squared, divided by expected. So for each category, for each level, we do the observed minus the expected squared, divided by the expected. And then once we have those for each part of our table, we're going to sum those values together. And that's going to give us our chi-squared statistic.
So let's start with that. So we are looking at rolling a die. And when you roll a die, you can get 1 dot, 2 dots, 3, 4, 5, and 6. And I put the dots in there just to kind of keep it clear that those are our categories, instead of our expected and observes. So if I rolled a die 60 times, I would expect that I would get 1 dot 10 times, 2 dots 10 times, 3 dots, 4 dots, 5 dots, and 6 dots 10 times. I would expect that 1/6 of the time would get 1 dot. Because 1 dot is 1/6 of the options, that's our theoretical probability. So 1/6 times 60 gets me the 10.
Now the observed comes from-- I actually rolled the die 60 times and recorded the results. So now let's start to use our chart to compute our chi-squared statistic. First, we're going to start with this observed minus expected. So 11 minus 10, 1; 8 minus 10, negative 2; negative 1; 0; 2; and 0. Then we need to do observed minus expected squared. So 1, 4, 1, 0, 4, and 0.
And then we need to divide each of those by the expected value. So divide by 10, divide by 10, divide by 10. Now the final step is to sum everything in that row. So 1/10, 4/10, 1/10, and 4/10 gets us a total of 10/10, which equals 1. So in this case, our chi-squared statistic is 1.
What do we do with that? How can we tell whether or not we should accept or reject the null hypothesis? Is this a fair die? Did we get something that was fair and legitimate, or is the variation in the observed values too large for what we would expect? In order to answer that question, first we need to set a significance level. So let's say we set 5% significance level, that our alpha was 5%. Once we have that, we would use that, combined with something called the degrees of freedom, to look at a chi-squared distribution chart. The value that we get from there can help us to determine whether or not, based on this one, we should accept or reject our null hypothesis.
This has been your tutorial on the chi-squared statistic, how to find it, and then a little bit about what to do with it once you have it.