Or

4
Tutorials that teach
Two-Way Tables/Contingency Tables

Take your pick:

Tutorial

Source: Table created by Katherine Williams

This tutorial discusses two-way tables. Two-way tables are a way of organizing categorical data that has two variables. In this example here, we talk about shakes and fries. And for that, you can either say, yes, you would like it or, no, you wouldn't, yes, you would like it or, no, you wouldn't. So this table here is organizing the pieces of information about two different topics into one table. It's really great, because it's going to help us to calculate really specific probabilities and to help us to organize our information at the same time.

Now, if you have a two-way table, it might already come with a total calculated on the side. If it doesn't, that's your first step. You're finding the marginal distributions.

So here, the people who said yes to fries-- that total is going to be 110 plus 175. So we have a total of 285 people saying yes to fries. For the people saying no, we're going to add 54 and 369-- whoops. So 54 plus 369, 423.

Now, on the other hand, to find these totals we're going to have to find the total for the shakes. So how many people said yes to shakes? 110 plus 54 gets us 164. Then, how many people said no shakes-- 175 plus 369-- 544.

Now, this box here-- this is kind of our total total. It's everybody that was interviewed or everyone that there is data collected about. It should encompass every number on the table.

There's a couple of ways of finding it. And they should all give us the same number. If they don't, we've made a small mistake somewhere.

If we add vertically, we should get that number. So 285 plus 423 gets 708. And I'm going to make that cleaner.

Now, if I add going across, we should get that same number. So if I add 164 plus 544, I do get 708. The other thing is if I add up these four cells, it should get that, as well. So 110 plus 175 plus 54 plus 369-- 708.

It might seem a bit redundant to do that three times. But given that all of your probabilities are going to be based off this table, it's really important that every box is accurate. And double checking or triple checking the total is a good way of making sure you haven't accidentally forgotten a spot.

Sometimes these contingency tables-- it's another name for them-- can show even greater breakdowns of the information. So you have a lot more than four boxes. So it's easy to miss something.

Now, if we were trying to calculate probabilities, we would use this to help us do that. We could say something like, what's the probability that someone likes fries and shakes? So you'd find the people who, yes, they like fries, yes, they like shakes. It's these 110 people. And out of all of our total people, the 708, so for yes to fries and shakes, it's 110 out of that 708.

We could find what percent of the people liked neither fries nor shakes. So no to fries, no to shakes gets us this box here, so the 369 out of the 708.

We can also calculate things called conditional probabilities. What's the percent of people who like fries that don't like shakes? A tutorial will go into that more in depth. That's separate. It's really interesting and a really great way to use two-way tables.

This has been your introduction to two-way tables. They're a way of organizing information about categorical data. Now, two-way tables are used to give information and to get some probabilities. Other tutorials cover that more in depth.