Source: USA Map, Creative Commons: http://commons.wikimedia.org/w/index.php?title=File:Blank_US_Map.svg&page=1 Top Hat, Creative Commons: http://commons.wikimedia.org/wiki/File:Chapeauclaque.png, pool balls and figures created by Jonathan Osters
In this tutorial, you're going to learn all about sampling, what it is, and what it's intended to do. Now, we always start with a population. Population is the complete set of all the things that are being studied.
Typically, we use the population of the United States, or the population of the world, or the population of a state to be the population that we wish to generalize our findings to. Now, we're not going to sample everyone from a state, or everyone from the country, or everyone from the world. We're going to take hopefully a representative group of people.
Now, this group of people from the United States seems like too big of an example. So we'll start with a smaller example of billiard balls. So the complete set of things in this particular example are the 15 billiard balls that are on a pool table.
Now with a group so small, it's possible to take all of them and define some attribute of them like color, or weight, or what have you-- whether they're striped or solid, there's lots of different things that you could describe about each pool ball. And it's easy enough to just take the entire population and examine all of them. That's called a census, is when you examine all the members of your population.
Now, think back to the United States example. That's not really always feasible. Suppose your population is a large group of people. And this isn't even quite that large, but it's larger than 15. It's kind of a big group, and it might be hard to get answers from everybody.
So what we might choose to do instead is take a small subset of those individuals and make a sample. In this case, I've chosen to sample seven of these many individuals in the population. A sample is a subset of the population. We're going to obtain our data from that subset and leave everyone else out.
From that sample, we're going to obtain our data. And we're going to calculate our statistics. The idea is hopefully we would like the sample to be sort of a small version of the population. We would sort of like it to be a microcosm of the population, such that when we calculate our statistics from the data we obtain from the sample, it's about the same as what we would have gotten if we had measured the population directly. That's what we mean when we say that we want the sample to be representative of the population.
There are certain ways that you can guarantee that a sample will be representative. One way is to take the entire population and put them in a hat. Now again, this is a lot easier with billiard balls then it is with people. But imagine putting all the billiard balls into the hat, shaking it up, and taking out say, five. That would be a sample of five.
And there are certain ways to guarantee that you won't get a representative population. Suppose I specifically cherry picked only solid colored billiard balls. Well, that wouldn't be very representative of the population of 15.
Now thinking back to this example, is it possible that when I take that hat and pull out five billiard balls that all five of them are solid? Yeah, that's possible, it's just not all that likely. If you cherry pick, that's not a good idea. You're getting something that's specifically not represented.
So to recap, a census is a way of collecting data that uses everybody. And a sample only uses some. So in order to generalize the findings from the sample to the population at large, it has to be representative of your population at large. Once again, the terms that we've described in this tutorial are population, census, the noun sample, and the verb sampling, and the idea that a sample should be representative. Good luck. And we'll see you next time.