In this tutorial, you're going to learn all about sampling focusing on:
Sampling always starts with a population. Population is the complete set of all the things that are being studied.
Typically, we use the population of the United States, or the population of the world, or the population of a state to be the population that we wish to generalize our findings to since examining all members of a population may not be feasible. This method, examining all members, is called a census. Hopefully a group of people can represent the population
The group of people from the United States seems like too big of an example, a smaller example of billiard balls will be demonstrated. As you see in the image below, the complete set of things in this particular example are the 15 billiard balls on a pool table.
With a group so small, it's possible to take all of them and define some attribute of them like color, or weight,or what have you-- whether they're striped or solid, there's lots of different things that you could describe about each pool ball. And it's easy enough to just take the entire population and examine all of them.
When you think about the United States example, you can see that it's not really always feasible. Suppose your population is a large group of people. And the image below doesn't even feature a group super large, but it's larger than 15. It's kind of a big group, and it might be hard to get answers from everybody.
What you might choose to do is take a small subset of those individuals and make a sample. In this case, perhaps seven of these many individuals in the population were chosen. A sample is a subset of the population and you would obtain data from that subset and leave everyone else out.
From that sample, you would obtain your data and calculate your statistics. The idea is hopefully you would like the sample to be a small version of the population. A microcosm of the population, such that when you calculate your statistics from the data we obtain from the sample, it's about the same as what we would have gotten if we had measured the population directly. That's what we mean when we say that we want the sample to be representative of the population.
There are certain ways that you can guarantee that a sample will be representative. One way is to take the entire population and put them in a hat.
Now again, this is a lot easier with billiard balls then it is with people. But imagine putting all the billiard balls into the hat.
Let’s say you shake up the hat, and take out five. That would be a sample of five.
There are certain ways to guarantee that you won't get a representative population. Suppose I specifically cherry picked only solid colored billiard balls. Well, that wouldn't be very representative of the population of 15.
Is it possible that when you take that hat and pull out five billiard balls that all five of them are solid? Sure, that's possible, it's just not all that likely. If you cherry pick, that's not a good idea because you're getting something that's specifically not represented.
A census is a way of collecting data that uses everybody. And a sample only uses some. In order to generalize the findings from the sample to the population at large, it has to be representative of your population at large. Once again, the terms that we've described in this tutorial are population, census, the noun sample, and the verb sampling, and the idea that a sample should be representative.
Source: this work is adapted from sophia author jonathan osters.
The entire set of individuals from which to sample
Using the entire population to obtain data
A subset of the population. There are many ways to select a sample.
A sample that accurately reflects the population