Or

4
Tutorials that teach
Stratified Random and Cluster Sampling

Take your pick:

Tutorial

Source: Top Hat, Creative Commons: http://commons.wikimedia.org/wiki/File:Chapeauclaque.png; Pool balls created by the author

This tutorial is going to be talking about Stratified Random Sampling. It's a random sampling procedure. But it's not a regular, old, simple random sample. And we'll talk about the differences as we go through.

So first off, let me posit a scenario to you. A high school has just adopted a new, healthy lunch provider. And they would like to solicit student opinion on the healthy lunch options. The school has 100 freshmen, 110 sophomores, 120 juniors, and 90 seniors.

So the first part of the question that I'm going to ask you to do is explain how the school could select a simple random sample of 42 students. What I'd like you to do is pause the video and write down, off to the side, how the school might do that, how they might implement that.

All right. Hopefully what you came up with after you paused the video was that the school could assign each student a unique number, 1 to 420, then use a random number generator to select 42 numbers, ignoring repeats. The students corresponding to those numbers will be surveyed about the school's new, healthy options. Another way to do it would be to put the 420 student names in a hat and draw out 42.

So if you had either of those as your solution, there, that would be fine. Now, how about a way that the study might improve and guarantee an accurate cross-section of students between the grades? Because freshman might feel differently about the healthy options than seniors do. So pause the video again and decide how the school might obtain a more accurate cross-section.

Hopefully what you came up with after hitting pause is something like this. Since 42 is 10% of the school's population, survey 10% of each grade. With the freshmen, sophomore, junior, and senior classes, randomly select 10, 11, 12, and 9, using a similar simple random sample method as described before, like putting names in a hat or assigning everyone a unique number and selecting those numbers.

What you've just described to me, what we see here on the screen is a sampling method called a stratified random sample. It's a sampling method where the population is subdivided into groups called strata. The strata are homogeneous with some respect, some characteristic that we think might affect the overall sample.

Basically, we don't want too many of the sample to be having this characteristic. But then a simple random sample is carried out within each stratum. And you can have as many strata as you please. But they have to be roughly homogeneous.

So for instance, let's take a look at these billiard balls from a pool table. What I've done is I've subdivided them into low, middle, and high. This is pretty common if you have three people that want to play a pool game. A lot of the times people will subdivide them into lows, mediums, and highs.

What you can do to take a stratified random sample of these 15 is to put all the low-valued balls in a hat, put all the middle-valued balls in a hat, put all the large-valued balls in a hat, and randomly select, say, two from each. And then you can have a stratified random sample of six. You're guaranteed to have exactly two low, exactly two middle, and exactly two high.

So to recap, in a stratified random sample, the population is broken down into homogeneous groups called "strata." And we think that if we don't break it down into strata, that there's going to be some characteristic that might misrepresent the population.

So we're going to force them into groups and then take a simple random sample within each of the strata. The terms that we've used are "stratified random sample," and the groups are called "strata"-- singular, "strata." Good luck. And I'll see you next time.

This tutorial is going to talk about cluster sampling. Now, cluster sampling is a little bit different than some of the other sampling procedures that we've talked about. In cluster sampling, the population is divided into groups. These groups are called clusters. And these groups are natural groupings. They don't necessarily have anything in common other than, say, geography, typically.

So we're going to take a random sample of clusters instead of a random sample of individuals. Each individual in the cluster is going to be part of the sample if we select that cluster. So unlike the groups in a stratified random sample, the groups in a cluster sample aren't based on some variable. They just happen to be near each other. The individuals in the cluster just happen to be near each other. And we'll give some examples of that as we move forward in the tutorial.

So here's one example. A potato chip company wants to implement some quality control on its manufacturing. One thing that it could do is go nearer to the beginning of the assembly line and take the simple random sample of individual chips. That would work just fine. But it might be a little bit easier for them to, at the end of the assembly line, randomly sample some bags of chips. And that might be a little bit simpler for them to do.

So the bags themselves are the clusters. And if we take a bag of chips off the assembly line, we're going to sample every chip in that bag for quality control.

Here's another example. A landlord of an apartment complex wants to know whether a new carpet he's considering is appropriate for all the apartments in the building. And each of the four floors has eight apartments. So he could randomly select eight apartments from the building, and that would be a simple random sample. Or he could select randomly two apartments per floor, and that would be a stratified random sample. Or third option would be a cluster sample. He could take a spinner like this and spin it.

Suppose it landed on three here. That means that every apartment on the third floor would receive carpeting. He doesn't have to have the carpet installers going to all these different rooms on all of these different floors. He can just say, OK, everyone go up to third floor and install carpet in every room on that floor. It would be easier for him and just as cost effective. So that would be a cluster sample as opposed to some other type of sample.

Now, just like every sampling method, cluster sampling has pros and cons. The biggest advantage is that it's easier than a simple random sample, and often it doesn't cost as much. And typically, gives similar results, because the clusters are pretty heterogeneous already.

The disadvantages is that maybe the clusters aren't heterogeneous. Maybe they do have some characteristic, other than just being geographically different from each other, that might affect the sample's findings. For instance, in the previous problem with the carpet, it may be that people on third floor happened to have more pets than the other floors do. And so maybe that would affect how the carpet holds up. But typically, the clusters are pretty representative, and typically, it's pretty similar to a simple random sample.

So, to recap, cluster sampling is done by taking naturally occurring, typically geographically similar groups and taking a simple random sample of the clusters. And then each member in the cluster becomes part of the sample. Typically, they are more cost effective. They're faster to do. And a lot of the times, that's exactly what most polling organizations end up doing.

So the terms that we talked about were cluster sample, which is the process, and then the clusters, which were the groups that formed the sample. Good luck, and we'll see you next time.