This tutorial is going to talk about cluster sampling.
Cluster sampling is a little bit different than some of the other sampling procedures that we've talked about. In cluster sampling, the population is divided into groups. These groups are called clusters. And these groups are natural groupings. They don't necessarily have anything in common, other than say, geography, typically. So we're going to take a random sample of clusters instead of a random sample of individuals.
Each individual in the cluster is going to be part of the sample, if we select that cluster. So unlike the groups in a stratified random sample, the groups in a cluster sample aren't based on some variable. They just happen to be near each other, the individuals in the cluster just happen to be near each other. And we'll give some examples of that as we move forward in the tutorial.
So, here's one example. A potato chip company wants to implement some quality control in its manufacturing. One thing that it could do is go nearer to the beginning of the assembly line, and take the simple random sample of individual chips. That would work just fine. But it might be a little bit easier for them to, at the end of the assembly line, randomly sample some bags of chips. And that might be a little bit simpler for them to do. So the bags themselves are the clusters. And if we take a bag of chips off the assembly line, we're going to sample every chip in that bag for quality control.
Here's another example. A landlord of an apartment complex wants to know whether a new carpet he's considering is appropriate for all the apartments in the building. And each of the four floors has eight apartments. So he could randomly select eight apartments from the building, and that would be a simple random sample. Or he could select randomly two apartments per floor, and that would be a stratified random sample.
Or, a third option would be a cluster sample. He could take a spinner like this and spin it. Suppose it landed on three here. That means that every apartment on the third floor would receive carpeting. He doesn't have to have the carpet installers going to all these different rooms on all these different floors. He can just say, OK, everyone go up to the third floor and install carpet in every room on that floor. It would be easier for him and just as cost effective. So that would be a cluster sample, as opposed to some other type of sample.
Now just like every sampling method, cluster sampling has pros and cons. The biggest advantage is that it's easier than a simple random sample, and often it doesn't cost as much. And typically it gives similar results because the clusters are pretty heterogeneous already. The disadvantages are that maybe the clusters aren't heterogeneous, maybe they do have some characteristic other than just being geographically different from each other, that might affect the sample's findings.
For instance, in the previous problem with the carpet, it may be that people on third floor happen to have more pets than the other floors do, and so maybe that would affect how the carpet holds up. But typically, the clusters are pretty representative and typically, it's pretty similar to a simple random sample.
So to recap. Cluster sampling is done by taking naturally-occurring-- typically geographically-- similar groups, and taking a simple random sample of the clusters. And then each member in the cluster becomes part of the sample. Typically they are more cost effective, they're faster to do, and a lot of the times that's exactly what most polling organizations I know do.
So the terms that we talked about were a cluster sample, which is the process, and then the clusters, which were the groups that formed the sample. Good luck and we'll see you next time.