Suppose a high school has just adopted a new, healthy lunch provider, and they would like to solicit student opinion on the healthy lunch options. The school has a total of 420 students: 100 freshmen, 110 sophomores, 120 juniors, and 90 seniors.
How would a simple random sample look?
For a simple random sample of 42 students, think of ways that 42 students could be chosen, each having an equal chance of being selected. First, assign each student a unique number 1 to 420 (total number of students). Once this is done, you could:
Now, is there a way that the study might improve and guarantee an accurate cross-section of students between the grades? After all, freshman might feel differently about the healthy options than seniors so it will be important to have individuals from each grade weigh in on the lunch options.
This can be done with a stratified random sample. Stratified random sampling is a method where the population is subdivided into groups called strata. Strata are groups with homogeneous characteristic(s). They are separated by the characteristic that we think might affect the overall sample. This is to avoid having too many of the sample having this one characteristic that may affect the sample.
In the above example, it would look something like this: since 42 is 10% of the school's population, your survey should be 10% of each grade.
Once the groups are in place, a simple random sample is carried out within each stratum, like putting names in a hat or assigning everyone a unique number and randomly selecting numbers. You can have as many strata as you please, but they must be roughly homogeneous.
[MUSIC PLAYING] Pretend you've subdivided billiard balls into low, middle, and high numbers. To take a stratified random sample of the 15, this is what you do. Put all the low numbered balls in hat one. Put all the middle numbered balls in hat 2.
And finally, put all the high numbered balls in hat three. At that point, you'd randomly select two from each hat. The result would give you a stratified random sample of six billiard balls. You're guaranteed to have exactly two low numbers, exactly two middle numbers, and exactly two high numbers.
[MUSIC PLAYING]
When using a cluster sample, the population is divided into groups. These groups are called clusters. It’s important to note that these groups are natural groupings. They don't necessarily have anything in common, other than say, geography, typically. Therefore, we're going to take a random sample of clusters instead of a random sample of individuals.
Each individual in the cluster is going to be part of the sample if we select that cluster. So unlike the groups in a stratified random sample, the groups in a cluster sample aren't based on a characteristic or variable. The individuals in the cluster just happen to be near each other.
IN CONTEXT
Suppose you work at a potato chip company and it’s your job to implement some quality control in the manufacturing department. Maybe you stand at the start of the assembly line and take a simple random sample of individual chips. That would work just fine.
However, it might be easier for you to sample some bags of chips. The bags of chips are clusters. You would then take a bag of chips off the assembly line and sample every chip in that bag for quality control. That’s cluster sampling.
Similar to every sampling method, cluster sampling has pros and cons.
Advantages and Disadvantages for Cluster Sampling | |
---|---|
Advantages |
Easier than a simple random sample, and often it doesn't cost as much Typically gives similar results because the clusters are fairly heterogeneous |
Disadvantages | Risk that clusters are NOT heterogeneous--perhaps they do have some characteristic other than just being geographically different from each other that might affect the sample's findings. |
Suppose a landlord of an apartment complex wants to know whether a new carpet he's considering is appropriate for all the apartments in the building. Each of the four floors has eight apartments.
What would a simple random sample look like? How might a cluster sample be different from a stratified random sample?
Well, he could randomly select eight apartments from the building, and that would be a simple random sample.
He could randomly select two apartments per floor, and that would be a stratified random sample.
Or, a third option would be a cluster sample. He could take a spinner like the one shown below and spin it.
Suppose it landed on three. That means that every apartment on the third floor would receive carpeting. He doesn't have to have the carpet installers going to all these different rooms on all these different floors. He can simply instruct everyone to go up to the third floor and install carpet in every room on that floor, which would be far easier for him and just as cost effective. This would be a cluster sample, as opposed to some other type of sample.
But what if all the floors were NOT heterogeneous? What if apartments on the third floor allowed pets? The carpet might not hold up as well. That’s one of the disadvantages of cluster sampling in action. But typically, the clusters are fairly representative and very similar to a simple random sample.
Source: This work is adapted from Sophia author Jonathan Osters.