Or

4
Tutorials that teach
Stratified Random and Cluster Sampling

Take your pick:

Tutorial

Hi. This tutorial covers stratified random samples. To choose a stratified random sample, there are really three steps. First step is to divide the population into strata. So strata is the plural of the word stratum, and a stratum is a group of individuals that are similar, or homogeneous, in some way that is important to the response.

So once you have your population split up into strata, step two is to choose a separate simple random sample from each stratum. Then, step three, combine these simple random samples to form a stratified random sample. So divide your population into strata. Choose a simple random sample from each stratum. Combine the simple random samples to form your big stratified random sample.

So let's take a look at an example where stratified random sampling would be useful. So a school official wants to estimate the average number of hours per week that students devote to homework. How might we choose our strata get a more representative stratified random sample?

So let's take a look at three ways of stratifying and see which one would give us the best stratification. All right, method one would be to stratify corresponding to class or grade level-- so freshman, sophomore, junior, senior. So remember that when we're stratifying, when we pick our strata, they need to be similar in some way that is important to the response.

Does the class or the grade level affect how much homework they do, since homework is our response? And I would say yes. I would say that a freshman probably doesn't do as much homework as a junior or senior. So I would say that this is a good way of stratifying.

Method two, to stratify corresponding to the type of class, so either regular level classes or honors levels classes. And I would say this would certainly be a good way to do it. Students in honors level are probably going to get more homework than students in regular level classes. So I would say that this is also a good way of stratifying.

Stratifying corresponding to first letter of last name, so folks that last name begins with A through E, and then last names F through K, et cetera. So, again, we want to think, does the first letter of someone's last name affect how much homework they do? And I would say no. Last name not seem to have any effect on home how much homework students are going to do. I would say that this would be a bad way of stratifying, since last name has no effect on the response.

So now suppose that the official decides to stratify by type of class, either honors level or regular level. The official knows that 30% of students are enrolled in primarily honors classes, and 70% of students are primarily enrolled in regular level classes. If the official wants to take a sample of 50 students, he should take a simple random sample of 15 honor students and 35 regular level students.

So it's important that the proportions within your simple random sample match the proportions in your population. So 30% of the population are in honors classes. If we look at 15 out of the 50, that's also 30%. And then 35 out of 50 would be 70%. So our proportions would match that way.

So what's important about a stratified random sample is to get a more representative sample. Sometimes when you just take a simple random sample, maybe just by chance you randomly get a bunch of students that just do a lot of homework. Maybe you take another sample and you get a bunch of students that don't do a lot of homework. That might throw off your data.

So to make sure you have a representative sample, sometimes it's important to stratify, so you know for a fact you're going to get some that do a lot of homework and some that don't do very much homework. So, again, it's all about getting that representative sample. And sometimes a stratified random sample is the best way to do that. So that is the tutorial on stratified random samples. Thanks for watching.

This tutorial covers cluster samples. Cluster sampling is a sampling technique where the entire population is divided into groups called clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample. Key there thing being all. So everything within the cluster will become part of your cluster sample.

So let's take a look at an example where cluster sampling would be useful. Suppose that the Department of Agriculture wishes to investigate the use of pesticides by farmers in Minnesota. A cluster sample could be taken by identifying the different counties in Minnesota as clusters. So I'd take a county map of Minnesota, and each county is going to represent a cluster.

So in order to take a random sample of the clusters, it would be helpful if they were numbered. So what I would do first is I'd go through and I'd number each of the counties. I'm not going to number all of them. There's 87 counties in Minnesota, so I'd number them 1 to 87. And then what I would need to do is use a table of random digits or a random number generator to randomly select how many clusters I want to be part of my sample.

So let's say I want to include five clusters in my sample. So what I'd need is five random numbers. So I generated some random numbers earlier. So then what I would need to do is if I had a fully numbered map, I'd go through and select where my counties are. So let's say that this was 22. Let's say this was 35. 51 is maybe down here. I'm sorry 39, we'll say, is here. 51 is here, and 83-- let's say 83 is here.

So then once I have my clusters identified, I would need to go to each county, and I would observe all of the farms in that county and see what their pesticide use is. So then each of those farms would then become part of my sample.

So it's easier to visit several farmers in the same county than it is to travel to each farm in a random sample to observe the use of pesticides. So let's say I was taking a random sample instead of a cluster sample. I might get a farm over here, and then a farm here, and maybe a couple farms over here, and then a couple farms in here, and then a farm over here, and a farm here, and a farm here. That's going to take a lot of money and a lot of time to go around the state to all of these different farms observing pesticide use. So it's going to be a lot more cost effective and timely just to study everybody within each of these clusters.

So, again, a cluster sample is useful when geography would make a simple random sampling sample costly and time consuming. And just the last note here, when you're dealing with a cluster sample, a cluster sample done carefully can be treated as a random sample. So if you do perform your sampling technique carefully and thoughtfully, generally that will suffice as a random sample. So that is the tutorial on cluster samples Thanks for watching.