Source: Blue Hand; Public Domain http://www.clker.com/clipart-darkblue-hand-print.html Pink Hand; Public Domain http://www.clker.com/clipart-right-hand-8.html
There are many ways of drawing a random sample. This tutorial talks about stratified random samples. With a stratified random sample, you start off by breaking the population into homogeneous strata. Strata is another word for groups. One strata-- so one group-- is called a stratum.
So here, once you've broken the population into homogeneous groups, homogeneous strata, then the individuals are selected in proportion from each stratum. So an advantage to doing this is it helps to guard against unrepresentative samples, and it can be more precise or accurate. So because you're selecting in proportion, it's likely that your eventually selected sample is going to look similar to the population in terms of whatever you've divided the strata across.
Now, a disadvantage is it involves creating and sorting people into groups. So if that's going to take you a lot of time or money, that's going to add on to your costs. So, for example, we have Stephanie, and Stephanie is designing a new T-shirt for her company. And she wants people to vote on the color. She thinks that males and females are going to have different opinions.
So males and females, she thinks they're going to be different. And she knows her company is 10% male, and she wants to survey 20 people. So those are important pieces of information in our information prompt here. So first, we're going to start by establishing what the groups are. Then we're going to use proportions to determine how many people to select. And finally, we're going to randomly select the correct number from each stratum.
So first, establish the strata. Establish the groups. Stephanie thinks men and women are going to be different, so those are two groups-- women and men. Next, we're going to use proportions to determine how many people to select from each stratum. So Stephanie's company had 10% male, and she's surveying 20 people. So we start off with 10% male, and we're serving 20.
Now, in our survey, we would want to retain that 10%. We would want 10% of the people we surveyed to also be male. So we're going to do that. I like to turn the percent into a fraction or a decimal. I'm going to go with the decimal, so 0.10. And then you multiply by how many people you want to be-- sorry, how many people are in your survey total. So 0.1 times 20 is going to give me 2.
So when I do the survey, or when Stephanie does her survey, she'd want two people to be male. Now, to find out how many should be female, it would be either the remaining 90%, so 0.90 times 20, which is 18. Or we could have said 20 minus the 2. That's going to be everybody else. That's going to be 18.
So we've used proportions to determine how many people to select from each stratum. Now we're going to randomly select the correct number from each stratum. So you would want to use a random number generator or something like that from the company to randomly select 2 men to interview and then 18 women to interview.
When we talk about ways of random selection, there are a couple of things you could do. You could use a random number table. So a random number table is a table with the digits 0 through 9 arranged randomly. Each digit has no predictable relationship to the digit before or after it, and each of the digits is roughly equally distributed. So here, we have 99982, 27691. Across this whole chunk of table, the digits are pretty evenly distributed, and there's no real pattern as to what comes before or after a particular number.
Another method is a random number generator. Random number generators are devices that are going to create a set of random numbers. Often, we think of them as being programs that are computing those numbers for you, so computer programs that are available online or in statistical software or dice. Those are another potential way of getting a random number. This has been your tutorial on stratified random sampling.
There are many ways of collecting your random sample. The method that this tutorial will look at is a cluster sample. With a cluster sample, you start by dividing the population into roughly equal heterogeneous groups. The groups are mixed up. There's all kinds of people contain-- people or units, objects, whatever it is you're studying contained within each thing-- each group. And then, you randomly select a couple of the groups. And then, everyone in that group is selected for the sample.
So if we, for example, broke everyone up into six groups. And then within the groups, things are roughly the same. We would then randomly select a couple of the groups. So let's say we select two of the groups randomly and then interview everyone within that group or observe every subject within that group.
An advantage to doing this is that it's less costly. You don't have to travel all around to talk to every group. You don't have to interview every single person in the population. But a disadvantage is that it could be less precise or less accurate. Here's an example.
Example 1 says that all nurses are assigned to primarily one hospital. Randomly select several hospitals, then interview all the nurses at those hospitals. This is a cluster sample. Because the hospitals have a roughly heterogeneous distribution of nurses by perhaps age or race or gender or length of time worked in a hospital, it's heterogeneously grouped across the hospitals.
And then, we're randomly selecting a couple of those groups, randomly selecting a couple of hospitals. Here, the advantage is that the interviewer can stay at one hospital and interview all of the nurses there instead of having to bounce all around town to get to nurse A at hospital 1 and then the next nurse at hospital 2 and then the next nurse at hospital 3. So this is a really advantageous savings on time and then on cost as well here.
Second example says all houses are randomly assigned to one zip code. Randomly pick several zip codes and survey all the houses in those locations. Again, same set of advantages. The interviewer doesn't have to bounce around to different towns or even different states. He's staying within that one zip code rather than going from place A to place B to play C. He's only making a couple of jumps to visit each of the groups.
But again, the disadvantage is that he might be losing precision or accuracy in not visiting everyone. If the zip code that he first goes to is really, really different from the one next to it that he doesn't end up going to because it wasn't selected, then the opinions of the people in that second place aren't being represented in the sample, and the sample isn't as accurate or as precise. This has been your tutorial on cluster samples.