Source: Graphs created by Jonathan Ostrens
We're going to talk about sampling error and how sample size relates to sampling error. Now, sampling error simply relates to the variability within the sampling distribution. Let's take a look. So suppose that we have taken a sampling distribution of certain sizes from this parent distribution. So this would be a distribution where the number one occurs about three-eighths of the time, two occurs about one-eighths of the time, three occurs about two-eighths of the time, and four also occurs about two-eighths of the time.
These are the different sampling distributions. You can see that their means are all the same. But you can also notice that they're standard deviations, which are the lengths of these arrows, decrease as the sample size increases. That means that the larger the sample, the closer on average the sample statistic will be to the right answer which is in fact the population mean, which is this blue line.
What you'll notice is that some of the sample means from samples of size 4 are way out here, up near four or down near one, when the true population mean is two and three-eighths. Meanwhile, when you look at samples of size 20, the vast majority of these samples are between two and three-- very close to the population mean of two and three-eighths. So the distribution of sample means has a smaller standard deviation with a larger sample size.
So when we take larger samples, the sampling error, which is the amount by which the sample statistic like a sample mean is off or wrong from the population parameter, which is a fixed value that we're trying to estimate. The amount by which it's off decreases on average. So when we calculate margin of error, we're approximating the sampling error. So we're saying that our sample statistic is probably within this margin of error of the right answer, which we don't know.
So think of a poll where you said that 60% of people are going to vote for a particular candidate for office. And you report it with a margin of error. You're saying that our sample gave us 60%, but we think that the real population proportion of people who are going to vote for that candidate is plus or minus some amount.
Sampling error occurs-- and it's not our fault-- but when we use a statistic like a sample mean or sample proportion to estimate a parameter, like a population mean or population proportion. And what we notice from the previous sampling distributions is since the sampling error decreases as the sample size increases, we would like as large a sample as we can get.
Now sometimes getting a large sample is precluded by practical concerns like money or time. Maybe you just don't have the money or time to take a large sample. Maybe you're confined to a small sample. And that's fine. But you would like a larger sample if you can get one. Now that's the ideal.
But an increased sample size, it has to be coupled with well-collected data. An increased sample size does not rescue poorly collected data. If your data are biased, you can't say, they're biased, but we'll just double the sample size and everything will be OK, because we'll have less sampling error. It doesn't work that way.
If the questions are poorly worded or there's non response or other biases like response bias or what have you, then the data are going to become any more accurate. They're not going to accurately approach the population parameters that you're trying to estimate by taking a larger sample. It's just not going to work. Once you've collected your data poorly, you might as well throw it out. An increased sample size does not rescue it.
And so to recap, sample statistics estimate population parameters. And they do it more accurately when the sample size is large. Now oftentimes, we don't know what the population parameter is. And that's why we've taken the sample in the first place-- to try and estimate the parameter. But if the data were properly collected-- and hopefully it was-- and the sample size is large-- which is ideal-- we can be pretty sure that the statistic that we get is close to the right answer, the parameter for the population. So we talked about sampling error and how it decreases when sample size goes up. Good luck and we'll see you next time.