This tutorial will focus on sampling error and how sample size relates to sampling error. Specifically, you will learn about:
Sampling error simply relates to the variability within the sampling distribution.
Take a look. So suppose that you have taken a sampling distribution of certain sizes from this parent distribution.
This would be a distribution where the number one occurs about three-eighths of the time, two occurs about one- eighths of the time, three occurs about two-eighths of the time, and four also occurs about two-eighths of the time.
These are the different sampling distributions.
You can see that their means are all the same. But you can also notice that they're standard deviations, which are the lengths of the arrows, decrease as the sample size increases. That means that the larger the sample, the closer on average the sample statistic will be to the right answer. This also means that you will be closer to the population mean, represented by the blue line down the middle of the graphs above.
With larger samples, the sampling error (amount by which the sample statistic like a sample mean is off or wrong) from the population parameter, is a fixed value that you’re trying to estimate. The population parameter, not the sampling error, is the fixed value
What you'll notice is that some of the sample means from samples of size 4 are way up near four or down near one, when the true population mean is two and three-eighths. Meanwhile, when you look at samples of size 20, the vast majority of these samples are between two and three-- very close to the population mean of two and three-eighths. So the distribution of sample means has a smaller standard deviation with a larger sample size.
With larger samples, the sampling error, the amount by which the sample statistic like a sample mean is off or wrong from the population parameter, is a fixed value that you’re trying to estimate. The amount by which it's off decreases on average. When you calculate margin of error, you're approximating the sampling error. What’s being said is that our sample statistic is probably within this margin of error of the right answer, which we don't know.
Think of a poll where you said that 60% of people are going to vote for a particular candidate for office. You report it with a margin of error. You're saying that our sample gave us 60%, but we think that the real population proportion of people who are going to vote for that candidate is plus or minus some amount.
Sampling error occurs-- and it's not your fault-- but when you use a statistic like a sample mean or sample proportion to estimate a parameter, like a population mean or population proportion. What you notice from the previous sampling distributions is since the sampling error decreases as the sample size increases, you would like as large a sample as possible.
Sometimes getting a large sample is precluded by practical concerns like money or time. Maybe you just don't have the money or time to take a large sample. Maybe you're confined to a small sample. And that's fine. But you would like a larger sample if you can get one. That's the ideal.
An increased sample size has to be coupled with well-collected data. An increased sample size does not rescue poorly collected data. If your data are biased, you can't say, they're biased, but we'll just double the sample size and everything will be OK, because we'll have less sampling error. It doesn't work that way.
If the questions are poorly worded or there's non response or other biases like response bias or what have you, then the data aren’t going to become any more accurate. They're not going to accurately approach the population parameters that you're trying to estimate by taking a larger sample. It's just not going to work.
Once you've collected your data poorly, you might as well throw it out. An increased sample size does not rescue it.
Sample statistics estimate population parameters. They do it more accurately when the sample size is large. Oftentimes, we don't know what the population parameter is. That's why we've taken the sample in the first place-- to try and estimate the parameter. But if the data were properly collected-- and hopefully it was-- and the sample size is large-- which is ideal-- you can be pretty sure that the statistic that we get is close to the right answer, the parameter for the population. When you calculate margin of error, you're approximating the sampling error.
Good luck!
Source: This work adapted from Sophia Author Jonathan Osters.
The size of a sample of a population of interest.
The amount by which the sample statistic differs from the population parameter.