Source: Graphs and table created by Jonathan Osters
In this tutorial you're going to learn about the distribution of sample means.
So real quick review, a sampling distribution is the distribution of all possible values a statistic might take in all possible samples of a given size. So let's take a look at how we might create one of those.
So here's our handy dandy spinner with three 1s, a 2, two 3s, and two 4s, so 8 equally sized sectors. If you spun it four times to obtain a sample mean the sample mean might be 2.5 as a result of a sample where the first been was a 2, the second spin was a 4, the third spin was a 3, and the fourth spin was a 1.
Or the sample might look like this, in which case it's mean would be 2.25. Or might look like this, in which case the sample mean is 3.5. Or the mean might be 2, or 1.5, or 1.25. There's lots, and lots, and lots of possible samples that could be taken of size 4 for this spinner. And there are lots, and lots of possible means that could arise from those samples.
What we're going to do to create a sampling distribution is we're going to take those means, those sample means, and place them here on the x-axis of a graph. And we're going to place them one at a time over and over. And if we went to the extreme and took every possible sample size 4 for that spinner the graph would look something like this.
So consider every possible set of four outcomes, and every possible mean that could arise. If we took every possible scenario and plotted its mean we would create a sampling distribution of sample means. You might notice that an average of 4, a sample mean of 4, happens occasionally. That requires that you get 4 every time. A sample mean of 1 occurs sometimes. But it seems like the most common values for the sample means were between 2 and 3.
Now this is that same graph converted into a histogram. So you can notice its sort of bell shaped. This would be the sampling distribution if the sample size was 1. This means that 1 occurs about 3/8 of the time. 2 occurs about 1/8 of the time. 3 occurs about 1/4 of the time. And 4 occurs about 1/4 of the time. You'll notice the distribution of sample means when the sample size is 4 the shape is significantly different from when the sample size was 1.
And then look at what happens at 9 and 20. These are samples of size 9, their averages. And these are the averages from samples of size 20. And so there should be some things that you can recognize here about all four of these. There's some similarities and some differences.
Similarities are their centers. All of them are centered at 2 and 3/8. They're all centered at the same place. You'll notice that some of these are more tightly packed around that number, like the samples of size 20 are more tightly packed around that number than the samples of size 1 for instance. But they all are centered at that very same number. What we can see here is that the mean of the sampling distribution of sample means is the same as the mean for the population. In this case it was 2 and 3/8.
How about spread? The arrows on each of these indicate the standard deviation of each distribution. Notice the arrows on the first distribution are very wide. And they seem to diminish in size as each distribution is graphed. Where while we get to the lowest distribution, here where the sample size was 20, its spread is much, much less.
So the rule that's being followed is that the standard deviation for the sampling distribution of sample means is the standard deviation of the population divided by the square root of sample size. What that indicates is that when the sample size is 4 the standard deviation of that sampling distribution of sample means it's going to be half as large as it was when the sample size was 1. When the sample size is 9 it's going to be 1/3 the size of the original standard deviation. And when n is 20 it's going to be the original standard deviation divided by the square root of 20.
And then finally this measured center, this measured spread, let's describe the shape of these distributions. You'll notice that the shape is becoming more and more like the normal distribution as the sample size increases. There's a theorem that we have that describes that. And it's called the Central Limit Theorem. It says that when the sample size is large, large being at least 30 for most distributions with a finite standard deviation, the sampling distribution of the sample means is approximately normal. Which means we can use the normal distribution to calculate probabilities on them. Which is nice because normal calculations are easy to do.
So it's going to be normal, or approximately normal, with a mean of the same as that of the population and a standard deviation equal to the standard deviation of the population divided by the square root of sample size. Once again the mean of the population is going to be the mean of the sampling distribution. And the standard deviation of the population divided by the square root of sample size is going to be the standard deviation of the sampling distribution. The standard deviation of the sampling distribution is also called the standard error.
So to recap the sampling distribution of sample means has an approximately normal sampling distribution when the sample size is large. That's the Central Limit Theorem. Its mean is the mean of the population. And the standard deviation of the sampling distribution, which is also called the standard error, is the standard deviation of the population divided by the square root of the sample size.
So we talked about the distribution of sample means. This is called a sampling distribution of sample means, the standard deviation of that distribution which is also called the standard error. Good luck. And we'll see you next time.