In this tutorial you're going to learn about the distribution of sample means. You will specifically focus on:
A sampling distribution is the distribution of all possible values a statistic might take in all possible samples of a given size. Let's take a look at how we might create one of those.
Here's our handy dandy spinner with three 1s, a 2, two 3s, and two 4s, so 8 equally sized sectors.
If you spun it four times to obtain a sample mean, the sample mean might be 2.5 as a result of a sample where the first spin was a 2, the second spin was a 4, the third spin was a 3, and the fourth spin was a 1. Or the sample might look like the next line, in which case it's mean would be 2.25. Or might look like the line after that, in which case the sample mean is 3.5. Or the mean might be 2, or 1.5, or 1.25. There are lots, and lots, and lots of possible samples that could be taken of size 4 for this spinner. And there are lots, and lots of possible means that could arise from those samples.
How you're going to create a sampling distribution is take those means, those sample means, and place them here on the x-axis of a graph. Place them one at a time over and over.
If you went to the extreme and took every possible sample size 4 for that spinner the graph would look something like this:
Consider every possible set of four outcomes, and every possible mean that could arise. If we took every possible scenario and plotted its mean we would create a sampling distribution of sample means.
You might notice that an average of 4, a sample mean of 4, happens occasionally. That requires that you get 4 every time. A sample mean of 1 occurs sometimes. But it seems like the most common values for the sample means were between 2 and 3.
Now this is that same graph converted into a histogram.
You notice it’s sort of bell shaped.
This would be the sampling distribution if the sample size was 1.
This means that 1 occurs about 3/8 of the time. 2 occurs about 1/8 of the time. 3 occurs about 1/4 of the time. And 4 occurs about 1/4 of the time.
Notice the distribution of sample means when the sample size is 4, the shape is significantly different from when the sample size was 1.
Then look at what happens at 9 and 20.
These are samples of size 9, their averages.
And these are the averages from samples of size 20.
There should be some things that you recognize here about all four of these. There are some similarities and some differences.
Similarities are their centers.
All of them are centered at 2 and 3/8; the same place. You'll notice that some of these are more tightly packed around that number, like the samples of size 20 are more tightly packed around that number than the samples of size 1 for instance. But they all are centered at that very same number. What we can see here is that the mean of the sampling distribution of sample means is the same as the mean for the population.
In this case it was 2 and 3/8.
Compare the spreads below for each n.
The arrows on each of the graphs indicate the standard deviation of each distribution.
Notice the arrows on the first distribution are very wide. Also, they seem to diminish in size as each distribution is graphed. When you get to the lowest distribution, where the sample size was 20, its spread is much, much less.
The rule that's being followed is that the standard deviation for the sampling distribution of sample means is the standard deviation of the population divided by the square root of sample size.
What that indicates is that when the sample size is 4 the standard deviation of that sampling distribution of sample means it's going to be half as large as it was when the sample size was 1. When the sample size is 9 it's going to be 1/3 the size of the original standard deviation. And when n is 20 it's going to be the original standard deviation divided by the square root of 20. And then finally this measured center, this measured spread, let's describe the shape of those distributions.
You'll notice that the shape is becoming more and more like the normal distribution as the sample size increases. There's a theorem that we have that describes that. And it's called the Central Limit Theorem. It says that when the sample size is large, large being at least 30 for most distributions with a finite standard deviation, the sampling distribution of the sample means is approximately normal.
This means we can use the normal distribution to calculate probabilities on them which is nice because normal calculations are easy to do. It's going to be normal, or approximately normal, with a mean of the same as that of the population and a standard deviation equal to the standard deviation of the population divided by the square root of sample size.
Once again the mean of the population is going to be the mean of the sampling distribution. The standard deviation of the population divided by the square root of sample size is going to be the standard deviation of the sampling distribution. The standard deviation of the sampling distribution is also called the standard error.
The sampling distribution of sample means has an approximately normal sampling distribution when the sample size is large. That's the Central Limit Theorem. Its mean is the mean of the population. The standard deviation of the sampling distribution, which is also called the standard error, is the standard deviation of the population divided by the square root of the sample size. The distribution of sample means is called a sampling distribution of sample means, the standard deviation of that distribution which is also called the standard error.
Source: This work adapted from Sophia Author Jonathan Osters.
A distribution where each data point consists of a mean of a collected sample. For a given sample size, every possible sample mean will be plotted in the distribution.
The standard deviation of the population, divided by the square root of sample size.
The standard deviation of the sampling distribution of sample means distribution.