This tutorial covers the shape of a sampling distributions. You’ll learn about:
In a sampling distribution, the center is the same as the center of the original distribution. That is to say, the mean of all the x-bar averages is the same as the mean of the original distribution. Shape is also a characteristic of the central limit theorem and will be discussed in the next section.
The spread, or standard deviation, is the same as the original standard deviation divided by the square root of sample size. So that is to say, the standard deviation of all the x bars is equal to the original standard deviation divided by square root of n, n being the sample size.
Those two characteristics are notated like this:
So consider this spinner:
Consider the sampling distributions caused by averaging different numbers of spins. Well, it's pretty obvious that if you spun it once, you would get a one about three out of eight times, a two about one out of eight times, a three about two out of eight times, and a four about two out of eight times, making the distribution look something like this:
You can see here that 1 is the most common, 3 and 4 are equally common and next most common, and 2 is the least common.
What about the sampling distributions if you averaged four spins? Well, you wouldn't just have options of one, two, three, and four anymore. You'd have options of .25, .5, 1.75 , 2, 2.25, 2.5, etc..
Having more options would necessarily decrease the likelihood of getting all 1s or all 4s. The distribution would look something like this:
Getting all four 1s would be extremely unlikely, and getting all four 4s would be extremely unlikely. The most likely scenario is getting 2.25. There are a few more ones than there are anything else, which pulls the mean down a little bit from where you might have thought it would be: 2.3. This is a slightly skewed to the right distribution.
What about if you sampled nine and averaged nine spins? Well, the probability, for instance, that you get all 1s therefore averaging a 1 goes down even further. And the probability that you get all fours goes down to almost zero:
As this graph shows, it's possible to get all 1s, but it’s not very likely. It's a lot more common to get something between 2 and 3. The spread of the sampling distribution is decreasing as n gets bigger.
As the previous graphs show, the shape of the distribution changes as the n changes.
Say you are averaging 20 spins:
It's almost impossible to average a 1, a 4, or even something close to 3. You're almost guaranteed to average something between 2 and 3. The spread, again, is decreasing.
What would you say is happening to the shape?
As n increases, the shape is becoming more normal.
The changes in shape that we saw above is what the Central Limit Theorem deals with. The Central Limit Theorem discusses the shape of a sampling distribution.
Central Limit Theorem
A theorem that explains the shape of a sampling distribution of sample means. It states that if the sample size is large (generally n ≥ 30), and the standard deviation of the population is finite, then the distribution of sample means will be approximately normal.
So the Central Limit Theorem says that when the sample size is large, the shape of the sampling distribution of means becomes very nearly normal.
How large is considered a large enough for the distribution to be considered approximately normal? In our distributions so far in this tutorial, is it approximately normal when n is nine? Is it approximately normal when n is 20? What constitutes a large enough sample so that when you show the distribution of all the averages, you get a normal distribution?
The definition of “large” will be different depending on what the original distribution looked like. Our original distribution was almost uniform, so it didn't take very many trials. If the distribution had been heavily skewed, it would have taken more trials to average out some of those high numbers with some of those low numbers.
To be on the safe side, in this case 30 is going to be a good sample size such that when you average the 30 observations, you are going to get something close to what you expect.
With a sample size of 30, the distributions that are off from what you expect will tail off in a normal shape.
For almost all distributions, a sample size of 30 is exactly what we want. It's the Central Limit Theorem that explains why so many real world processes are normally distributed.
A sampling distribution is the distribution of all possible means of a given size; there are characteristics of distributions that are important, and for the Central Limit Theorem, the important characteristic is shape.
The Central Limit Theorem outlines that when the sample size is large, for most distributions (meaning 30 or larger), then the distribution of sample means will be approximately normal. Occasionally, you need to make the distribution of sample means even bigger still than 30 if the parent distribution that we started with is very, very skewed or has outliers. The Central Limit Theorem is probably the most important idea in statistics.
Thank you and good luck!
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS
A theorem that explains the shape of a sampling distribution of sample means. It states that if the sample size is large (generally n ≥ 30), and the standard deviation of the population is finite, then the distribution of sample means will be approximately normal.