In this tutorial, you're going to learn about the distribution of sample proportions. This is going to be called a sampling distribution. Specifically you will focus on:
Many different situations can get you proportions.
Suppose that you were taking a poll during political season, and you calculated the proportion of people that were going to vote for a particular candidate. One thing that's worth noting is those are typically sample proportions. The only way to obtain the true population proportion, which is the parameter we're trying to estimate, is by taking a census. If you had some binomial question type, if you're going to vote for one candidate or the other, and you took a census, you would be able to know the parameter.
In most cases you only deal with samples. You will want to figure out what the distribution of sample proportions actually looks like
Suppose you have a coin and flip it ten times. Obviously, you would expect 50% heads and 50% tails. But as you know it doesn't always work out exactly that way.
Suppose the first time you flipped ten coins, you got 60% heads.
The next time you flipped ten coins, you got 70% heads. It seems like it might change from trial to trial, or sample to sample rather, that proportion of heads. First time, you got 60% heads in your sample. The second time you got 70% heads in your sample. Suppose you do this a lot of times, and obtain sample proportions of heads every time.
Maybe the next time you do it, you get 60% heads, The fourth time, 40% heads, The fifth time, 50% heads finally, And the sixth time 60% percent.
What you can then start to do is start to graph those sample proportions on a dot plot.
Take the 0.6 and graph it, and then the 0.7, then the 0.6 again, stacking up the second dot on top of the first dot. Then the 0.4, 0.4, 0.5, and the third 0.6. If you did this over and over and over and over again, for every possible sample of size ten, you would obtain a distribution that looks like this.
This is the distribution of the sample proportions of heads. This is what's called a sampling distribution of proportions. Notice it peaks at 0.5, exactly where expected. Notice it sort of falls in almost a normal-looking shape off to each side. Very rarely did you get all of them being heads, a sample proportion of of one, and very rarely did you get none of them being heads, a sample proportion of zero.
Notice the average here, the mean, is the value of p, the actual probability of getting heads, which was 0.5. It centers around what the proportion of heads is going to be for a single trial, what the probability of heads is for a single trial. Since the numerator of every sample proportion, the number of successes, was six out of ten or five out of ten. It was the number of successes out of ten.
That number of successes is actually a binomial variable. Either you do it or you don't, and each trial's independent and all of the requirements for it being binomial are there. When we graph the proportion of successes, which is the number of successes, which is binomial, divided by a constant, which is n, the standard deviation will be the standard deviation of the binomial divided by n. So the standard deviation of p-hat is the square root of n times p times q divided by n.
Standard Deviation of a Distribution of Sample Proportions
After some algebra, this n inside that square root and this n down here, simplifies to this square root, the the square root of p times q over n. And this is the standard deviation of the sampling distribution of sample proportions.
Finally, the shape. Center was discussed, being at the probability of success, that was the mean. And the standard deviation, being this value here (squared in red):
You got the value above from the binomial numerator.
You're going to use the binomial numerator again to determine the shape. Since the sampling distribution of sample proportions is a binomial variable divided by a constant, that is it's some number of successes divided by n, the rules for shape of it are going to follow that of the binomial distribution.
That is, it's going to be skewed to the left when the value of p is high and the sample size is low. It's going to be skewed to the right when the proportion, or the probability of success is low and the sample size is low. But then when the sample size is large, it will be approximately normal.
Again, how large is large? When n times p is at least ten and when n times q is at least ten, the distribution of sample proportions will be approximately normal, with the mean of p and the standard deviation of the square root of p times q over n.
This is going to be one of our conditions for inference, if you're going to use normal calculations, which you're going to want to do because they're easy to deal with. You're going to require that n times p is greater than ten, at least ten, and n times q is also at least ten.
Once again, this is the true proportion of success. In this case, it was 0.5. This is the standard deviation of the distribution of sample proportions. It's also called standard error.
The sampling distribution of sample proportions has an approximately normal sampling distribution when the number of trials is large. That's the shape. Its mean is the proportion of successes in the population. That's the center. And the standard deviation of the sampling distribution, which is also called standard error, is the square root of the product of the probabilities of success and failure, divided by the number of trials. That's the spread.
You've learned about the distribution of sample proportions, the standard deviation of a distribution of sample proportions, and standard error, which was the same thing as the standard deviation of the sampling distribution.
Source: This work adapted from Sophia Author Jonathan Osters.
A distribution where each data point consists of a proportion of successes of a collected sample. For a given sample size, every possible sample proportion will be plotted in the distribution.
The square root of the product of the probabilities of success and failure (p and q, respectively) divided by the sample size.
The standard deviation of the sampling distribution of sample proportions.