Source: Images and Charts created by Author
In this tutorial, you're going to learn about the distribution of sample proportions. This is going to be called a sampling distribution.
Now many different situations can get you proportions. Suppose that you were taking a poll during political season, and you calculated the proportion of people that were going to vote for a particular candidate. One thing that's worth noting is those are typically sample proportions. The only way to obtain the true population proportion, which is the parameter we're trying to estimate, is by taking a census. If you had some binomial question type, if you're going to vote for one candidate or the other, and you took a census, you would be able to know the parameter.
In most cases, though, we only deal with samples. And so we're going to want to figure out what the distribution of sample proportions actually looks like. So suppose we have a fair coin and we flip it ten times. Obviously, we would expect 50% heads and 50% tails. But as you know, when you flip coins, it doesn't always work out exactly that way.
Suppose the first time I flipped ten coins, I got 60% heads. And maybe the next time I flipped ten coins, I got 70% heads. So it seems like it might change from trial to trial, or sample to sample rather, that proportion of heads. First time, I got 60% heads in my sample. The second time I got 70% heads in my sample. Now suppose that we do this a lot of times, and obtain sample proportions of heads every time. Maybe the next time I do it, I get 60% heads, The fourth time, 40% heads, The fifth time, 50% heads finally, And the sixth time 60% percent.
What we can then start to do is start to graph those sample proportions on a dot plot. So we'll take the 0.6 and graph it, and then the 0.7, then the 0.6 again, stacking up the second dot on top of the first dot. And then the 0.4, 0.4, 0.5, and the third 0.6. If we did this over and over and over and over again, for every possible sample of size ten, we would obtain a distribution that looks like this.
This is the distribution of the sample proportions of heads. This is what's called a sampling distribution of proportions. Notice it peaks here at 0.5, exactly where we thought it would. And notice it sort of falls in almost a normal-looking shape off to each side. Very rarely did we get all of them being heads, a sample proportion of of one, and very rarely did we get none of them being heads, a sample proportion of zero.
Notice the average here, the mean, is the value of p, the actual probability of getting heads, which was 0.5. So it centers around what the proportion of heads is going to be for a single trial, what the probability of heads is for a single trial. And then since the numerator of every sample proportion, the number of successes, right, we're taking the number of successes-- like in the the previous examples. It was six out of ten or five out of ten. It was the number of successes out of ten.
That number of successes is actually a binomial variable. Either you do it or you don't, and each trial's independent and all of the requirements for it being binomial are there. So that when we graph the proportion of successes, which is the number of successes, which is binomial, divided by a constant, which is n, the standard deviation will be the standard deviation of the binomial divided by n. So the standard deviation of p-hat is the square root of n times p times q divided by n.
So when we do some algebra, this n inside that square root and this n down here, doing some algebra simplifies to this square root, the the square root of p times q over n. And this is the standard deviation of the sampling distribution of sample proportions.
And then finally, the shape. We talked about center, being at the probability of success, that was the mean. And the standard deviation, being this value here, which we got from the binomial numerator. And we're going to use the binomial numerator again to determine the shape. Since the sampling distribution of sample proportions is a binomial variable divided by a constant, that is it's some number of successes divided by n, the rules for shape of it are going to follow that of the binomial distribution.
That is, it's going to be skewed to the left when the value of p is high and the sample size is low. It's going to be skewed to the right when the proportion, or the probability of success is low and the sample size is low. But then when the sample size is large, it will be approximately normal.
And so again, how large is large? When n times p is at least ten and when n times q is at least ten, the distribution of sample proportions will be approximately normal, with the mean of p and the standard deviation of the square root of p times q over n.
So this is going to be one of our conditions for inference, if we're going to use normal calculations, which we're going to want to do because they're easy to deal with. We're going to require that n times p is greater than ten, at least ten, and n times q is also at least ten.
And once again, this is the true proportion of success. In our case, it was that 0.5. And this is the standard deviation of the distribution of sample proportions. We also call that standard error.
And so to recap. The sampling distribution of sample proportions has an approximately normal sampling distribution when the number of trials is large. That's the shape. Its mean is the proportion of successes in the population. That's the center. And the standard deviation of the sampling distribution, which is also called standard error, is the square root of the product of the probabilities of success and failure, divided by the number of trials. That's the spread.
So we've talked about the distribution of sample proportions, the standard deviation of a distribution of sample proportions, and standard error, which was the same thing as the standard deviation of the sampling distribution. Good luck and we'll see you next time.
The standard deviation of the sampling distribution of sample proportions.
The square root of the product of the probabilities of success and failure (p and q, respectively) divided by the sample size.
A distribution where each data point consists of a proportion of successes of a collected sample. For a given sample size, every possible sample proportion will be plotted in the distribution.