Source: Tables created by the author; t-test, Creative Commons: maths.dur.ac.uk/stats/courses/tables/t.ps
In this tutorial you're going to learn about confidence intervals for a population proportion.
So let's start with an example. Suppose we have this drug called Obecalp, and it's a popular prescription drug. But it's thought to cause headaches as a side effect. So to test they took a simple, random sample of 206 patients who are taking Obecalp, and 23 got headaches. Construct a 95% confidence interval for the proportion of all Obecalp users that would experience headaches.
So this is-- if you gave this drug to all the people who are using it, what percent of all of them would be getting headaches? In our sample 23 of the 206 experienced headaches.
So let's run back through what the process for constructing confidence intervals is. So, first we're going to verify the conditions necessary for inference. Stating the conditions isn't enough, and it's not just a formality. We have to verify. Second, we have to calculate the confidence interval. And then finally we have to interpret what it actually means.
So let's state what the conditions are here. The requirements are randomness, independence, and normality. In this case randomness means that the sample that we got of Obecalp users was a random sample. And it was. So that one's verified.
What about independence? We have to show that the sample of Obecalp users that we took is a small fraction of the population of Obecalp users. There's no way to verify that empirically unless we had like the whole list of people taking the drug. So we're going to have to assume there at least over, at least 2060 people taking this drug.
And then finally, this np is greater than or equal to 10 thing is a little harder to figure out. We don't know p. The true proportion of people who will get headaches, and we don't have a best guess for it either from a null hypothesis. There is no null hypothesis in this problem.
What we do have, as a point estimate for p, p hat. And so we're going to verify normality by using p hat instead of p. So we're going to say n times p hat has to be at least 10. And in this case 206 times p, 23 out of 206, is 23, which is bigger than 10. And times q hat is 183, which is also bigger than 10.
And again we need to use p hat to verify the normality condition because we don't know p.
Second we're going to calculate the actual interval. We're going to do the point estimate, p hat, plus or minus the z star critical value times the standard error of p hat, which is the square root of p hat, q hat over n. Again we're using the p hat and the q hat here, because we don't know what p and q are.
Again the population proportion is not known, so we're again using p hat for the standard error. P hat is 23 out of 206. We knew that from the problem. The sample size is 206.
And as you look at this page you might be saying, hey, this is the t distribution. We don't use the t distribution for proportions. And I would agree with you. Where we have to look on this sheet, though, is down here in the confidence levels. We need a confidence level for our confidence interval. And we're going to find it down here.
Z confidence level, critical values, are found in the last row of this t table, under the infinity value. Essentially the normal distribution is the t distribution with infinite degrees of freedom. So we're going to look in this row to find that the z critical value that we should use is 1.96.
So we're going to take all of that, and put it into the formula here. And obtain 0.112, which was our p hat, plus or minus 0.043. This is our margin of error. It's 4.3%. So, when we actually evaluate the integral it's going to be 0.069 all the way up 0.155.
Now we need to interpret this interval here. This is about 7%. This is about 15 and 1/2%. so we're 95% confident that the true proportion-- So again, that's if everyone did it, the population proportion. If everyone who was taking Obecalp was in the study, the true proportion of all Obecalp users who would experience headaches is somewhere between 6.9% and 15.5%.
It's likely somewhere in that range. We don't know exactly where in that range, but the true proportion is probably somewhere in this range.
And so to recap. We can create point estimates for population proportions, which is our sample proportion, and then use that sample proportion to determine the margin of error for a confidence interval. First we verify the conditions for inference are met, then we can construct and interpret a confidence interval based on the data that we've gathered. And the statistics that we've calculated.
So we talked about confidence intervals for a population proportion. Good luck, and we'll see you next time.
A confidence interval that gives a likely range for the value of a population proportion. It is the sample proportion, plus and minus the margin of error from the normal distribution.