In this tutorial, you're going to learn about the basics of confidence intervals. Specifically you will focus on:
1. Confidence Intervals
2. Confidence Intervals for Population Proportions
Some background is that sampling error is the inherent variability in the process of sampling. In a random sample, it occurs when we use a statistic, like a sample mean, to estimate the parameter, like a population mean. We won't always get exactly right on with the sample mean, but we can use it to estimate the population mean. The idea is that we can be close.
When we take a larger sample, we're going to be, on average, closer. The sampling error, which is the amount by which the sample statistic is off from the population parameter, decreases. We get more consistently close values to the parameter when we take larger samples. When we calculate a margin of error in a study, we are approximating the sampling error.
When we take a sample, we try to obtain values that accurately represent what's going on in the population.
An example would be if we took a simple random sample of 500 people getting ready for an upcoming election, in a town of 10,000, and found that 285 of those 500 plan to vote for a particular candidate. Our best guess, for the true proportion, in the population of the town that will vote for Candidate Y, is the proportion that we got in our sample. The proportion is 285 out of 500, which is 57% of the town. That's our best guess, but we might be off by a little bit.
We don't know if the true proportion of people who will vote for that candidate is 57%, and that's why we report a margin of error in our poll. From the margin of error, we can create what's called a confidence interval.
An interval that contains likely values for a parameter. We base our confidence interval on our point estimate, and the width of the interval is affected by confidence level and sample size.
The confidence interval is our point estimate, which is our best guess from our simple random sample. In this case, it was 57%, plus or minus the margin of error. We believe we are within a certain amount of the right answer with our point estimate.
The margin of error depends on two things.
1. The sample size. We knew this from before when we said that larger sample size results in less sampling error, and therefore a lower margin of error.
2.Confidence level. We're going to discuss this more later, but a higher confidence level results in a larger margin of error. For instance, if we want to be very confident that we're going to accurately describe what percent of people are going to vote for that particular candidate, we have to go out a little bit further on each side. Maybe we have to go out plus or minus 5%, as opposed to plus or minus 3%.
Have a look:
If the sampling distribution of p-hat is approximately normal, it will be centered at p, the population parameter. 95% of all sample proportions will be within two standard deviations of p. So p plus or minus two standard deviations will contain 95% percent of all p-hat. This is called 95% confidence. 19 out of every 20 samples approximately, in the long term, that we take, will be within two standard deviations of the right answer. 95% percent of all p-hats are within two standard deviations of p.
If we want to be more confident, we can go out even further.
For instance, 99% of all p-hats will be within 2.58 standard deviations of p. This means that when we take a sample proportion, 99% percent of sample proportions will be within 2.58 standard deviations of the right answer, the value of p. Take our p-hat value, and plus or minus 2.58 standard deviations, and we're 99% likely to capture the value of p.
These bold words are all going to be replaced with numbers, in typical interpretations:
In our confidence level percent of samples, the sample statistic will be within some value (use the corresponding z-critical value) standard errors of the parameter.
In 95% of the samples, all will be within of p.
In 99% of the samples, all will be within of p.
What does this look like if we're using means?
Confidence Interval of Samples
This means mu, (), the parameter, will be contained in the interval statistic, which is x bar, plus or minus z* times the standard error of the statistic, some percent of the time, which may be 99% or 95%. The confidence level determines the value of z*.
What does this look like if we're using proportions?
If we're using proportions, that means that the sample proportion, plus or minus z* standard errors, will contain the value of p some percent of the time, such as 95% or 99% of the time.
Confidence Interval of Proportions
Suppose we have this drug called Obecalp which is a popular prescription drug. It is thought to cause headaches as a side effect. To test, they took a random sample of 206 patients who are taking Obecalp, and 23 got headaches.
Construct a 95% confidence interval for the proportion of all Obecalp users that would experience headaches.
If we gave this drug to all the people who are using it, what percent of all of them would be getting headaches? In our sample 23 of the 206 experienced headaches.
1. Verify the conditions necessary for inference. Stating the conditions isn't enough, and it's not just a formality; we have to verify.
2. Calculate the confidence interval.
3. Interpret what it actually means.
Step 1. State what the conditions are.
The requirements are randomness, independence, and normality.
Again, we need to use p-hat to verify the normality condition because we don't know p.
Step 2. Calculate the confidence interval.
Confidence Interval of Population Proportion
To do this, we will take the point estimate, p-hat, plus or minus the z* critical value times the standard error of p-hat, which is the square root of p-hat, q hat over n. Again we're using the p-hat and the q-hat here, because we don't know what p and q are.
The population proportion is not known, so we'll use p-hat for the standard error, or 23 out of 206. The sample size is 206.
To find the z* critical value, we can use a z-table. For a confidence interval, we can follow the same steps as a two-sided test. If we have a 95% confidence interval, this actually is the same as a 5% significance level. However, this is split between two tails, the lower and upper part of the distribution. Each tail will have 2.5%.
We can use the upper limit to find the critical z-score. Remember, a distribution is 100%, so to find the upper limit, we can subtract 0.025 from 1, which gives us 0.975. Now, we can use a z-table.
In a z-table, the value 0.975 corresponds with at 1.9 in the left column and 0.06 in the top row. This tells us that the z-score is 1.96.
Another way is to use a t-table, which you will learn more about in a later tutorial. We don't use the t distribution for proportions, however, we can use the last row in this table to find the confidence levels.
Z confidence level, critical values, are found in the last row of this t table, under the infinity value, or ">1000". Essentially the normal distribution is the t distribution with infinite degrees of freedom. We're going to look in this row to find that the z critical value that we should use, which is the same as the 1.96 as we got from before.
Take all of that and put it in the formula:
From this formula, we obtain 0.112, which was our p-hat, plus or minus 0.043, which is the margin of error. When we evaluate the interval, it's going to be 0.069 all the way up 0.155.
3. Now we need to interpret this interval. We're 95% certain that if everyone who was taking Obecalp was in the study, the true proportion of all Obecalp users who would experience headaches is somewhere between 6.9% and 15.5%. We don't know exactly where in that range, but the true proportion is probably somewhere in this range.
Depending on what we choose for our confidence level, z* will be affected that way.
When we take a sample, we obtain a sample statistic that is a point estimate for the population parameters. When we create a confidence interval, we are saying that we are a certain percent confident, like 90% confident, or 95% confident (depending on how many standard deviations or standard errors we go out), that the parameter lies within an interval.
This means that the percent of sample statistics in the sample distribution are within the margin of error of the parameter. Maybe we'll say 95% of all the x bars in the sampling distribution of x bar will be within the margin of error of the true parameter Mu. That percent of confidence intervals will contain the parameters.
If we did samples over and over again, and took confidence intervals each time, 90% or 95% of confidence intervals would contain the answer of Mu, p, or whatever parameters we're trying to estimate.
We can create point estimates for population proportions, which is our sample proportion, and then use that sample proportion to determine the margin of error for a confidence interval. First, we verify the conditions for inference are met, then construct and interpret a confidence interval based on the data that we've gathered and the statistics that we've calculated.
Source: THIS WORK ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS.