Source: Graphs created by Jonathan Osters
In this tutorial, you're going to learn about the basics of confidence intervals. Now some background is that sampling error is the variability-- the inherent variability in the process of sampling. And it occurs when we use, in a random sample, a statistic, like a sample mean, to estimate the parameter, like a population mean. We won't always get exactly right on with the sample mean. So we use it to estimate the population mean.
But hopefully we're close. The idea is that we can be close. And when we take a larger samples we're going to be on average closer. So the sampling error, which is the amount by which the sample statistic is off from the population parameter, decreases. The sampling error decreases. We get more consistently close values to the parameter when we take samples.
When we calculate a margin of error in a study, we are approximating the sampling error.
So let's take a look. When we take a sample, we tried to obtain values that accurately represent what's going on in the population. So an example would be if you took a simple random sample of 500 people getting ready for an upcoming election, in a town of 10,000, and found that 285 of those 500 plan to vote for a particular candidate. Then our best guess, for the true proportion, in the population of the town that will vote for candidate y is the proportion that we got in our sample. 285 out of 500, which is 57% of the town. That's our best guess. But we might be off by a little bit.
We don't know if the true proportion of people who will vote for that candidate is 57%, and that's why we report a margin of error in our poll. And from the margin of error we can create what's called a confidence interval. And it looks like this. The confidence interval is our point estimate, which is our best guess from our simple random sample, in this case it was that 57% percent number, plus or minus the margin of error. We believe we are within such and such amount of the right answer with our point estimate.
And the margin of error depends on two things. One thing is the sample size. We knew this from before when we said that larger sample size results in less sampling error, and therefore a lower margin of error. Also our confidence level. We're going to discuss this a little bit further, but a higher confidence level results in a larger margin of error.
For instance, if we want to be very confident that we're going to accurately describe what percent of people are going to vote for that particular candidate, we have to go out a little bit further on each side. Maybe we have to go out plus or minus 5%, as opposed to plus or minus 3%.
So let's give an example. If the sampling distribution of p hat is approximately normal, it will be centered here at p, the population parameter. And 95% of all sample proportions will be within two standard deviations of p. So it'll be p plus or minus two standard deviations will contain 95% percent of all p hat.
This is what's called 95% confidence. 19 out of every 20 samples approximately, in the long term, that we take will be within two standard deviations of the right answer. 95% percent of all p hats are within two standard deviations of p.
But if we want to be more confident we can go out even further. For instance 99% of all p hats will be within 2.58 standard deviations of p. Which means that when we take a sample proportion, 90% percent of sample proportions will be within 2.58 standard deviations of the right answer. The value of p.
And so we'll take our p hat value, and plus or minus 2.58 standard deviations, and we're 99% likely to capture the value of p.
So this is the interpretation here. These bold words are all going to be replaced with numbers, in typical interpretations. So in c% of samples, in our confidence level percent of samples-- in the previous examples we used 95%, or 99% of samples. The sample statistic, which was our p hat, will be within some number we used 2, and we used 2.58. Standard errors. Standard error was the square root of pq over n of the parameter. Which was our value of p.
So what does this look like if we're using means? If we're using means it looks like that Mu, the parameter, will be contained in the interval statistic, which is x bar, plus or minus z star times the standard error of the statistic. 99%, or 95%, or what have you, of the time. The confidence level determines the value of z star.
If we're using proportions, that means that the sample proportion, plus or minus z star standard errors, will contain the value of p some percent of the time. Like 95% or 99% of the time. Depending on what we choose for our confidence level, z star will be affected that way.
And so to recap. When we take a sample, the sample statistic that we get is a point estimate for the population parameters. And when we create a confidence interval, we are a certain percent confident, like 90% confident, or 95% confident, depending on how many standard deviations or standard errors we go out, that the parameter lies within an interval.
So this means that the percent of sample statistics, in the sample distri-- sampling distribution, are within the margin of error of the parameter. So maybe we'll say 95% of all the x bars in the sampling distribution of x bar will be within the margin of error of the true parameter Mu, for example. And that percent of confidence intervals will contain the parameters.
So if we did samples over and over and over again, and took confidence intervals each time, 90% or 95% of confidence intervals would contain the answer of Mu or p, or whatever parameters we're trying to estimate.
So we talked about the basics of confidence intervals. We'll talk a little bit more in subsequent tutorials about how to figure margin of error. Good luck, and I'll see you next time.
Overview
(0:00-1:14) Margin of Error
(1:15-2:03) Example: Polling for an election
(2:04-2:48) The general formula for a confidence interval
(2:49-3:40) Margin of Error is affected by Confidence Level and Sample Size
(3:41-5:27) Discussion around Confidence Levels
(5:28-7:19) Formulas and interpretations of Confidence Intervals
(7:20-8:57) Recap
An interval that contains likely values for a parameter. We base our confidence interval on our point estimate, and the width of the interval is affected by confidence level and sample size.