In this tutorial, you're going to learn about the basics of confidence intervals. Specifically you will focus on:
Some background is that sampling error is the inherent variability in the process of sampling. In a random sample, it occurs when you use a statistic, like a sample mean, to estimate the parameter, like a population mean. You won't always get exactly right on with the sample mean. Use it to estimate the population mean. The idea is that you can be close.
When you take a larger samples you're going to be, on average, closer. The sampling error, which is the amount by which the sample statistic is off from the population parameter, decreases. The sampling error decreases. You get more consistently close values to the parameter when you take samples. When you calculate a margin of error in a study, you are approximating the sampling error.
When you take a sample, you try to obtain values that accurately represent what's going on in the population.
An example would be if you took a simple random sample of 500 people getting ready for an upcoming election, in a town of 10,000, and found that 285 of those 500 plan to vote for a particular candidate. Your best guess, for the true proportion, in the population of the town that will vote for candidate y is the proportion that you got in your sample. 285 out of 500, which is 57% of the town. That's your best guess but you might be off by a little bit.
You don't know if the true proportion of people who will vote for that candidate is 57%, and that's why you report a margin of error in your poll. From the margin of error you can create what's called a confidence interval. And it looks like this.
Confidence Interval
An interval that contains likely values for a parameter. We base our confidence interval on our point estimate, and the width of the interval is affected by confidence level and sample size.
The confidence interval is your point estimate, which is your best guess from your simple random sample, in this case it was 57%, plus or minus the margin of error. You believe you are within a certain amount of the right answer with your point estimate.
The margin of error depends on two things.
For instance, if you want to be very confident that you're going to accurately describe what percent of people are going to vote for that particular candidate, you have to go out a little bit further on each side. Maybe you have to go out plus or minus 5%, as opposed to plus or minus 3%.
Have a look:
If the sampling distribution of p hat is approximately normal, it will be centered at p, the population parameter. 95% of all sample proportions will be within two standard deviations of p. It’s p plus or minus two standard deviations will contain 95% percent of all p hat. This is called 95% confidence. 19 out of every 20 samples approximately, in the long term, that you take will be within two standard deviations of the right answer. 95% percent of all p hats are within two standard deviations of p.
If you want to be more confident you can go out even further.
For instance 99% of all p hats will be within 2.58 standard deviations of p. This means that when you take a sample proportion, 90% percent of sample proportions will be within 2.58 standard deviations of the right answer. The value of p. Take your p hat value, and plus or minus 2.58 standard deviations, and you're 99% likely to capture the value of p.
This is the interpretation here. These bold words are all going to be replaced with numbers, in typical interpretations.
In c% of samples, in your confidence level percent of samples-- in the previous examples you used 95%, or 99% of samples. The sample statistic, which was your p hat, will be within some number you used 2, and you used 2.58. standard errors. Standard error was the square root of pq over n of the parameter which was your value of p.
What does this look like if you're using means?
If you're using means it looks like that Mu, the parameter, will be contained in the interval statistic, which is x bar, plus or minus z star times the standard error of the statistic. 99%, or 95% of the time. The confidence level determines the value of z star.
If you're using proportions, that means that the sample proportion, plus or minus z star standard errors, will contain the value of p some percent of the time, such as 95% or 99% of the time.
Depending on what you choose for your confidence level, z star will be affected that way.
When you take a sample, the sample statistic that you get is a point estimate for the population parameters. When you create a confidence interval, you are a certain percent confident, like 90% confident, or 95% confident, depending on how many standard deviations or standard errors you go out, that the parameter lies within an interval.
This means that the percent of sample statistics in the sample distribution are within the margin of error of the parameter. Maybe you'll say 95% of all the x bars in the sampling distribution of x bar will be within the margin of error of the true parameter Mu. That percent of confidence intervals will contain the parameters.
If you did samples over and over again, and took confidence intervals each time, 90% or 95% of confidence intervals would contain the answer of Mu or p, or whatever parameters you're trying to estimate.
Good luck.
Source: This work adapted from Sophia Author Jonathan Osters.
An interval that contains likely values for a parameter. We base our confidence interval on our point estimate, and the width of the interval is affected by confidence level and sample size.