Use Sophia to knock out your gen-ed requirements quickly and affordably. Learn more
×

Calculating Confidence Intervals for Z-Tests

Author: Sophia


What's Covered
In this tutorial, you're going to learn about the basics of confidence intervals. Specifically you will focus on:

1. Confidence Intervals

2. Confidence Intervals for Population Proportions


1. CONFIDENCE INTERVALS 

Some background is that sampling error is the inherent variability in the process of sampling. In a random sample, it occurs when we use a statistic, like a sample mean, to estimate the parameter, like a population mean. We won't always get exactly right on with the sample mean, but we can use it to estimate the population mean. The idea is that we can be close.

When we take a larger sample, we're going to be, on average, closer. The sampling error, which is the amount by which the sample statistic is off from the population parameter, decreases. We get more consistently close values to the parameter when we take larger samples. When we calculate a margin of error in a study, we are approximating the sampling error.

When we take a sample, we try to obtain values that accurately represent what's going on in the population.

Example  An example would be if we took a simple random sample of 500 people getting ready for an upcoming election, in a town of 10,000, and found that 285 of those 500 plan to vote for a particular candidate. Our best guess, for the true proportion, in the population of the town that will vote for Candidate Y, is the proportion that we got in our sample. The proportion is 285 out of 500, which is 57% of the town. That's our best guess, but we might be off by a little bit.

We don't know if the true proportion of people who will vote for that candidate is 57%, and that's why we report a margin of error in our poll. From the margin of error, we can create what's called a confidence interval. 

Term to Know
Confidence Interval
An interval that contains likely values for a parameter. We base our confidence interval on our point estimate, and the width of the interval is affected by confidence level and sample size.

Formula

Confidence Interval

C I equals P o i n t space E s t i m a t e plus-or-minus m a r g i n space o f space e r r o r

The confidence interval is our point estimate, which is our best guess from our simple random sample. In this case, it was 57%, plus or minus the margin of error. We believe we are within a certain amount of the right answer with our point estimate.

The margin of error depends on two things.
1. The sample size. We knew this from before when we said that larger sample size results in less sampling error, and therefore a lower margin of error.

2.Confidence level. We're going to discuss this more later, but a higher confidence level results in a larger margin of error.  For instance, if we want to be very confident that we're going to accurately describe what percent of people are going to vote for that particular candidate, we have to go out a little bit further on each side. Maybe we have to go out plus or minus 5%, as opposed to plus or minus 3%.

Have a look:

If the sampling distribution of p-hat is approximately normal, it will be centered at p, the population parameter. 95% of all sample proportions will be within two standard deviations of p.  So p plus or minus two standard deviations will contain 95% percent of all p-hat. This is called 95% confidence. 19 out of every 20 samples approximately, in the long term, that we take, will be within two standard deviations of the right answer. 95% percent of all p-hats are within two standard deviations of p.
If we want to be more confident, we can go out even further.

For instance, 99% of all p-hats will be within 2.58 standard deviations of p. This means that when we take a sample proportion, 99% percent of sample proportions will be within 2.58 standard deviations of the right answer, the value of p. Take our p-hat value, and plus or minus 2.58 standard deviations, and we're 99% likely to capture the value of p.

These bold words are all going to be replaced with numbers, in typical interpretations:

In our confidence level percent of samples, the sample statistic will be within some value (use the corresponding z-critical value) standard errors of the parameter.

Examples:

In 95% of the samples, all p with hat on top will be within plus-or-minus 2 square root of fraction numerator p q over denominator n end fraction end rootof p.

In 99% of the samples, all p with hat on top will be within plus-or-minus 2.58 square root of fraction numerator p q over denominator n end fraction end rootof p.

What does this look like if we're using means? 

Formula
Confidence Interval of Samples
C I space equals space x with bar on top space plus-or-minus space z asterisk times space bevelled fraction numerator sigma over denominator square root of n end fraction
This means mu, (mu), the parameter, will be contained in the interval statistic, which is x bar, plus or minus z* times the standard error of the statistic, some percent of the time, which may be 99% or 95%. The confidence level determines the value of z*.

What does this look like if we're using proportions? 


2. CONFIDENCE INTERVAL OF PROPORTIONS 

If we're using proportions, that means that the sample proportion, plus or minus z* standard errors, will contain the value of p some percent of the time, such as 95% or 99% of the time.

Formula

Confidence Interval of Proportions
C I space equals space p with hat on top space plus-or-minus space z asterisk times space square root of fraction numerator p q over denominator n end fraction end root

Example

Suppose we have this drug called Obecalp which is a popular prescription drug. It is thought to cause headaches as a side effect. To test, they took a random sample of 206 patients who are taking Obecalp, and 23 got headaches. 

Construct a 95% confidence interval for the proportion of all Obecalp users that would experience headaches. 

If we gave this drug to all the people who are using it, what percent of all of them would be getting headaches? In our sample 23 of the 206 experienced headaches.

1. Verify the conditions necessary for inference. Stating the conditions isn't enough, and it's not just a formality; we have to verify.
2. Calculate the confidence interval.
3. Interpret what it actually means.

Step 1. State what the conditions are.

The requirements are randomness, independence, and normality. 

  • Randomness: The sample that we got of Obecalp users was a random sample, so that’s verified.
  • Independence: The sample of Obecalp users taken was a small fraction of the population of Obecalp users. There's no way to verify that empirically unless we had the whole list of people taking the drug. We're going to have to assume there are at least 2060 people taking this drug.
  • Normality: This np is greater than or equal to 10 thing is a little harder to figure out. We don't know p. The true proportion of people who will get headaches, and we don't have a best guess for it either from a null hypothesis. There is no null hypothesis in this problem.  What we do have, as a point estimate for p, p-hat. Verify normality by using p-hat instead of p. We could say "n times p hat has to be at least 10". In this case 206 times p-hat, 23 out of 206, is 23, which is bigger than 10. Times q-hat is 183, which is also bigger than 10.

n p greater than 10
left parenthesis 206 right parenthesis left parenthesis bevelled 23 over 206 right parenthesis equals 23 comma space 23 greater than 10
left parenthesis 206 right parenthesis left parenthesis bevelled 183 over 206 right parenthesis equals 183 comma space 183 greater than 10

Again, we need to use p-hat to verify the normality condition because we don't know p.

Step 2. Calculate the confidence interval. 

Formula
Confidence Interval of Population Proportion
C I space equals space p with hat on top space plus-or-minus space z asterisk times space square root of fraction numerator p with hat on top q with hat on top over denominator n end fraction end root
To do this, we will take the point estimate, p-hat, plus or minus the z* critical value times the standard error of p-hat, which is the square root of p-hat, q hat over n. Again we're using the p-hat and the q-hat here, because we don't know what p and q are.

The population proportion is not known, so we'll use p-hat for the standard error, or 23 out of 206. The sample size is 206.

To find the z* critical value, we can use a z-table.  For a confidence interval, we can follow the same steps as a two-sided test.  If we have a 95% confidence interval, this actually is the same as a 5% significance level.  However, this is split between two tails, the lower and upper part of the distribution.  Each tail will have 2.5%.  

We can use the upper limit to find the critical z-score.  Remember, a distribution is 100%, so to find the upper limit, we can subtract 0.025 from 1, which gives us 0.975.  Now, we can use a z-table.  

In a z-table, the value 0.975 corresponds with at 1.9 in the left column and 0.06 in the top row. This tells us that the z-score is 1.96. 

Another way is to use a t-table, which you will learn more about in a later tutorial. We don't use the t distribution for proportions, however, we can use the last row in this table to find the confidence levels. 

Z confidence level, critical values, are found in the last row of this t table, under the infinity value, or ">1000". Essentially the normal distribution is the t distribution with infinite degrees of freedom.  We're going to look in this row to find that the z critical value that we should use, which is the same as the 1.96 as we got from before.

Take all of that and put it in the formula:

C I equals p with hat on top plus-or-minus z asterisk times square root of fraction numerator p with hat on top q with hat on top over denominator n end fraction end root
p with hat on top equals bevelled 23 over 206 equals 0.112
q with hat on top equals 1 minus p with hat on top equals 1 minus 0.112 equals 0.888
n equals 206
z asterisk times equals 1.96

C I equals p with hat on top plus-or-minus z asterisk times square root of fraction numerator p with hat on top q with hat on top over denominator n end fraction end root equals 0.112 plus-or-minus 1.96 square root of fraction numerator left parenthesis 0.112 right parenthesis left parenthesis 0.888 right parenthesis over denominator 206 end fraction end root equals

0.112 plus-or-minus 1.96 square root of fraction numerator 0.099456 over denominator 206 end fraction equals end root 0.112 plus-or-minus 1.96 square root of 0.00048 end root equals

0.112 plus-or-minus 1.96 left parenthesis 0.219 right parenthesis equals 0.112 plus-or-minus 0.043

From this formula, we obtain 0.112, which was our  p-hat, plus or minus 0.043, which is the margin of error. When we evaluate the interval, it's going to be 0.069 all the way up 0.155.

u p p e r colon space 0.112 minus 0.043 equals 0.069
l o w e r colon space 0.112 plus 0.043 equals 0.155
left parenthesis 0.069 comma space 0.155 right parenthesis

3. Now we need to interpret this interval. We're 95% certain that if everyone who was taking Obecalp was in the study, the true proportion of all Obecalp users who would experience headaches is somewhere between 6.9% and 15.5%. We don't know exactly where in that range, but the true proportion is probably somewhere in this range.

Depending on what we choose for our confidence level, z* will be affected that way.

Summary

When we take a sample, we obtain a sample statistic that is a point estimate for the population parameters. When we create a confidence interval, we are saying that we are a certain percent confident, like 90% confident, or 95% confident (depending on how many standard deviations or standard errors we go out), that the parameter lies within an interval.

This means that the percent of sample statistics in the sample distribution are within the margin of error of the parameter. Maybe we'll say 95% of all the x bars in the sampling distribution of x bar will be within the margin of error of the true parameter Mu. That percent of confidence intervals will contain the parameters.

If we did samples over and over again, and took confidence intervals each time, 90% or 95% of confidence intervals would contain the answer of Mu, p, or whatever parameters we're trying to estimate.

We can create point estimates for population proportions, which is our sample proportion, and then use that sample proportion to determine the margin of error for a confidence interval. First, we verify the conditions for inference are met, then construct and interpret a confidence interval based on the data that we've gathered and the statistics that we've calculated.

Good luck.

Source: THIS WORK ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS.

Z-table

/

T-table

/
Terms to Know
Confidence Interval

An interval we are some percent certain (eg 90%, 95%, or 99%) will contain the population parameter, given the value of our sample statistic.  We base our confidence interval on our point estimate, and the width of the interval is affected by confidence level and sample size.

Critical Value

A value that can be compared to the test statistic to decide the outcome of a hypothesis test

Margin of Error

An amount by which we believe our sample's mean may deviate from the true mean of the population.

Formulas to Know
Confidence Interval
C I equals P o i n t space E s t i m a t e plus-or-minus m a r g i n space o f space e r r o r
Confidence Interval of Proportions
C I equals p with hat on top plus-or-minus z asterisk times square root of fraction numerator p with hat on top q with hat on top over denominator n end fraction end root
Confidence Interval of Samples

C I equals x with bar on top plus-or-minus z asterisk times space bevelled fraction numerator sigma over denominator square root of n end fraction