Source: TABLES CREATED BY JONATHAN OSTERS; T-TEST; CREATIVE COMMONS: MATHS.DUR.AC.UK/STATS/COURSES/TABLES/T.PS
In this tutorial, you're going to learn about confidence intervals, specifically that use the t-distribution. That means that these are going to be confidence intervals for population means. So let's take a look at an example.
A lot of the times, consumers will pay attention to nutritional contents on packaged food. So it's important to make them accurate as to what their contents actually contain. A random sample of 12 frozen dinners was selected, and the calorie contents of each one was determined. And the stated calorie content was 240. And so one of the boxes contained actually 255 calories' worth of food. Whereas this one over here only contained 225 calories' worth of food.
Now what I want to do is construct a 90% confidence interval for the true mean number of calories. So I want to construct a confidence interval such that I'm 90% confident that the true mean of all the packaged frozen dinners lies within the interval. Now doing a confidence interval is a lot like doing a hypothesis test, and there's a lot of the same requirements.
First, we're going to verify the conditions for inference are, in fact, met. Then, we're going to calculate the confidence interval and, finally, interpret it in the context of the problem. So let's look at the conditions for this problem. The randomness condition. How are the data calculated? It was a random sample, said so in the problem. How about the independence condition? Is the population of all frozen dinners at least 10 times the size of our sample? That's reasonable to believe. Assume there are at least 120 frozen dinners in all of this company's frozen dinner line.
And finally, the normality condition. This one's a little tricky. Our sample size isn't 30 or larger, so the central limit theorem doesn't apply to this problem. And is the parent distribution normal? Well, we don't know that either. We need to determine if this is, in fact, plausible. And we can do that by graphing the actual data that we have. When we do that, we see that the parent distribution might be normal since the data that we got from the distribution-- from the population, rather, are single peaked and approximately symmetric. So it's possible that the population parent distribution is normal. And so we can proceed under the assumption of normality. We weren't able to verify it 100%, but we can go with it for the purposes of this problem.
Next, we're going to calculate the confidence interval. Now normally what we would like to do is take the sample mean plus or minus some number of standard deviations times the standard error, because this is a sampling distribution. The only problem is this formula has a sigma in it. And we don't know what the population standard deviation is. So we have to replace this formula with one that uses s. But because we're using s as a stand-in for sigma, we need to use the t-distribution instead.
We have this information from our sample. That's the sample mean when we calculated the mean of all of those 12 dinners that we had. So there were 12 of them. This was the mean. And this was the sample standard deviation.
What we need to do is we actually need to figure out what that t star value is going to be. If you look closely at this model here up top, this is a t-distribution. It says t star. What that means is we need a t that will give us 90% of the t-distribution. Now one way to do it is to look and see, hey, the upper tail probability would then be 0.05, because then there would be a lower tail probability that was also 0.05. And that would give us 90% in the middle. Or we can look all the way down here, at the bottom, and see that there is a row way down here that says confidence level c. And there's 50%, 60%, 70%, 80%, 90%. And so either one of those justifications is reason enough to use this column.
The next thing we need to know is which number from this column we're going to use. What we're going to check is the degrees of freedom row. In this problem, we had 11 degrees of freedom because we had 12 dinners in our sample, and the degrees of freedom is n minus 1. So we're going to look in the 11 degrees of freedom row and the 90% confidence column until we obtain a t star of 1.796.
Now we have all the information we need in order to create our confidence interval. So we're going to construct it as x bar plus or minus the t critical value times the sample standard deviation divided by the square root of sample size. When we do that, we obtain 244.33 plus or minus 6.65 is the result of all this multiplying and dividing here. When you subtract and then add, you have 237.68 for the lower bound and 249.98 for the upper bound.
So what does this confidence interval actually mean? How can we interpret the interval? The interpretation is that we're 90% confident that the true mean calorie content of all frozen dinners is between about 237 and 250 calories. We're 90% confident that the real value is somewhere in there, and that 240 value that they were purporting at the beginning of the problem is, in fact, plausible.
And so to recap, we can create point estimates for the population means using x bar, and determine the margin of error. That margin of error is the t star times s over the square root of n piece of the confidence interval. First, we verify that conditions are met. Then we construct and interpret the confidence interval. So we talked about confidence intervals specifically for means using the t-distribution. Good luck, and we'll see you next time.
An interval we are some percent certain (eg 90%, 95%, or 99%) will contain the population parameter, given the value of our sample statistic.
A family of distributions similar to the standard normal distribution, except that they are fatter in the tails, due to the increased variability associated with using the sample standard deviation instead of the population standard deviation in the formula for the test statistic.