Source: Parmanand Jagnandan (Images from MS ClipArt)
So you go and drink a nice, cool glass of water. There's nothing really like, right? But unfortunately for some places, closer to home than you might realize, there are people that are not able to access clean water to drink. So in a home in a small town in the US, it was found that the chemical atrazine, which is used to produce corn, was found in large concentrations than expected in the drinking water.
Now worried, the scientist testing the water decided to test the water of 20 random homes in the town to see if it was just an issue with that one home or if they should warn the town about the water quality. Now, scientists knew that the safe levels or concentration should be three parts per billion, which is what the national average is and what the EPA recommends.
Now, to give you an idea of what a part per billion is, imagine you had a semi truck that had 13,000 gallons of fuel. And you dropped just one drop of ink into all that fuel. That's about one part per billion. So it might seem like it's completely insignificant, but it actually has drastic effects. Now, scientists wanted to determine if the concentration of atrazine in the town's drinking water is higher than the national average of three parts per billion.
So how can we go about performing this study? So here we can use hypothesis testing. And we also can realize that we're dealing with quantitative data, because we have data on the population's mean. In this case, mu is equal to three parts per billion. And so next thing we want to do when we're doing a hypothesis test is we state our null hypothesis. In this case, we're saying h naught is mu is equal to three parts per billion.
Now, after stating the null hypothesis, we need to state the alternate hypothesis. And in this case, we're going to say that the alternate hypothesis, or at least I am going to say, is that the actual average is greater than three parts per billion. So notice here, we're just testing if the population mean is greater than three parts per billion. Therefore, we're performing a one-tailed test.
For this test, we're going to use the alpha value of 0.01. And you can choose that value as you like. Normally you can choose 0.05 as well. And the reason I'm choosing 0.01 in this case is that should I end up rejecting the null hypothesis, I'll need really strong evidence to support my alternative hypothesis with an alpha value of 0.01.
Now, remember that when performing these types of tests, you need to confirm that a few things are met. So you need to make sure that your sample is being done at random. You can't just sample that house and the neighbor's house and then just a couple houses in the one small little cluster. So that's not a completely random sampling. You need to sample homes all over the town.
Next, we need to make sure that our sampling satisfies a 10% condition. Now, here we're sampling 20 homes. And we know that there are at least 200 homes in the town. So that's met. And finally, we can make a histogram of the data to confirm that our data is approximately normal. And so once we can confirm our conditions are met, we can then model our sample using a t-distribution.
So once I've gathered some data from 20 different households, I plot them on the histogram and I get this. So what does this tell me? Well, if I look at it, I notice it's taking on somewhat of a bell shaped curve, which is indicating normal behavior. Now again, we're going to use a t-distribution to help us approximate this normal behavior.
So why do we use a t-distribution? Well, in this case, again we're dealing with quantitative data. And t-distributions can help us approximate normal distributions when we don't have very many data points. Remember, we only have 20 data points here, or 19 degrees of freedom. And the other thing is we're dealing with one population mean, and that is the concentration of atrazine. So that's another reason why we're using a t-distribution. And finally, because we don't know the standard deviation of the population, we're going to be performing a one-sample t-test.
So now it's time to actually make some calculations. But what exactly are we going to calculate? Well, we need to calculate the t-value to compare with the known t-value at a given alpha level or confidence value. And so to do that, we'll use the formula t, which is this bottom part here, the subscript df is representing the degrees of freedom is equal to the number of samples minus 1. This is a reminder for yourself. And that formula's equal to the sample mean minus the population mean divided by s, which is the sample standard deviation, which is divided by the square root of the number of samples that you're taking.
So notice here we need to list or calculate a number of statistics and parameters. And our parameter, remember, is related to the population. And the only thing we know about the population is that the population mean is equal to three parts per billion. And that's the only thing we really know about the population. Now, the statistics, those are related to the sample. Now, from the data that I showed you earlier-- and you can go back and calculate them if you like-- the mean was 5.3. The standard deviation was 2.557. The number of samples was 20. And the degrees of freedom, df here, was 19, because it's one minus the number of samples.
Now, one thing to note here in this formula is that the sample standard deviation divided by the square root of the number of samples is our standard error for our sample mean. And we're using s instead of sigma, because again, we don't know the population standard deviation. So making our calculations now, we find-- I'm just rewriting these values here-- that this is equal to 5.3 minus 3, all over 2.557 over the square root of 20.
Now simplifying this, 5.3 minus 3 is 2.3. That's going to be times the square root of 20. I'm just moving the square root on the top, because that's what you do when you're dividing by a fraction. You just flip the fraction and multiply by the top. And then that's divided by 2.557. Now simplifying this, I get 10.286 divided by 2.557, which results in a value of 4.023. And so that's my t-value then.
So what do I do with this value? Well, using my t-distribution table, I know that my alpha value was 0.01. And I strictly said that the mu was actually greater than three parts per billion. Now that means then I'm doing a one sided test. Now, I'm going to look at then the 99% confidence interval, because again, that's what my 0.01 relates to for a one sided test. If I was doing a two sided, I would do 99.5%. And I'm looking at 19 degrees of freedom. So that's here. So I just move over, and then I get my value. So my t critical value that I'm comparing to is going to be 2.861.
Now, my value that I calculated was 4.023. And we see that 4.023 is clearly greater than 2.861. Now what does that mean? Well, this means that we have strong evidence that shows that the actual concentration of atrazine in the town's drinking water is higher than the national average. So then I would be able to warn the town that you shouldn't be drinking this water because it's bad for you. And that's basically how you then perform a test for the population means when you don't know the population standard deviation. So I hope this lesson was helpful.