This lesson will explain the conditions that need to be met for a z-test or t-test.
This tutorial will show you how to address and check the conditions for both z-tests and t-tests. You’ll learn about:
For both z-tests and t-tests, the conditions are the same. However, you may recall that for z-tests, the population standard deviation has to be known, and for t-tests, the population standard deviation is unknown.
When performing a hypothesis test for a population mean, there are three conditions.
1. First, are the data collected in some random way? The purpose is to make sure there's not any bias in the sample. Ideally, you want a simple random sample from the population or to be able to treat our data as being a simple random sample. Cluster samples are typically okay, as are stratified random samples. The randomness is what matters most.
2. Second, the independence condition. You want to make sure that each observation doesn't affect any other observation. There are a couple ways to do that:
3. Finally, Is the sampling distribution approximately normal? The distribution of sample means the sampling distribution will be nearly normal in two cases:
Many customers pay attention to the nutritional contents on packaged foods, so it's important that the information be accurate. Here is a list of the calorie contents of some frozen dinners:
The reported calorie content was 240. So check to see if the conditions for inference are met: do the data support or refute the idea that the calorie content is in fact 240?
As you can see, it's approximately symmetric, mound-shaped. So the idea that this could have come from a normal distribution is a reasonable assumption for you to be making.
So the three conditions have been checked and verified.
Renee wants to know the average weight of women at her health club. She stands at the door and asks the first 20 women who enter if they'll step on the scale. Here are the weights of the women who said yes:
Are the conditions for inference met here?
The data were not collected randomly. Renee stood at the door and asks the first 20 women who enter if they'll step on the scale. Maybe not all 20 actually did. Maybe the first 20 women who said yes are here. Ultimately, the sample was a convenience sample, not a random sample, and it probably suffers from voluntary response bias. The women who maybe are heavier might be more self-conscious about stepping on the scale to give Renee their weight. So maybe the sample will have bias that underestimates the average weight of women in the health club.
You can't do a test of significance. You can't do a confidence interval. There's no rescuing poorly collected data. So you don't need to check the remaining conditions, because inference will not be appropriate for the data, even if the other two conditions were met.
The conditions for running a hypothesis test -- a z-test or a t-test-- are as follows:
The randomness condition. How were the data collected? It should be some kind of random way. If it's not, you actually can't proceed.
Independence. Is the population large in comparison to your sample, at least 10 times your sample size?
And normality. Remember, there are three ways to do this. You should reference the central limit theorem if there are at least 30 observations in your sample. Or, there are two ways to verify that the parent distribution is normal. Either it will say in the problem, if you get lucky, or you have to actually graph the data and look for a mound-shaped, approximately symmetric, single-peaked distribution of data with no outliers.
And so those are the z-test conditions and t-test conditions. Remember, there was one additional condition for the z-test that required that we know the population's standard deviation.
Thank you and good luck!
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS
The data were collected in a random way, each observation must be independent of the others, and the sampling distribution must be normal or approximately normal
The data were collected in a random way, each observation must be independent of the others, the sampling distribution must be normal or approximately normal, and the population standard deviation must be known