Source: Parmanand Jagnandan (Images from MS ClipArt)
So here we'll take a look at how to determine what type of hypothesis testing or inference test you should perform on a given data set. And so first we need to ask ourselves if we're dealing with qualitative or quantitative data. So that's the first thing here, the first level that you need to think about. And when dealing with qualitative data, this stuff over here, we'll perform a test related to population proportions. In this case, we need to consider if we're dealing with one or two or more population proportions.
So when dealing with one population proportion, we'll perform a one proportions z-test to model our data using a normal distribution. So that's with one population. And when dealing with two or more populations, we're going to use a chi-squared test. And in this case, we will need to determine if we are testing for goodness of fit, homogeneity, or association and independence.
So when dealing with quantitative data, we will perform a test related to population means. And when dealing with one population mean, we'll perform a one sample z-test or a one sample t-test. And that depends on whether or not we know the population standard deviation. If we do, we do the z-test. If we don't, we do the t-test. Now, if we have three or more population means, we're going to use an ANOVA F-test. And if our data has one characteristic, we're going to use a one way ANOVA test. And if it has two or more characteristics, we're going to use a two way ANOVA test.
So let's look at some examples to give you some idea of how this can be used to do actual problem solving. So in this problem, we're going to look at if a claim is true. So suppose you hear that four out of five dentists recommend a certain type of toothpaste. And after taking a sample of 100 dentists, you found that 75 dentists would recommend the toothpaste. So was the claim accurate? And so here, what kind of tests are you going to use to try and figure this out?
Well, the first thing we need to note is that we're dealing with categorical data here. We're looking at dentists and if they recommend something or they don't recommend something. We're not really dealing with calculating means. And we also need to think about, well, how many proportions do we have? And here we only have one proportion that's 75 out of 100 dentists. And therefore, we're going to perform a one proportion z-test.
Let's look at another example. So suppose you flip a coin 100 times and recorded the number of heads and tails. In this case, we would expect that there would be 50 heads and 50 tails. But our data showed 30 heads and 70 tails. So how can you tell if the coin that you're flipping is fair? And what tests should we use? So first, we need to consider the type of data that we're dealing with. And so notice here, we have heads and tails to record, which are categorical data because the data just falls into two categories, heads or tails.
So we're also dealing with population proportions in regards to heads and tails. So we're dealing with two populations, heads and tails. And therefore, we're going to use a chi-squared test. But what kind of chi-squared test should we be using? Well, since we're comparing observed data to expected data, we're going to be using a chi-squared test for goodness of fit.
So let's look at another example. So suppose you want to determine the effectiveness of the flu vaccine in preventing the chance of someone getting the flu. And so you gather data on 500 people where 250 had the flu vaccine and 250 didn't get the flu vaccine. And you also record who got the flu and who did not get it. So what type of tests would you use to determine if the flu vaccine was effective or not?
So in this case, we need to ask ourselves again what kind of data are we dealing with? And we're looking at those that got the flu vaccine and those who did not, as well as the number of people that caught the flu. So notice here that we're dealing with two populations, those that got the flu and those who didn't. So therefore, we're going to use a chi-squared test again. So notice we're trying to determine if the flu vaccine was effective or not across two populations we're considering. And because we're doing it across two populations, we're going to use a chi-squared test for homogeneity.
Let's look at another example. So suppose we want to determine if gender affects whether or not someone likes an apple, orange, or banana. So how are we going to test this? Well, we need to ask ourselves what kind of data are we dealing with? In this case, we're dealing with data that can be categorized by names-- apples, oranges, and bananas, which are categorical data. And we also notice that we're dealing with two populations, men and women. Therefore, we're going to use a chi-squared test. And because we're trying to determine how apples, oranges, and bananas are related to each population, we're going to use a chi-squared test for association or independence.
Now, suppose you're trying to determine if the overall standardized test scores on a given test across different states are equal for high school students trying to enter college. So what kind of test should we use? So notice that we are dealing here with qualitative data. Remember, that's the first thing you should always ask yourself. What kind of data am I dealing with?
And we're also dealing with several population means. In this case, 50 population means, one for each state. So that means we're going to be using an ANOVA F-test. And so when you're dealing with three or more populations, we perform an ANOVA F-test. But if you're just dealing with two population means, we use a special type of student t-test, which really isn't part of this course. And so in this case, we're looking at one characteristic of the data that is overall test scores. And because we're just looking at one characteristic, we're going to use a one way ANOVA F-test.
So now suppose you want to determine how students in different states are performing on the math and English sections of the exam. So how are we going to test this? Well, you need to think about what kind of data am I dealing with? And here we're dealing with mean test scores, which are quantitative data. And we're dealing with multiple populations-- in this case, up to 50 population means, because we're having one for each state. So again, because we have so many population means, we're going to use an ANOVA F-test. And in this case, we're looking at two characteristics of the data tests, test scores on the math and the English section. And so here, we're going to use a two way ANOVA F-test.
Now, suppose we're concerned with the test scores of students in a particular state taking a given standardized test. So how are we going to test this? Well, again, what kind of data are we dealing with? So here we're dealing with mean test scores, which are quantitative data. And we're dealing with one population mean-- in this case, Minnesota's population mean. So we're going to use a one sample test. And in this case, we're looking at one characteristic of the data, the overall test score. Therefore, if we don't know the standard deviation of the entire population that took the test, we would use a one sample t-test.
But if we did know the standard deviation of the population that took the test, then we would use a one sample z-test. And so I hope this lesson has helped your understanding at how to perform different types of hypothesis or inference test you're likely to encounter when you're in a statistics course and to better understand when to apply one over the other.