First, please create an account

Already have a Sophia account?

Z-Test for Population Means

Author: Sophia

Video Chapters

( 00:00 - 00:44 ) Introduction

( 00:45 - 01:39 ) Null Hypothesis

( 01:40 - 03:56 ) Alternate Hypothesis

( 03:57 - 05:14 ) Check for Constraints

( 05:15 - 09:16 ) Calculations

( 09:17 - 13:51 ) Drawing a Conclusion

Video Transcription

Download PDF

So let's look at how to use z-tests to determine the validity of a given population mean. Here we'll determine if the claim population mean recording the amount of sleep adults in Minneapolis get is more than the national average of seven hours per night with a standard deviation of 1.250 hours.

So as you might imagine, this technique is not just related to sleep, but can be applied to many fields where you're interested in determining the validity of a given population mean. In all cases though, when comparing population means, you first need to list the null and alternative hypothesis.

So why do we do that? Well, stating the null hypothesis helps us to determine what we're testing against. It's important to remember that the null hypothesis always assumes that things are working as intended or that there is no significant difference in population and sample means.

So what is our null hypothesis for this problem? So in this case, I'm going to say H naught, which is the type of symbol we use to represent the null hypothesis, that the actual and claimed population means are the same for people in Minneapolis as those around the country, or in other words, the population mean is equal to seven hours.

So once I've stated a null hypothesis, I need to state the alternative hypothesis. So what is this? So remember that the alternative hypothesis says that there is something going on and that the null hypothesis just isn't right. It's basically saying that the claimed and actual population means are not the same.

So in this case, I'm going to say the alternative hypothesis is that the actual and claimed population means are not the same for adults in Minneapolis and elsewhere around the country. So that looks like it's just saying that the amount of sleep adults in Minneapolis get is actually more than seven hours. All right? At least that's what my alternative hypothesis is going to be in this case. Or in other words, the population mean is greater than seven hours. So that's how we would write it out mathematically.

Now one thing to keep in mind is that how you state the alternative hypothesis matters, right? Because I'm saying the actual population mean is higher than the claimed population mean means that I am performing a one-tailed test. OK? If I simply said they were not, that the population mean was not equal to seven hours, this would be a two-tailed test, OK? This here, this is a one-tailed. All right, so something to keep in mind. And you'll see that later.

Now a few things to keep in mind for this problem. First is that we are dealing with one population mean, in this case, average sleep adults in Minneapolis get. Another thing to keep in mind is that for this problem, I'm assuming that the standard deviation for the population is known, which means or allows me to use a z-test. If I did not know the population standard deviation, then I would have to use a t-test instead. OK? So let's keep that in mind.

So where do we go from here? So well, I first need to sample my data, right? And I'm going to sample data from 500 people. And I'm going to choose them at random. I'm going to make sure that the 10% condition is met, meaning that there is at least 5,000 people in Minneapolis, right? 10 times more than the number I sampled.

And I also need to check my data, that it can be modeled using a normal distribution. OK? Only when these conditions are met can we model the sample data with a normal distribution. So doing that, I gathered the data on the amount of sleep, different amounts that people get, and I noticed that, after plotting it on the histogram, that it kind of takes on a bell-shape curve, which is kind of indicating normality. OK?

And another thing to keep in mind is remember that this data is quantitative and we know the population mean, therefore, we perform a z-test for the population, right? And that helps us when we're trying to calculate our answer.

So what exactly do we calculate? Well, we need to calculate a z-score to compare with a known z-score at a given alpha value, right? So that's what the quantitative part lets us know. And the fact that we can model it with a normal distribution.

And to do that, we're going to use the formula-- should be x bar-- that the z-score is equal to the sample mean minus the population mean, divided by the population standard deviation, which is divided by the square root of the number of samples that we took.

And to do this, we need some data related to statistics and parameters. So our statistics, in this case, are related to the sample that we have. And here, I found that the average, which is a weighted average, is 6.954 hours, and the number of samples was 500.

The parameters that we have, which is related to the population, is that mu is assumed to be seven hours, and the standard deviation is assumed to be 1.250 hours.

Now once we figure that out, we need to calculate the standard deviation for the distribution of sample means. OK? Now the standard deviation for the distribution of sample means comes from the central limit theorem, and it basically says that we're basically calculating the standard deviation of the population divided by the square root of the number of samples.

OK, so if we do that calculation, what do we get? We're going to get sigma over the square root of n, which is basically 1.250 over the square root of 500. OK? I'm going to leave that as is for now because you will see it just makes our calculation easier in the end.

So next we need to actually calculate our z-score. Now the formula we had was that the z-score is equal to the sample mean minus the population mean over the standard deviation of the population over the square root of n. OK?

Now notice that this part here is the same here. So I'm just going to end up plugging these values in. So I knew that the sample mean was 6.954 hours, minus the population mean is 7, all over 1.250 over the square root of 500.

Now rearranging this after I do the subtraction on top, I have negative 0.046 times the square root of 500, all over 1.250. Now I can do that because I'm dividing by a fraction in the denominator of this problem. And so I can basically multiply the numerator by the inverse, or just flipping the denominator and just multiplying. OK?

Now solving this, I get that my z-score is going to be approximately negative 0.82. OK? So that's what I get as my z-score.

So now that we have our z-value, what do we do with it? Well, before we do anything, we really need to consider the critical value. That's our alpha. In this case, I'm going to choose 0.05. Now you can choose the alpha value as you like. I usually choose 0.05 or 0.01. OK?

And so what does this mean? Well, 0.05, remember, we're doing a one-tailed test, meaning if I look at the bell-shape curve because it's a one-tailed, I'm going to the 0.05 mark, and I'm on the left side here because I'm working with a negative table. OK? That's what our z-value indicated.

So if we look for 0.05 as a p-value here in the table, so we come across somewhere right around here, see, it's kind of in between these two values. Corresponds to a negative 1.6. If we go up, it's between 0.04 and 0.05. And so we'll say negative 1.645. Just kind of in between the two, all right? Now if it was on the other side, I would get, you know, 0.05 here would correspond to a positive 1.645.

Now one thing to keep in mind here is originally we had said that our alternative hypothesis was that the population mean was actually greater than 7. But when we did our calculations, we found that the sample population was actually less than 7. Because normally, we would have expected to be on this right side here.

Well, we ended up on the left side. So that kind of shows that our alternative hypothesis was incorrect, but it doesn't mean that we automatically fail to reject the null hypothesis. We still can use the data that we gathered to either reject it or fail to reject it still.

So what do we do there? Well, we compare our z-score with the critical z-score. This is the z-critical value.

Now if our value happens to be smaller than this z-value, then we would reject the null hypothesis. But our z-score for the sample was 0.8, or negative 0.82, which is definitely bigger than the critical z-value. It's in here somewhere. And we can kind of see that if we look at what p-value this z-score is related to.

So if we look at negative 0.82, we get negative 0.8, and then 2 is up here. Come all the way down, you see it corresponds to this value here. So this corresponds to a p-value of 0.2061, or 20.61%, where their alpha value corresponded to a 5%. So where the p-value is actually larger than our alpha value, we would say that we actually fail to reject the null hypothesis because the data that we have isn't showing a significant difference in the population means. OK?

So notice the thing here. Our z-value happens to be larger than the critical z-value and we fail to reject. Our p-value happens to be greater than the alpha value and we fail to reject. OK? So try and keep those two things straight.

Now I hope this lesson, this helps your understanding of how to use a z-test to determine the validity of a given population mean.

Video Transcription

So let's look at how to use z-test to determine the validity of a given population mean. Here we'll determine if the claim population mean recording the amount of sleep adults in Minneapolis get is more than the national average of seven hours per night with a standard deviation of 1.250 hours.

So as you might imagine, this technique is not just related to sleep but can be applied to many fields where you're interested in determining the validity of a given population mean. In all cases though, when comparing population means, you first need to list the null and alternative hypothesis.

So why do we do that? Well, stating the null hypothesis helps us to determine what we're testing against. It's important to remember that the null hypothesis always assumes that things are working as intended or that there is no significant difference in population and sample means. So what is our null hypothesis for this problem? So in this case, I'm going to say h naught, which is the type of symbol we use to represent the null hypothesis, that the actual and claimed population means are the same for people in Minneapolis as those around the country. Or in other words, the population mean is equal to seven hours.

So once I've stated a null hypothesis, I need to state the alternative hypothesis. So what is this? So remember that the alternative hypothesis says that there is something going on and that the null hypothesis just isn't right. It's basically saying that the claimed and actual population means are not the same. So in this case, I'm going to say the alternative hypothesis is that the actual and claimed population means are not the same for adults in Minneapolis and elsewhere around the country.

So that is just saying that the amount of sleep that adults in Minneapolis get is actually more than seven hours. At least that's what my alternative hypothesis is going to be in this case. Or in other words, the population mean is greater than seven hours. So that's how we write it out mathematically.

Now, one thing to keep in mind is that how you state the alternative hypothesis matters, because I'm saying the actual population mean is higher than the claimed population mean means that I'm performing a one-tailed test. If I simply said the population mean was not equal to seven hours, this would be a two-tailed test. This here, this is a one-tailed. So something to keep in mind, and you'll see that later.

Now, a few things to keep in mind for this problem. First is that we are dealing with one population mean-- in this case, average sleep adults in Minneapolis get. Another thing to keep in mind is that for this problem, I'm assuming that the standard deviation for the population is known, which allows me to use a z-test. If I did not know the population standard deviation, then I would have to use a t-test instead. So let's keep that in mind.

So where do we go from here? So well, I first need to sample my data. And I'm going to sample data from 500 people. And I'm going to choose them at random. I'm going to make sure that the 10% condition is met, meaning that there's at least 5,000 people in Minneapolis, 10 times more than the number I sampled.

And I also need to check my data that it can be modeled using a normal distribution. Only when these conditions are met can we model the sample data with a normal distribution. So doing that, I gathered the data on the amount of sleep, different amounts that people get, and I notice that after plotting it on a histogram that it takes on a bell shaped curve, which is indicating normality.

Another thing to keep in mind is remember that this data is quantitative. And we know the population mean. Therefore, we perform a z-test for the population. And that helps us when we're trying to calculate our answer. So what exactly do we calculate? Well, we need to calculate a z-score to compare with a known z-score at a given alpha value. So that's what the quantitative part lets us know and the fact that we can model it with a normal distribution.

And to do that, we're going to use the formula x bar that the z-score is equal to the sample mean minus the population mean divided by the population standard deviation, which is divided by the square root of the number of samples that we took. And to do this, we need some data related to statistics and parameters. So our statistics, in this case, are related to the sample that we have. And here, I found that the average, which is a weighted average, is 6.954 hours and the number of samples was 500. The parameters that we have, which is related to the population, is that mu is assumed to be seven hours and the standard deviation is assumed to be 1.250 hours.

Now once we figure that out, we need to calculate the standard deviation for the distribution of sample means. Now, the standard deviation for the distribution of sample means comes from the central limit theorem. And it basically says that we're basically calculating the standard deviation of the population divided by the square root of the number of samples.

So if we do that calculation, what do we get? We're going to get sigma over the square root of n, which is basically 1.250 over the square root of 500. I'm going to leave that as is for now, because you will see it just makes our calculation easier in the end. So next we need to actually calculate our z-score. Now, the formula we had was that the z-score is equal to the sample mean minus the population mean over the standard deviation of the population over the square root of n.

Now, notice that this part here is the same here. So I'm just going to end up plugging these values in. So I know that the sample mean was 6.954 hours minus the population mean of 7, all over 1.250 over the square root of 500. Now, rearranging this after I do the subtraction on top, I have negative 0.046 times the square root of 500, all over 1.250.

I can do that because I'm dividing by a fraction in the denominator of this problem. And so I can basically multiply the numerator by the inverse, or just flipping the denominator and just multiplying. Now solving this, I get that my z-score is going to be approximately negative 0.82. So that's what I get as my z-score.

So now that we have our z-value, what do we do with it? Well, before we do anything, we really need to consider the critical value that's our alpha. In this case, I'm going to choose 0.05. Now, you can choose the alpha value as you like. I usually choose 0.05 or 0.01.

So what does this mean? Well, 0.05, remember we're doing a one-tailed test, meaning if I look at the bell shaped curve because it's a one-tailed, I'm going to the 0.05 mark. And I'm on the left side here, because I'm working with a negative table because that's what our z-value indicated. So if we look for 0.05 as a p value here in the table, so we come across somewhere right around here. See, it's between these two values, corresponds to a negative 1.6. If we go up, it's between 0.04 and 0.05. And so we'll say negative 1.645, just in between the two. Now, if it was on the other side, I would get 0.05 here would correspond to a positive 1.645.

Now, one thing to keep in mind here is originally we had said that our alternative hypothesis was that the population mean was actually greater than 7. But when we did our calculations, we found that the sample population was actually less than 7, because normally we would have expected to be on this right side here. But we ended up on the left side. So that shows that our alternative hypothesis was incorrect, but doesn't mean that we automatically fail to reject the null hypothesis. We still can use the data that we gathered to either reject it or fail to reject it still.

So what do we do there? Well, we compare our z-score with the critical z-score. This is the z critical value. Now, if our value happens to be smaller than this z-value, then we would reject the null hypothesis. But our z-score for the sample was negative 0.82, which is definitely bigger than the critical z-value. It's in here somewhere. And we can see that if we look at what p value this z-score is related to.

So if we look at negative 0.82, we have negative 0.8. And then 2 is up here. All the way down, you see it corresponds to this value here. So this corresponds to a p value of 0.2061 or 20.61%, where our alpha value corresponded to a 5%. So where the p value is actually larger than our alpha value, we would say that we actually fail to reject the null hypothesis, because the data that we have isn't showing a significant difference in the population means.

So notice the thing here. Our z-value happens to be larger than the critical z-value and we fail to reject. Our p value happens to be greater than the alpha value and we fail to reject. So try and keep those two things straight. Now, I hope this lesson has helped your understanding of how to use a z-test to determine the validity of a given population mean.

Z-Tables

Terms to Know

Z-Test for Population Means: A hypothesis test that compares a hypothesized mean from the null hypothesis to a sample mean, when the population standard deviation is known.

Formulas to Know

Z-Statistic for Population Means: $z space equals space fraction numerator x with bar on top space minus space mu over denominator begin display style bevelled fraction numerator sigma over denominator square root of n end fraction end style end fraction$