Source: All graphs created by Dan Laub. Image of z-tables, PD, https://upload.wikimedia.org/wikipedia/commons/2/25/The_Normal_Distribution.svg; Image of sports car, PD, http://bit.ly/1J27jRx; Image of house, PD, http://bit.ly/1NT9Km8
[MUSIC PLAYING] Hi, Dan Laub here. And in this lesson, we're going to discuss using z-scores to find a probability. But before we get started, let's discuss the objective for this lesson. By the end of this lesson, we want to be able to use a z-score probability table to determine the likelihood of a range of values of a normally distributed variable occurring. So let's get started.
The graph you see here is that of a standard, normally distributed variable, otherwise known as a z-distribution. z-scores are important because they allow us to know the position of a value in a normal distribution. When we know a z-score associated with the value of a variable, we were able to determine the likelihood that an event will take place. Simply having a solid understanding of z-scores and how they are associated to probability allows one to easily determine the likelihood of an event occurring.
In fact, approximately 68% of values in a normally distributed variable fall between a z-score of negative 1 and 1. 95% of values fall between a value of negative 2 and 2. And 99.7% of values fall between a z-score of negative 3 and 3. If we were to think of likely values in a normal distribution, they would fall in the 95% between a z-score of negative 2 and 2.
In fact, if a value has a large positive z-score, we would consider it to be unusually large, and not very likely to occur. Along the same line of thought, if a value has a large negative z-score, we would consider it to be unusually small, and not very likely to occur as well.
Suppose we were to take a look at the mean area of all homes in the United States in terms of square feet. If the mean area of a home in the United States is 2,400 square feet, with a standard deviation of 400 square feet, well, if we have a home that has 1,450 square feet, the z-score would be equal to negative 2.375.
If there was a home with an area of 2,300 square feet, the z-score would be equal to negative 0.25 using the same z-score formula. If there was a home with an area 4,080 square feet, the z-score would be equal to 4.2 based upon the same calculations.
Based on these z-score values, we can determine whether or not a value is likely to occur. For instance, the 2,300 square foot home is likely to occur in a sample of homes drawn from a normally distributed population. But the 1,450 square foot home would not be very likely. And the 4,080 foot square home would be very unlikely, as you can see here on the distribution in front of you.
Since the z-distribution is a standard normal distribution, z-scores and probabilities that are associated to a specific z-score can be represented on standard tables. This being the case, one can use a table to find the areas under a z-distribution curve. When viewing a normal distribution graph, the area under the curve is associated with the probabilities of a value occurring. So knowing the area corresponding to a z-score tells us about the probability of an event taking place.
As you can see, the positive z-table begins at the center of the z-distribution, where z is equal to 0 and represents the mean of the distribution. The negative z-table, on the other hand, starts out at the left of the distribution, and shows increasing z values up into the point of z being equal to 0. The values for both z-tables correspond with the same information that is illustrated in a z-distribution graph, but actually tell us more about the specific likelihood of an event occurring.
If we are provided with a negative z-score, we use the negative z-table to find the area under the curve to the left of the given z-score. This area under the curve is equal to the proportion of all z-scores in this range. When we have a positive z-score, the positive z-table lets us find the area under the curve to the left of that given z-score. This area under the curve is equal to the proportion of all z-scores in this range.
So for example, let's use ACT scores. So the American College Test is taken by high school students all across the United States as a means of determining their aptitude for attending college. And let's suppose in this instance that the mean score for the population is 21, and the standard deviation in this case is 5. Well, how will we determine the probability that a score would fall within a particular range?
Let's consider the probability that the score would be higher than 30. How do we determine what that value is? Well, the first thing we do is use the z-score formula, as you see illustrated here. And we figure out what the z-score is. In this case, it is the difference between 30 and 21, which is 9, divided by the standard deviation of 5, which gives us a z-score of 1.8.
And as you can see here on the z-table, that gives us a probability value of 0.9641. Now, what does that tell us? It tells us that the area to the left of that point is the 0.9641, or roughly 96% of the area under the curve falls to the left of z is equal to 1.8. In order to determine the probability that the score is greater than 30, we're interested in the difference between 0.9641 and 1, which gives us a probability of 0.0359.
What if we're interested in the probability that a score falls between 23 and 27? In this instance, we need to calculate two different z-scores, one for 23 and one for 27. Now we locate both of these z-scores on the z-table, as you see done here. And we simply subtract the difference between the greater value and the lower value. That is the probability that a randomly drawn score would fall between the range of 23 and 27.
What if we're looking at some areas to the left of the mean? So suppose we're interested in the probability that a score falls between 15 and 20. We run the same type of calculations. And for this case, the z-score for a 20 would be equal to a negative 0.2. The z-score for 15 would be equal to negative 1.2. And once again, we figure out the probabilities associated with each value, and we simply subtract the difference, which in this case works out to be a probability of 0.3056.
And for the fourth example, let's look at the probability that a score will be less than 20. Here we simply need to find one z-score, the z-score for 20, which you see here, and we've used in previous calculations, is equal to negative 0.20. The probability for this value is 0.4207, which tells me that that's the probability that a score would be below 20 based upon a normal distribution.
What if we're interested in determining the value of a variable based upon a pre-chosen probability? So hypothetically speaking, let's look at the miles driven per year of the American driver. Let's say in the population that we're looking at, the mean miles driven per year is 16,550, with a population standard deviation of 2,100.
Suppose we were interested in finding the number of miles driven per year in which there were only 1.5% of all observations below that value. How do we determine that figure? Well, we establish our probability of 1.5%. And we divide that by 100 to arrive at a probability equal to 0.015, which is the value we will look up in our z-table.
So as you can see here, in order to find that out, we scan through the numbers and we find the value that's equal to 0.015, which in this case, is going to be in the negative side of the z-table. And as you see here, we look at the columns and the rows, and we find the value of 0.015 happens to be with a z-value of negative 2.17. And so if we solve for that question mark, we're going to wind up with a z-value of 11,993 miles per year. Which is the value we arrive at in order to determine that only 1.5% of drivers drive fewer than that number of miles per year.
So suppose that we have a probability. In this case, we're looking for the value of 69.5%. And so once again, we divide this value by 100 to arrive at a probability of point 0.695. And when we scan the z-table to find that 0.695 value, it turns out that the z-value is equal to 0.51.
And so to solve for our value, we basically look at whatever the value is minus 16,550. We take the difference and divide it by 2,100 to get 0.51. And so if we do a little cross multiplication, we arrive at a value of 17,621 miles per year, in which case 69.5% of drivers would drive less than that.
So let's go back to our objective just to make sure we covered what we said we would. We were going to learn how to use a z-score probability table to determine the likelihood of a range of values of a normally distributed variable occurring. Which we did. We used the example of the ACT exam. And we used four different examples to determine how we would calculate the likelihood of a range of values occurring from a normally distributed population.
We also reversed it. We looked at the probability first, and then went and tried to figure out what the z-score would be to determine the value that we chose. And we did that with the cars in terms of how many miles per year Americans drove. So again, my name is Dan Laub. And hopefully, you got some value from this lesson.
Table for looking up the area starting at z=0 for a positive z-score.