Source: All graphs created by Dan Laub, Image of fish, PD, http://bit.ly/1TmENdB; Image of bus, PD, http://bit.ly/1MDcQsq
Hi, Dan Laub here. And in this lesson, we want to discuss calculating z-scores. But before we do so, let's discuss the key objective for this lesson.
We want to understand what a z-score actually is, and be able to determine one if were given a mean and standard deviation of a normally distributed variable. So let's get started. Remember from previous lessons that normal distributions have bell-shaped curves. This curve represents the values of a variable and how they are distributed across a range.
In a normal distribution, the mean is the center of the distribution. While the standard deviation represents how spread out the observations are across the distribution. The key reason that we use a standard normal distribution is that doing so enables us to easily determine a probability associated with a specific value regarding any normal distribution.
Any standard normal distribution, the mean is always equal to 0, while the standard deviation is always equal to 1. A z-score is what we use to represent the values of the mean and standard deviation when using a standard normal distribution, with a z-score of 0 being equal to the mean and any other value being equivalent to how many standard deviations away from the mean a value lies.
The nice thing about a standard normal distribution is that it can be applied to any normal distribution. So long as we know the mean and standard deviation specific to that distribution. We do this by using the formula z-score is equal to the difference between the value and the mean, and we divide that quantity by the standard deviation.
With a normal distribution, as long as we know the mean and the standard deviation, whenever we are interested in a specific value of the variable, we could determine a z-score. When using the z-score formula, one must subtract the mean from the specific value of interest prior to dividing by the standard deviation.
Remember that population data comes from a larger realm of all values of a specific variable. While sample data comes from a smaller number of observations that are chosen by the researcher. While the mean and the standard deviation can be from either a sample or population when applied to the z-score formula, in this specific instance we will only look at data that comes from a sample.
And so there are a variety of different situations in which we might want to convert data into a z-score. One of which could be test scores. Maybe it's a standardized test, like the ACT or the SAT.
Or maybe it's just an exam that a particular class is taken. Maybe we're talking about something regarding biology and we're interested in measuring the size of a particular specimen of a species. And we're interested in whether or not it happens to be a relatively large specimen or a relatively small one-- z-scores would come in handy there as well.
And we could also look at something regarding say, economics, such as household income. And we get a sense for how the distribution of household income looks and the probabilities associated with whether or not a particular household falls in a relatively high range in terms of the household income or relatively low. By converting values of a variable in to z-scores, we were able to make certain that we adhere to a standard process for analyzing data.
Now, the graph you see in front of you represents what a z-distribution looks like. And you'll notice that you have individual number that represent how many standard deviations away from the mean a particular point lies. So when you look at the horizontal axis, one unit in this case is equal to one standard deviation of the z-distribution.
And you'll see here as we label the graph with the number of standard deviations, we are falling away from the mean. When we look at this graph of a normal distribution, it's helpful to think of the z-score as illustrating how many standard deviations a value lies above the mean, if the z-score happens to be positive. Or how many standard deviations a value lies below the mean, if the z-score happens to be negative.
So let's fit an example. Suppose that we were to look at the morning commute times for Americans. And that there are normally distributed with a mean of 25 minutes and a standard deviation of four minutes. The graph you see here illustrates this distribution, as well as where the mean is located, and the points that are on one standard deviation to the left and right of the mean.
The z-score for mean in this case is 0, while the z-score for 21 minutes is negative 1, and the z-score for 29 minutes is equal to one. Each of these values would be either one standard deviation to the left of the mean or one standard deviation to the right of the mean. If we add additional values to the graph corresponding to other quantities of standard deviations away from the mean, we can see the related z-scores.
13 minutes has a z-score of negative 3. 17 minutes has a z-score of negative 2. 33 minutes has a z-score of 2. And 37 minutes has a z-score of 3.
However, there are often values of a variable that don't directly correspond to a whole number of standard deviations away from the mean. For example, someone with a commute time of 16 minutes would equal a z-score of negative 2.25. And someone with a commute time of 39 minutes would equal a z-score of 3.5.
An example such as this, knowing a z-score can help us determine how likely it is that an event will occur, a topic that we will continue to elaborate on in another lesson. So let's revisit the objective for this lesson just to make sure we covered. We wanted to be able to understand what a z-score is and be able to determine one given a mean and standard deviation of a normally distributed variable. And we did that. So again, my name is Dan Laub. And hopefully you got some value from this lesson.
(0:00 - 0:33) Introduction
(0:34 - 3:22) Normal Distributions and Z-Scores
(3:23 - 4:05) Properties of a Z-Distribution Graph
(4:06 - 5:26) Calculating a Z-Score
(5:27 - 5:45) Conclusion
Indicates how many standard deviations away from the mean a value lies