Source: All tables and graphs created by Dan Laub. Image of car, PD, http://bit.ly/1J27jRx; Image of doctor, PD, http://bit.ly/1P5WBdM; Image of stethoscope, PD, http://bit.ly/1RB0TeE; Image of books, PD, http://bit.ly/1O5M3GF
[MUSIC PLAYING] Hi, Dan Laub here. And in this lesson, we're going to discuss representing how data can vary. But before we get started, let's discuss the objectives for this lesson.
The first objective is to be able to identify the symbols for sample variance and sample standard deviation. And the second objective is to know the correct formula for and be able to determine the sum of squares of samples. So let's get started.
In addition to finding the center of a data set, we may also be interested in finding a number that tells us how far the data is spread out from the mean. Similar to range in interquartile range, by measuring how spread of data is from the mean, we can determine a different measure of variability.
In the event that the variability in the data is small, then the mean will generally serve as a good estimate of a typical value in the data set. In the case that the variability in the data is large, the mean is generally not a good estimate of what a typical value in the data set is.
Two measures of variability that can be used are what are known as variance and standard deviation. For example, let's take a look at a sample of high school grade point averages drawn from a group of recent graduates. By knowing the standard deviation for this sample, which in this case is 0.26, and the sample mean, 2.9 for this specific sample, we can get a good sense for how the data is distributed.
By knowing the variability of a data set in terms of variance and standard deviation, we are generally able to determine what percentage of data falls within a certain range of values. We are better able to draw conclusions about our data when we have such information at our disposal.
If you recall what a normal distribution is, you can see in the graph shown here how the standard deviation illustrates the variability in the data set.
Looking at the example of the miles per gallons of new cars, we can see that the standard deviation provides us with an indication of how close the data is distributed to the mean. In a case like this, we could encounter a situation where the data has a large standard deviation if we were to include large vehicles and trucks in our data set.
On the other hand, if our data only consisted of small sedans, the standard deviation would likely be quite small. With a large standard deviation, the data will be spread out relatively far, whereas with a small standard deviation, the data would be much closer to the mean. Much like standard deviation, variance also helps determine how spread out data is from the mean.
So let's use the average waiting time at the doctor's office as an example. Suppose that one were to conduct an experiment aimed at establishing the length of time that patients had to wait to see two different doctors. Both doctors had a mean wait time of 18 minutes, but the variation in the data was significantly different.
The standard deviation of the wait time for Dr. Smith would be 6 minutes, whereas for Dr. Jones it would be 2 minutes. If we were to only look at the mean, we would risk drawing poor conclusions regarding the validity of the experiment, especially considering that variance takes into account how widespread the data may be.
Generally speaking, one would prefer a lower variance than a higher variance when trying to compare the results of two tests designed for learning more about a topic, as a lower variance indicates that the test may validate one another.
Remember that the population is the entire group of people a researcher is interested in. And it is typically very large. Therefore, it is much simpler to work with a sample instead and use the sample to interpret information about the population.
The variance and a standard deviation of a sample can be found by following the following steps. First, compute the mean, x bar, of the data set. Given a list of numbers, the mean is found by adding up all of the numbers and then dividing by how many numbers there are. This is expressed by the equation sum of x divided by n, where sum of x refers to the sum of all the numbers and n is how many numbers there are in the data set.
Next, subtract the mean from each number in the data set and square the difference. Now, add all of the values in the previous step. This quantity is called a sum of squares.
Now we divide the sum of squares by n minus 1. The total number of values there are in the data minus 1. The quantity we obtained is what we call the sample variance.
In the event that you were to work through this on a calculator, you must first add all the numbers and then divide. When the variance comes from a sample, it is denoted by the term s squared. The standard deviation is then the square root of the variance, or s.
Let's go through an example using the price of textbooks. And so what you see here in front of you is a sample of eight different observations. And we see in the left-hand column the total price of the textbooks for each individual book.
So it starts of 145, than 150, 165, all the way down to 200. The first step in this case would be to determine what the sum of these values actually is. And if we add them all up together we get 1,376. If we divide that by 8, we get a mean of 172.
The next step after that would be to subtract x bar, which is 172, from the actual values of x. And we see that here in the second column.
The next step would be to square the difference. So as you see here in the third column, when we take negative 27 to square it, we get 729. When we take negative 22 to square it, we get 484. When we do this for all 8 observations and add them all up together, we get a sum of squares of 2,340.
We can calculate the variance by taking that value and dividing it by n minus 1, or in this case 7. And we get a variance of 334.28 for this sample. And in the last step to determine the standard deviation, we take the square root of the sample variance, which in this case would be 18.28.
So let's go back to our objectives just to make sure we covered what we said we would. The first objective was to be able to identify the symbols for sample variance and sample standard deviation, which we did. We had s squared and s.
The second objective was to know the correct formula for and be able to determine the sum of squares of samples, which we did. We used the example of 8 different textbook. And we went through the steps necessary to calculate the sum of squares for that sample.
So again, my name is Dan Laub. And hopefully, you got some value from this lesson.
Provides an indication of how close the data is distributed to the mean
Takes into account how widespread the data may be