Hi. This tutorial covers the Standard Deviation. So it is important to be able to measure variation, or spread, of a data set. The four common measures of variation are range, interquartile range, variance, and standard deviation. Today, we're going to concentrate on variance and standard deviation.
Let's take a look at a data set. I really love drinking coffee, and I drink a lot of it. So I was interested in knowing if my colleagues drink about the same amount of coffee as I do. So I asked five of my colleagues how many cups of coffee they've had today. Here's the data I collected.
So let's do a couple of things with this data set first, right off the bat. So let's start by calculating the sample mean. OK? Since this is only a sample, we're going to use our sample statistics.
So let's start by adding up all of the values, taking the sum, so that ends up being 10, and then I'm going to divide by the sample size which is 5. So my sample mean here is 2. So on average, these five people drink two cups of coffee-- on average have drank two cups of coffee today.
All right. The next thing we're going to do is make a dot plot of this data. OK? So I'm going to start with a number line, and then I'm just going to go ahead mark my data values, so 2, 1, 5, 2, and 0. OK. So what I want to do is quantify, or measure, what the spread of this data set is.
So what I'm going to do is draw in what are called deviations. Deviations are always how far a data value is from the mean, and the mean here is 2. So what I'm going to do is just mark in a dashed line here where the mean is. So the mean is at 2.
So let's start with this value. This is the furthest one away. So let's draw in a deviation. So this deviation, the length of that deviation is 3 units, and since that's above the mean, that's going to be a deviation of positive 3 units. OK?
If we go over to 1, this number now is below the mean. So this person drank less than average. That has a deviation of negative 1. OK?
The deviation for 0, so let's draw our deviation here. This deviation is negative 2. OK? And then the deviations for these two people are obviously 0, because they drank the same as the mean.
All right. So one way you might want to think about is, well, what happens if I just wanted to find like the average deviation? So what I'd do is I'd add up all these deviations-- 3 three plus 0 plus 0 plus negative 1 plus negative 2. Well, if you add up those three numbers, you're going to get 0.
And actually, any time you try to find the sum of some deviations, you're going to get 0. OK? So that doesn't really provide us anything meaningful, because you'll always get 0. So we need another way, and we're actually going to use these deviations in calculating what's called the standard deviation.
All right. So to measure the spread of data, we can calculate the variance and the standard deviation. So the standard deviation is a measure of spread of the data about the mean, and then the variance is just simply the standard deviation squared. I will note that standard deviation is the preferred measure of variation, when you have a symmetric distribution or a distribution that's approximately symmetric.
All right. So let's take a look at formulas for both the sample variance and the sample standard deviation. All right. So for sample variance, again, this would be a formula for the variance of some sample data. So that gets the symbol s squared.
If we are looking for a population variance, it gets the Greek letter sigma. So it ends up being sigma squared, if we're talking about the variance of a population. OK? And then if we look at sample standard deviation, notice that this is just s, instead of s squared. So we can see that it's the variance formula but square rooted which means the population standard deviation is just sigma. OK?
So if we break down the variance formula, we can see that, in parentheses, we have x minus x bar. That's just your variation-- or excuse me, this is just your deviation. x minus x bar is your deviation, and then we're squaring the deviation.
Then, we're taking the sum of all of the deviations in the data set. So how far each of those data values are from the mean and then squared. We're summing them up, and then we're dividing by n minus 1. OK?
Variance is an easier calculation to do, because we don't have to do that last step that we have to do a standard deviation, that square rooting step. So variance is an easier thing to calculate than s, than your sample standard deviation, but your standard deviation is a more commonly used term. The other thing that we can see here is that s is going to have the same units as x.
So if I think back to my coffee example, the units on the standard deviation is going to be cups of coffee, just like the data set was. If we look at variance, variance, since we're squaring all of those deviations, the units on your variance, those are going to be cups of coffee squared. OK? So again, there's an issue in the units, when we're dealing with variance versus standard deviation.
All right. So let's take a look at an application of both the sample variance formula and the sample standard deviation formula. So what I'm going to do is start with my data set. So I'm going to make this as a table. I think it's really helpful to have your data values in a table, and now off to the side here, or down here, I'm going to write down my variance formula.
OK. So the first thing that we want to do is in our table, let's write down what our deviations were. So x minus x bar, those again are our deviations, and remember that x bar was equal to 2. OK? So we wanted to do so 2 minus 2 is 0. 1 minus 2 was negative 1.
These are the same values we had on our dot plot. 5 minus 2, positive 3. 2 minus 2 is 0. 0 minus 2 is negative 2.
OK. So that's basically all of our deviations, but we don't just want our deviations, because if we added all those up, that gives me 0. We want our squared deviations, and now, why would we square deviations? Because whenever you square something, you always get a positive number. That'll make all of these numbers positive, so that we can take the sum of a bunch of positive numbers.
So 0 squared is 0. Negative 1 squared is negative 1 times negative 1 which is positive 1. 3 times 3, 3 squared is 9. 0 time 0 is 0. Negative 2 times negative 2 is positive 4.
OK. So that basically gives me x minus x bar squared for all of my values. Now, what I want is I want the sum of x minus x bar squared. So I need to add up all these values, and that's pretty easy. 1 plus 9 plus 0 plus 4 ends up being 14. OK? So that's the number that I'm going to use on the top of my variance formula.
So s squared now is going to equal 14 over n minus 1. Well, n is 5, because we had 5 data values, minus 1, so that's 14 divided by 4. OK? So as a decimal, 14 divided by 4 ends up being 3.5. So this is your variance. Your variance is 3.5. OK? So that's your sample variance. OK?
And now to get our sample standard deviation, s, you simply take the square root of 3.5. OK? And if we do that on the calculator, the square root of 3.5 ends up being about 1.871. So s here is 1.871. So this value is our sample standard deviation. OK?
So once you find your variance, you can always just take the square root of your variance to get your standard deviation. OK. So that is your process in finding a standard deviation. Usually, a calculator or a computer can calculate sample standard deviation by hand, because this does take a little bit of work. But again, you can usually calculate those on a computer or a calculator.
OK. The last thing we need to do is just interpret that standard deviation. So again, the mean was 2. Sample standard deviation was about 1.871. So I would expect the number of cups of coffee a colleague drinks to be within 1.871, one standard deviation cups of the mean of 2 cups.
So we would expect them to be within about 1.8, 1.9 cups of the mean of two cups. So anything 1.87 above or below 2 is going to be considered typical. All right. Well, that's been your tutorial on the Standard Deviation. Thanks for watching.