Source: Creative Commons: http://commons.wikimedia.org/wiki/File:Fourboxplots.png
One measure of variation is range. Range is one of the simplest ways to calculate variation. You do it by subtracting the minimum from the maximum. So you do maximum minus minimum.
Now, the issue with range is, even though it's easily calculated, there is a downside. Because you're using the maximum and the minimum, you're including outliers. You're including these values that are way spread out from everything else. So your range is influenced by these values.
We'll go through a couple of examples. So here we have the height in inches of some third graders. So let's find the maximum and the minimum. The maximum is 42. That's the max. And the minimum is the lowest number, this 34. So in order to find the range, we're going to do the maximum minus the minimum, so 42 minus 34. And 42 minus 34 gets us 8. So here, our range is 8.
In our second example, we're going to use the same set, except we're going to add in a couple of really tall kids, some outliers. So now our maximum-- it's 50. And our minimum is still 34. Now we're going to do the same thing, the maximum minus the minimum. So 50 minus 34. And this time, it's going to be 16.
So in our second example, our range is 16. By adding that one outlier, or even those two outliers-- but it's just the biggest one that changes our answer-- we've spread out the range a lot, because it's including that outlier in the calculation.
Now, a variation on range is the interquartile range. With the interquartile range, we're still doing a measure of variation. We are, instead of doing the maximum minus the minimum, doing the third quartile minus the first quartile. By doing this, we're essentially finding the middle 50% of the data. Because we're looking at the middle 50%, we're not going to be as influenced by outliers, which is a good thing. Additionally, this is the preferred choice for when we have a really skewed data set.
First, we're going to go through an example. We need to make sure that our numbers are in order from smallest to largest so that we can accurately find the median, first quartile, and third quartile. The numbers are in order. So I'm going to go about finding the median. Crossing off until I get to that middle value. I get left with two numbers, but they're the same. It's 39 and 39. So our median is right here, and our median is 39.
Now I'm going to cross off the first half of the data in order to find the first quartile. So I'm using this chunk of numbers in here, everything below the median. And I have 37. So 37 is the first quartile. I'm going to repeat over here with the upper half of the data, everything above the median, to find the third quartile. So crossing off, we have a 40. So this is the third quartile.
So my first quartile is 37, my third quartile is 40. In order to find the interquartile range, I'm doing third quartile minus first quartile, so I'm doing 40 minus 47-- or sorry, 40 minus 37, to get 3. So our interquartile range is 3 in this example.
For skewed distributions, the IQR is preferred to the standard deviation as a measure of variation.
One interesting thing is you can see both the interquartile range and the range on a box plot. Here we have our maximum and our minimum. So this right here is our range. The same is true on the other box plot. Our maximum and our minimum. So this chunk in here shows us our range, how spread out that data is. Now, for interquartile range, it's third quartile to first quartile. Third quartile to first quartile, so this boxed area is showing us the interquartile range, showing us how far apart that middle 50% is spread out.
This has been your tutorial on range. You can find the range by doing the maximum minus the minimum. There's also interquartile range, which is the third quartile minus the first quartile. And that one is more resistant to outliers than just the regular range. Both of these can be seen on a box plot. Thank you.