Or

4
Tutorials that teach
Five Number Summary and Boxplots

Take your pick:

Tutorial

Hi. This tutorial covers the five number summary. Let's take a look at a situation. A local police department is interested in the speeds on a busy highway around the time of 9:00 PM. An officer measures the speed in miles per hour of 30 autos on this stretch of road. The data follows.

OK, so that would mean the 72 means that the officer measured somebody going 72 miles per hour, 74 miles per hour. And there are 30 total values there. So we'll come back to that data in a minute.

So this data can be organized and summarized in many different ways. One useful way is to report the five number summary. So the five number summary is a set of numbers that characterizes both the center and the spread of a data. So it's a pretty useful group of numbers.

So the five number summary, the five numbers consist of the minimum, the first quartile or Q1, the median, the third quartile or Q3, and the maximum of the data set. So we should be familiar with the minimum, just the smallest value. The maximum, the largest value. The median is the middle number or the average of the two middle numbers.

Now, the only thing we maybe don't know is the first quartile and the third quartile. So the first quartile, sometimes known as the lower quartile, is the value that approximately separates the lower 25% from the upper 75%. It's abbreviated as Q1. Another way to think about this is that the median will split the data into two halves. The first quartile splits the first half of the data in half, so it separates the lower quarter from the upper 3/4.

All right, now, the third quartile, or the upper quartile, is the value that approximately separates the lower 75% from the upper 25% of the data, abbreviated as Q3. So this is really the same as the median of the second half of the data. And then the median is known as the second quartile or the middle quartile, or Q2. So Q1 is first quartile. Q2 is the median. Q3 is the third quartile.

OK, so now let's actually find the five number summary for the speed data. OK, notice that I have put the data in order. That's always the first thing you need to do when you're finding the five number summary. Then what I like to do is-- so then I have the five number summary all listed out here-- minimum, Q1, median, Q3, maximum.

And the minimum and maximum are the easiest. So the minimum was 55. Maximum was 87, so very fast. And then what I like to do next is to find the median. So we know that there are 30 values here, so n is 30. So the median will be between the 15th and 16th values, because we have an even number of data values.

OK, so let's count in 15. So 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. So that means the median is between the 15th and 16th value there, OK? So our median is going to end up being the average of those two numbers, which is 66.5, so 66 plus 67 divided by 2.

OK, now, what the median does is it splits your data. What I like to do is to split it into the first half and the second half. So the first half goes from 55 to 66. Second half goes from 67 to 87. Now, the other thing to mention here is that if you only have one number in the middle, so if we had an odd number of data values, the median would not be included in the first half or the second half. So we would start halves on either side of that median.

OK, now, Q1-- Q1 is the median of the first half of the data. So we have 15 values in the first half of the data. So the median is going to be the eighth data value. So that'll give me seven values, then the eighth value, then another seven values for 15. So I'm going to count in and find the eighth value. So 1, 2, 3, 4, 5, 6, 7, 8. So the median of the first half, or Q1, is 60.

Now, to find Q3, we do the same thing, but for the second half of the data. So, again, I just count in and find the eighth value. So 1, 2, 3, 4, 5, 6, 7, 8. That ends up being 71. So 71 is Q3. So Q1 is 60, and Q3 is 71.

So, again, like we said before, the five number summary gives me both a way of measuring the center as well as the spread of the data. So we can see the median value is 66.5. So that's the speed that was right in the middle. We can also measure the spread. So we now know that the speeds ranged from 55 all the way to 87. We can see that they're spread out by-- the minimum and maximum differ by about 32 miles per hour, which is a pretty large range.

We can also see that in between Q1 and Q3, that's going to give you about half of your value, your values. So it's basically going to be the second fourth of the data and the third fourth of the data, so that makes up to 2/4, or 1/2. So half of the speeds were between 60 and 71. So that also gives us a good way to measure the spread of the data.

So this has been your tutorial on the five number summary, which are the numbers that I have listed right here. Thanks for watching.

Source: AUTOMOTIVE.COM

Hi. This tutorial covers a specific type of graph called the box and whisker plot. Let's start off with a situation. My wife and I are planning for our upcoming trip to Alaska and are interested in the gas prices up there. I was able to find current gas prices from 23 gas stations in Anchorage, Alaska. We're going to be spending some of our trip in Anchorage, Alaska, and we're going to be doing a lot of driving, so I was interested in these prices. And all these prices came off of a website called automotive.com.

So we can see they're all in order here, they range from $3.90-- these are all in dollars per gallon. And the most expensive gas was $4.12. So what I did next was found the five number summary for the gas data. So the minimum was $3.90. Maximum was $3.12. Median was $3.96. Q1 was $3.94. Q3 was $4.

So the five number summary, then, can be use can be used to make a common type of graph called the box and whisker plot. So before you make a box and whisker plot, you always need to find the five number summary. So the box and whisker plot, or just box plot, is a type of graph where a box is drawn from the first to the third quartile, and whiskers extend from the quartiles to the maximum and the minimum. So it extends from the third quartile to the maximum, and then another whisker goes from the first quartile down to the minimum.

So an axis should be always drawn prior to making a box plot. OK, so I'm going to make a box plot for the gas data. So I've reproduced the five number summary here. So what I want to do is, like I said, start by making an axis. So I'm going to draw my axis here. We're going to be making the box plot above the axis. So when you're drawing this on your own, make sure you leave some space above the axis.

And I need to scale my axis, so I know that my smallest value is $3.90, and I know my largest value is $4.12. So I would say a good scale would be by maybe nickel, so a five-cent scale. And then maybe I'd start at $3.85, and then I'd end at $4.15.

So I'm going to just start by making my tick marks. I'm making them a half inch apart. I like use a ruler anytime I'm making an axis. I don't know if I'll need all these tick marks, but that's what I'll start with. So this will be $3.85, $3.90. This would be $4. I'll just-- $4.10, $4.20. And I could go out here to $4.30, but I don't really need to.

OK, so this represents gas prices. And the units here are dollars per gallon. So now I'm going to go about making my box and whisker plot. So I said that the box portion of the box and whisker plot goes from Q1 to Q3. OK, so Q1 is at $3.94. So that's going to be slightly to the left of $3.95. So I'm going to make a vertical bar there. And Q3 is at $4, so I'm going to make a vertical bar here. And then I'm going to connect them to make a box or a rectangle.

Now, the median always goes through kind of the middle of the box, wherever it matches up. So the median is $3.96. So I'm going to draw a vertical line that represents the median right through this box. So that's my median. If your median happens to be the same as Q1 or Q3, then you don't draw on this line. It'll just be assumed that the median is right on top of either Q1 or Q3.

OK, now I need two whiskers. So I need a whisker going from Q1 down to the minimum. And I need a whisker going from Q3 out to the maximum. OK, so let's do the lower whisker first. So that's going to go out to $3.90. So I like to draw it coming right out of the middle of the box, kind of like so. And then I'll put a little hash mark here at the end, at the minimum of $3.90.

And now my maximum needs to go out all the way to $4.12. So that'll be a little past $4.10. And again, I'll make a little hash mark here to represent the minimum. OK, so that is my box and whisker plot for the gas data. Just to recap, the box goes from Q1 to Q3. The median goes on the inside of the box. And then your whisker goes from Q1 to the minimum. And then the upper whisker goes from Q3 to the maximum.

If it happens that your minimum and your Q1 are the same, you're not going to have a lower whisker, so it would just end up looking like that. If Q3 and your maximum were the same, then you're not going to have an upper whisker. So you would just leave that off. OK, so that is just how to make a box plot.

Now let's just interpret another box plot here. So we will also be visiting Fairbanks, Alaska, on the trip, another town in Alaska. A comparative box can be used to compare the prices of gas in Anchorage and Fairbanks. OK, so what I did is I produced a comparative box plot. Sometimes this is called a-- or it's just called a comparative box plot.

Notice, for Anchorage, that looks pretty similar to the one I constructed. It's a little thinner here. And on this, it placed the values of the five number summary above the box plot. Now, if you look at Fairbanks, Fairbanks made this box plot. You'll see that a median isn't represented, and the reason for that is because the median was the same as Q1. So there's really almost two lines right on top of each other there.

What's nice about a comparative box plot is that I can see right off the bat that it seems like there's a little more spread in the prices for Fairbanks gas. It goes out a little further. I might have to pay a little bit more. But the big thing that this tells me is the length of this box. I can see that there's much more variability in the middle 50% of prices than in Anchorage.

When I go to Anchorage, there's a 75% chance I'll be able to find gas for under $4 per gallon. For Fairbanks, there's a 75% chance that I'll be able to find gas under $4.10. That's really the big thing that it tells me, is that in Fairbanks, I might have a tougher time finding cheaper gas than I will in Anchorage. So that's just one thing I can tell from the comparative box plots.

So if you have one data set, you can always use the five number summary to make a box plot. You can also compare multiple box plots using a comparative box plot. You could also add in a third population if you wanted, a fourth population. Sometimes you'll also see where this axis is turned, so it's a vertical axis, and then the box plots go vertical. So they'll go top to bottom instead of left to right. So that has been the tutorial on box plots. Thanks for watching.