Or

4
Tutorials that teach
Five Number Summary and Boxplots

Take your pick:

Tutorial

This tutorial talks about the five number summary. The five number summary describes the data's center and spread. Now, as it would indicate, there are five numbers in this summary. And you have, in order, the minimum, the first quartile, the second quartile, the third quartile, and the maximum.

Now, the second quartile is often known as the median. It's the point where 50% of the data is below and 50% of the data is above. Now, the first quartile is the median, or the middle, between the minimum and second quartile. So that's the point that's 25% of the way from the minimum to the maximum. And on the same end, the third quartile is in the middle of the second quartile and the maximum, so 75% percent of the data is on-- its 75% of the way from the minimum to the maximum.

Now, these names, quartiles, are indicating those percents. So the first quartile indicates 25% of the data lies, the median is 50% of the data, and the third quartile is 75% of the data. And knowing those percentages can help us to interpret the five number summary and, in turn, interpret the data. So let's look at an example.

Here, we have the five number summary. I like to write out those five numbers and those five kind of headers before I start any problem. It helps me to remember what it is I'm looking for. Here's our example. It talks about the height, in inches, of third graders. So we have our data values listed out in order from smallest to largest. If your data has not already been organized in this way, then you need to organize it from smallest to largest before starting to find your five number summary. Otherwise, you're going to be doing the incorrect thing.

Now, the minimum and the maximum are really easy to pick out. So here, the minimum, 34. And the maximum, 50. And we can rewrite those numbers up above as well so that we remember we already found those values. Now, the median, there's a couple of methods of finding it. Again, I prefer the cross-off method. But since I want to be able to still see my numbers, I am just going to make little dots above. So 34, 50.

And then I'm just going to make sure that all my values are appearing before I keep going. And here, we're just finding the median. So if you prefer to use the find out how many values there are and then divide by 2, you certainly can. So here, we get left with two middle numbers. We get left with a 39 and the 40, so I need to find what's in the middle of them. So I need to find in the middle, and I know that is 39.5. So that number there is going to be our median. So our median is 39.5.

So now that we know what the median is, we can go on to find out the first quartile and the third quartile. So the first quartile is going to be all the data from the minimum up to the median. So everything from this 34 up to the 39. This 39.5 was not part of our data set, so we should not include it when we're trying to find the first quartile. And again, we're just going to do the cross-offs to find that middle number.

And again, I get left with two numbers-- 37 and 38. And in the middle of 37 and 38 is 37.5. So if I didn't know that, I would've done 37 plus 38 and divided by 2 to get 37.5. So that is our first quartile, 37.5. We're going to do the same thing to find the third quartile. We're going to find the middle of all the values from 40 up to 50-- and again, not including the median.

So I use the cross-off method, and we get left with two numbers-- the 41 and the 42. So in the middle of those two numbers is 41.5. And that is our third quartile, 41.5. So now we have our five number summary. The minimum is 34, the first quartile is 37.5, the median is 39.5, the third quartile is 41.5, and the maximum is 50. So for the median, 50% of the data is below 39.5. 50% of the data is above 39.5.

For the first quartile, 25% of the data is below, 75% of the data is above. The third quartile is reversed-- 75% of the data is below, and 25% of the data is above. This has been your tutorial on calculating the five number summary.

This tutorial talks about box-and-whisker plots. A box-and-whisker plot is also called a box plot. It refers to the same thing, just two different possible names. Now, a box-and-whisker plot is a display of the five-number summary. And as we learned before, the five-number summary includes the minimum, the first quartile, the median, the third quartile, and the maximum.

Now, when you have a five-number summary, one of the things you need to do is to mark a scale. Once you have the scale, you can plot the five values above it. So one line for minimum, one for quarter one, median, quarter three, and maximum. And then you're going to connect the first and third quartile with a box.

So if we just imagine that these are our five lines, and then we're going to connect the first quartile and the third quartile with a box. And it would obviously be better and straight. Then it says connect the minimum to the first quartile with a line, and the maximum to the third quartile with a line. So here is the general shape of your box-and-whisker plot.

Let's look at an example with numbers. Here, we have a five-number summary that we've previously calculated. I have my evenly spaced scale set up already on the bottom. So now I'm just going to come and draw in my box plot.

Now, the first thing I'm going to do is place a line at each value that I can see. So 34, we're going to have a line just before 35. That's 37.5 right around there. 39.5 for the median is just before [INAUDIBLE] Q3 is 41.5, and the maximum is on 50. Now I'm going to connect the first and third quartile, connect into a box, connect the minimum with a line, and connect the maximum with a line. So this is my box plot.

Now, you'll notice in here that the sections aren't evenly spaced. Each section, so from the minimum to the first quartile, contains 25% of the data. And then, again, from the first quartile to the median is 25%, from the median to the third quartile is 25%, and the third quartile to the maximum is 25%. So this is just showing us that this part here from the third quartile to the maximum, it still holds 25% of the data. That data is just a little bit more spread out.

Now, one thing that box-and-whisker plots are nice for is comparing two sets of data. So if we look here and we look at our traditional box plot and we're comparing the test scores for two groups, we can use our knowledge about the percentages to compare the test scores for Group 1 to the test scores for Group 2.

Here, we can see that 75% of Group 1 scored higher than about 50% of Group 2. So it's not that every single person in Group 1, just about 75% of that data. So that's one example of how you can use two box plots to draw comparisons, and that's part of why it's important to have a consistent scale to be able to do those comparisons.

This has been your tutorial on box-and-whisker plots.