The five number summary takes larger data sets and makes them more manageable and easier to understand. By breaking down large data sets from many numbers to just five, this method can help to summarize the center and variability.
The five number summary consists of five parts:
Two of the numbers in the five number summary are the smallest and largest--the minimum and the maximum.
EXAMPLE
Suppose you have a list of the heights of the Chicago Bulls basketball team:Height of Chicago Bulls Players | |
---|---|
Omer Asik | 84 |
Carlos Boozer | 81 |
Ronnie Brewer | 79 |
Jimmy Butler | 79 |
Luol Deng | 81 |
Taj Gibson | 81 |
Richard Hamilton | 79 |
Mike James | 74 |
Kyle Korver | 79 |
John Lucas III | 71 |
Joakim Noah | 83 |
Derrick Rose | 75 |
Brian Scalabrine | 81 |
Marquis Teague | 74 |
C.J. Watson | 74 |
The median measures the center of a data set; it's the middle of an ordered set of data. Currently, this is alphabetical by the last name, so it needs to be rearranged from least to greatest height order. We can then see that the middle number, 79, is the median.
71 | 74 | 74 | 74 | 75 | 79 | 79 | 79 | 79 | 81 | 81 | 81 | 81 | 83 | 84 |
Median |
Dividing at that point, you are left with two groups: a low group and a high group. Next, take the median of each of those data sets. Now you have 74 in the low group, 81 in the high group, and 79 in the middle.
71 | 74 | 74 | 74 | 75 | 79 | 79 | 79 | 79 | 81 | 81 | 81 | 81 | 83 | 84 |
Q1 |
Median Q2 |
Q3 |
In this data set, 74 is the first quartile, 79 is the second quartile or the median, and 81 is the third quartile.
Now, the five number summary consists of the following five numbers.
71 | 74 | 74 | 74 | 75 | 79 | 79 | 79 | 79 | 81 | 81 | 81 | 81 | 83 | 84 |
Minimum | ~25% | Q1 | ~25% | Median | ~25% | Q3 | ~25% | Maximum |
You'll notice that:
Also, you can see where a concentration of data values lie within the data set. For instance, there are more data values in a narrower range. There are the same amount of data values between 79 and 81, as there are between 74 and 79. Although it's the same number of data values, the range of the 79 to 81 band is narrower than the 74 to 79 band. Therefore, you can tell the data are more clustered together in the 79 to 81 band versus the 74 to 79 band.
Boxplots are also sometimes called box-and-whisker plots. A boxplot is a way to graphically display the five number summary for a data set. It is composed of a box, which contains the middle 50% of the values, and whiskers, which extend out to the maximum and minimum values.
To create a boxplot, following the simple steps below:
Step 1: Draw an axis. It can be horizontal or vertical.
Step 2: Scale the axis with equal increments.
Step 3: Make a mark to identify the five numbers from the five number summary.
Step 4: Draw a box from the first quartile to the third quartile. Draw a whisker from Q1 to the minimum and from Q3 to the maximum.
EXAMPLE
Refer to the chart above of the heights of the Chicago Bulls basketball team. Recall that the five number summary consists of:So, how do you put this information into a boxplot?
Step 1: Draw an axis. It can be horizontal or vertical.
Step 2: Scale the axis with equal increments. Here, the graph includes the lowest number, 71, to the tallest number, 84.
Step 3: Make a mark to identify the five numbers from the five number summary: 71, 74, 79, 81, and 84.
Step 4: Draw a box from the first quartile to the third quartile. The box shows where the middle 50% of the data lies. Then, about 25% percent of the data falls in the "whisker" to the left side, and about 25% of the data falls in a "whisker" to the right side. This is why it's sometimes called a box-and-whisker plot.
You can use boxplots to compare two distributions. For instance, if you were talking about the heights of girls versus boys, you might be able to compare them by saying the spread, or the variation, with the girls, is much less than the variation with boys.
You can see this variation not only in the width of the boxes but also in the total width from the minimum to the maximum in each of these two data sets. Therefore, you can use boxplots as sort of a summary distribution for the boys and the girls.
Source: Adapted from Sophia tutorial by Jonathan Osters.