Source: Tables created by the author
In this tutorial, you're going to learn how to calculate and interpret a median of a data set. So suppose you have the players from the Chicago Bulls basketball team. And you wonder what would be a typical height for someone on this team. It looks like there's a lot of 81's, but would you call that typical?
What we can do is we can calculate a median. Now, a median is simply a measure of center for the data set that actually finds the middle value in a sorted list. So it's the middle number when the data set is arranged from least to greatest-- or greatest to least. It doesn't really matter.
Now these numbers were when the players were sorted alphabetically. But we need it ordered from least to greatest. So we can reorder those numbers to look like this. And then all we do is simply cross off the lowest and highest number and continue working our way in until we get just one number left. That number is the median-- 79.
What you notice is that half the values in the list are at or below 79, and half the values in the list are at or above 79. We can also use technology to figure out what the median of a data set is. These are the same heights, not ordered. And we can find the median, anyway, using technology.
All we have to do is type into a cell, "equals median." It will give you some suggestions. And then it asks you to enter the numbers. You can select the numbers that you want to find the median of and simply hit Enter. And sure enough, it returns to us 79, just like the last time.
Now suppose you had a class of 10 students and you had a 10-point quiz. And maybe these were also listed alphabetically. So the first student alphabetically scored a 10, and the last student alphabetically scored an 8. Again, we can't deal with this alphabetically. We need to find the median. We need to deal with it least to greatest.
So we can reorder to make it look like this. And we do exactly the same thing. The only difference in this example is, oh no, we have two middle values. In that case, all we do is average them. We're going to take the mean of 8 and 9. 8 plus 9 divided by 2. 8 and 1/2 is going to be our median.
Let's take a look and see how the median is affected by extreme values. Suppose that we had another 10-point quiz for a different class. And these were the scores, in order. Well, one of these seems completely out of range. This was just a typo. Maybe our finger slipped and we hit 90 instead of 9.
What you'll notice is that the median of this data set is actually 7 because that's the middle number. There are five less than that and five more than that. Suppose we change our mistake and change it back to a 9 like it should have been. Well, you know what, the median is still 7. And you know what, the median is not all that affected by outliers or extreme values.
Another way to figure out a median would be if you have data summarized in a frequency table. This is the information about the number of days that the temperature was in a particular range in Chanhassen, Minnesota in 2009. For instance, there were 8 days that had a temperature of between 0 and 9 degrees Fahrenheit.
What we see here is there's actually a couple of different ways to find not exactly what the median temperature is, but we can figure out which bin it's in. So we can see that the 183rd day of the year falls in this category here. Now what importance does that have? Well, that means that there are 182 days that were as cold or colder than that particular day. And there are 182 days that were at least as warm as that particular day, which means that these are semi-ordered by temperature. And the 182 below and 182 above form the two halves. And the median is somewhere here in the 50's.
We're not 100% sure exactly where in the 50's it is, but we can be sure that it's in the 50's. Notice the number 183, when you look at the cumulative frequency, falls between the 156 and the 202. By the time we've gotten here, we haven't accounted for half the days in terms of ordered temperatures. But we've accounted for more than half the days by the time we finish the 50's, which means that the median is somewhere in the 50's.
If you look at the relative cumulative frequency column, you can see the same thing. By the time we have finished the 40's, we've only accounted for less than 43% of the data. By the time we finish the 50's, however, we will have accounted for over 55% of the data.
So where's the 50th percentile? Where is it where 50% of the data falls at or below that value? We don't know what the number is, but again, we know it's somewhere in the 50's. 50% of days fall in or below the 50's. So we're going to call the 50's the median class. We know the median is somewhere in there. But because of the way this data is presented, we don't know exactly what the median is. But we can tell that it's in this group of values, the 50's, and not anywhere else.
So to recap. The median identifies the middle number in a set of ordered data. If there's an even number of data values, we're going to take the mean of those two middle numbers. And if the data are on a frequency table, you can find the median class, but you can't find the median directly.
So we talked about medians and median class. Good luck. And we'll see you next time.
The value that is in the "middle" of a data set when the set is arranged from least to greatest.
The bin that contains the median value. This is the most precise measurement we can obtain when we are looking at data that have already been categorized.