On October 21st, 2010, a newspaper no less venerable than the New York Times published an article with a headline reading: "Average College Debt Rose to $24,000 in 2009" (see here for the aticle). Wow! Students are really in for it after graduation aren't they? Gee - I can't believe it, the average debt is $24,000 dollars. But wait. Just what does "the average college debt" mean? Is the average debt the amount owed by most students? Or is it something else? This lesson intends to make sense of what people mean when they talk about "the average".
You should have had at least some prior exposure to averages in high school algebra when you learned about the mean, the median, and the mode. I fyou have not seen these concepts before then fear not! We will review them now.
The mean is calculated by adding up all the values in our set and then dividing by the number of values we added.
Example
If we want to find the mean of the numbers 3, 1, 2, 1, 8 we first add them up.
3+1+2+1+8 = 15
Then, we divide by 5 since that are 5 numbers.
15/5 = 3.
To find the median we first list the numbers in ascending order, and then choose the middle item on the list.
Example
Using the values above, we first write them in order:
1 1 2 3 8
Now we pick the value in the middle of the list:
2
This is found by counting the number of times each value appears in the list, and then picking the value that appears most often.
Example
Looking at our old list
1 1 2 3 8
we pick the number that occurs most often, so
1
because 1 appears twice and the rest only once.
Source: www.nytimes.com October 21, 2010 "Average College Debt Rose to $24,000 in 2009"
So, to find the median of a collection of numbers we just list the numbers in order and pick the middle one. But what do we do in a situation like this:
5 3 2 6 1 8 7 4
Listing these in order gives us
1 2 3 4 5 6 7 8
But there is no middle number here because the list has even length.
When this happens we choose the two middle numbers, in this case 4 and 5, and calculate their mean. So (4 + 5)/2 = 9/2 = 4.5.
Sometimes, when there are many data points and only a few values that they might take, we find that no single value is most common. Often there is a tie between two or more values. For example, consider the following list.
4 3 2 3 4 3 2 1 1 2 3 3 4 2 4 4 2
Each of the values 2, 3, and 4 appear five times each.
In cases like these, where there really isn't a clear choice for the mode, either any of the most common values will suffice, or none will, depending on the context. If such ties occur in statistical research they are most likely to be pointed out and might even be significant.
Returning to the article about student debt, we now see that the term "average" could take on one of several meanings. When printed in the news like this, "average" usually means "mean", and we just learned that "mean" is a non-resistant measure of central tendancy. The mean can be influenced by the presence of outliers in the data set so that if a couple of schools reported abnormally high or abnormally low student debt, the whole statistic about student debt would have been influenced. Lets see just how this might have happened.
The following listing provides fictitious data about ten different students' debts. The students are simply numbered from 1 to 10.
1) 9,243
2) 9,343
3) 10,554
4) 91,234
5) 96,115
6) 9,544
7) 8,332
8) 7,214
9) 7,895
10) 10,908
The mean of these numbers is found by adding them all up then dividing by 10, and it equals about 26,038 dollars. But when we look at the list we see no figures anywhere near $26,000 dollars. Instead we see that most debts range between $7,000 and $10,000 and that two extremely unlucky students have debts approaching $100,000.
What can we learn from this? Even though the average of these numbers really is $26,000 (and would be technically correct if reported in a newspaper article) we see that 80% of the students on the list actually owe less than half of this figure.
A better measure of central tendency in this case might be to choose the median. Aranging the numbers from smallest to largest gives us
1) 7,214
2) 7,895
3) 8,332
4) 9,243
5) 9,343
6) 9,544
7) 10,554
8) 10,908
9) 91,234
10) 96,115
Since the number of values is even, we pick the middle two and take their mean: ($9,243 + $9,343)/2 = $9,293. This value seems like a much better characterization of the "average" student debt.