Source: Graph created by the author
This tutorial is going to answer the question which measure of center should I use? There are multiple measures of center. We've talked about the mean, the median, and the mode. Now the good news is that there's a default to that you should use if there's no real reason to use anything else. And it's the mean. The mean is the one we would like to use if we can. It's the most versatile measure of center. And it's the most appropriate one in the vast majority of cases.
However there are certain situations where the mean is not appropriate gauge for center. And we should use the median in those cases. We will hardly ever use the mode.
Here's an example of when the mean is a poor representation of where the center really is. So suppose that we have 12 employees. 8 of them are shift workers. 3 of them are managers. And there's 1 head honcho. And apparently the head honcho makes about $200 grand. And the other workers make quite a bit less. If I take the mean of the 8 shift workers, the 3 managers, and the head honcho altogether the average is over $58,000.
Now take a look. How many of the employees make more than $58,000 and how many make less than $58,000? 11 of our 12 people make less than $58,000. And only one makes more than that. And he makes substantially more. That doesn't really make a whole lot of sense to measure center then. This $200,000 is an outlier. The head honcho's salary is an outlier in this data set.
In this case a better measure of center would be the median. The median if you took all the salaries and wrote them out from least to greatest the one in the middle would be $42,000. And that more accurately describes what a typical worker makes.
To reemphasize this again $200,000 was an outlier. In the presence of outliers, which are very few high or very few very, very low values the mean won't give an accurate representation of center. And you should use the median in cases like those.
All right, so when do you use mode? We've talked about when to use mean and when to use median. When do you use mode? Well we don't use it all that often to be quite honest. It's used mainly for qualitative data sets, to determine the category that has the most values in it. So in this case the mode is biology. And that's really all we use it for unless we're describing the peak of a distribution, like a histogram.
And so to recap the mean is our default measure of center. It's the preferred one. It's the most versatile. However sometimes if we have outliers, or a few values that can pull the mean towards them either on the high side or the low side, the mean then won't accurately represent center anymore. And the median should be used. Typically we reserve the mode for qualitative distributions. Good luck. And we'll see you next time.
The average number in a quantitative data set; the sum of all the values, divided by the number of values.