Source: Female Symbol; Public Domain: http://www.clker.com/clipart-6436.html
In this tutorial, you're going to learn about how to calculate and interpret the Mean of a data set. Now, the word "mean" is often used interchangeably with the word "average." And I'll do that, too, from time to time. But there are several different things that we could call an average. And so the mean is the most common of those. And that's what we're going to use interchangeably with the word "average," as opposed to something else, like a median.
So for instance, we're going to start with a data set that shows how tall the players on the Chicago Bulls basketball team are. These are the players. And these are how tall they are. And my question is, what height would be considered a typical height for basketball players on this basketball team?
One way to find it out is the mean. And the mean is essentially the same thing as an average. It's what's typically referred to as the average. And it's found by adding up all the values together and dividing by how many there are. Notationally, it looks like this-- x1 plus x2 plus x3.
Now, the 1s, 2s, and 3s just means that they're the first number in the list and then the second number in the list, all the way up until the last number in the list. And then these two ends are the same thing. This is the nth value. It's the last value.
Now we're going to divide by however many values there were. So in this case, we're going to add up all of the players' heights and divide by 15 because there were 15 players, the result being 78.33 inches. That's the average height of a player on the Bulls.
To do something like this using technology, you can use a spreadsheet. And you just create your list of values like I have and say, equals average. And it will suggest some formulas for you.
And you say, average, parentheses. And it says, number 1, number 2, number 3. You can just grab all of it, highlight it, close the parentheses, and hit Enter. And sure enough, 78.33, the same number as we got last time is the number that the spreadsheet returns to us.
Here's an example of when the mean is a poor representation of where the center really is. So suppose that we have 12 employees. Eight of them are shift workers. Three of them are managers. And there's one Head Honcho. And apparently, the Head Honcho makes about $200,000. And the other workers make quite a bit less.
If I take the mean of the eight shift workers, the three managers, and the Head Honcho, all together, the average is over $58,000. Now, take a look. How many of the employees make more than $58,000? And how many make less than $58,000?
11 of our 12 people make less than $58,000. And only one makes more than that. And he makes substantially more. That doesn't really make a whole lot of sense to measure center, then. This $200,000 is an outlier. The Head Honcho's salary is an outlier in this data set. $200,000 was an outlier.
In the presence of outliers, which are very few high or very few very, very low values, the mean won't give an accurate representation of center. One note about an average-- notationally, there's a couple of accepted notations.
One uses the Greek letter, and it's pronounced "mew." And we're going to see this quite a lot. So the Greek letter, mu, is a notation for mean. The other is called "x-bar." And it's just simply an x with a bar over it-- or y-bar or whatever value you're using. Here, we were using heights. If we were calling that "h," we could make "h-bar."
One other thing to note is that sometimes we use a special notation that shortens up all of this summation. It's called "summation notation" or "sigma notation." This is the Greek letter, sigma. And I'll walk you through all of the parts to it.
This x subscripted i is just like the x1 and x2 and x3. What I'm saying here is add up all of the x's starting at the first one, where the i value is 1, so x1, and finishing at the nth one, where the subscript is n. So it's the last one. When that's all done, you divide by n. And so this compact formula is in fact the same as this fairly large, "lengthy to write out" formula.
'Last thing to remember is that sometimes not everything is weighted the same. Sometimes, like in a course, the exams are weighted for some, but then the final is weighted for more. So for instance, suppose you have a course where the first three tests are weighed the same. But then the final is weighed three times as much as the others. So how do you do this?
What we're going to do is we're essentially going to count each of the tests as one test. Except the 94, we're going to count it as three tests because it's weighed three times as much. So we multiply each of them by their weights.
And because we're counting these not as four tests, but essentially more like six test-- because this one counts for three-- we divide by 6 in the end. So this weighted average, or the weighted mean, is 87 and 1/2.
And so to recap, the mean is one measure of center that we can use. And it's what we mean by the word "average." Also, sometimes we use summation notation as a shortcut instead of writing the whole long string of added values. And then finally, weighted averages can be found by multiplying each value times its weight and counting it, essentially, that many times.
And so the terms we used were "mean," which is the same as average, "summation" or "sigma notation," and "weighted average." Good luck. And I'll see you next time.
The "average" value of a data set. It is obtained by dividing the sum of the values by the number of values in the set.
A notation that uses the Greek letter sigma to state that values should be added together.
A way of calculating a mean when not all the values count for the same amount. Each value should be multiplied by its weight and added together, then divide the sum by the sum of the weights.