Source: Female Symbol; Public Domain: http://www.clker.com/clipart-6436.html
In this tutorial, you're going to learn about how to calculate and interpret the Mean of a data set. Now, the word "mean" is often used interchangeably with the word "average." And I'll do that, too, from time to time. But there are several different things that we could call an average. And so the mean is the most common of those. And that's what we're going to use interchangeably with the word "average," as opposed to something else, like a median.
So for instance, we're going to start with a data set that shows how tall the players on the Chicago Bulls basketball team are. These are the players. And these are how tall they are. And my question is, what height would be considered a typical height for basketball players on this basketball team?
One way to find it out is the mean. And the mean is essentially the same thing as an average. It's what's typically referred to as the average. And it's found by adding up all the values together and dividing by how many there are. Notationally, it looks like this-- x1 plus x2 plus x3.
Now, the 1s, 2s, and 3s just means that they're the first number in the list and then the second number in the list, all the way up until the last number in the list. And then these two ends are the same thing. This is the nth value. It's the last value.
Now we're going to divide by however many values there were. So in this case, we're going to add up all of the players' heights and divide by 15 because there were 15 players, the result being 78.33 inches. That's the average height of a player on the Bulls.
To do something like this using technology, you can use a spreadsheet. And you just create your list of values like I have and say, equals average. And it will suggest some formulas for you.
And you say, average, parentheses. And it says, number 1, number 2, number 3. You can just grab all of it, highlight it, close the parentheses, and hit Enter. And sure enough, 78.33, the same number as we got last time is the number that the spreadsheet returns to us.
Here's an example of when the mean is a poor representation of where the center really is. So suppose that we have 12 employees. Eight of them are shift workers. Three of them are managers. And there's one Head Honcho. And apparently, the Head Honcho makes about $200,000. And the other workers make quite a bit less.
If I take the mean of the eight shift workers, the three managers, and the Head Honcho, all together, the average is over $58,000. Now, take a look. How many of the employees make more than $58,000? And how many make less than $58,000?
11 of our 12 people make less than $58,000. And only one makes more than that. And he makes substantially more. That doesn't really make a whole lot of sense to measure center, then. This $200,000 is an outlier. The Head Honcho's salary is an outlier in this data set. $200,000 was an outlier.
In the presence of outliers, which are very few high or very few very, very low values, the mean won't give an accurate representation of center. One note about an average-- notationally, there's a couple of accepted notations.
One uses the Greek letter, and it's pronounced "mew." And we're going to see this quite a lot. So the Greek letter, mu, is a notation for mean. The other is called "x-bar." And it's just simply an x with a bar over it-- or y-bar or whatever value you're using. Here, we were using heights. If we were calling that "h," we could make "h-bar."
One other thing to note is that sometimes we use a special notation that shortens up all of this summation. It's called "summation notation" or "sigma notation." This is the Greek letter, sigma. And I'll walk you through all of the parts to it.
This x subscripted i is just like the x1 and x2 and x3. What I'm saying here is add up all of the x's starting at the first one, where the i value is 1, so x1, and finishing at the nth one, where the subscript is n. So it's the last one. When that's all done, you divide by n. And so this compact formula is in fact the same as this fairly large, "lengthy to write out" formula.
'Last thing to remember is that sometimes not everything is weighted the same. Sometimes, like in a course, the exams are weighted for some, but then the final is weighted for more. So for instance, suppose you have a course where the first three tests are weighed the same. But then the final is weighed three times as much as the others. So how do you do this?
What we're going to do is we're essentially going to count each of the tests as one test. Except the 94, we're going to count it as three tests because it's weighed three times as much. So we multiply each of them by their weights.
And because we're counting these not as four tests, but essentially more like six test-- because this one counts for three-- we divide by 6 in the end. So this weighted average, or the weighted mean, is 87 and 1/2.
And so to recap, the mean is one measure of center that we can use. And it's what we mean by the word "average." Also, sometimes we use summation notation as a shortcut instead of writing the whole long string of added values. And then finally, weighted averages can be found by multiplying each value times its weight and counting it, essentially, that many times.
And so the terms we used were "mean," which is the same as average, "summation" or "sigma notation," and "weighted average." Good luck. And I'll see you next time.
Source: TABLES CREATED BY THE AUTHOR
In this tutorial, you're going to learn how to calculate and interpret a median of a data set. So suppose you have the players from the Chicago Bulls basketball team, and you wonder what would be a typical height for someone on this team. It looks like there's a lot of 81's, but would you call that typical?
What we can do is we can calculate a median. Now, a median is simply a measure of center for the data set that actually finds the middle value in a sorted list. So it's the middle number when the data set is arranged from least to greatest or greatest to least. It doesn't really matter.
Now, these numbers were when the players were sorted alphabetically, but we need it ordered from least to greatest. So we can reorder those numbers to look like this. And then all we do is simply cross off the lowest and highest number and continue working our way in until we get just one number left. That number is the median, 79. What you notice is that half the values in the list are at or below 79, and half the values in the list are at or above 79.
We can also use technology to figure out what the median of a data set is. These are the same heights, not ordered, and we can find the median anyway using technology. All we have to do is type into a cell equals median. It will give you some suggestions, and then it asks you to enter the numbers. You can select the numbers that you want to find the median of and simply hit Enter. And sure enough, it returns to us 79, just like the last time.
Now, suppose you had a class of 10 students and you had a 10-point quiz. And maybe these were also listed alphabetically. So the first student alphabetically scored a 10, and the last student alphabetically scored an 8. Again, we can't deal with this alphabetically. To find the median, we need to deal with it least to greatest. So we can reorder it to make it look like this.
And we do exactly the same thing. The only difference in this example is, oh, no, we have two middle values. In that case, all we do is average them. We're going to take the mean of 8 and 9. 8 plus 9 divided by 2, 8 and 1/2 is going to be our median.
Let's take a look and see how the median is affected by extreme values. Suppose that we had another 10-point quiz for a different class. And these were the scores in order. Well, one of these seems completely out of range. This was just a typo. Maybe our finger slipped and we hit 90 instead of 9. What you'll notice is that the median of this data set is actually seven, because that's the middle number. There are five less than that and five more than that.
Suppose we change our mistake and change it back to a 9 like it should have been. Well, you know what? The median is still seven. And you know what? The median is not all that affected by outliers or extreme values.
Another way to figure out a median would be if you have data summarized in a frequency table. This is the information about the number of days that the temperature was in a particular range in Chanhassen, Minnesota, in 2009. For instance, there were eight days that had a temperature of between zero and nine degrees Fahrenheit.
What we see here is there's actually a couple of different ways to find not exactly what the median temperature is, but we can figure out which bin it's in. So we can see that the 183rd day of the year falls in this category here. Now, what importance does that have?
Well, that means that there are 182 days that were as cold or colder than that particular day, and there are 182 days that were at least as warm as that particular day, which means that these are semi-ordered by temperature. And the 182 below and 182 above form the two halves, and the median is somewhere here in the 50s. We're not 100% sure 100% in the 50s it is, but we can be sure that it's in the 50s. Notice, the number 183, when you look at the cumulative frequency, falls between the 156 and the 202. By the time we've gotten here, we haven't accounted for half the days in terms of ordered temperatures, but we've counted for more than half the days by the time we finish the 50s, which means that the median is somewhere in the 50s.
If you look at the relative cumulative frequency column, you can see the same thing. By the time we have finished the 40s, we've only accounted for less than 43% of the data. By the time we finish the 50s, however, we will have accounted for over 55% of the data. So where's the 50th percentile? Where is it where 50% of the data falls at or below that value?
We don't know what the number is, but again, we know it's somewhere in the 50s. 50% of days fall in or below the 50s. So we're going to call the 50s the median class. We know the median is somewhere in there, but because of the way this data is presented, we don't know exactly what the median is, but we can tell that it's in this group of values, the 50s, and not anywhere else.
So to recap, the median identifies the middle number in a set of ordered data. If there's an even number of data values, we're going to take the mean of those two middle numbers. And if the data are in a frequency table, you can find the median class, but you can't find the median directly. So we talked about medians and median class. Good luck, and we'll see you next time.
Source: SOURCE: BIMODAL DISTRIBUTION GRAPH, CREATIVE COMMONS: HTTP://EN.WIKIPEDIA.ORG/WIKI/FILE:BIMODALANTS.PNG OTHER CHARTS AND GRAPHS CREATED BY THE AUTHOR
In this tutorial, you're going to learn how to interpret and calculate the mode of a data set. So suppose that we have the heights of the Chicago Bulls basketball team listed in this list right here. I want to know what height would be considered typical for this team. There's a couple of different ways we can go about it.
We could find the mean, which would be add all these numbers up and divide by 15. Or we could order them least to greatest and find the number that's in the middle. That would be the median. Another way to do it would be to look at the values that appear most frequently. The mode is the most frequently occurring value in a data set. In a quantitative data set like we have here, it's the most frequently occurring number or numbers, assuming that they do, in fact, occur more than once.
So 81 occurs here, here, here, and also here, but that's not the only number that occurs four times. In closer examination, we can see that 81 occurs four times and 79 occurs four times. Unlike with means and medians, a distribution can have more than one mode. In this case, 79 and 81 are both modes.
Let's do a practice problem. A class has 12 students, and the grades from a 10-point quiz are listed here. Determine the mode. It shouldn't take you long to realize that the mode is 9. It appears four times, and nothing else appears more than three times. 7 appears three times, but that's not any more than 9.
We can also have the mode of a qualitative data set. In a qualitative data set, the mode is the most frequently occurring category or the largest category. It looks like, in this example, that there are several large categories here and here. But the largest category is biology, so biology would be the mode of this data set.
And finally, one last thing to remember is that, in a distribution, when you have it actually graphed out, you might have something that's multi-peaked. So there's a gap in between here, not a full-on gap, but it does decrease very precipitously after this and rises again over here. In a distribution, we would call both of these areas modes.
This value over here at 5 and again over here near 8 would be considered modes because they're the different peaks in the distribution. It might still be called bimodal although, in reality, there's only the one mode here at 8. There's only one highest bar, although we might still call this distribution bimodal.
So to recap, the mode is the most common value in a data set. By value, we might mean category if it's qualitative or number if it's quantitative. There can be no mode if nothing appears more than one time, exactly one mode like we had in the quizzes example, or many modes like on the Chicago Bulls basketball team.
If no value appears more than once, again, there is no mode. And if several values appear an equal amount of plural times that are tied for the most, then they all are considered modes. So you can have more than one. And modes may also refer to the peak or peaks of a distribution even if they're not the tallest point in the distribution. If a distribution has many peaks, we might call them bimodal or multimodal. Good luck and we'll see you next time.