Source: All graphs created by Dan Laub; Images of tree, PD, http://bit.ly/1Yrp8jQ; watering can, PD, http://bit.ly/1OiLhfy target, PD, http://bit.ly/1OGdEyC; SUV, PD, http://bit.ly/1PgcrAO; exam, PD, http://bit.ly/1Ja9ux7; calculator, PD, http://bit.ly/1O3yBHn; football, PD, http://bit.ly/1NHblvi; tape measure, PD, http://bit.ly/1UA84CF; men/women sign, PD, http://bit.ly/1mw6PsR; ribbon, PD, http://bit.ly/1Nye8sH; compass, PD, http://bit.ly/1MrhoCl; ice cream (1), PD, http://bit.ly/1TZLHFQ; ice cream (2), PD, http://bit.ly/1TZLUsx
[MUSIC PLAYING] [SCRIBBLING SOUND]
[MUSIC PLAYING]
Dan Laub here. And in this lesson, we're going to discuss determining the center of data. And before we get started with that, let's talk about a few of the objectives.
The first objective is to understand how the mean, median, and mode provide useful information about data. The second objective is to know how to determine what the mean, median, and mode are for a variety of different data types. So let's get started.
So just as a quick refresher, let's talk about the steps involved with the experimental method. In the fifth one, the one I want to discuss here, analyze the results of the test to determine what the results tell you about the cause-and-effect relationship between the two variables. And so let's use a simple example. Watering a tree.
So you go and you buy a new tree, and you plant in the yard. And you water it on a regular basis. So maybe you were thinking that the water's actually causing it to grow, and maybe you have a second tree where you decide not to water it as often to see if there's a difference between the two. Can you give a rough idea of what that step's actually about? Well, we're looking at the results to determine what they tell us about cause-and-effect relationships.
What I want to do in this lesson is discuss what are called "measures of center." When analyzing data, it's important to know what the center of a data is as it actually helps us to summarize and organize the data set so that we can convey something meaningful with that data. So there are three measures of center that we're going to look at-- mean, median, and mode. The mode is simply a value that occurs most often, the median is the middle value where half of the values are larger and half are smaller, and the mean is the average value for a set of data. And as we continue to move through this lesson, we'll cover all three measures in much more detail.
So let's talk about the mode. If one is provided a list of values, the mode is the value that occurs most often on that list. In order to find the mode, one must count how often each value occurs on that list. Now, it is possible that two or more different values occur the most often in which case all of them are modes. In the event that no values are repeated, well, then we simply don't have a mode.
So let's look at a simple example here. Let's say we decide to take a survey in a neighborhood, and we want to look at how many cars each household in the neighborhood has. And so we count up. In this case, there are going to 19 different households. And we simply go to the door and say, "How many cars do you have?" and record the number.
And in this case, you will clearly see that the mode in this case is two cars. Nine of the households have two cars whereas two of them don't have any. Four of them only have one car. Three of them actually have three, and then one household has four cars. The mode is clearly the value that stands out the most.
Now, let's discuss the median. If we were to be provided with a list of numbers, the median value is found by first arranging the numbers from smallest to largest and then finding the middle number in the list. If we happen to have an even number of values, well, then the median is simply found by adding the two middle numbers in the list and then dividing them by 2.
So to give you an example here, I have provided you with eight different numbers. And what we need to do here is simply go through and look at the eight, and so we could simply cross off the first four and we cross off the bottom four. And it turns out, in this particular case, that, well, the median is obviously between 16 and 20.
And so what we're doing in a case like this is we are simply figuring out what the average is between 16 and 20. And to do that, we add the two to come up with 36, divide it by 2. And in this situation, the median is going to be 18.
Now, if we move on to an example where we have odd number of observations, in this case, we have seven. And so what we simply do is cross off the bottom, cross off the top. Continue to strike those down. We wind up with the middle number of 11. And in this case, since we have an odd number-- three of the numbers are smaller, three are larger-- then our median is going to actually be 11 in this particular case.
Now, let's move on to an example that might be a little more complicated than just simply taking some numbers listed in order. Suppose that we have a classroom where the teacher has, in this case, 15 students. The teacher gave an exam, and the scores were listed as follows. So you clearly see the numbers in the red box here are not in any particular order.
In order to figure out what the median is, we need to sort the numbers from the point where they actually start with the smallest and go on to the largest. And when we do that, you'll see the list over here. So of those 15, how do we determine what the median is?
Well, we simply look at crossing the first one off, crossing the last, crossing the second one off, crossing the second-to-last, and so on and so forth, until we wind up with our median, which, in this case, is going to be 81. So the median score is 81-- meaning out of the 15 students that took the exam, seven finished worse than 81%, seven finished better than 81%.
Now, what if the teacher actually forgot that there was a 16th student? And that 16 student happened to get an 85. So we throw that in the stack of papers to grade. And now, we wind up with an even number of students. How do we determine the median in this case?
Well, we simply just reorder the numbers. And now, we have an 85 in there, and it turns out that we have eight observations above the median and eight below. And so what we need to do is figure out what's the average between that eighth and ninth number?
And in this case, the eighth number, as you see here, is 81. The ninth number is 84. And what we need to do is simply take the average of the two. And so if we add them together and divide by 2, we want up with an median, in this case, of 82.5%-- meaning half the class did better than 82.5%, half the class did worse.
And now, let's discuss the third measure of center and that would be the mean. The mean has the same meaning as average-- meaning that, given a list of numbers, the mean is found by adding up all the numbers and then dividing them by how many numbers there actually are.
The mean could be determined using a formula as follows. We add up all the x's-- so the sum of x-- divided by n, which is the number of observations or the number of numbers in the list. And this is how we wind up with the mean.
Now, if you wanted to determine the mean on a calculator, what you'd need to do first is add up all the numbers together. Record that number, and then divide by how many numbers there were in total. If the mean comes from a sample, well, it's going to be denoted by symbol called "x bar," which is nothing more than an x with a straight line right above it. If the mean comes from a population, however, we use a Greek letter, mu. In general, Greek letters are used as symbols for quantities that come from population whereas Latin letters are generally used for quantities that come from a sample.
So let's work through an example here about how we'd actually figure out the mean in a particular instance. And so in a case like this, let's say we have a football team and we want to figure out the average height-- or the mean height-- of players on the football team. And so we have a list here of all the football players in terms of their height in inches. So 72 inches would be 6 foot, 73 would be 6 foot 1, and so on and so forth. And we have here 11 different football players.
How would we figure out what that mean's going to be? Simple. We add up all the numbers. And if we do that, we simply work down the list and go 76 plus 77 plus 74, and so on until we get all 11.
Well, it turns out, in this case, the sum of x is 823. And then we would take that, divide it by the 11 observations, and wind up with a mean height of approximately 74.82 inches. So the average player on this team is just under 6 foot 3 inches tall.
Now, if this was simply a sample of a larger football team-- let's say, for instance, they had 45 players and we only decided to take a sample of 11-- this would be denoted by x bar. In other words, x bar would be equal to 74.82. If this was the entire team and this was the population mean, then we would denote that by looking at my.
The last thing I want to cover in this lesson is-- what happens when we're dealing with different types of data? So measures of center will provide us with important information about a data set since they do tell us where the middle is located or what a typical value in a data looks like. However, with nominal data, such as gender, the mode is the correct measure of center because, from nominal data, the only information we can use is how often each value occurs.
With regard to ordinal data-- such as first, second, third, and so on-- the median would be the correct measure of center. This is because ordinal data can put in order, and so one can find the middle value by arranging the data from smallest to largest. However, with ordinal data, one must be careful not to simply select the middle value of the numbers.
For interval or ratio data-- such as the direction on a compass, which would be an example of interval data, or stating that something is twice as large as something else, which would be ratio data-- either the median or the mean can be used as measures of center. The mean could be used because the data consists of numbers while the median could be used because we can arrange the numbers from smallest to largest.
And so in summary, let's talk about the objectives. Did we actually cover them? The first objective was to understand how mean, median, and mode provide useful information about data. And we went over all three, how do you determine each one of them, and also, how we work through examples to show how actual numbers can provide us with useful information.
And the second was to know how to determine the mean, median, and mode for a variety of different data types. And we worked through four additional data types-- nominal, ordinal, interval, and ratio. And so, again, my name is Dan Laub. And hopefully, you got some value from this lesson.
(0:00 - 0:35) Introduction
(0:36 - 1:20) The Experimental Method
(01:21 - 01:50) Measures of Center I
(01:51 - 02:47) Mode
(02:48 - 05:48) Median
(05:49 - 07:57) Mean
(07:58 - 09:04) Measures of Center II
(09:05 - 09:42) Conclusion
(Sum of x)/n
The middle value of a data set.
The most common value in a data set.