Source: All graphs created by Dan Laub. Image of monitor, PD, http://bit.ly/1lN2Urn; Image of woman, PD, http://bit.ly/1I9lLH4; Image of graduate, PD, http://bit.ly/1NTA7Mi; Image of student, PD, http://bit.ly/1NvFJbY; Image of tape measure, PD, http://bit.ly/1UA84CF; Image of monitor, PD, http://bit.ly/1P7kJLe; Image of money bag, PD, http://bit.ly/1RSaTQ3; Image of golf ball, PD, http://bit.ly/1JceXmR; Image of fuel pump, PD, http://bit.ly/1O8GL30; Image of scale, PD, http://bit.ly/1ISw0Q8; Image of ice cream, PD, http://bit.ly/1YkTbEu
[MUSIC PLAYING] Hi. Dan Laub here. In this lesson, I want to discuss using data to identify a relationship between variables. And before we do so, let's discuss the objectives for this particular lesson.
The first objective is to be able to match a scatterplot with a value of a correlation coefficient. The second objective is to identify the direction and strength of the relationship of variables by looking at a scatterplot. So let's get started.
Sometimes, when researchers are presented with data, they are interested in whether the data shows a cause and effect relationship between two variables. Often, when representing how two variables may be related to one another, a scatterplot is used. Now, recall that scatterplots are used for interval or ratio variables. And if you remember, interval variables are variables using a numerical scale so that the difference between two values can be measured and the difference between any two values can always be determined the same way. Ratio variables, on the other hand, are interval variables where the only difference is that a value of zero does mean that something does not exist.
So for each observation, two numbers are recorded. The first number for the first variable and the second number for the second variable, as you can see the scatterplot picture here. There are variables that are related in every aspect of nature, such as the amount of exercise a person engages in and their resting heart rate.
In the event that two variables are related, they are said to be correlated. The two simplest ways that two variables can be correlated is, as one of them increases, the other increases. Or if one variable increases, the other would decrease. If the second variable increases when the first one increases, the scatterplot shows an upward trend-- meaning that the points in the scatterplot increase from left to right. An upward trend like this is referred to as a "positive association between variables."
So as an example, let's look at the level of education that a person has and their income. Intuitively, we would expect this to be a positive association. And as you can see, based upon the scatterplot and the trend line on this graph in front of you, it is.
Now, on the other hand, if the second variable decreases as the first one increases, this can be seen in the scatterplot is a downward trend where the points of the scatterplot fall from left to right. This downward trend is called a "negative association between the variables," and we see that here in our second example. The number of absences a student has and their grade point average.
And you can clearly see, as we would expect, that's a negative relationship. Meaning what? Meaning the less often a student is in class or the more absences they have, the more likely their grades are to suffer.
Generally speaking, it is important to identify trends in data because they can help us establish a relationship between variables, especially when looking at them visually using a scatterplot. For example, if we were to look at the scatterplot that showed a person's age and their annual health care expenditures, we would likely get a sense that variables such as these have a positive association. Meaning what? Well, older people would typically have more health issues. And as a result, they will probably have greater health care expenditures.
By recognizing trends in data, we might also be able to better predict what could happen in a specific scenario. For example, if an independent variable increases and we know that the independent variables are associated negatively, we could predict that the dependent variable would decrease as well.
So as an example here, let's look at the height of girls based upon their age. So suppose we took a sample and we had 40 girls, and we were interested in determining what their height was relative to their age. Now, intuitively, we would expect that to be a positive association.
And so if you can see the scatterplot you see pictured here, clearly, there's an upward trend. And then each data point you see here represents two different values-- one, their age in terms of how old they are in years and the second one is their height in inches. And you notice that the upper trend here follows this particular line. It's a relatively strong trend.
As another example, let's find something that might have a negative association. And let's say we're looking at the number of hours of television that a student watches per week and their grade point average here. And notice that there's a downward trend, as we might expect.
And each data point here represents two measurements-- one is the hours of television they watch; the other one is their grade point average. And you can clearly see what this means is that, the more television a student watches, probably the less they're studying. And as a result, their grades tend to go down and it tends to follow the line you see illustrated right here.
A numerical value is used to indicate if there is an upward or downward trend in a scatterplot, and this value is called a "correlation coefficient." A correlation coefficient also indicates how well the data on a scatterplot follows a straight line. The correlation coefficient is denoted by the symbol r and is a number that always lies between negative 1 and 1. If the correlation coefficient is 0, then there is no upward or downward trend in the scatterplot and the line is either a flat horizontal line or the data may be so scattered that it does not follow any noticeable pattern.
When the correlation coefficient is positive, there is an upward trend in the scatterplot. And when the correlation coefficient is negative, there is a downward trend in the scatterplot. If the correlation coefficient is positive and near 1, an upward trend exists in the scatterplot that follows a straight line. If the correlation coefficient is negative and near negative 1, there exists a downward trend in the scatterplot and that would closely follow a straight line as well.
The sign of the correlation coefficient-- in other words, knowing whether or not it is positive or negative-- illustrates the direction of the trend or association between the two variables. The proximity of this correlation coefficient to 1 or negative 1 reveals the strength of the trend or association between these two variables. When the correlation coefficient is positive and close to 1, there is a strong positive association between the two variables.
When the correlation coefficient is negative and nearer to negative 1, there is a strong negative association between the variables. If the correlation coefficient happens to be positive but closer to 0, there would be a weak positive association between the variables. And if the correlation coefficient is negative but closer to 0, there would be a weak negative association between the two variables.
So let's use a few examples here just to illustrate these concepts. The first scatterplot you see here is illustrating the relationship between a person's income and how much money they have saved for retirement. And notice here that you'll see the particular value listed here as the correlation coefficient. So the correlation coefficient, in this case, is equal to a 0.945, which is very strong-- meaning it's positive, as we would expect. The more money somebody earns, in all likelihood, the more they're able to save for retirement-- meaning there's a very, very strong association between those two variables.
Now, let's look at another scatterplot. This one illustrates something completely different. Say we're dealing with the number of hours per month that a golfer practices their swing and their average score.
And in this case, as you see with the scatterplot, not much of a downward association here, although it is slight. And the correlation coefficient, in this case, is equal to negative 0.169. Which tells us what?
Well, we would expect that the more somebody practices, the better their golf score is. And a better golf score would be a lower score. However, it's not a very strong association.
The third example I want to show you would be the weight of cars and the miles per gallon that they receive. And we would expect this one intuitively to be a very strong negative association. Which means what?
A heavier your vehicle would probably get worse gas mileage, which would be a lower miles per gallon figure. And as you see by the scatterplot in front of you, there's a downward sloping line and the correlation coefficient, in this case, is equal to a negative 0.845. And since that value is close to negative 1, we can tell that it's a negative association that is relatively strong.
And the fourth example I want to show you would be how much somebody weighs relative to how much ice cream they eat in a month. So let's say that we have a group of people here, and we asked them how many ice cream cones they consume in a month and we also ask them what their weight is. Now, maybe we would expect there to be a positive correlation here-- meaning that, obviously, you eat a high-calorie foods like ice cream, you might be more prone to gain weight.
And as you see from the scatterplot here, there is a positive association, but it's a very weak one. As a matter of fact, the correlation coefficient, in this case, is equal to 0.30, which tells us it's relatively weak simply because it's closer to 0 than it would be to 1.
And so let's discuss the objectives for this lesson again, just to make sure we cover what we said we were going to. The first was to be able to match a scatterplot with the value of a correlation coefficient, and so we did this. We could tell whether or not we have a negative value or a positive value based upon which direction the trend goes, and we could tell whether or not it's a relatively strong relationship by looking at the direction as well as the strength of this particular scatterplot as well. And we did so by looking at correlation coefficients.
So again, my name is Dan Laub. And hopefully, you got some value from this lesson.
(0:00 – 0:39) Introduction
(0:40 – 2:07) Scatterplots
(2:08 – 4:49) Scatterplot Examples
(4:50 – 6:36) Correlation Coefficients
(6:37 – 9:01) Correlation Coefficients on Scatterplots
(9:02 – 9:34) Conclusion
A number that indicates whether there is an upward trend or a downward trend in the scatterplot, and also how well the trend follows a straight line.