Source: Graphs and tables created by Jonathan Osters
In this tutorial, we're going to explore positive correlation, and negative correlation.
So let's take a look. Correlation is going to allow us to observe the strength, and direction of a linear association between two quantitative variables. It's a number between negative 1 and positive 1, and anything-- any correlation coefficient between negative 0.5 and positive 0.5 is considered a weak association between the two quantitative variables.
Anything with an absolute value of between 0.5 and 0.8, so that's positive 0.5 to positive 0.8, or negative 0.5 to negative 0.8, is considered a moderate core-- moderately strong correlation .
And a very strong correlation would be anything nearer to 1. So positive 0.8 to positive 1, or negative 0.8 to negative 1. So correlation is going to be-- a positive correlation is going to be a tendency of the response variable to increase in response to an increase in the explanatory variable. And a negative correlation is going to be the tendency of the response variable to decrease in response to an increase in the explanatory variable.
So let's look visually at what this might look like. This is a correlation coefficient r of negative 0.99. Which means it's almost a perfectly straight linear relationship. And you can see that as the explanatory variable on the x-axis increases, that means that the response variable on the y-axis has a tendency to fall as you move to the right.
So here's a negative trend. You can see that it's negative, but it's not terribly strong, so this is an r of negative 0.5.
This is a relatively zero correlation. In a relatively zero correlation the points will appear to be a cloud. There's no discernible association between the explanatory and the response. Another way you can get a correlation coefficient of 0, is if all the response variables are the same. So if all the points lined up in a straight horizontal line, that would also give you a correlation coefficient of zero.
Here's a moderately strong positive association of 0.7. A strong positive association with the correlation coefficient of 0.9. And you'll notice that there's a huge difference between 0.99 in terms of strength, and 0.9, but this is still a strong positive correlation that we're seeing. And a weak positive association, you can sort of see the points rising as you go to the right, but it's not very strong. This is a weak positive association.
Now, one thing that's worth noting is that the numbers, like correlation, very rarely tell the entire story. So if you take a look at these two tables. And I've calculated the correlation coefficient for each of them, and it's 0.82 in both cases. And so based on that you might think that they look similar when they're graphed.
And the first one looks like this. So you can see it's a fairly strong positive association. Just like you would expect. But what about the other one? The other one looks like this. It's a strong association, but it's not linear. This follows a non-linear form.
So this is a nonlinear relationship that x and y have. It's not going to-- a line isn't going to model this accurately at all. So even though they have the same correlation coefficient, one has a line being a correct model for the data set, and the other does not.
So real quick one more example. Consider this data set. And if you see that the correlation is a number that is very, very low, near zero, you might assume, hey, there's no relationship between x and y in this case. And you would be wrong. Because if you graph the data you can see there's a clear non-linear trend in the data set.
The correlation coefficient only measures the strength of a linear relationship between x and y. Which is why it's important to always, always graph your data.
And so to recap. Correlation is a way to quantify the strength, and the direction of a linear association, or a linear relationship between two quantitative variables that lie on a scatter plot. A strong linear association will be a number near positive 1 or negative 1. And there are also moderate correlation coefficients, and weak correlation coefficients. Weak linear associations will have a correlation coefficient near zero.
Also a set of data might have low correlation, but strong nonlinear association. Always plot your data, and then you'll see the association first hand. So we talked about positive correlation versus negative correlation. We also talked about relatively zero correlation, and a non-linear relationship. How correlation doesn't really tell you the story there. Good luck, and we'll see you next time.