Source: Image of graph created by Ryan Backman
Hi. This tutorial covers the coefficient of determination. So let's start by just defining it. So the coefficient of determination, now that is abbreviated as r squared. And remember that r is the correlation coefficient, so the coefficient of determination is the correlation coefficient squared. So you take whatever r is, you square it to get the coefficient of determination.
And what it is, it's a measurement of how much of the variation in the response variable is due to the variation in the explanatory variable. And we'll make some sense of that definition in a minute. But generally what r squared-- and the way it's measured is it's usually measured as a proportion. So it's generally a number between-- or it's always a number between 0 and 1.
OK, so let's take a look at a scatter plot. And what this is a scatter plot of the arm spans and heights of 25 different people and they're measured both in centimeters. A lot of times you'll see, when you make a scatter plot, there'll be like a little squiggle here to show that this point here is not actually the origin. This graph actually starts at 150 and 150.
But what this tells you is this gives us a value of r and a value of r squared. Since this looks to be a pretty linear association, this 0.922 says that this is a strong positive association between arm span and height.
Now to interpret r squared, the 0.849-- OK, and really all you need to do is square r to get r squared. But to interpret r squared, is we can say now that 84.9% of the variation in height can be explained by the linear relationship between arm span and height. So if your a linear model fits the data very well, it's generally going to give you a high value of r squared. If your linear model does not fit the data, it will generally give you a lower value of r squared.
So now a couple of things about r squared, or the coefficient of determination, is that it can be found by squaring the correlation coefficient. Now you can also get the correlation coefficient by square rooting r squared. The only issue is that the correlation coefficient can be both positive and negative, so you'll have to look at the scatter plot to see if it's going to be a positive or negative value of r.
OK, so r squared, because it's a squared value, it can only be between 0 and 1. And the larger r squared, the stronger the linear relationship. So this is really why you calculate r squared. If r squared is very high, you know that a linear model is going to fit the data very well.
If you have a low value of r squared, maybe a curve is a better fit or maybe there's just not a very good association at all. But the higher that value of r squared, so the closer it is to 1, the stronger the linear relationship. So this has been the tutorial on the Coefficient of Determination. Thanks for watching.