First, please create an account

Already have a Sophia account?

Predictions from Best-Fit Lines

Author: Katherine Williams

Source: Image of graph created by Katherine Williams; Regresstion Line; Public Domain http://wikimediafoundation.org/wiki/File:World_record_10.000_m_graph.png

Video Chapters

( 00:00 - 00:35 ) Introduction to Context

( 00:36 - 01:36 ) Example: Calculation of a Predicted y-Value

( 01:37 - 02:04 ) Comparing Predicted and Actual Value

( 02:05 - 03:18 ) Example: Further Calculation of a Predicted y-Value

( 03:19 - 03:43 ) Definition of Extrapolation

( 03:44 - 05:06 ) Cautions about Extrapolation

Video Transcription

Download PDF

This tutorial covers how to make predictions from a line of best fit.

So if we have our linear equation for a line of best fit-- in this case, it's y equals 0.5563x plus 60.082. And we have a set of x values. We've decided what these values are because the x is the input, so we can choose what the values are. So I'm curious what the glucose level-- what the y value-- would be for someone with an x value-- an age-- of 43, 21, 25, 42, and 57.

Now, how can I turn those x values into y values? I would do that by using our equation. Now, this equation says we're going to take the slope, the 0.5563. We're going to multiply that by x, and then add on the intercept-- add on 60.082. And I'm going to do that for each x value to find out what its corresponding y value is. So for each age, I'm going to find out what the glucose level is.

So first, I would say y equals 0.5563 times 43, and then add on 60.082. When I do this in my calculator, I get a value of 99-- sorry, when I do this my calculator, I get value of 84. Now, in this case, I have some observed measures. I know, when I actually went out and found the 43-year-old, what their glucose level was.

And in this particular case, this 43-year-old had a glucose level of 99. So even though I'm predicting that the glucose level would be 84, it's still possible to find some people with something different-- to find someone with it being 99. Now, if I wanted it to do it for the person who's 21, I would say same pattern, y equals 0.5563. This time, the x is 21, so times 21 plus 60.082.

And then I would find that the value is 71.8. Now, let's think about whether or not that makes sense. Here, we had an x of 43, here, an x of 21. So here, we are in putting a slightly larger number when we are multiplying than here, so it would make sense that we'd get a slightly higher y value. Because the 43 was larger, this one should be larger because of the way our equation is setup.

Now, we could go through and do predicted values for every subject's age level. I am not going to calculate those all out, but I am going to write them up here. So if you want to do the practice, you can check your answers. So the predicted, we found 84, 71.8, 74.m 83.4, and 91.8.

Now, if we are kind of using our data to make predictions about values of the data outside of our range-- so if I wanted to find out about someone who is on the upper end or the lower end. So in our past example, if I wanted to find out about someone who had age of 0 or age of 99, what I would be doing there is called an extrapolation.

And we need to be a little bit careful on the extrapolate. Because if this model here-- this linear equation, this line of best fit-- really only applies to people between, say, 20 and 60, then if I'm extrapolating way outside of that range and doing it for someone who's 0 years old or who is 100 years old, then it might not make sense anymore.

Another example is this shows marathon times over time. So as history has gone on, people have gotten faster. If I keep extrapolating this out and these decreases, and I keep extrapolating out to find out how fast someone could run in 2099, I might find that perhaps someone could run a mile in 30 seconds or something like that. Depending on the way the equation was written, I could get a value that's kind of crazy for the data set I'm looking at.

So sometimes, when the data kind of has a leveling-off point like running times, where it's not going to ever make sense that we would run a mile in less than 30 seconds or so, then you'd have to really be careful and not do that extrapolation, because your data doesn't support it.

So this has been your tutorial on making predictions from best fit limes. You can use your linear equation in order to make predictions for within your data range. And when you're doing an extrapolation and thinking outside of the range of data you have, you need to be very careful that your results still make sense.

Terms to Know

Extrapolation: Using the regression line to make predictions outside the window for which the model was intended.