Or

4
Tutorials that teach
Predictions from Best-Fit Lines

Take your pick:

Tutorial

Hi. This tutorial covers predictions from best fit lines. So let's start with some data. So the data below shows the airfare and distance to several destinations from the Minneapolis St. Paul airport. So we can see that there are 13 destinations. The distances here are measured in miles. These are all miles from the MSP airport. And all the airfare prices here are in dollars.

So we can see the scatter plot here, and then there's also a best fit line on the data. We can see that here is the equation for the best fit line y hat equals 0.1412x plus 229.58. So we're going to kind of investigate that equation a little bit today.

So I've reproduced the best fit line equation. So that's right here. This is a least squares equation. I've put it in the form b0 plus b1x. So let's use the equation to predict the airfare to Philadelphia. So Philadelphia was not one of the destinations on the data set. So we're going to use this equation to make a prediction for the airfare to Philadelphia. And it is 1,172 miles from MSP.

So what we're going to do is we're going to take this 1,172 and substitute it in for x there. So the calculation I'm going to end up having to do is 229.58 plus 0.4142 times 1,172.

So let's go to the calculator to come up with the predicted airfare. So 229.58 plus 0.4142 or point-- sorry-- 0.1412 times 1,172. Hit Enter there. And we end up with a predicted airfare of about $395.07, so $395.07, and put units on there. Those are in terms of dollars.

So it's always important when you make a prediction like that using the line is to make sure that that prediction is reasonable. And what I'm going to say is, yes, making this prediction is reasonable because 1,172 miles was well within the range of distance data that the regression equation was calculated from.

So if we look at that 1,172 in terms of the distance data, if we look at that, yeah, that's well within our range here. So we have some just in the hundreds, but then we have a couple here in the thousands. But it's well within-- we had data anywhere from-- what was the lowest-- 395 all the way up to over 3,000. So 1,172 is kind of right in that acceptable range. So if you're working within the data, it is reasonable to make a prediction like that.

So let's make another prediction now. So again, I have the same equation. So let's use this equation to predict the airfare to Beijing, China, which is 6,691 miles from the Minneapolis St. Paul airport. So again, we're going to do the same thing. We're going substitute in the distance in for x. This time, it's 6,691 miles.

Now, to type that in, we have 229.58 plus 0.1412 times 6,691. Hit Enter there. And we end up with a predicted airfare of about $1,174.35. And that's going to be in terms of dollars as units there.

So what I'm going to suggest now is that making this prediction is unreasonable because 6,691 miles was well outside the range of distance data that the regression equation was calculated from. If we're doing that, if we're working now outside of that data range-- so again, 6,000 is well outside the distance data that we collected-- because we're doing that, we're doing something called extrapolation.

So this is called extrapolation. And then just to provide a definition of extrapolation, it's the act of using a regression equation to make predictions outside the range of data used to find the line.

So results gathered through extrapolation should be interpreted with great caution or avoided altogether. So I would say there's a lot of different things that contribute to the air fare, especially when you're working outside of the United States.

So anytime you're going well outside of 3,000 miles in terms of a distance to a destination, it's probably not a good idea to use the equation that we came up with earlier just because if the data is calculated with values only between about 400 and 3,000, when you're looking at something like 6,600 miles away, that's going to be well outside of the data that that line has created for us.

We want to make sure that we're being really careful if we're doing any extrapolation. And if possible, we want to just try to avoid it altogether. So this has been the tutorial on predictions from best fit lines. Thanks for watching.