In this tutorial, you're going to learn mainly some cautions about using best-fit lines to make predictions. Specifically you will focus on:
The data here is the airfares for different city destinations for Minneapolis-Saint Paul Airport and how many miles away from Minneapolis-Saint Paul they are.
You figure miles is the explanatory variable because you think that things that are further away should cost more to get to, based on gasoline, etc.
The regression equation is predicted airfare is equal to 113.11 plus 0.137 times miles.
It's a pretty straightforward question asking what the predicted airfare would be for a flight from Atlanta from Minneapolis. The distance is 1,064 miles. Simply put this into the regression equation and you get about $259. The biggest question here, though, is how confident are you in that prediction. How confident are you that that prediction is close to what it actually cost to get to Atlanta?
Look back at the data.
The data, or your linear model, is based on data that had distances that were less than the distance to Atlanta and also more than the distance to Atlanta. It seems to make sense to use that model to predict what the cost would be for a place that is 1,064 miles from Minneapolis.
What about the predicted airfare for a flight to Anchorage at a distance of 3,163 miles from Minneapolis?
In this case, the prediction differs largely from what the actual airfare ends up being. You can't really use the prediction equation to predict what the airfare to Anchorage would be because it's so far out of the bounds of the data that you use to actually create the model.
The range of miles that used to actually create your model was from about 400 miles away from Minneapolis to about 1,300 miles away from Minneapolis.
Charleston was the longest distance away, and Chicago was the shortest distance away. What you're saying is that within the window of 400 to about 1,300, the line gives reasonable predictions for airfare.
Outside of that window, though, it might not. So you have to be very cautious about using this prediction line to predict the airfare to Milwaukee, which is closer than 400 miles to Minneapolis, or to a place like Anchorage. It might not give accurate predictions outside of this particular window.
The whole idea of that is called extrapolation. It's using the linear model to make predictions outside the range of values for which the estimate was intended, which is this window. It's not always bad to extrapolate. Sometimes linear trends do continue outside of the window from the data that made them, just not always.
Extrapolation
Using the regression line to make predictions outside the window for which the model was intended.
Proceed with caution if you do end up extrapolating data, it is risky. You're trusting the linear model to continue outside of the bounds that created the model itself. Using the linear model to try and predict outside those bounds might be an unwise decision.
Men's Olympic gold medal 100 meter dash times have decreased at a rate of about 100th of a second per year for the last 60 years. Someone who extrapolates might say: if this trend continues, then about in 1,000 years, there will be a person whose gold medal sprint time is zero seconds.
You know that that's nonsense. You can't use this line to predict what might happen even 100 years down the road, much less 1,000 years down the road. Extrapolation might not be a good idea, especially with this particular data set. Extrapolation can lead to nonsense results.
A linear model is a reasonable predictor of response values. In your example, it was airfare values within the range of values of the explanatory variable, within the range that created it. In the 400 to 1,300 mile range that created our airfare graph. Using that model to predict responses for values outside that range is called extrapolation. It might not be a good idea, it's not always a bad idea, but a lot of the times t's risky. You should be aware of that.
Sometimes it gives you values that don't make any practical sense. Extrapolation is the reason why sometimes the y-intercept of a least-squares line doesn't have a meaningful interpretation. In your example, the y-intercept was about $113, which means that if you go nowhere from Minneapolis, you pay $113, or at least you're predicted to. That makes no sense because zero isn't near the values that created the line. So using zero miles to predict an airfare would be extrapolation. It doesn't have any meaningful interpretation in that particular context.
Good luck.
Source: This work adapted from Sophia Author Jonathan Osters.
Using the regression line to make predictions outside the window for which the model was intended.