This tutorial is going to teach you how to find the least-squares line of a data set. This tutorial will specifically focus on:
Look at airfare prices for certain destinations for Minneapolis/St. Paul Airport.
Boston is 1,266 miles from St. Paul. And it has an airfare of $263, et cetera for the rest of these.
The scatter plot looks like this.
The mean of the miles in this data set is 882.4 and the average airfare was $234 per ticket. You can also find the standard deviation for the miles and for the airfare.
There are two key terms that are needed as we go through and find the equation of the least-squares line:
1. The point x-bar, y-bar is a point on the line. X-bar is the mean of the explanatory variable, the number of miles. And y-bar is the mean of the response variable, the distance traveled.
X-bar
The average x value for a sample
Y-bar
The average y value for a sample
Least-Square Line
A best-fit line that is found through a process of minimizing the sum of the squared residuals
2. The slope of the line is equal to the correlation times the standard deviation of the y values, the response, over the standard deviation of the explanatory variable. The standard deviation of the response variable is 68. Because airfare is the response variable. The standard deviation of the explanatory variable is right here, 393.
The only thing missing is the correlation coefficient. The correlation coefficient is easy enough to find. It's 0.794.
Once you have these 3 values, all you have to do is plug them in.
The slope is going to be 0.794 times 68 over 393. The result of that is 0.137. So what is that 0.137? That's the change in y, airfare in dollars, over a change in one of the miles. It's about 13.7 cents per mile.
Once it is graphed it does appear to go right through the pack of points like it's supposed to. This is the slope. And a point on the line was known to begin with. That's all the information needed to algebraically determine the equation of the line.
The equation of the line was airfare hat equals b0 plus b1 times miles.
We just found on that b1, the slope, is $0.137 per mile. We need to find b0, the only other constant. We don't know it, but we do know a value for miles and airfare hat currently. We know x-bar, y-bar, average number of miles, average value of airfare is going to be on the line.
You know the airfare is predicted to be 234 when the miles is 882.4. Substitute those numbers in temporarily for miles and airfare, solving the rest and get 113.11 equals b0. Put that all together.
You know the slope and the y-intercept.
Airfare hat equals 113.11 plus 0.137 times the number of miles traveled.
You can also use a spreadsheet.
In Excel, the easiest way to do this is to highlight your data and create a chart that is a scatter plot. When you do this, you have to actually right-click or control-click onto the data points themselves so the get highlighted. Click Add Trendline. Under Options, click Display Equation. Essentially it’s the same idea. So don't get too frustrated because technology can rescue you here. Especially for larger data sets, finding this by hand can be a pain.
Calculation of the least-squares line involves two key facts-- First, the point x bar, y bar-- mean of explanatory variable, mean of response-- is a point on the line, and second, that the slope is a calculable value from the correlation and the standard deviations that you have.
You learned about the least-squares line and calculating the least-squares line. And you used all of these values plus correlation in order to find it.
Good luck.
Source: This work adapted from Sophia Author Jonathan Osters.
The mean of the response variable.
The mean of the explanatory variable.
The line of best fit, according to the method of Least-Squares.