4 Tutorials that teach Finding the Least-squares Line
Take your pick:
Finding the Least-squares Line
Common Core: S.ID.6a S.ID.6c

Finding the Least-squares Line


This lesson will demonstrate finding the least-squares line.

See More

Try Our College Algebra Course. For FREE.

Sophia’s self-paced online courses are a great way to save time and money as you earn credits eligible for transfer to over 2,000 colleges and universities.*

Begin Free Trial
No credit card required

25 Sophia partners guarantee credit transfer.

226 Institutions have accepted or given pre-approval for credit transfer.

* The American Council on Education's College Credit Recommendation Service (ACE Credit®) has evaluated and recommended college credit for 20 of Sophia’s online courses. More than 2,000 colleges and universities consider ACE CREDIT recommendations in determining the applicability to their course and degree programs.


What's Covered

This tutorial is going to teach you how to find the least-squares line of a data set. This tutorial will specifically focus on:

  1. Finding the Least-Squares Line


Look at airfare prices for certain destinations for Minneapolis/St. Paul Airport.

Boston is 1,266 miles from St. Paul. And it has an airfare of $263, et cetera for the rest of these.

The scatter plot looks like this.

The mean of the miles in this data set is 882.4 and the average airfare was $234 per ticket. You can also find the standard deviation for the miles and for the airfare.

There are two key terms that are needed as we go through and find the equation of the least-squares line:

1. The point x-bar, y-bar is a point on the line. X-bar is the mean of the explanatory variable, the number of miles. And y-bar is the mean of the response variable, the distance traveled.

Terms to Know


The average x value for a sample


The average y value for a sample

Least-Square Line

A best-fit line that is found through a process of minimizing the sum of the squared residuals

2. The slope of the line is equal to the correlation times the standard deviation of the y values, the response, over the standard deviation of the explanatory variable. The standard deviation of the response variable is 68. Because airfare is the response variable. The standard deviation of the explanatory variable is right here, 393.

The only thing missing is the correlation coefficient. The correlation coefficient is easy enough to find. It's 0.794.

Once you have these 3 values, all you have to do is plug them in.


The slope is going to be 0.794 times 68 over 393. The result of that is 0.137. So what is that 0.137? That's the change in y, airfare in dollars, over a change in one of the miles. It's about 13.7 cents per mile.

Once it is graphed it does appear to go right through the pack of points like it's supposed to. This is the slope. And a point on the line was known to begin with. That's all the information needed to algebraically determine the equation of the line.

The equation of the line was airfare hat equals b0 plus b1 times miles.


We just found on that b1, the slope, is $0.137 per mile. We need to find b0, the only other constant. We don't know it, but we do know a value for miles and airfare hat currently. We know x-bar, y-bar, average number of miles, average value of airfare is going to be on the line.


You know the airfare is predicted to be 234 when the miles is 882.4. Substitute those numbers in temporarily for miles and airfare, solving the rest and get 113.11 equals b0. Put that all together.

You know the slope and the y-intercept.

Airfare hat equals 113.11 plus 0.137 times the number of miles traveled.

You can also use a spreadsheet.

Try It

In Excel, the easiest way to do this is to highlight your data and create a chart that is a scatter plot. When you do this, you have to actually right-click or control-click onto the data points themselves so the get highlighted. Click Add Trendline. Under Options, click Display Equation. Essentially it’s the same idea. So don't get too frustrated because technology can rescue you here. Especially for larger data sets, finding this by hand can be a pain.


Calculation of the least-squares line involves two key facts-- First, the point x bar, y bar-- mean of explanatory variable, mean of response-- is a point on the line, and second, that the slope is a calculable value from the correlation and the standard deviations that you have.

You learned about the least-squares line and calculating the least-squares line. And you used all of these values plus correlation in order to find it.

Good luck.

Source: This work adapted from Sophia Author Jonathan Osters.

  • Least-Squares Regression Line

    The line of best fit, according to the method of Least-Squares.

  • x-bar

    The mean of the explanatory variable.

  • y-bar

    The mean of the response variable.