+
4 Tutorials that teach Outliers and Influential Points
Take your pick:
Outliers and Influential Points
Common Core: 8.SP.1

Outliers and Influential Points

Description:

This lesson will explain outliers and influential points.

(more)
See More

Try Our College Algebra Course. For FREE.

Sophia’s self-paced online courses are a great way to save time and money as you earn credits eligible for transfer to over 2,000 colleges and universities.*

Begin Free Trial
No credit card required

25 Sophia partners guarantee credit transfer.

221 Institutions have accepted or given pre-approval for credit transfer.

* The American Council on Education's College Credit Recommendation Service (ACE Credit®) has evaluated and recommended college credit for 20 of Sophia’s online courses. More than 2,000 colleges and universities consider ACE CREDIT recommendations in determining the applicability to their course and degree programs.

Tutorial

What's Covered

This tutorial is going to teach you about outliers and influential points by discussing:

  1. Outliers
  2. Influential Points

1. OUTLIERS

You may have understood the term "outliers" when talking about univariate data. But in bivariate data, outliers are something a little bit different.

Term to Know

Outlier

Points that deviate substantially from the overall form of the remainder of the data points.


ExampleLet's take a look at these two data sets. One thing that you might realize is that the ones on the left seem all over the place, whereas the ones on the right, all the x's except one are 8, which might be a dead giveaway to somethin

But, they have the same mean for the x's-- 9. They have the same mean for the y's-- around 750. Their standard deviation for the x's is the same. Their standard deviation for the y's is the same at 203. And their correlations are the same at 0.816 in the positive direction.

Based on that information, one would think that these two graphs look pretty similar.


2. INFLUENTIAL POINTS

Both of the graphs above have what are called influential points that are changing a lot of the values.

Term to Know

Influential Points

An observation that, if removed, significantly changes a statistical measure


Usually the measure that we're talking about changing is correlation, but it also could affect other measurements as well-- the mean of x or y and the standard deviation of x or y.

For instance, the scatterplot above on the left with this point has a correlation coefficient of 0.816, versus without it, the correlation coefficient is 1. The points line up exactly on the line. Conversely, if we look over the one above on the right, this point is influential, and it changes all of these values substantially. With it, the mean of x is 9, the standard deviation of x is 3.3, and the correlation coefficient is 0.816. Without it, the mean of x becomes 8 because all the x-values are 8. The standard deviation is 9 because they never deviate from 8, and the correlation is 0. So it changes all of these measures very substantially by being there. That point is certainly influential.

The form of these points is a vertical line essentially, and with this point, it very much diminishes that.

ExampleThey might be an outlier in one direction but not the other. Both of these circled points are, in fact, outliers on the scatterplot because they don't fit the overall trend. If you look at this one, it's an outlier in the y-direction because it's so much higher than the other y-direction but not the x-direction.



This one is an outlier in the x-direction because it's so much further to the right of the other pack of points but not in the y-direction. If you look horizontally, it's sort of in the middle lower part of the y-direction. It's an outlier in one direction but not the other.

In this case, because neither of them fit the overall trend provided by the other points, both of them are outliers.


Now, an outlier might not be an outlier in either the x- or the y-direction so long as it doesn't fit the overall trend established by the rest of the data. Here it's a curve.

The point in the middle that doesn't fit that curve will be an outlier.

Some of those are influential, and some of those are not. Two of these are not going to have a great effect on the correlation or the least squares regression line that these data sets create. In these two cases, a line is an inappropriate model. But if you did make a line, having this point versus removing this point wouldn't affect that line or the correlation very much. The correlation wouldn't change very much because the correlation already is very near to zero. The point on the right, on the other hand, is influential. The correlation will increase from nearly zero without it to a positive correlation coefficient with it.



Summary

Important points on a scatterplot are influential points and also outliers. Influential points substantially change at least one statistical measure. Outliers simply are points that deviate from the overall form of the rest of the points. They may be outliers in the x- or y-direction, but don't have to be, according to this definition. Be aware that different people use different definitions of outliers for scatterplots so there's not one hard-and-fast definition.

Good luck!

Source: This work adapted from Sophia Author Jonathan Osters.

TERMS TO KNOW
  • Outlier

    In a scatter plot, an outlier is an observation that has an extreme x value, an extreme y value, both an extreme x and y, or is well away from the main trend of points.

  • Influential Point

    An observation that, if removed, significantly changes a statistical measure.