+
Data Analysis

Data Analysis

Description:

This lesson will introduce data analysis.

(more)
See More

Try Our College Algebra Course. For FREE.

Sophia’s self-paced online courses are a great way to save time and money as you earn credits eligible for transfer to over 2,000 colleges and universities.*

Begin Free Trial
No credit card required

25 Sophia partners guarantee credit transfer.

221 Institutions have accepted or given pre-approval for credit transfer.

* The American Council on Education's College Credit Recommendation Service (ACE Credit®) has evaluated and recommended college credit for 20 of Sophia’s online courses. More than 2,000 colleges and universities consider ACE CREDIT recommendations in determining the applicability to their course and degree programs.

Tutorial

What's Covered

This tutorial is going to cover data analysis. You will learn about:

  1. Data Analysis
  2. Shape
  3. Center
  4. Spread
  5. Outliers

1. Data Analysis

Data analysis is what we do once we've collected our data.

Term to Know

Data Analysis

The understanding of the key features of a set of data - shape, center, spread, and outliers.

In this lesson, we will look at data analysis to identify those trends or key features. There are four components of data analysis that are key:

  1. The shape that a distribution will have
  2. The center of that distribution
  3. The spread of the data
  4. Any outliers in the data

2. Shape

Shape is sort of a qualitative notion telling us where most of the points lie in the distribution.

Term to Know

Shape

The qualitative description of the clustering of data points in a certain location when the data are graphed.

Example For instance, for this shape, you would say that most of the data points are in the hump, where the line is highest on the y-axis:

There are not a whole lot of data points on the far right side, in what we'd call the tail of the graph.

Shapes can be either skewed to the left or the right:

  • The distribution in the previous example is called skewed to the right. It has a hump on the left and a tail on the right.
  • In contrast, the distribution below is called skewed to the left. It has a tail on the left and a hump on the right.


3. Center

Center is essentially what it sounds like: it's wherever the middle is.

Term to Know

Center

The “middle” of the data set. There are many measures of center.

There are a couple different ways to measure center.

In this graph, there are a few arrows pointing to the different measurements of the middle.

  1. The first arrow (the arrow furthest to the left on the x-axis) falls directly below the peak.
  2. The second arrow (the one in the middle) is a little further off to the right. It appears that if you drew a line directly through this arrow, about half the area of the graph would be to the left of it and about half the area would be to the right of it.
  3. The third arrow is farthest to the right of the x-axis.

Which one is the correct measure of center? They're all different measures and they can all be correct in different situations.


4. Spread

Spread gives a numerical value relating how spread out the data points are.

Term to Know

Spread

The numerical description of how close the numbers are to the center.

Just as with center, there are several different measures for spread:

  • Maybe you are just interested in where most of the data points lie, which would be below the hump.

  • Maybe you are interested in the full range of data points from the lowest all the way the highest.

There would both be different, and correct, measurements of the spread.


5. Outliers

Outliers are important to look for.

Term to Know

Outliers

Points in a data set that are so high or so low as to be unusual, given the rest of the values.

Outliers are not just the highest or lowest numbers, but they are very far above the next highest number in the data set or very far below the next lowest number in the data set.

ExampleSuppose that a small class took an exam. And the scores were as follows:

Some students did very well on this test. In fact, most students scored in the 80s or the 90s. However, one person scored only 46. That 46 would be considered an outlier because it's so much lower than the rest of the pack.

Big Idea

Outliers are important data points because they are so high or low that they would be considered unusual.


Summary

Data analysis consists of clearly describing the four key elements: shape, center, spread, and outliers, if there are any. There are some standard descriptions that are used to describe shape, such as skewed to the left and skewed to the right, and there are also several different measures for center and spread. Those are typically numbers. Outliers are values that are so high above the rest of the data set or so far below that they would be considered unusual.

Thank you and good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS

TERMS TO KNOW
  • Data Analysis

    The understanding of the key features of a set of data - shape, center, spread, and outliers.

  • Shape

    The qualitative description of the clustering of data points in a certain location when the data are graphed.

  • Center

    The "middle" of the data set. There are many measures of center.

  • Spread

    The numerical description of how close the numbers are to the center.

  • Outliers

    Points in a data set that are so high or so low as to be unusual, given the rest of the values.