In this tutorial, you're going to learn about describing scatterplots. Specifically you will focus on:
When talking about univariate data, that is one variable data when making histograms and dot plots, you would talk about the shape, center, and spread of a distribution.
Here, there are two variables, quarterback salary and total team payroll.
On a scatterplot it's a little hard to talk about the shape. In the center and spread, it's all very confusing, maybe the QB salary is very spread out, and the total salary is maybe not so spread out. It would be hard to talk about spread.
You're actually going to describe in a scatterplot, form, direction, and strength.
1. Form
In the form, we look for a pattern. Is the pattern linear or do the data show a curve? Do they start low and then peak and then end low or do they start low and end high? How do they curve when they do that or do they rise quickly and then tail off? There's a lot to look at.
In this particular situation, with quarterback salary and total team salary for the National League football teams, it appears to be fairly linear. It starts kind of low, ends pretty high. Are there clusters or gaps we should be aware of? In this case, it's fairly linear with a possible outlier.
Form
The overall shape of the data points. The form may be linear or nonlinear, or there may not be any form at all to the points, if they form a "cloud."
2. Direction.
In this case, the direction is how does the y-axis variable, that is total salary, respond to as you move to the right on the x-axis variable, which is quarterback salary? Here, does it increase as you move to the right or does total payroll decrease as the quarterback salary increases? In this case, total team payroll tends to increase as the quarterback salary increases.
Direction
The way one variable responds to an increase in the other. With a negative association, an increase in one variable is associated with a decrease in the other, whereas with a positive association, an increase in one variable is associated with an increase in the other.
That's not a hard and fast rule. For instance, one particular team had a very high team payroll even though their quarterback salary was a lot lower. It's not a hard and fast rule, it just tends to be the case.
3. Strength
How well do the points follow that indicated form? How well do these points stack up on a line? They seem to go up as they go to the right, and they don't seem to sweep up, or go up quickly and then tail off. They do seem to go up fairly steadily, but there's quite a bit of variation in the y-coordinates. It's not like they're all sitting exactly on a line. You would say this is a moderate linear association. If this were a stronger association, this oval would be thinner. And if it was a weaker association, this oval would be wider.
Strength
The closeness of the points to the indicated form. Points that are strongly linear will all fall on or near a straight line.
Have a look. This is the 1970 and 1980 price of different seafood in cents per pound. So let's talk about the form, direction, and strength.
It looks like the form is fairly linear. One is a little bit low for the line that we would look at for the rest of the data points. It also appears to have an outlier on the high side.
Scallops were expensive in 1970, and expensive in 1980 as well. This one appears to be an outlier.
But the direction is positive, which means that as the 1970 price increases, so does the 1980 price.
That's not surprising, because you would expect that the ones that are less expensive before would be less expensive in 1980, and the ones that were more expensive in '70 would be more expensive in '80. The strength is very strong. It's pretty predictable what is going to happen with these prices, based on they're very close to a line.
You can also look at these different forms. So this is a strongly non-linear form.
This looks kind of like exponential growth. This is a sort of a weak linear, but it is in fact linear form.
You could have something like this, sort of like a cloud of points that has no association.
Here are different strengths.
Imagine the oval that you could put over these. This strong association, you could put a very long, thin oval over. The moderate association, you could put a kind of a wider oval over it, but it would still be longer than it is wide. Over this weak association, the oval is almost more like a circle. And so the idea is, if you can encase the points in an oval, the stronger associations will have a longer, thinner oval.
To describe scatterplots, rather than looking at shape, center, and spread, when we were looking at one variable data, in two variable data, we analyze form, direction, and strength. So linear or nonlinear? Are there unusual features, gaps, clusters, outliers? All that goes into form. How well do they follow that form? That's strength. And what happens as the x-axis variable increases? Does the y-axis variable go up, down, or does it stay the same? Or is there really no association at all? And that's the direction of the association.
Good luck.
Source: This work adapted from Sophia Author Jonathan Osters.
The closeness of the points to the indicated form. Points that are strongly linear will all fall on or near a straight line.
The overall shape of the data points. The form may be linear or nonlinear, or there may not be any form at all to the points, if they form a "cloud."
The way one variable responds to an increase in the other. With a negative association, an increase in one variable is associated with a decrease in the other, whereas with a positive association, an increase in one variable is associated with an increase in the other.