This tutorial will discuss scatterplots, specifically focusing on:
Scatterplots are ways that you can show more than one quantitative attribute at a time for a particular data set. In the past, you’ve been using something like dot plots, where you have some quantitative attribute about a data set. And you've been making dot plots where you stack up dots at a particular value and you look at it that way.
But scatterplots allow you to not only see how those values compare along this attribute, but also along a different attribute. It can be very valuable to compare multiple data sets. You might have values that are low for Attribute 1 but high on Attribute 2.
Data sets that you might put in scatterplots are something like cigarette consumption and cancer death. Maybe there are certain states or countries that have low cigarette consumption and maybe, correspondingly, low cancer deaths. Each dot would correspond to one single state or one single country.
Or, if you were going with a sports team, maybe does spending a lot of money on your team payroll cause them to win more? Each dot, in that case, would correspond to a single team.
This was the 1992 payrolls for the National Football League for their quarterback, who's usually their most expensive player, and for the entire team. These are in thousands of dollars.
Now to put this on a scatterplot. Put the one that you think helps to explain the other on the x-axis, on the horizontal axis. Now, it's probably the quarterback salary that helps to contribute to a high or low team salary. Start with the first team, the 49ers. Find that $900,000 for the quarterback and $17.2 million the team payroll and put a dot there. That's one of the many dots that we're going to end up with.
The next team, the Bears, had a quarterback salary of $3 million and a total payroll of about $23 million. As you continue on, you’re going to end up with one dot for each team. The final version looks like this.
It seems like as the quarterback salary increases, as it moves to the right, that the total payroll tends to increase as well.
Now, suppose that you wanted to add an additional categorical variable. For instance, you wanted to know if the payrolls were different depending on conferences. There are two conferences in the National Football League, the NFC and the AFC. So, what you can do with that is look at that same scatterplot just using different symbols for AFC, a gray circle, or NFC, a blue square.
You can also see this using technology. If you want to use Excel, all you have to do is select the area that you want and select the correct graph.
Sometimes, there's a little bit of extraneous stuff that you can get rid of. And notice, the axes aren't labeled, either. This is something that you might want to add in. But you can see that same set of data. Sure enough, shown right in there, too.
Scatterplots are ways that you can show more than one quantitative attribute at a time for a particular data set. It is a way to show the relationship between two quantitative variables, which are paired data sets.
These are two attributes for the same individuals in the data set. One variable, typically the one that we think might cause the other to happen, is assigned to the x-axis. And the other is assigned to the y-axis. And we can also put in multiple data sets, just using different symbols or different colors, to denote the different sets.
Source: This work adapted from Sophia Author Jonathan Osters.
A graphical display that allows us to see the relationship between two quantitative variables.
Plotting more than one data set on a scatterplot requires that we use different colors or symbols for the different data sets so we can see the relationships separately.