Source: Image of tables and graphs created by Jonathan Osters
In this tutorial, you're going to learn about scatterplots. Scatterplots are ways that we can show more than one quantitative attribute at a time for a particular data set. So in the past, we've been using something like dot plots, where we have some quantitative attribute about a data set. And we've been making dot plots where we stack up dots at a particular value and we look at it that way. But scatterplots allow us to not only see how those values compare along this attribute, but also along a different attribute. So we might have values that are low for Attribute 1 but high on Attribute 2.
Examples of data sets that we might put in scatterplots are something like cigarette consumption and cancer death. So, maybe there are certain states or countries that have low cigarette consumption and maybe, correspondingly, low cancer deaths. Each dot would correspond to one single state or one single country. Or, if we were going with a sports team, maybe does spending a lot of money on your team payroll, paying your players a lot, cause them to win more? So, maybe we look at teams that have high payroll, and do they win a lot of games or do they win very few games? Each dot, in that case, would correspond to a single team.
Let's take a look at an example. So, this was the 1992 payrolls for the National Football League for their quarterback, who's usually their most expensive player, usually, and for the entire team. These are in thousands of dollars. So, 900,000 and this is 17.2 million because it's 17 thousand thousand for the San Francisco 49ers.
What we're going to do is we're going to put this on a scatterplot. We'll put the one that we think helps to explain the other on the x-axis, on the horizontal axis. Now, it's probably the quarterback salary that helps to contribute to a high or low team salary. We're going to do this in thousands, here. And I chose to do this in millions, here.
So, what we're going to do is we're going to start with the first team, the 49ers. And we're going to find that $900,000 for the quarterback and $17.2 million the team payroll puts a dot right here. Now, that's one of the many dots that we're going to end up with. The next team, the Bears, had a quarterback salary of $3 million and a total payroll of about $23 million.
And we go on to the next team, the Bengals. And as we continue on, we're going to end up with one dot for each team. The final version looks like this. Each dot corresponds to a single team. And we can see that there's a trend here, it's not a hard and fast rule. But it seems like as the quarterback salary increases, as it moves to the right, that the total payroll tends to increase, as well.
Now, suppose that we wanted to add an additional categorical variable this. For instance, I wanted to know if the payrolls were different depending on conferences. So, there are two conferences in the National Football League, the NFC and the AFC. So, what I can do with that is I can look at that same scatterplot just using different symbols for AFC, the gray circle, or NFC, the blue square. And what I find is there's not that much of a difference. There's still that overall trend of as the quarterback payroll increases, so does the team payroll.
And we can also see this using technology. If we want to use Excel, all we have to do is select the area that we want. Select the correct graph. We're going to use a scatterplot. And sometimes, there's a little bit of extraneous stuff that we can get rid of. And notice, the axes aren't labeled, either. This is something that you might want to add in. But you can see that same set of data. Sure enough, shown right in there, too.
And so, to recap, scatterplots are a way to show the relationship between two quantitative variables. And these are paired data sets. These are two attributes for the same individuals in the data set. One variable, typically the one that we think might cause the other to happen, is assigned to the x-axis. And the other is assigned to the y-axis. And we can also put in multiple data sets, just using different symbols or different colors, to denote the different sets. Good luck and we'll see you next time.