Or

4
Tutorials that teach
Scatterplot

Take your pick:

Tutorial

Hi. This tutorial covers a type of graph called the scatter plot. So much of the study of statistics involves the univariate data sets. But oftentimes, there are more than one variable. More than one variable is collected on each individual. So univariate, remember, just means one variable. But sometimes you can collect data on more than one variable on each individuals.

So here are a couple examples. So in large health studies of populations, it's common to obtain variables such as age, sex, weight, height, blood pressure, and total cholesterol on each individual. So what you might want to do is think about, well, how does height affect total cholesterol? Or how does weight affect blood pressure? So those, we could pair up.

Economic studies may be interested in, among other things, personal income and years of education. And most university admissions committees ask for an applicant's high school grade point average and a standardized admission test score, such as the ACT or SAT.

So bivariate data consists of two paired quantitative variables. So in that last example, if we had an ACT score and a grade point average, those two would be considered bivariate data, if we were to collect both of those pieces of data for a number of individuals.

Now, a scatter plot is a graph with an x and y-axis, where a coordinate point represents the values of two quantitative variables for one individual. A scatter plot is the primary graphical display of bivariate data. So let's actually look at some binary data and make a scatter plot.

So what we have here are six countries. And the two variables that we are measuring are the sugar consumption and that's measured as calories per person per day from sugar. And we have depression rate. So these numbers are the number of cases per 100 people per year.

So in Korea, on average, people consume 150 calories from sugar each day. And then, Korea's depression rate is 2.3. That represents 2.3 cases of depression per 100 people per year. So what we're going to do is make a scatter plot for this data. So we need an x-axis and a y-axis.

Because we don't have any negative values, my scatter plot only needs to include the first quadrant, which represents all positive values. And what we're going to do is, on the x-axis, we're going to put our sugar consumption. And on the y-axis, we're going to put it in our depression rate.

Now, on my x-axis, I need to be able to graph values from 150 all the way up to 480. So those are the minimum and the maximum. So I think I can start at 0. I think it's reasonable to do that. And I need to get out to 480. So I think what I'll do is I'll come by hundreds.

I'll count by hundreds and then I'll label every 200. So we have 200, 400, 600. So that represents my sugar consumption. That's the x-axis. Now, on the y-axis, I'm going to represent my depression rates. And I think what I'll do here is just count by ones. And maybe I'll label every two. 2, 4, 6. So 2, 4, 6.

Now, in a scatter plot, a dot represents each individual. So in this case, an individual would be a country. So I'm going to make a dot for Korea. And that dot will match up with the value of a sugar consumption and depression rate. So we got 150 and 2.3. So this is 100. So 150 would be halfway between 100 and 200. And then, 2.3 is going to be a little bit above 2. So that's going to be about here. That point represents Korea.

United States, 303 and 3.0. So 300. And 3 is right about here. So we can see that that match up, 300 and 3. 350 and 4.4. So 350 and 4.4 is going to be about here. 375 and 5.0. That's going to be about here. 390 and 5.2. That's really close. It's going to be about here. And New Zealand, 480 and 5.7. So 480 would be about here. 5.7 would be about here.

So that would represent the scatter plot of this bivariate data set, again, where each dot represents a country. And for each country, their sugar consumption and depression rate can be identified from the scatter plot.

Now, a scatter plot can be used to graph multiple related bivariate data sets. So multiple data sets, we're just referring to two or more data sets. So the following scatter plot shows the heights of boys and girls measured at age 9 and age 18. A red mark represents the height of a specific girl at age 9, which is the x, and age 18 as the y. A blue mark represents the height of a specific boy at age 9 and at age 18.

So if we take a look at this scatter plot, we can see right away that boys are blue, girls are red. The shapes are also different. You'll see that height in centimeters at age 9 is on the x-axis. And the height in centimeters at age 18 is on the y-axis. So we can see that kind of girls seem to be almost consistently lower than the boys.

We can also see that it seems like there are a few girls over here that were shorter at age 9. But this a way to use a scatter plot to graph multiple data sets. If you're not able to do this in color, a lot of times you'll just use different symbols for the two different populations.

So I might just have a square representing the girls and a diamond representing the boys or maybe the boys as a dot and the girls maybe as a star, just so you can distinguish between the two sets. It is important though to have a key so that the reader is able to see which mark represents which population or which sample. So this is just a good way of using a scatter plot to graph multiple data sets. That has been the tutorial on the scatter plot. Thanks for watching.