Online College Courses for Credit

4 Tutorials that teach Cautions about Correlation
Take your pick:
Cautions about Correlation
Common Core: S.ID.9

Cautions about Correlation

Author: Katherine Williams

Identify non-linearity, influential point, or inappropriate grouping in a data set.

See More

Source: Graphs created by Katherine Williams

Video Transcription

Download PDF

This tutorial talks about different cautions that need to be aware of when you're using correlations. Now, you have to use correlation really carefully because r doesn't tell us everything. And, in particular, you're going to need to look at the plot. One major thing is if you get an r of zero, your instinct is to say, oh, those variables don't have any association. But it's careful because those variables don't have any linear association.

They might make a perfect curve or a perfect quadratic, but if you don't look at the plot, you would never know that. So no matter what, even if you have your r value, and you're really excited about a perfectly strong correlation, or you're kind of disappointed because it's a zero correlation, you need to look at the plot to see what's actually happening with your data points.

One other thing that can happen is you can have inappropriate grouping. So if you've combined multiple data sets together, whether intentionally or unintentionally, and the two subgroups are different, but they're graphed as one, then sometimes the correlation can be weakened or reversed. We'll see an example here.

Here these graphs are talking about different measurements from adult monkeys and relating it to their age. And if you have this group in here, so these points-- and I shouldn't use black. So this first chunk was combined with these older monkeys. And you can see by the lines and by the trends of the data points that the young and the old monkeys have really different responses. So this would be a case when I would say you should separate out the ages. Even though it's both negative, the younger monkeys have a much different response than the older monkeys.

Similarly, over here, the two groups were combined together. But the younger monkeys seemed to have a positive correlation between age and their total body BMD. And the older monkeys have a slightly negative correlation. So in putting those two groups together and inappropriately grouping the younger and the older monkeys, we wouldn't know about how the correlations truly are because it's just combined together into one scatter plot that would have a pretty flat correlation.

But when we separate them out and look at the monkeys separately, instead of as one, we can see the differences in the correlation between the two groups. So it's important to think about your data and whether or not you've combined pieces of information together and to double check things like whether or not males or females are responding differently or age or things like that, so that you can make sure that you're not inappropriately grouping and either weakening your correlation or accidentally reversing it.

This has been your tutorial on cautions about correlation.

Terms to Know
Inappropriate Grouping

Combining together subgroups that should not be combined, resulting in a weakened, or even reversed, association.