4 Tutorials that teach Correlation and Causation
Take your pick:
Correlation and Causation
Common Core: S.ID.9

Correlation and Causation


This lesson will introduce the connection between correlation and causality.

See More
Introduction to Statistics

Get a load of these stats.
Our Intro to Stats course is only $329.

Sophia's online courses not only save you money, but also are eligible for credit transfer to over 2,000 colleges and universities.*
Start a free trial now.


What's Covered

This tutorial is going to talk about correlation versus causation. You will learn about:

  1. Correlation and causation
  2. Lurking variables
  3. Coincidence

1. Correlation and causation

Correlation and causation are not the same thing.

Terms to Know


A phenomenon whereby an increase in one variable directly leads to an increase or decrease in another variable.


A statistic which measures the strength and direction of the linear association between two quantitative variables.

However, it's often tempting to say that two well-correlated variables have what we call a "causal" link between the two. There are a variety of explanations:

  • It could be that well-correlated variables don't have a situation where the explanatory variable causes the response.
  • There could be something called a "lurking variable" behind the scene that causes an increase or decrease in one or both of them.
  • It could just be that you got the association reversed.

So let's look at some examples:

In many families where parents left the light on in their infant's room as they slept, the infant developed nearsightedness. This is an actually studied scenario, where researchers noticed that there was a positive relationship between sleeping with the light on and having nearsightedness. Therefore, researchers concluded that sleeping with the light on might cause nearsightedness. Is this conclusion correct?

Upon follow-up studies, this conclusion was shown to be incorrect. The nearsightedness of the children was genetic, and was therefore caused by their parents' nearsightedness, not by sleeping in a room with the light on.

In fact, the parents' nearsightedness caused them to leave the light on in the child's room so that the parents could see. So the nearsightedness of the child and the light being left on were both due to the lurking variable of their parents' nearsightedness. It wasn't the light that caused the child's nearsightedness.

2. Lurking Variables

Consider this example: as ice cream sales increase, so do the number of drowning deaths. Conclusion: eating ice cream causes drowning. So if you look at the different months with ice cream sales and the same number of months with the number of drowning deaths that occur, was your mother right? Should you not go swimming after eating ice cream because it's dangerous for you? Well, not really.

Both of ice cream sales and drownings happen to increase with higher temperatures:

  • As the summer months go on, more people consume ice cream because it's warmer and they want to cool off.
  • They also want to cool off by going to the beach and the pools in the summer.
  • A higher volume of people attending those beaches and pools will sadly cause the number of people that drown to go up, as well.

Just as in the case of the nearsightedness and sleeping with the light on, there's a lurking variable behind the scenes causing the increase in both ice cream sales and drowning. It's not the increase in ice cream that causes the drowning, nor does the drowning cause an increase in ice cream sales. They're both increased by the higher temperatures.

Try It

As the number of firefighters at a fire increases, so does the damage the fire causes. As you increase the number of firefighters, the damage of the fire increases, as well. What conclusion is correct:

    1. Sending firefighters is counterproductive because they only increase the size of the fire.
    2. As the size of the fire increases, more firefighters need to be sent.

The first is option is pretty obviously a ludicrous conclusion to draw. In fact, the true association is just the other way around. So the association is reversed. It is cause-and-effect. But it's reversed. It's a severe fire that causes the firefighters to arrive, not the other way around.

3. Coincidence

There doesn't always have to be an explanation for the relationship between two events. It's possible that two variables might be very well-correlated but the correlation is simply a coincidence. Therefore, the best way to prove cause-and-effect is with a controlled experiment where the explanatory variable is administered to one group and withheld from the other.

If the experiment follows the basic experimental design principles of control, randomization, and replication, the experiment can, in fact, prove a cause-and-effect relationship. It can give the best evidence for causation.


Sometimes two variables will be related because one causes the other, whereas other times they will be very well-correlated but the association isn't what we call "causal"; this is the difference between correlation and causation. In many cases, there's a lurking variable, something behind the scenes, that's causing an increase or decrease in both variables, or maybe a decrease in one and an increase in the other. Finally, sometimes there appears to be a relationship between two variables, but it is only a coincidence. Thus, the most valid way to prove causation is with a controlled, randomized experiment. However, strong evidence for causation can be made with an observational study.

Thank you and good luck!


  • Correlation

    A statistic which measures the strength and direction of the linear association between two quantitative variables.

  • Causation/Cause-and-Effect

    A phenomenon whereby an increase in one variable directly leads to an increase or decrease in another variable.