Source: Graphs created by Author; Baby, public domain http://www.clker.com/clipart-sleeping-baby.html Icecream, public domain http://openclipart.org/detail/8315/fast-food-desserts-ice-cream-cones-waffle-triple-by-gerald_g Firetruck, public domain, http://openclipart.org/detail/27554/iso-fire-engine-by-secretlondon
This tutorial is going to talk about correlation versus causation. They're not the same thing. And it's tempting to say that two well-correlated variables have what we call a "causal" link between the two. So let's take a look.
It could be that well-correlated variables don't have a situation where the explanatory variable causes the response. It could be a variety of causes. There could be something called a "lurking variable" behind the scene that causes an increase or decrease in one or both of them. Or it could just be that we got the association reversed. So let's look at some examples.
So one scenario might be that in families where parents left the light on in their infant's room as they slept, the infant developed nearsightedness. This is an actually studied scenario where they concluded, off the bat, that sleeping with the light on might cause nearsightedness.
And they might have looked at this way-- percent of time that the infant slept with the light on versus the percent of the children that these parents had that were nearsighted. Maybe all the children were nearsighted, it was 100%. Or maybe only half their children were nearsighted.
And so what they saw was they saw a positive relationship here. And so they concluded that sleeping with the light on might cause nearsightedness. Upon followup studies, they realized that that wasn't the case. The nearsightedness was genetic, caused by their parents' nearsightedness.
Their parents' nearsightedness caused them to leave the light on in the child's room so that the parents could see. So the nearsightedness of the child and the light leaving on were both due to the lurking variable of their parents' nearsightedness. It wasn't the light that caused the child's nearsightedness.
Second example-- as ice cream sales increase, so do the number of drowning deaths. Conclusion-- eating ice cream causes drowning. So if you look at the different months with ice cream sales and the same number of months with the number of drowning deaths that occur, was your mother right? Should you not go swimming after eating ice cream because it's dangerous for you? Well, not really.
Both of those happen to increase with higher temperatures. As the summer months go on, more people consume ice cream. Because it's warmer and they want to cool off. They also want to cool off by going to the beach and the pools in the summer. And just with so much of a higher volume of people attending those beaches and pools, sadly, the number of people that drown will go up, as well.
And so again, there's a lurking variable behind the scenes causing the increase in both ice cream sales and drowning. And it's not that the ice cream causes the drowning or even the other way around. They're both increased by the higher temperatures.
Last example-- as the number of firefighters at a fire increases, so does the damage the fire causes. So as you increase the number of firefighters, the damage of the fire increases, as well. And so the conclusion is sending firefighters is counterproductive because they only increase the size of the fire.
Well, that's pretty obviously a ludicrous conclusion to draw. In fact, the true association is just the other way around. So the association is reversed. It is cause-and-effect. But it's reversed. It's a severe fire that causes the firefighters to arrive, not the other way around.
Now, there doesn't always have to be an explanation. It's possible that two variables might be very well-correlated but the correlation is simply a coincidence. Now, one thing that's worth mentioning here is that the best way to prove a cause-and-effect relationship between two variables is with a controlled experiment.
We've looked at different situations where two variables were well-correlated and tried to decide on whether or not one caused the other. But the best way to prove cause-and-effect is with a controlled experiment where the explanatory variable is administered to one group and withheld from the other.
And if the experiment follows the basic experimental design principles of control, randomization, and replication, the experiment can, in fact, prove a cause-and-effect relationship. It can give the best evidence for causation.
And so to recap, sometimes two variables will be very well-correlated. But the association isn't what we call "causal." In many cases, there's a lurking variable, something behind the scenes, that's causing an increase or decrease in both variables, or maybe a decrease in one and an increase in the other.
The most valid way to prove causation is with a controlled, randomized experiment. Although, strong evidence for causation can, in fact, be made with an observational study. It's worth noting that. And so we talked about correlation, the whole idea that correlation does not imply causation, and the idea that there might be a lurking variable working behind the scenes to cause increases in both variables. Good luck. And we'll see you next time.
A phenomenon whereby an increase in one variable directly leads to an increase or decrease in another variable.
A statistic which measures the strength and direction of the linear association between two quantitative variables.