This tutorial covers correlation and causation. It is really important to make sure that you don't make a big mistake, that you don't say that just because things are correlated that they're causing each other to happen. Correlation does not imply causation.
So if you've picked two variables, and you run a line of best fit for them, and you find out that the r is 1, there's a perfect linear correlation with them. They are strongly associated. You cannot say that one variable causes the other variable to happen without doing some other tests and making some other assertions.
Correlation is just saying that the two variables or events have a linear association, that they go together, that as one increases, the other increases too, or that as one increases, the other decreases in a predictable way. Causation is saying that that one event or variable causes the other one to happen. And we can only say causation when we've proven it. And we prove it through doing a random controlled experiment with a large sample. Absent of that random controlled experiment with a large sample, you cannot say that there is causation.
Now if you find a strong correlation, the reason we can't say there's causation-- one example is because there could be a confounding variable. So there should be something kind of lurking out in the side and confusing the relationship between the cause and the explanatory variable and the response variable.
So here, for example, if you're looking at the relationship between uniforms and test scores, and you find a strong relationship-- students who wear uniforms have higher test scores, or who wear uniforms more often have higher test scores. Now there could be a lurking variable. There could be a confounding variable, like parental income. Perhaps it's true that more parents with more money send their students to schools that wear uniforms, and the extra money also is affecting students' test scores. The parents are able to provide tutors and things like that. That confounding variable puts a question mark on the relationship between the explanatory variable and the response variable.
So we know we still have correlation. They are associated with each other. But we don't know that we have causation. It could also be that the causation is reversed. Perhaps you studied the number of-- sorry, perhaps you studied whether or not people own minivans and how many children they have, and you find that if you have more minivans-- sorry, if you own a minivan, you're more likely to have a baby.
So is it the minivan that is causing the babies to be born? Or is it the other way around? That once you have a lot of babies, you're more likely to buy a minivan in order to drive them around. So because we don't know what the direction of the cause and effect is, we can't say for sure that it's a causal relationship. We can only say that the two variables are correlated.
This has been your tutorial on correlation and causation.