This tutorial is going to teach you about a specific statistical paradox called Simpson's Paradox. You will learn about:
There are many kinds of paradoxes, and Simpson's Paradox is just one of them.
When two sets of data are subdivided, the means for the first data set can be consistently higher than the second, but when looked at as a whole, the mean of the second set is higher than the first.
Simpson’s Paradox is a relationship that's present in groups, but reversed when the groups are combined.
A very famous example of Simpson’s Paradox took place in 1973. That year, UC Berkeley had a sex discrimination lawsuit filed against them that asserted that UCB was favoring men over women substantially in the admissions process for their grad schools. Here is the data:
As you can see, it looks like 977 men applied and 492 were accepted, which is a little over half. In contrast, of the 400 women who applied, well under half, only 148, were accepted. In fact, the proportions are 50.3% versus 37%.
The difference between 37% and 50.3% is huge, which is why the lawsuit was filed. In an effort to see exactly where the women were being discriminated against, the lawyers looked into the admissions by department.
You would expect that there would be a large discrepancy in certain departments. For this tutorial, we will look at the data for two departments, which we are calling the Engineering and English (though the true numbers within these departments may have been different in the real case).
For the Engineering department, and you can see that about 63% percent of men were accepted to the Engineering department versus 68% for women. Women were accepted at higher rates to the Engineering department. So the discrepancy was not present in the Engineering department. You might then assume that the discrepancy occurs in the English department. However, women were accepted at higher rates to the English department as well, 34.9% versus 33.3%.
So women were accepted at higher rates to the Engineering department and the English department, but much lower overall.
Examining how the men's application rates were distributed, their 63% was weighted for a lot more into the weighted average of admissions rates versus the 68% for the women.
Only 25 of the 400 applicants to the Engineering department were women. That's not very many. And so that 68%, even though it's a high percentage, doesn't count nearly as much in the weighted average as the 34.9% does. So the 63% is weighted heavily for the men versus the 68% is weighted hardly at all for the women. And that's why you see that reversal of association.
Simpson's paradox is an association that the data show when you group that data in specific ways, causing the association to be reversed when the groups are combined. There are several paradoxes like this, of which Simpson's paradox is just one.
Thank you and good luck!
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS