Source: Tables created by the author
This tutorial is going to teach you about a specific statistical paradox called Simpson's Paradox. We'll start with an example, very famous example. In 1973, UC Berkeley had a sex discrimination lawsuit filed against them. And this is why. They said that they were favoring men over women substantially in the admissions process for their grad schools. It looks like 977 men applied to two of the departments, and 492 were accepted-- that's a little over half-- versus of the 400 women who applied, well under half, or 148 were accepted. In fact, the proportions are 50.3% versus 37%. That is a huge difference. And that's why the lawsuit was filed.
So in an effort to see exactly where the women were being discriminated against, the lawyers looked into the admissions by department. And you would expect there would be some large discrepancy in one or both departments. So they looked at-- I don't know that these are in fact the Engineering and English departments-- the cases is real, but I just made these names up.
So look at the Engineering department, and you can see that for the men, about 63% percent of men were accepted to the Engineering department verses 17 of the 25, which is 68 percent for women. Women were accepted at higher rates to the Engineering department. All right, in that case, you would assume that the discrepancy then occurs in the English department. Well, when you look at the English department, women were accepted at higher rates to the English department as well-- 34.9% versus 33.3%. So women were accepted at higher rates to the Engineering department and the English department, but lower-- way lower-- overall.
And this is what Simpson's Paradox is. It's a relationship that's present in groups, but reversed when the groups are combined. The reasoning behind it is that if you take a look at how the men's application rates were distributed, their 63% was weighted for a lot more into the weighted average of admissions rates versus the 68% for the women.
Look, only 25 women applied to the Engineering department of the 400. That's not very many. And so that 68, even though it's a high percentage, doesn't count nearly as much in the weighted average as the 34.9 percent does. So the 63% is weighted heavily for the men versus the 68% is weighted hardly at all for the women. And that's why you see that reversal of association.
And so to recap, Simpson's paradox is an association that the data show when you group them in specific ways. And the association gets reversed, when the groups are combined. And you'll see several paradoxes like this. Simpson's paradox is one of them. As we learn more about paradoxes, we can hone our statistical thinking and become better statistical thinkers. Good luck and we'll see you next time.
When two sets of data are subdivided, the means for the first data set can be consistently higher than the second, but when looked at as a whole, the mean of the second set is higher than the first.