+
Simpson's Paradox

Simpson's Paradox

Description:

This lesson will explain Simpson's Paradox.

(more)
See More

Try Our College Algebra Course. For FREE.

Sophia’s self-paced online courses are a great way to save time and money as you earn credits eligible for transfer to over 2,000 colleges and universities.*

Begin Free Trial
No credit card required

25 Sophia partners guarantee credit transfer.

221 Institutions have accepted or given pre-approval for credit transfer.

* The American Council on Education's College Credit Recommendation Service (ACE Credit®) has evaluated and recommended college credit for 20 of Sophia’s online courses. More than 2,000 colleges and universities consider ACE CREDIT recommendations in determining the applicability to their course and degree programs.

Tutorial

What's Covered

This tutorial is going to teach you about a specific statistical paradox called Simpson's Paradox. You will learn about:

  1. Simpson’s Paradox

1. Simpson's Paradox

There are many kinds of paradoxes, and Simpson's Paradox is just one of them.

Term to Know

Simpson's Paradox

When two sets of data are subdivided, the means for the first data set can be consistently higher than the second, but when looked at as a whole, the mean of the second set is higher than the first.

Simpson’s Paradox is a relationship that's present in groups, but reversed when the groups are combined.

Events to Know

A very famous example of Simpson’s Paradox took place in 1973. That year, UC Berkeley had a sex discrimination lawsuit filed against them that asserted that UCB was favoring men over women substantially in the admissions process for their grad schools. Here is the data:

As you can see, it looks like 977 men applied and 492 were accepted, which is a little over half. In contrast, of the 400 women who applied, well under half, only 148, were accepted. In fact, the proportions are 50.3% versus 37%.

The difference between 37% and 50.3% is huge, which is why the lawsuit was filed. In an effort to see exactly where the women were being discriminated against, the lawyers looked into the admissions by department.

Think About It

You would expect that there would be a large discrepancy in certain departments. For this tutorial, we will look at the data for two departments, which we are calling the Engineering and English (though the true numbers within these departments may have been different in the real case).

For the Engineering department, and you can see that about 63% percent of men were accepted to the Engineering department versus 68% for women. Women were accepted at higher rates to the Engineering department. So the discrepancy was not present in the Engineering department. You might then assume that the discrepancy occurs in the English department. However, women were accepted at higher rates to the English department as well, 34.9% versus 33.3%.

So women were accepted at higher rates to the Engineering department and the English department, but much lower overall.

Examining how the men's application rates were distributed, their 63% was weighted for a lot more into the weighted average of admissions rates versus the 68% for the women.

Only 25 of the 400 applicants to the Engineering department were women. That's not very many. And so that 68%, even though it's a high percentage, doesn't count nearly as much in the weighted average as the 34.9% does. So the 63% is weighted heavily for the men versus the 68% is weighted hardly at all for the women. And that's why you see that reversal of association.


Summary

Simpson's paradox is an association that the data show when you group that data in specific ways, causing the association to be reversed when the groups are combined. There are several paradoxes like this, of which Simpson's paradox is just one.

Thank you and good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS

TERMS TO KNOW
  • Simpson's Paradox

    When two sets of data are subdivided, the means for the first data set can be consistently higher than the second, but when looked at as a whole, the mean of the second set is higher than the first.