[INAUDIBLE] covers paradoxes. A paradox is a situation where a phenomenon can be looked at in two contradictory ways. So depending on what you're looking at and how you're thinking of the situation, you can come up with one of two conclusions or one of two visions. And those visions contradict each other.
Now when you're interpreting data, you're going to be using intuition, and human intuition can be misleading. So it's important to analyze and resolve paradoxes. Now this isn't always possible. But whenever it is, we should do it.
This tutorial looks at Benford's Law. Other tutorials will look at different paradoxes. Benford's Law is a paradox about the first digit. Random chance says that every number has an 11% chance of showing up as the first digit. Digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. But we wouldn't consider 0 as a first digit, so every number has a 1 out of those other 9 remaining chances, so 11%. But in reality, we end up seeing a totally different pattern.
Now this pattern was first studied by a man named Benford. And he analyzed a variety of different data with numbers to find out that the first digit of one actually appears about 30% of the time. And data with a first digit of 2 appears about 17.61% of the time. And so on and so forth down to the digit of 9, which appears in the first place only about 4.58% of the time.
So this is very different from what our intuition expected. But a deeper understanding of logarithms would help us to see how this Benford's discoveries and his realizations are, in fact, true. So depending on how you look at the situation, you come up with two contradictory views, but it is possible to resolve it. So this has been your tutorial on paradoxes.
This tutorial talks about Simpson's Paradox. Simpson's Paradox is a paradox that involves the means of samples. Now, when you have the data beta divided out, one result shows. But with the data aggregated, with it all combined together, the opposite result shows.
Another way of saying this is that Simpson's paradox is when two sets of data are subdivided, the means for the first data set can be consistently higher than for the second, but that when looked at as a whole, the mean of the second set is higher than the first. This image here kind of gives a picture for that example. Here, we have the data for the blue separated out from the data for the red. And we can see that the blue has a positive trend, and the red has a positive trend.
But when you aggregate the data together and treat these eight points as a single set, then this black line is showing the trend for just this one set. And in that case, it's negative. So depending on whether you look at the data separated out by whatever category is dividing it or aggregate it together, the trend is going to reverse, and we're going to see different results. Let's look at another example.
In this example, we're talking about batting averages. And if we look at this, Tom is better against lefties and righties than Joe is. Tom hits a 0.314 against righties, whereas Joe only hits 0.259. And Tom hits a 0.213 against lefties, where Joe hits a 0.200.
Now, if we go to look at their combined averages, somehow Joe's combined average is better. He's at 0.247, whereas Tom only has a 0.243. So this is showing Simpson's Paradox again because when we separate out the data, Tom is doing better. But when we combine the data, Joe is doing better. And let's look at how that occurs.
If we think about it, and if Tom faces righties 30% the time, but lefties 70% of the time, and Joe faces the righties 80% of the time, but only 20% of the time for the lefties, then this difference in the amount that they're seeing each type of hitter is accounting for the differences. Now, because both players are worse against the lefties, but Tom faces more lefties than Joe-- he faces lefties 70% of the time, whereas Joe faces them 20% of the time-- Tom's average is further dragged down by the lefties than Joe is. And this is why Tom's combined average is lower than Joe's combined average. So this is showing us exactly why this paradox is occurring. So this has been your tutorial on Simpson's Paradox.