This tutorial is going to speak to you about statistical paradoxes. You will learn about:
Paradoxes are apparent contradictions in what you see versus what you expected to see. Specifically, the one that we're going to talk about in this tutorial is called Benford's Law.
Paradox
An apparent contradiction between what our intuition tells us, and what is true in reality.
Statistics allows us to draw conclusions about things that we see. But sometimes, the phenomena that we see are counter to what we thought would happen. So these seeming contradictions are called paradoxes. If we understand them better, we can improve as statistical thinkers.
Suppose that you were going to create a phony checking account and you wanted to set it up so that you could steal some money from people. To do so, you will need to create a checking account number for this phony account. For your fakes account number, would it matter what number you selected as the first number?
You probably thought that, no, it really wouldn't matter which number you chose as the first in your fake account number. All the numbers one through nine are equally likely to be selected for the first number, so if the account number is randomly selected, it wouldn’t really matter which number you selected first. That's your intuition.
However, your intuition that all the numbers 1-9 are equally likely to be selected for the first number is actually wrong. What's really the case is that checking account numbers are most likely to start with 1.
In published data, including checking account numbers and many other kinds of numbers, are most likely to start with 1.
Benford's Law, also called the First Digit Law, says that the first number of most any real life data, including financial reports, follow a pattern with the number 1 being the most likely, 2 being the next most likely, and so in in a specific order.
Benford's Law
A law that shows that most of the numbers that are published, regardless of topic, begin with smaller numbers, and very few of them lead with larger numbers. The most common first digit is 1, the least common is 9.
This law shows that only about 10% of account numbers will start with a four, whereas about 30% will start with a one.
People who try to steal identities are likely to use more 4s, 5s, and 6s, because they think those are the middle. But really, it's the number 1 that's the most likely as a lead number.
Benford examined many different sets of data, including:
Benford looked at these different values and saw that almost across the board, 1 is the most likely lead number, 2 is the next most likely lead number, and 9 is the least likely lead number.
Ask yourself, if the lead number follows that law, then does that mean that the second digit must also follow that law?
If you thought, “yes,” then your intuition again lead you astray. While the first digit follows Benford's Law, the second digit does not.
As you can see in the image above, the second digit is approximately equally likely to be any of the numbers 0-9. There is slight favoring of the lower numbers, but all are about 10%. The second digit has equal frequency.
The reason for a phenomenon like what you saw in the above examples has to do with exponential growth, which looks like this:
As you can see, the 100-200, 200-300, and 300-400 are equally spaced. However, there are more numbers on the x-axis that create a value 100-200 versus ones that create a number 200-300.That amount diminishes as you move along to the right of the x-axis.
A paradox is a seeming contradiction between what you think should happen versus what's actually happening. The First Digit Law, which is Benford's Law, is one of these paradoxes. We thought that we would find a uniform distribution among first digits of certain numbers that we see, but 1 is a much more common lead number than any other. Not all numbers occur with equal frequency as the lead digit, which is related to exponential growth.
Once you understand paradoxes better, you’ll be able to hone your statistical thinking and be more precise.
Thank you and good luck!
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS
A law that shows that most of the numbers that are published, regardless of topic, begin with smaller numbers, and very few of them lead with larger numbers. The most common first digit is 1, the least common is 9.
An apparent contradiction between what our intuition tells us, and what is true in reality.