This tutorial is going to teach you about the normal distribution's approximation to the binomial distribution. Specifically you will focus on:
Here is a review about the binomial distribution itself.
Using that formula, you can create a probability distribution for all the values of k, zero successes, one success, two successes, all the way up to n successes.
That can be made into a histogram, where the x-axis are the values of k, the number of successes; and the y-axis is the relative frequency of those successes. These buckets on zero go up to the height corresponding to the probability.
Just like all distributions, that histogram is going to have a mean and a standard deviation. The mean is pretty obvious to calculate. Suppose you rolled a fair die six times. How many threes would you expect? What if you rolled it 60 times or 600 times? How many threes would you expect?
What you should be thinking is if you rolled it six times, you'd expect one of them to be a three. If you rolled it 60 times you'd expect about 10 threes. If you rolled it 600 times, you'd expect about 100 threes.
You were multiplying by 1/6 because 1/6 was the probability. 1/6 of six was one.
Ts the average, or the expected value, is going to be the number of trials times the probability of success. That's where you get 60 times 1/6 is the probability of a three gives you 10 as the expected value.
The standard deviation is fairly compact.
Every distribution has three key features.
Center and spread you just dealt with by finding the mean and the standard deviation. But what about the shape?
Shape of this distribution is affected by two things. It's affected by both n and p.
Look at this distribution here, where there were 10 trials and the probability of success was 0.925. Notice when the probability of success is very high, the distribution is skewed very heavily to the left.
When the probability of success is very low, the distribution becomes very much more skewed to the right.
When it's near 0.5, the probability of success, the distribution becomes nearly symmetric.
That's what you should see when we look at the binomial distribution.
Look at how n affects the shape of the distribution. When you had 10 trials, and a probability of success of 0.4, it was fairly symmetric.
With 100 trials, it's still fairly symmetric:
When the probability of success was very high, the shape would be skewed. But if you take a look, it's skewed here very heavily at 10 trials.
While at 100 trials it's nearly symmetric.
It's a little skewed to the left, but not heavily skewed to the left.
How about when p is very low? Here this was heavily skewed to the right.
At 100 trials, this distribution is only slightly skewed to the right:
That should be an interesting fact: when n is low, the skew, if any, is more prominent. And when n is high, the distribution is approximately normal. The only exceptions are when the value of p is very low or very high.
This is a big deal. This means when that you have a large number of trials, the distribution of binomial probabilities is nearly normal, with the mean of what you found the mean to be, and standard deviation of what you found the standard deviation to be. Ultimately what you're finding, is the binomial distribution with parameters n and p, this is what makes the binomial look like what it looks like, looks a lot like the normal distribution with that mean and that standard deviation.
It has to be large enough to satisfy two conditions.
This means that you had to be far off of the left-hand side, far enough off the left-hand side, and far enough off the right-hand side. When you had that distribution, it looked normal when you were safely in the middle of the distribution, and not near the very ends. These two conditions have to be satisfied.
This makes looking at a lot of these problems a whole lot easier.
Suppose a baseball player gets a hit 28% of the time when he comes to bat. What’s the probability that he gets over 30 hits in his next 95 at bats?
The old way, you would have to find the probability that he gets exactly 31 hits, plus the probability that he gets exactly 32, all the way up to the probability that he gets exactly 95 hits. That's 65 individual calculations to do.
The new way uses the normal approximation.
Use the mean of 26.6 and a standard deviation of 3.763 to use the normal distribution to find the answer. Both conditions are satisfied, np and nq are both bigger than 10, and so the normal distribution, or the binomial distribution, is going to look very much like this.
Use the normal distribution, calculate out a z-score, and find the probability that way.
This is almost the same as what you got using the binomial calculations.
The normal distribution is a good approximation for the binomial under certain conditions. n has to be large, and p has to be not too extreme, not too high, not too low.
You can use the mean and standard deviation of the binomial as the mean and standard deviation for the normal, and use z-scores to find the probabilities. This simplifies the problem.
Source: This work adapted from Sophia Author Jonathan Osters.
If a random variable has a binomial distribution, the number of trials is sufficiently large, and the probability of success is not too close to 0 or 1, the variable's distribution can be approximately modeled using a normal distribution.