Source: Image of graph created by Ryan Backman
Hi, this tutorial covers the normal distribution. So let's start by just taking a look at the following dot plots. So we have three of them here. Each of them have a different sample size, so we're just measuring some variable. So we have a sample size of 100, 1,000, and 10,000 so we want to see, what are some similarities? What are some differences?
Well, it seems that all of the distributions are relatively symmetric. We can draw in a pretty symmetric shape over the top of each of them, so it seems like the mean and median are in similar locations. Again, they're symmetric. They almost have that bell shape to it, shaped kind of like a church bell. And the major difference we can see is that as the sample size got bigger, the distributions became more and more of this bell shape.
So what all of those distributions look like are what's called a normal or a Gaussian distribution. Gaussian-- there was a pretty famous mathematician named Carl Friedrich Gauss who did significant work with the normal distribution. So what a normal distribution is, is a single peaked, bell shape, symmetric distribution where the mean, median, and mode are all in the same place. So the normal distribution is a very important distribution in statistics, encountered in many different types of data.
Now, if the mean and standard deviation of a normal variable are known, the variables' distribution can be completely characterized. So let's take a look at an example and take a look at what the distribution might look like. So the distribution of the duration of human pregnancies is approximately normal with the mean of mu equals 270 days and a standard deviation of sigma equals 15 days. Notice we're using mu and sigma because we're dealing with now the population of humans.
So we know that the distribution is approximately normal, so the way I'm going to draw it is I'm just going to start with the number line and I'm going to draw in that normal distribution or a normal curve. Then what happens is usually I'll draw down a little dashed line, and that is where the mean is going to be. The mean is always right in the middle there. I'll let x be the length of one of the pregnancies.
Now, I know that one standard deviation is 15 days. So what I'm going to do on this distribution is I'm going to find what I think is about the inflection point. So the inflection point is where this thing starts to bend more and more. So that seems to be here at about 285. And then I want to go the same distance below-- a standard deviation below the mean, so where this distance is about the same as this distance, and this is going to end up being 255.
So remember, a definition of standard deviation is the typical distance from the mean. So it's typical that a pregnancy would last anywhere between 255 days to 285 days. We can continue going another standard deviation above the mean. So 300 now is at two standard deviations. If we want another standard deviation below the mean, this would end up being 240.
So if you could imagine dots underneath this curve where we sampled a bunch of different people, we could see that you'd have the biggest cluster right around the mean. And as it started to get larger, as the amount of time got larger and larger, the stacks of dots would get smaller and smaller, same thing as you went this way. If we even went out to maybe 315 which would be three standard deviations above the mean, our stacks of dots are going to be very small. If we went below 240, again, the stacks of dots would be very small.
So this would be a good picture and a good way to characterize this distribution of the lengths of human pregnancies, and that, again, using a normal distribution. So that has been the tutorial on normal distributions. Thanks for watching.