Source: Image of graphs and Pool balls created by Jonathan Osters; Dot Plot, Public Domain http://commons.wikimedia.org/wiki/File:Dotplot_of_random_values_2.png Normal Distribution Graph created from public domain file http://commons.wikimedia.org/wiki/File:Dotplot_of_random_values_2.png Poisson Distribution, Creative Common by Skbkekas http://commons.wikimedia.org/wiki/File:Poisson_pmf.svg
This tutorial is going to talk to you about distributions. Now, distribution is a way to visually show how many times a variable takes a certain value, so it's the values the variable takes and how often. So to that end, one way of showing a distribution is in a frequency table. If we're talking about these pool balls here, two are yellow, two are blue, two are red, et cetera, all the way down to one of them is black. And this can visually show how often the variable color takes the value of yellow. So, this is a legitimate distribution by that definition.
We can see in some distributions, like pie charts and bar graphs, that these are distributions, but they're for qualitative data. The variable values are categories. Both of these distributions here, the pie chart and the bar graph, in this case are showing the same data set.
There are also distributions that we can use for quantitative data. This right here deals with just stacking dots on top of each other. It's a very simple plot to make, and it's called a dot plot. This one here is called a histogram. And don't worry about the terminology. All of these will be talked about in later tutorials. This one is called a stem and leaf plot, and this one is called a time series.
And finally, a distribution might be described by some mathematical rule, like, for instance, the height of people might be described by this distribution that's single peak. It's a mathematical rule. It's called the normal distribution. Or you might have something that follows something called the Poisson distribution. Both of these are distributions that we'll learn about in different tutorials, but it's good to know that there are distributions that do, in fact, follow mathematical rules, and are not strictly data driven.
So, why are there so many different kinds of distributions? We just looked back through pie charts, bar graphs, dot plots, histograms, stem and leaf plots, and time series. I just glazed over them, but we saw that there were lots of different kinds.
Why are there so many different kinds? The point of a distribution is to make the data, which can be a large data set, sometimes, and possibly unwieldy, simpler to understand. We want to make it easy on ourselves. And there's many different kinds because different distributions lend themselves better to different data sets.
For instance, a dot plot, which was that very first one on the last page, is better for data that are close together and ones that don't have a lot of values, whereas certain other distributions are better for larger data sets. A histogram is better than a dot plot when the data's very spread out. Each distribution has its own situation for which it's ideal. The data will tell us which distribution we would like to use.
And so, to recap-- distribution-- there's a lot of them. And their point on all of them is to visually display your data so the reader can take the large data set and succinctly understand what's going on with it. It's a quick summary. Some distributions contain every observation, every data point, and some only contain summaries. All those distributions that we talked about have their own tutorial, so good luck. Watch those. We'll see you next time.
A way to visually display the values a variable takes and how often it takes each value.