A data set is not just a list of numbers or values; there is some context associated with it, usually the units, or what type of measurement is used, or perhaps some kind of descriptor.
A variable is any characteristic of the individual members of the population that can be measured. A variable of interest can take on different values for each member of the population.
EXAMPLEFor example, suppose we are interested in the variable of height for a group of people. This could vary from person to person because people have different heights.
A distribution is a way to visually show how many times a variable takes a certain value; it is the values the variable takes and how often they show up. There are many kinds of distributions:
|Types of Distributions||Description||Examples|
|Frequency tables||Can visually show how often a variable takes on a certain value|
|Qualitative Data||The variables in these distributions are categories.||
|Quantitative Data||The variables in these distributions are measures of values or counts.||
|Mathematical Rules||Can visually show variables through a certain pattern and are not strictly data-driven.||
Why are there so many different kinds of distributions? The point of a distribution is to make the data--possibly a large data set that is unwieldy--simpler to understand. You want to make it easy for yourself and your readers to understand. Therefore, different kinds of distributions will lend themselves better to different types of data sets.
EXAMPLEA dot plot is better for data that are close together and doesn't have a lot of values, whereas certain other distributions are better for larger data sets. A histogram is better than a dot plot when the data is very spread out.
You can determine which kind of distribution to use based on the kind of data you have.
Source: Adapted from Sophia tutorial by Jonathan Osters.