Or

4
Tutorials that teach
Histograms

Take your pick:

Tutorial

Hi, this tutorial covers histograms. So let's take a look at some data to begin with. So the observed daily high temperatures in degrees Fahrenheit of St. Paul, Minnesota from June 14, 2012 to July 8, 2012, a span of 25 days, are listed below. So each of these data values represents the high temperature for that day.

So on the 14th of June, the high temperature was 72 degrees. On the 15th of June, the high temperature was 81 degrees. OK, if we go to the eighth of July, that would be the last data value that was 89 degrees as a high. So right now, we just have raw data. This data was collected from weather.com. And what might be nice to better summarize this data is to make some sort of graph. A good way to display this quantitative data set is with a histogram.

OK, so let's, first of all, look at a distinction between a histogram and a bar graph. These are two types of graphs that are often confused. So I want to make sure that we have a good understanding of what type of graph is used when. So a histogram is used for quantitative data, OK, so your numerical data, whereas a bar graph is used for qualitative data or categorical data.

So what a histogram is, is it's a graph where quantitative data is divided into equal intervals. And the frequency of the interval gives the height of the bar. All right, so let's make a graph of the weather data using a histogram. But before we can make the histogram, the data must be split into intervals. OK? And what we need to do is do a process called binning. And binning is the process of picking the interval size.

So how big do we want our intervals do we want them of width-- 10 degrees, 20 degrees, 5 degrees, 2 degrees? So that's what this process of bidding is. So changing the interval size can significantly change the histogram. When bidding, make sure you don't have too many or too few intervals. OK, so in this case, intervals-- intervals of width 5 degrees would create good-sized bins.

So if we look back at the data, our smallest data value seems to be 72. The largest data value is 102. So that spans 30 degrees. So I think it would make sense to have bins of width 5. And that would give us a pretty good way of displaying that data.

OK, so now let's actually go and categorize the data. So we're going to do that using a frequency table. Again, I have the data reproduced up here. This is same data set as before. So when we make our frequency table, we're going to set this up with two columns. So one column is going to be the temperature intervals. And the second column will be the frequency.

OK, so the first thing we need to look at when we're setting up our intervals is we look for the very smallest value. OK, and the smallest value is the 72. And we also want to look for the largest value which we said was 102. Now, if we're using interval widths of size 5, OK, probably wouldn't make sense to go from 72 to 77. I think it would make more sense to go from 70 to 75.

So my first interval is going to go 72. Now, what I'm going to write is less than 75. So what that means is that I'm going to include 70 in this interval but not 75. OK, so it would go 71, 72, 73, 74 would go in this interval. 75's are going to go in the next interval, which would be 75 to less than 80.

OK, and then I'm just going to go and complete this column of the table. So 85 to less than 90, 90 to less than 95, 95 to less than 101, and 100 to less than 105. OK, and I didn't have any temperatures 105 or greater, so I can stop there. OK, now what I'm going to do is go ahead and fill out the frequency side of my table here.

So what I'm going to look for is all the data values first between 70 and less than 75. So I know the 72 for sure. And then if I scan this, OK, I don't see any more. So that's going to have a frequency of 1. OK, now if I go to 75 to 80, again, I'm going to scan my data. We have a 76, so that's 1, 2, 3. 4 is 79. And that seems to be everything. So this would get a frequency of 4.

OK, and so I would continue doing that until I had all of my frequencies completed here. OK, I did this ahead of time and produced this table. OK, so save time. I won't do everything else. But these are all of the frequency, so 1, 4, 5, 4, 6, 3, 2.

This interval, 90 to 95 because it had a frequency of 6, the most frequent temperature interval was from 90 to 95-- 90 to less than 95. There's six days when it was that temperature. OK, there was three days when it was 95 to 102 and two from 100 to 105.

OK, now that we have our frequency table, now we can go ahead and start making our histogram. OK, so our histogram, I'm going to make on this page. First thing we need are two axes. I like to use a ruler. We're going to display frequency on the y-axis. And we're going to display your temperature intervals on the x-axis.

OK, so the first thing I want to do is label my x-axis. And, again, I'm going of intervals of width 5 starting at 70 and counting up to 100. OK, so what I like to do is just start 70 a little bit past 0. Sometimes what you'll see is a little mark like that, which means that you're skipping from zero to another value. It's optional though.

So let's start here. So that'll be 70, 75, 80, 85, 90, 95, 100, 105. And we'll just go one more out just for a little more space. OK, and then, so 70. And then I'm just going to mark every other one. OK, so that represents my temperature. OK, and then I'm going to mark my y-axis as frequency. So let's label that as frequency first.

OK, and now we need to look for my highest frequency. So my highest frequency ended up being 6. So I need to get up to 6. And I don't have to go any higher than 6. So let's go ahead and do that now. So 1, 2, 3, 4, 5, 6. I'll go up to 7, but I won't need it. OK, and then, again, I'll just mark every other.

OK, now what I need to do is start drawing in my bars. So from 70 to less than 75, it had a frequency of 1. So I'm going to draw my bar going up to a frequency of 1. OK, so let's go ahead and do that. So I'm going to start by going up to 1 up to 1, drawing and across like so. OK, so that would represents one day being between 70 and less than 75.

So even though 75 isn't included in this bar, I'm going to still go to 75 there. OK, from 75 to less than 80, that had a frequency of 4. So I'm going to draw that up to 4. OK, and like so. Notice the bars touch. The only time you're going to have a gap when you're making a histogram is when a certain interval did not have a frequency or had a frequency of 0.

OK, so the next one now is going to go up to 5. So I'm going to mark it here, go up to 5 here. OK, do the top. It's about here. OK, like so. So this gives me a pretty good picture of what was going on in terms of temperature in this 25-day period. OK, again, we can see that between 70 and in the 70s, that wasn't very frequent.

There's nothing over 105. Again, we can see 90 to 95 was the most frequent. And, really, most of these middle intervals were pretty similar in terms of frequencies. So that's a good way of displaying, again, your temperature data. So that has been the tutorial on histograms. Thanks for watching.