Source: Image of table created by Jonathan Osters
In this tutorial, you're going to learn about cumulative frequency. Now, you should already know about frequency, which means how often a data value occurs. But cumulative, that's a word you've probably heard before, but maybe not in this context.
Cumulative means everything that's come before. So if a teacher says that a test is cumulative, that means that it's going to cover everything that you've learned that year. In this context, cumulative frequencies involve separating the data into bins, just like we have before, and determining how many observations fall within or below that bin.
So let's take a look at an example. This is the distribution of temperatures by '10s for Chanhassen, Minnesota in the year 2009. There were three days that were between 10 below zero and one below zero. There were eight days that were between zero and nine degrees for the high temperature.
We can look at cumulative frequencies by saying, OK, how many days were at or below nine degrees for the high temperature? Well, there were eight days that fell within the zero to nine bin, and three that fell below it. And so that's a total of 11.
Obviously, for the first bin, there are three that fell within or below this category. For the second category, like we just said, there was 11. For the third category, how many days were at or below this bin? Well, 25 were in that bin and 11 were below it, which means it's a total of 36.
We can continue this throughout the entire chart. Unsurprisingly, we get 365 total days. All 365 days of the year were at or below 99 degrees in Chanhassen that year.
Oftentimes, it's a good thing to consider relative cumulative frequencies, which is the percent of observations that fall in or below a certain bin. We've talked about relative frequency before, but not relative cumulative frequency. Fortunately, it's calculated exactly the same way as frequency was-- or sorry, relative frequency was.
You divide each value by 365. This means that 0.008 of the data, about 1/100 of the data, fell in or below this bucket. Dividing 11 by 365 gives you about 0.03. And continuing on the rest of the chart, we get these values.
Notice the final value of 1.000 means that all of the values, 100%, fell at or below this bin, which is the same as we understood with the 365 there. Graphically, this information can be presented in something that's called an ogive. It's also called a relative cumulative frequency graph.
It sounds like a big name, but it's just a graph of the relative, which means divided by in this case 365, cumulative, which means it's at or below these bins, the frequency. So the relative cumulative frequency graph, called an ogive. Sometimes it's called a percentile graph. It's a line chart that uses these bins and the relative cumulative frequencies to show how many values were at or below.
We use the left hand edge of the bin. The reason that we do this is because by the time we've gotten to negative 10 degrees going left to right on this number line, we haven't encountered any of the days of the year yet. However, once we get to zero degrees, we've encountered three of the days, or this much relative cumulative frequency.
By the time we get to 100 degrees, we will have encountered every single day. Every day will have been at some point in or below that bin. Ogives are increasing from left to right. If there's no data in a particular bucket, you get a flat line, no increase.
And so to recap, cumulative frequency and relative cumulative frequency show the number, or if it's relative cumulative frequency, the percent of the data that fall in or below a certain bin of data. This is a nice way to show how certain values relate to other values. How do they relate to the whole?
Is a day, for instance, that is 70 degrees and Chanhassen considered a very hot day? How does it compare to the rest of the days of the year? The relative frequency-- sorry, relative cumulative frequency-- can tell us that. And so the terms that we've used are cumulative frequency and relative cumulative frequency. Good luck, and we'll see you next time.
The percent of data points that fall within or below a given bin of data.
The number of data points that fall within or below a given bin of data.