In this tutorial, you're going to learn about cumulative frequency. This will cover:
You probably already know about frequency, which is how we refer to how often a data value occurs. Cumulative means everything that's come before. So cumulative frequency is the collected frequency of data points.
Cumulative Frequency
The number of data points that fall within or below a given bin of data.
A teacher says that a test is cumulative, that means that it's going to cover everything that you've learned that year.
In this context, cumulative frequencies involve separating the data into bins, just like we have before, and determining how many observations fall within or below that bin.
This is the distribution of temperatures by 10s for Chanhassen, Minnesota in the year 2009. There were three days that were between 10℉ below zero and 1℉ below zero. There were eight days that were between 0℉ and 9℉ for the high temperature.
With this information about the distribution of temperatures, you can determine cumulative frequencies by asking, “how many days were at or below 9℉ for the high temperature?” Well, there were eight days that fell within the zero to nine bin, and three that fell below it. And so that's a total of 11.
Obviously, for the first bin, there were three days that fell within or below this category. For the second category, there were 11.
For the third category, how many days were at or below this bin?
Well, 25 were in that bin and 11 were below it, which means it's a total of 36.
You can continue this throughout the entire chart. Unsurprisingly, you will get 365 total days. All 365 days of the year were at or below 99 degrees in Chanhassen that year.
Oftentimes, it's a good thing to consider relative cumulative frequencies, which is the percent of observations that fall in or below a certain bin.
Relative Cumulative Frequency
The percent of data points that fall within or below a given bin of data.
You may have encountered relative frequency before, but not relative cumulative frequency. Fortunately, it's calculated exactly the same way as relative frequency is.
In order to to determine relative cumulative frequency, divide each value by 365. This means that 0.008 of the data, about 1/100 of the data, fell in or below this bucket. Dividing 11 by 365 gives you about 0.03. And continuing on the rest of the chart, we get these values.
The overarching main point here is to divide each cumulative by the the total.
In the previous graph, you may notice that the final value of 1.000 means that all of the values, 100%, fell at or below this bin, which is the same as we understood with the 365 there. Graphically, this information can be presented in something that's called an ogive. It's also called a relative cumulative frequency graph or sometimes a percentile graph. It's a line chart that uses these bins and the relative cumulative frequencies to show how many values were at or below.
Use the left hand edge of the bin, because by the time you’ve gotten to negative 10 degrees going left to right on this number line, you haven't encountered any of the days of the year yet. However, once you get to zero degrees, you’ve encountered three of the days, which is a certain amount of relative cumulative frequency. By the time you get to 100 degrees, you will have encountered every single day. Every day will have been at some point in or below that bin.
Ogives are increasing from left to right. If there's no data in a particular bucket, you get a flat line, no increase.
Cumulative frequency and relative cumulative frequency show the number, or if it's relative cumulative frequency, the percent of the data that fall in or below a certain bin of data. We might also refer to this as an ogive. This is a nice way to show how certain values relate to other values.
How do they relate to the whole? Is a day, for instance, that is 70 degrees in Chanhassen considered a very hot day? How does it compare to the rest of the days of the year? The relative cumulative frequency can answer those questions.
Thank you and good luck!
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS
The percent of data points that fall within or below a given bin of data.
The number of data points that fall within or below a given bin of data.