Source: All images created by Dan Laub
Hi. Dan Laub here, and in this lesson we're going to talk about graphing data. Before we started with that, the objective for this lesson will to be able to identify and interpret different types of graphical representations. And so as we've covered in previous lessons, we realize there's eight different steps to the experimental method. Well, step eight has to do with largely sharing our information with others, and that's where graphing really comes in handy.
Some of the most commonly used graphs and statistics are bar graphs, histograms, line graphs, and scatter plots. And we're going to cover all of them today. So to give you a brief example to start out with, let's say we were interested in seeing how well the stock market has performed over the last year.
Why might we find it easier to look at a graph, instead of the actual data? Well, it's one thing to see numerical values in front of us, and certainly most anyone could calculate how well the S&P 500, for instance, performed if they put their mind to it. However, seeing the data presented visually, like the graph you see right here, gives us a better sense of trends than actual raw numbers might.
If data is represented on the wrong graph, it may convey information in a way that's not useful, confusing, or even misleading to those reading it. Now, regarding the S&P 500, it's a major stock index, and it fluctuates every single day that the market's open. The values you see on the graph here start on December 1st of 2014, and continue to November 30th of 2015. So that's one full year.
And you can clearly see by looking at the line how the trend has changed. And so it was up for a substantial part of the year, then it dropped suddenly around October, and then it's bounced back a little bit. Well, if you have the numbers in front of you, would you necessarily be able to tell that? Probably not. However, there's not necessarily always going to be all the graphs being suitable for one particular variable, and that's what we're going to cover in this lesson today.
So the lesson itself will explore the properties to each type of graph, and explain how given data types fit each one. Remember that there's a difference between nominal and ordinal data, and what they actually represent. Looking at data visually, bar graphs are used for nominal or ordinal variables. With bar graphs, the different values appear on the horizontal axis.
This axis is labeled with the units of the variable, and for each value. And the height of the bar indicates how often each different value appears in the data. The vertical axis is labeled with how often values can appear. Looking at a bar graph, you can see that the columns do not touch one another. The example I want to to use here has to do with blood type.
So there are common blood types that exist through all human beings, and to just make it, simple gonna be four key types I want to look at here-- A, B, AB, and O. Now, what if we were to take a survey of people, and figure out how many people have a particular blood type? And so you see here in this bar graph how these values are actually arranged on the horizontal axis.
I've got them in alphabetical order here-- A, B, AB, and O. The axis is labeled down here, same blood type. And you see how high the bars re for each one of them. So it's pretty obvious that the most common blood types are going to be either A or O. We look at the number of people that are found on the vertical axis over here. And so in this instance, we have 43 people that we took a survey from.
17 have type A, 5 have type B, 3 have AB, and 18 have type O. And that's indicated on the bar. Recall from previous lessons that there are two types of scales of measurement regarding interval and ratio variables. A variable has an interval scale if it provides numbers so that the difference between two values can be measured. And the difference between any two values can always be determined the same way.
A variable has a ratio scale if it is an interval variable where the only difference is that a value of 0 dosn't mean that something does not exist. Much like bar graphs show data from nominal and ordinal variables, histograms show the data for interval or ratio variables. With histograms, the different values appear on the horizontal axis, while the horizontal axis is labeled with the units of the variable.
The values are divided into ranges, meaning that we divide the data groups with each group having a value where it starts, and a value where it ends. For each range, the height of the bar indicates how many values fall on each range. The vertical axis is labeled with how often values can appear in each range, and unlike a bar graph, the bars on a histogram do touch one another.
For an example of a histogram, let's take a look at the graph you see in front of you. And this graph is going to show you youth soccer players that are sorted according to age group. And so we're going to start at the low end, at the age of four, and work our way up to the higher end, at the age of 14. And each individual bar that you see here represents the number of players that fall between a particular age.
So for those between the ages of 4 and 6, we see there are five players. Between the ages of 6 and 8 we see there are 10. Between the ages of 8 and 10, we see that there are 22 players. Between the ages of 10 and 12, we notice that there are 19 players. And between the ages of 12 and 14 we notice that there are 11 players. But what does this tell us?
This tells us how the height of each bar is going to tell us how many people fall into each individual range, and the vertical axis is what tells us that. So the vertical axis shows the number of youths that are on soccer teams in that particular age group. Whereas the horizontal axis is going to show us the different ages that we consider in this case.
A third type of graph that we're going to look at is a line graph, and these are used to keep track of an interval or a ratio variable over a period of time. With line graphs, much like the example I gave you earlier with the S&P 500, the time when each value was observed appears on the horizontal axis. This axis is labeled with units of time, such as days, or weeks.
The different values of the variable appear on the vertical axis, and the vertical axis is labeled with the units of the variable. Each value is plotted as a point, and the points are connected by lines. For example, what if we're looking at a particular ski resort, and we're interested in figuring out how much snow was on the ground throughout the first, say, 15 days of February?
And so what you see on the screen here is an actual graph of what that might look like. So the horizontal axis is labeled with the dates, and each date, starting with the beginning of February, working through the 15th, you see going left to right on the horizontal axis. The vertical axis is labeled in terms of how many inches of snow is on the ground.
Each point we see here in the graph represents a particular value at a particular time on a particular day. When looking at the graph, notice a few of the points here. We start out on the 1st of February, just under 20 inches. And on February 3rd, we see the value there is 20 inches. And as we move a little forward as the line continues to increase, we see that on the 10th of February there were roughly 56 inches of snow on the ground.
One of the goals of statistics is to determine if a cause and effect relationship exists between two variables. A scatter plot shows a possible relationship between two variables, but beware that a trend seen in the scatter plot does not necessarily mean that one variable causes another. This is a common misperception that you need to be aware of, as drawing such a conclusion can lead to faulty analysis.
For example, it's often thought that the amount of work experience an employee has contributes to their income. While this scatter plot does show a positive relationship between years of experience and annual income, it does not necessarily prove that one causes the other. Scatter plots like this are used for two related interval or ratio variables.
By having two related variables for each observation, two numbers are recorded, the first number for the first variable, the second number for the second variable. With scatter plots, the different values for one variable appear on the horizontal axis. This axis is labeled with the units of the first variable, with with the different values for the other variable appearing on the vertical axis.
The vertical axis is labeled with the units of the second variable, and for each of the two related numbers, a point is plotted. So to use the example again of income experience, look at the scatter plot we see on the screen in front of you. We're looking at information regarding 22 different random employees that were selected, and we look at the value on the horizontal axis of how many years of work experience they've got, ranging anywhere from 0 all the way up to 18.
And we look at their income in terms of thousands of dollars per year, which ranges anywhere from a low of $19,000 a year to a high of $122,000 a year. And you'll notice by looking at these points, you'll see how the trend generally tends to appear. That there is a positive relationship that exists between how much work experience an individual has, and what their actual income is.
In this next example, let's use a hypothetical circumstance about a business. And they track the number of customers that walked in their door on a given day of the month. In a case like this, we would clearly expect there to probably not be any kind of correlation that existed between these two variables, and the scatter plot you see here would pretty much indicate that it.
And so as we start at the first of the month on the left hand side and work our way toward the 30th on the right, you'll notice the points are all over the place. And so if we point out a couple different examples here, you'll notice that on the 15th of the month they had seven customers walk in the door, whereas on the 22nd of the month, they had 43 customers walk in the door.
The objective for this lesson was to be able to identify and interpret different types of graphical representations, and we did. We covered four different types of graphs-- bar graph, histogram, line graph, and the scatter plot. So again, my name is Dan Laub, and hopefully you got some value from this lesson.