Source: Soccer Ball, Creative Commons: http://en.wikipedia.org/wiki/File:Soccer_ball.svg Basketball Ball: http://www.clker.com/clipart-basketball-8.html Graphs created by the author
This tutorial is going to show you some misleading graphics. The point is that certain times people create graphics that make you want to think a certain way. They're trying to sway you to believe something. And they'll distort or mislead you using graphics to try and get you to believe a certain way.
Let's take a look. We'll start with one that is a pretty obvious visual distortion. So suppose we had three buddies, Paul, Hector, and Juan that were looking at the number baseball cards they had. This was probably drawn by Juan. And we'll get to why we think so in a second.
But you'll notice that the y-axis has unequal scaling. This apparently represents 50, whereas this apparently represents 70. But they're the same size gap. And even more ludicrous, this apparently represents 5. But it's the same gap. So some graphs are drawn with unequal scales so as to represent things disproportionately.
There are some that are more tricky. So let's take a look at the preferred brand of dish soap. So let's say that 15 people selected Brand A. And 8 people selected B. And 20 people selected C.
Now, both of these graphs can show us that. But which one is more accurate? As it turns out, the one on the left is more accurate.
The one on the right clearly was used to exaggerate the difference between B and C. Notice how much taller C is than B in the graph in the right as opposed to the graph on the left. It's only about twice as tall as B here, whereas it's about five times taller than B here.
If you use a graph like that, a bar graph or a histogram, it's a good idea to start the vertical axis at zero, unless there's a good reason not to start there. For instance, if you were tracking the change in home prices starting from zero and going all the way up to say 300,000 won't show a big difference between 300,000 and 280,000. However, to the homeowner, that drop in $20,000 is significant.
Graphs beginning anywhere besides zero have a tendency to exaggerate differences. But graphs starting at zero sometimes can minimize very real differences. So it's important to know what you're trying to emphasize.
Let's take another look. Suppose a class of 18 students was asked their favorite sport and some kids said soccer and some kids said baseball. Three said soccer. Five said baseball. And the remaining 10 said basketball.
Suppose a student drew this graph. Now, what's wrong with this graph? Well, twice as many students chose basketball. And look. It goes up to 10. And five students chose baseball. And look. It goes up to 5.
So what's the problem? Well, the problem is that while the height of the basketball is twice the height of the baseball, it's also twice the width. And so something that's twice the height and twice the width, if you compare the areas taken up by the basketball and the baseball, it's about four times as much area taken up by the basketball. In fact, you can even see that more clearly by putting the box that represented the baseball inside the box that represented the basketball. And it's clearly only about 1/4 the size.
To make matters worse, technology has introduced us to lots of different misleading graphs. This is kind of a ridiculous graph. It's meant to show across four different cities-- I don't know what it's supposed to be measuring, frankly. There's no label.
It's supposed to show across time how these different markets are doing. Maybe this is a store. I can't even tell what this is supposed to be measuring or what these numbers zero through 100 actually mean.
So the additional problem comes from what are we comparing here? Are we comparing the height of these things, the volume of these things. For instance, the cone to the cylinder, a cone only has about 1/3 the volume of a cylinder with the same base and height. And so it doesn't really make sense to be comparing cones and pyramids to cylinders and boxes. It just makes no sense.
Also, because this graph is three-dimensional, there's no way to easily compare heights. Is this cone supposed to be taller, shorter, or the same height as the cylinder behind it? I can't tell. And it's going to be very hard for anyone to tell. This is an incredibly misleading graphic.
To make matters worse, technology, like certain spreadsheet programs, have allowed you to look at all of these different graphs. Now, cones, cylinders, and all of these being three-dimensional, all that these do is distort your data. The best choice if you're going to use bar graphs would be the simple ones, the simple two-dimensional ones here at the top.
So taking a look, this is not a good choice. The better choice, being that you're trying to compare across time, would be something like a time series. January, February, March, April, May, June, July. Again, I'm not even 100% sure that these are where they're supposed to be, because I couldn't compare the heights easily. I had to guess at what these values were going to be.
But this is a lot more useful to anyone reading it then this would be. This is flashier. But the information gets hidden.
And so to recap, graphical displays can be manipulated in many different ways. If you use an inappropriate scale, you can exaggerate the differences. Or you can use areas to make differences seem larger than they actually are. Or you can use three-dimensional displays that aren't really clear at all.
As statisticians, our goal is to make the complicated simple, to make the data easy to understand. We're trying to clean up a messy world. Our goal is clarity. And all these misleading graphics don't do that.
So the terms that we used were misleading graphics; perceptual distortion, which was the area issue that we had with the baseball and basketball example; and scales. Where do you start your graph? Do you start it at zero? Or do you start it somewhere else?
Good luck. And we'll see you next time.
Source: SOURCE: SOCCERBALL, PUBLIC DOMAIN HTTP://OPENCLIPART.ORG/DETAIL/167872/SOCCERBALL-NOSHADOW-BY-RDURIS, BASEBALL, PUBLIC DOMAIN HTTP://OPENCLIPART.ORG/DETAIL/167870/BASEBALL-NOSHADOW-BY-RDURIS BASKETBALL, PUBLIC DOMAIN HTTP://OPENCLIPART.ORG/DETAIL/167867/BASKETBALL-NOSHADOW-BY-RDURIS GRAPHS CREATED BY JONATHAN OSTERS
This tutorial is going to teach you about pictographs. Pictographs are plots that show up in newspapers a lot, because they're very visually appealing. What they will do is they'll use pictures instead of dots or bins.
So suppose that a class of 17 students was asked their favorite sport. One student might have drawn this graph to illustrate the results. The three soccer balls meant that three students said that soccer was their favorite sport. The five baseballs means that five students said baseball, and the nine basketballs means that nine students said basketball. This is a completely valid graph. It's very analogous to a dot plot, except we're using pictures instead of dots.
One student might have done this. So another student might have created this dot plot. Notice this looks a little funny, because there's half of a soccer ball, half of a baseball here, and half of a basketball here. But notice this student went on to say that every basketball, soccer ball, or baseball actually counts as two students. So this is one ball, which is two students, and another half a ball, which is half of two more students. That makes three students, which is what the other student's pictograph looked like. This would be five students saying baseball, and nine students saying basketball. This is the same as the data that was presented by the other student.
And a pictograph is going to use pictures instead of a scale or dots. And they'll often appear in the newspaper, because they are so pleasing to the eye. The only problem is sometimes they can be misleading. Let's look at an example.
So in this figure, the USA had the most, and Russia had the next most. But if you take a look a little closer, this is far and away higher than 1,000. This is nearly 2,000. And so it's not really clear what one medal icon actually means in terms of relative size.
What we see is that if you divide the 1975 by 6 medal icons, 1 medal icon actually counts for 329 medals for the USA, but only 200 medals for Russia. In fact, none of these are very consistent. What we should have done is chosen a medal icon to represent a certain number of medals, and then just extended the ribbon out that far.
A better-looking pictograph would be something like this. I've chosen the medal icon to be 100 medals, and the results when we draw out all the medals will be rounded to the nearest 100. So I've lined up 20 medals for the USA, because the nearest 100 would be 2,000. Russia, then, would have half as many medal icons to represent their 999 medals. This shows, much more accurately, how many more Olympic medals the USA has than the other countries.
And so to recap, pictographs will use graphics instead of scales or dots to display differences in a distribution. They are legitimate graphs, and they're not used too much outside of newspapers and magazines. Because they are so visually appealing, they show up in these a lot.
You do need to take care, though, to make sure that they're not misleading. You want your pictures to actually represent the same amount in each category, and you don't want to visually distort the picture either. So we talked about pictographs in this tutorial. Good luck, and we'll see you next time.