In this tutorial, you're going to learn about stem-and-leaf plots. Sounds like a funny name, but they actually serve a really good purpose, and they're very versatile, too. Many quantitative data sets can be displayed in stem-and-leaf plots, and this is one of them. These are the 50 states in the United States, and these numbers are the percent of the public-- sorry. Percent of college students that are enrolled in public colleges. So in this state, 95% of its college students are in public schools. Whereas in this state, only 52% are.
To create a stem-and-leaf plot, we're going to first decide on a natural classification. Here, 10 seems like an obvious choice. Those are going to be our bins. We should also make a choice for our bins based on some digit. In this case, we'll go by the tens digit. If they were hundreds, maybe we'd go by the hundreds digit, but we could still go by the tens if we wanted to.
So what we're going to do is we're going to create quote, unquote stems. These are going to be the stems based on the bins we selected. The 9 means that this is going to be a state with 90 or something in the 90s percent of their students at public school, as opposed to this one, where it's some number in the 80s. We'll write them in order, least to greatest or greatest to least, up or down the page-- it doesn't really matter-- to the left of some vertical line.
Next, we're going to list the values by their ones digit ascending away from those stems. And those are going to be considered the leaves. When it's done, it will look like this. This means that 43% of the students in one state were at a public school. This means that in a state, there was 52. There was also 55. And 55. And 56. All of these numbers are from the 50s. All of these are from the 60s. All of these are from the 70s.
And notice, if a value appears more than once, we list it more than once. 80 appears three times, and so we list it, three 0s here. Notice, these numbers are ascending away from the stem.
Also notice that the 80s have more than any other-- any other grouping. In fact, they have more than twice as much as any other single grouping. This just sort of looks a little strange. And I wonder if there's anything we can do about that. As it turns out, there is, and we'll get to it in a minute.
But there's one more important feature of a stem-and-leaf plot. You need to be able to tell someone who's looking at this what they're looking at. So we know that this 6 bar 2 means there's a state that has 62% of its students going to public colleges. We'll tell the reader that by saying, in a key, 4 bar 3 means 43%. We're telling our reader how these numbers should be interpreted.
Now suppose that we decided that tens was too wide of a bin. Notice, there's a lot of 80s here. Four 81s, three 82s. There's a lot of 80s. What we could do is break it down to by fives, and then write two 8s, a low 8 and a high 8. 85 to 89 for the high, 80 to 84 for the low.
But if we're going to split one bucket, we need to split them all. It would look like this if you split the stems into lows and highs. Low 40s, high 40s. Low 50s, high 50s. Low 60s, high 60s. Et cetera. This, to me, is a little bit more of an appropriate visual than the first one.
Take a look at this set of GPAs, high school GPAs for these students. Pause the video and make a stem-and-leaf plot of these GPAs. What you should have come up with is something like this.
Now if you didn't come up with this, that's fine. I'll get to another one that you may have come up with in a minute. In my case, I say 2 bar 0 means the GPA rounds to 2.0. That's Jim. Jim's GPA rounds to 2.0. Even though it starts with a 1, he's not the 1.9. Isaiah at 1.94 is the 1.9. Jim is the 2.0. Amy at 2.95 is the 3.0. So I rounded these. And that's a legitimate thing to do.
If yours doesn't look like mine, you may have created something that looks like this, two-digit leaves. And that's fine. 4.00. 3.12 is a real GPA from the list. In fact, it's Tyler's GPA. In this case, you need a new key. 2 bar 23 means a GPA of 2.23. So this is a completely legitimate way to do it.
However, you need to visually separate these numbers. We didn't necessarily have to in some of our previous examples, like here. We didn't have to visually separate those numbers. We could put them all right next to each other. Whereas here, we should visually separate the 12 from the 24 from the 41. But this is still legitimate.
Or suppose I was interested in the differences between girls' GPAs, like Amy, Holly, Jenny, Katherine, and et cetera, with the boys' GPAs. I could compare those by saying, well, I'll have one group of leaves go to the right of the stem and another group of leaves go to the left of the stem. It would look like this.
Again, I'm going back to rounding. And I say 3 bar 1 means the GPA rounds to 3.1. Here, the girls' GPAs are on the left. The boys' GPAs are on the right. This allows us to compare the distributions of boys' GPAs to girls' GPAs. And what we see is that girls' GPAs are typically a little bit higher.
So ultimately, the question is why would I use a stem-and-leaf plot when our other graphical displays, like histograms or dot plots that I might be able to use. Well, there's a couple of advantages. One, it's like a dot plot in that all the data points can be seen. But it's better than a dot plot because it works over a larger range. But all the data points can be seen. And it's convenient if the data set's fairly small. You'll notice, the one that I did at the beginning had 50 data values, and we were really pushing it in terms of being able to see all the data values all at once. The drawback is it gets difficult to create if the data set is too big. Like I said, the one data set that we had at the beginning with 50 data values was really on the bubble in terms of it being a useful display for that data set.
And so to recap, stem-and-leaf plots are very useful displays of quantitative data. We saw lots and lots of different ways that we can utilize them. They're very versatile. We start by creating bins from natural numerical breaks so that the reader can identify the numbers. And we make a key there. To make the plot clearer, you can do lots of things. You can split stems, like we did with the college percent data. You can round the values, like I did with the GPAs. Or you can create leaves with double digits. Or you can compare across categories using a back to back plot. And so the terms that we used were stem-and-leaf plot, sometimes called a stem plot. And then we talked about stems and leaves. Good luck, and we'll see you next time.
A distribution of quantitative data that shows natural numerical breaks in the data as categories called "stems" and individual values as "leaves."
Two stem-and-leaf plots on the same set of stems. This allows us to compare the distributions of two different categories.