Source: Girl, Public Domain http://tinyurl.com/c8d6kkk Charts and Graphs created by Joseph Gearin
This tutorial is going to regard practical concerns regarding categories when you're dealing with qualitative data. So let's take a look at a fairly straightforward example, something like hair color. Assuming people are using their natural color, we can typically break it up into probably about these six categories.
So qualitative data is split into categories. But sometimes it's not so obvious what category people belong in. So what about this cutie right here? Does she belong in the blonde category because she has blonde hair? But she also pretty clearly has some brown hair. So do we put her in the brown category because she has some brown hair, the blonde category because she has some blonde hair, or do we add a new category for like dirty blonde or brown blonde?
The problem is, if we start doing it that way, if we start going down that road, we can end up with something like this, where we have the original six categories. But now we have black brown and brown blonde and brown gray and all of these other things. And even then, it's still necessarily going to be hard to categorize people. Because what's the difference between brown and black brown? So it's going to be very difficult to try and figure this out.
So the category in which people will be placed is subjective. But also the number of categories itself is a concern because they can start to proliferate out of control if we don't put a cap on them. And so the idea that we're going to go with is we're going to try and take the Goldilocks approach, not too many, not too few categories.
Too many categories, and pie charts and bar graphs are going to be overwhelming. Also, what's going to happen is a lot of the times these pie slices are going to get really, really thin. And these bars are going to get really, really small. You're going to have lots of options but not a lot of data points within each bin.
Conversely, we can have the opposite problem, where if we have too few categories and they aren't that informative anymore. And I have a good amount in this category and a little bit more than half in this category. But maybe this isn't as informative as it would have been with more categories. So if we can, we're going to go with not too many, not too few categories. We want to go with something that's just right.
And so to recap, sometimes we don't have an objective basis for assigning categories with our qualitative data. And we can get confused that way. So we have to try and put the brakes on and stop the over proliferation of categories and just say that we're going to put people into a category even if they don't necessarily fit neatly into it. Because we don't want to proliferate the categories too much.
So sometimes we have to make those tough decisions. So we talked about categories, and they're also called classes. Good luck, and we'll see you next time.
(0:00-1:50) Categories can proliferate out of control - Hair Color
(1:51-3:00) The "Goldilocks" Approach to Categories
The ways we choose to separate the data by differentiating characteristics. Too many or too few categories can be problematic.