2
Tutorials that teach
Representing How Skewed Data is Distributed

Take your pick:

Tutorial

Source: All images created by Dan Laub

[MUSIC PLAYING] Hi. Dan Laub here. And in this lesson, we're going to discuss representing how skewed data is distributed. So before we get started, let's discuss the objective for this lesson. By the end of the lesson, you should be able to identify the mean, median, and mode on a skewed distribution.

So let's get started. If you recall from a previous lesson, normal distributions have density curves that are symmetric and bell shaped and the mean, median, and mode of the normal distribution are all the same and equal to the center value of the density curve. However, there are situations where, unlike normal distributions, a distribution may not be symmetrical. We call these distributions "skewed distributions."

The first distribution that you see here is skewed and reflects a situation in which there would be a lot of values concentrated toward the lower end of the distribution relative to the higher end, which is typically how the housing market is distributed. The second distribution is skewed and reflects a situation in which there will be a lot of values concentrated toward the higher end of the distribution relative to the lower end. A good example of this distribution would be the mileage on the odometer of used cars.

So for example, let's look at home values. And we see the graph here that shows the distribution of home values. And so on the bottom horizontal axis, we have the price listed in thousands of dollars. And you see how then, on the lower end, we start at around 100,000.

And on the higher end, we work our way up almost to 800,000. And we see the majority of the values are concentrated right there, right around the $300,000 mark. And this is going to be a right skewed or a positively skewed distribution curve.

Why is it right skewed? Because we see quite a few observations right there around that $300,000 point, but we also see a lot of values of homes that are priced much higher. And that would pull the mean up while leaving the median or the middle value relatively low.

So we see most of the values here are going to fall in that middle range, but there are some rather large values that are going to pull that average up. In this case, the mode would be the smallest value, then the median, and then the mean. And we'd read the graph along the horizontal axis from left to right.

When we look at a right skewed distribution, it's pretty evident that the far right-hand side is pointing to the right. And so if we were to consider that as an arrow and think of the tail as an arrow where it's pointing to the right, we could say it's a right skewed distribution. And so based upon which direction that tail actually points gives us a sense of what kind of distribution we're dealing with.

The mean, in this case, would be $400,000, which would be higher than the median at $325,000 and higher than the mode of $300,000. And this is a key indicator of a right skewed or a positively skewed distribution.

A left skewed distribution, on the other hand, would be an example such as the mileage we see on used cars. And so if we look at the distribution of the mileage of used cars, we notice that there are similarities to the other graph. However, it's not going to necessarily look exactly the same. This type of distribution is called a "left skewed distribution" or a "negatively skewed distribution."

In this case, the mean would be less than the median and less than the mode. In this particular instance, we're going to see a mean of 101,000 miles that would be on the odometer of a used car, a median of 108,000 miles, and a mode of 118,000 miles. And notice that the mode is greater than the median and the median is greater than the mean.

And we see a relatively low number on the lower end in terms of the value of miles on a car, and simply because people might wait a lot longer to trade in their vehicles and they might have had to have higher mileage on them. And so we'd see a left skewed distribution. And we can also look at this in terms of a tail being on the left side, and it points to the left.

So how do we identify differences between distributions? Well, if you'd look at the graph, you'd see here what's called a "normal distribution" where it's symmetrical. We see it in the middle point there where, if we were to draw a line going straight down, how the left half would be a reflection of the right half. And the mean, the median, and the mode would be identical, and they'd be the value right there in the middle.

An example of a skewed distribution in which we see a right skewed would be bank account balances. Now, obviously, there are going to be some people that have quite a bit of money in their bank account. And in a case like this, it's going to see the observations on the right. We see a situation here where the mode and the median are going to be relatively low, where the mean's going to be higher. And that's simply because there are some large balances that would pull the mean up.

For an example of a left skewed distribution, let's look at the heart rate of a sample of people. And in this case, we're going to see the heart rate measured in beats per minute. And obviously, your heart has to be beating or else you'd obviously have some significant health problems.

However, we're going to see the mean beats per minute is probably somewhere around the 70 and 80 mark, and you're going to see some beats that are higher than that and some that are lower than that. And you're going to see a distribution here that's left skewed whereas that tail points to the left. And so you're going to see very few people with a low heart rate, and you're going to see some people with a much higher heart rate. And that's going to bring you to a situation that's going to reflect a left skewed distribution.

And so what is it that we can look at to get a sense for whether or not we have a distribution that's going to be skewed one way or the other? And it's going to depend on the value of the mean, the median, and the mode. And so as this table will illustrate, if the mean happens to be greater than the median and the mode, we are dealing with a right skewed distribution. If the mean happens to be equal to the median and the mode, it's going to be a normal distribution. And if the mean would be less than the median and the mode, it is going to be a left skewed distribution.

And so let's revisit the objective for this lesson. By the end of the lesson, we wanted to be able to identify the mean, median, and mode on a skewed distribution, which we did. We went through examples of positively skewed distributions, negatively skewed distributions, and illustrated how the mean, the median, and mode were different values depending on the skewness of that distribution. So again, my name is Dan Laub. And hopefully, you got some value from this lesson.

(0:00 – 0:30) Introduction

(0:31 – 1:19) Skewed Distributions

(1:20 – 2:58) Right Skewed Distributions

(2:59 – 3:57) Left Skewed Distributions

(3:58 – 5:20) Differences Between Right and Left Skewed Distributions

(5:21 – 5:50) How To Determine The Skewness of a Distribution

(5:51 – 6:16) Conclusion