Standard Deviation Tutorial

Welcome to the world of Standard Deviation...

What pops in to your head when someone talks about "standard deviation"? Intense mathematical formulas? Excel calculators? Vast piles of data waiting to be analyzed and presented? If you answered yes to any of the former questions, move on to more advanced topics in statistics - you are ready.

If what popped in to your head was more along the lines of: Serious boredom? Serial killers and other deviants? Screaming and running from the room in revulsion? Then the packet to follow is just for you!

We are going to talk about standard deviation and show you what it means, how people in the real world of work actually use it and a few other useful tips along the way.

What is Standard Deviation?

Standard Deviation is a measure of dispersion used in interpreting data sets.

So plugging a bunch of numbers from your population in to the formula will get you your standard deviation. But it still doesn't tell you what the result means or what to do with it.

(At the end of the packet - watch Khan's Academy video on computing the standard deviation to review. We aren't going to talk about computing, just using the end result.)

Source: http://geographyfieldwork.com/StandardDeviation1.htm, retrieved July 7, 2010

How far and how wide?

Standard deviation helps you to look at a set of data and make some assumptions. The bigger the standard deviation, the more your data is spread out. The smaller the standard deviation, the less variation in your data set.

Ahh, ya - so still wondering what it all means?

Example time:

So let's say you are looking at a bunch of senior citizens. You are looking at the relationship between height and age (did you know you shrink as you get older? All of us. Seriously.). So you survey a bunch of 80-year-olds from various locations and record their heights to get a starting point for your study.

What? They don't look like they are 80 years old? Hey - it pays to exercise!

Anyway, you record heights from 10 seniors, all male as you want to avoid any gender issues. The next screen shows the table of values. Notice how closely the results are grouped.

Heights of seniors - Table 1

Name Height

Fred 6.0 ft.

George 5.8 ft. Standard deviation (or σ) = .254

Harry 5.9 ft.

Melvin 5.6 ft. Mean = 5.87

Vern 6.5 ft.

Dan 5.6 ft.

Andrew 5.8 ft.

Craig 5.8 ft.

Nate 5.9 ft.

Jeff 5.8 ft.

Using Excel, I computed the standard deviation and the result, .254, agrees with my first observation, that the heights are grouped closely, hence the small standard deviation. Let's see what happens when the data is more widespread, as shown by Table 2.

Heights of Seniors - Table 2

Name Height

Mark 6.8

Matt 5.6 Standard deviation (or σ) = .719

Lloyd 5.2

Jim 4.6 Mean = 5.84

Cooper 7.1

Kirk 5.8

Charlie 5.7

Cleavon 5.9

Bob 5.6

Kenneth 6.1

Notice the change in standard deviation? It jumped from .254 to .719. Remember the mean from previous packets? Using Excel to compute the mean of both sets of data, notice how there isn't much difference? Average height of both groups is very close, even though the second group has much more widely dispersed values.

So the question must be asked - "Who cares?"

Depending on who ordered the study, lots of people. If a drug company wanted to market a new drug to prevent bone loss in aging males, the above information and how it is presented to the Food and Drug Administration could be the difference between a raging success and a monumental flop. The Census Bureau might using the data to push local governments to install a newer height of doors in all community buildings.

Anyone else use this stuff?

Climate

Consider the average daily maximum temperatures for two cities, one inland and one on the coast. The range of daily maximum temperatures for cities near the coast is smaller than for cities inland. While two cities may each have the same average maximum temperature, the standard deviation of the daily maximum temperature for the coastal city will be less than that of the inland city as, on any particular day, the actual maximum temperature is more likely to be further from the average maximum temperature for the inland city than for the coastal one. If you were planning to move to a city with a temperate climate, you would be interested in the outcome (think San Diego vs. Minneapolis – the average might be close but the spread would make a big difference).

Sports

In any sport, there will be teams that rate highly at some things and poorly at others. Chances are, the teams leading in the standings will not show such disparity, but will perform well in most categories. The lower the standard deviation of their ratings in each category, the more balanced and consistent the team. Teams with a higher standard deviation will likely be more unpredictable. For example, a team that is consistently bad in most categories will have a low standard deviation. A team that is consistently good in most categories will also have a low standard deviation. However, a team with a high standard deviation might be the type of team that scores a lot (strong offense) but also concedes a lot (weak defense), or, vice versa, that might have a poor offense but compensates by being difficult to score on.

In racing, a driver is timed on successive laps. A driver with a low standard deviation of lap times is more consistent than a driver with a higher standard deviation. This information can be used to help understand where opportunities might be found to reduce lap times.

Finance

In finance, standard deviation is a representation of the risk associated with a given security (stocks, bonds, property, etc.), or the risk of a portfolio of securities (actively managed mutual funds, index mutual funds, or ETFs). Risk is an important factor in determining how to efficiently manage a portfolio of investments because it determines the variation in returns on the asset and/or portfolio and gives investors a mathematical basis for investment decisions (known as mean-variance optimization). The overall concept of risk is that as it increases, the expected return on the asset will increase as a result of the risk premium earned – in other words, investors should expect a higher return on an investment when said investment carries a higher level of risk, or uncertainty of that return. When evaluating investments, investors should estimate both the expected return and the uncertainty of future returns. Standard deviation provides a quantified estimate of the uncertainty of future returns.

Source: http://en.wikipedia.org/wiki/Standard_deviation, retrieved July 8, 2010

Khan Academy - review of Standard Deviation computation

Source: Khan Academy, retrieved July 8, 2010

First, please create an account