First, please create an account

Already have a Sophia account?

Designing Samples to Minimize Bias

Author: Marble Happy

An example of a biased opinion poll...

Why do I need to know about bias when learning about sampling?

How do you get your daily dose of current events?

Do you watch TV?
Listen to the radio?
Have your iPod tuned to a talk-radio show?
Watch "The Daily Show with Jon Stewart"?
Live in a box and not pay any attention to current events?

No matter the media, likely you will be getting a daily dose of statistics along with your current events. "98% of texting respondents polled agree..." or "4 out of 5 doctors surveyed said..." and to use a current example, "Beasley dealt to Wolves after LeBron picks Miami - Good for Wolves?".

Learning about unbiased sampling helps you to separate the good information from the junk most of us get on a regular basis.

For instance, take the examples I listed above:

"98% of texting respondents polled agree..." Who was polled? What about those who do not text? Do they have a way to participate in the survey?
"4 out of 5 doctors surveyed said..." How many doctors were asked the question? 5? 50? 5,000?
"Beasley dealt to Wolves after LeBron picks Miami - Good for Wolves?" How likely are you to respond if you don't read the Minneapolis Star and Tribune? How about if you don't follow the NBA?

See what I mean about knowing the details?

Source: http://www.startribune.com/?elr=KArks8c77iUec77iUiD3aPc:_Yyc:aUQ7c4E7ME5U, retrieved July 9, 2010

Definition of sample bias according to Wikipedia...

Sampling bias

From Wikipedia, the free encyclopedia

In statistics sampling bias is causing some members of the population to be less likely to be included than others. It results in a biased sample, a non-random sample^[1] of a population (or non-human factors) in which all participants are not equally balanced or objectively represented.^[2] If the bias makes estimation of population parameters impossible, the sample is a non-probability sample. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

Huh? Let me translate. Using the NBA Timberwolves example from the previous slide, sampling bias is happening big-time. Think of people who will be unlikely/unable to respond to their survey:

people without computers or access to the internet
people who do not follow the NBA
people who do not follow the Timberwolves professional basketball team

Or to say it another way, the only people likely to respond are those that feel very strongly about the Timberwolves and their trading activities and who have access to the internet and read the Minneapolis Star and Tribune. Kind of puts lots of these surveys

in perspective, doesn't it?

Source: http://en.wikipedia.org/wiki/Sampling_bias, retrieved July 9, 2010

Definitions

Source: Bock, Velleman, DeVeaux, Stats: Modeling the World, Pearson Publishers, 2010

More on definitions if you want additional explanation...

I found the following definitions on a blog put out by a math teacher from California. He doesn't identify himself but I would love to take a class from him. I will include the link to his blog as he has some topical (and hilarious) clips from "The Daily Show by Jon Stewart at the end of his narrative. (His reward for reading through the terms!) Be sure to read both Parts 1 & 2 as there is a second clip at the end of Part 2.

http://mrho.net/blog/?p=782

Source: http://mrho.net/blog/?p=782, retrieved July 9, 2010

Confounding and Lurking Variables

A discussion of designing samples would not be complete without a section about confounding and lurking variables. Both are insidious little devils reaching out to mess with your results.

Confounding Variables are variables you failed to control or eliminate. They may damage the validity of your experiment by causing you to jump to a false conclusion, also known as a false correlation. Example - you are looking at gas mileage and fail to include tire pressure as a factor.
Lurking Variables are variables that have an effect on your response and should have been included in your analysis but just weren't for whatever reason (because they were LURKING in the background...). Example - you are researching risk factors for heart attacks and fail to include gender (did you know 223,000 women die annually from cardiovascular disease? www.womensheart.org).

Source: http://answers.yahoo.com/question/index?qid=1006022514061, retrieved July 9, 2010