Selection bias deals with not selecting the right group of people for your sample, so it’s very important when attempting to generalize findings. This tutorial will explain:
Sampling is like a pot of soup. Selecting a little bit of each ingredient for the soup is like obtaining a representative sample for an experiment. But things can go wrong with the taste test which may limit the ability to draw conclusions about the pot of soup as a whole.
Selection bias is also called undercoverage bias. And it occurs when a significant subset of the population is left out of the sample. This is not necessarily intentional, but rather, they were systematically ignored by whoever was taking the sample.
In the 2008 presidential primary, almost every poll showed Barack Obama leading by at least 5 percentage points leading up to the New Hampshire presidential primary in 2008. All of these were based on random digit dialers calling a random sample of New Hampshire households. It was a well done survey by all accounts.
However what happened was that Clinton gained some support in the last few days. And mainly a lot of college students ended up coming out in support of Hillary Clinton in the last days when people were expecting all college students to come out in support of Obama.
Because a lot of the college students are from out of state, they aren't actually New Hampshire residents. For that reason, they were not counted and as a result the sample got every prediction wrong and Clinton ended up winning.
The New Hampshire primary used random digit dialers. Random digit dialing involves using a machine to select random phone numbers from within selected area codes. It doesn't randomly select the area code necessarily, but once it's in the area code it can randomly select digits and dial that particular phone number after which the poll can be conducted.
The biggest advantage of using random digit dialers is that random 1-digit dialers can reach mobile phones and unlisted numbers that you wouldn't be able to obtain using a phone book. So it evens the playing field a bit since anyone can be selected for that sample as long as the phone number is within that particular area code.
Now how does selection bias affect what we think is in the soup?
Imagine that there were certain ingredients that were located only in certain locations in the pot. Maybe noodles sunk to the bottom. If you took a taste only from the top, it doesn't matter how big that taste is. If you missed the noodles, you wouldn't even know they were there.
That's the same as dealing with selection bias. Because you didn't select the representative group of ingredients from the population, you don't get the right idea of what's going on. It limits your ability to generalize your findings to the general population.
Selection bias occurs when some subset of the population is left out. It might be intentional or unintentional. Since some section of the population is left out, the coverage is lacking. This is why selection bias is also known as “under-coverage”. Random digit dialing is a great tool to use since it helps extend coverage to mobile phones and unlisted numbers.
Source: This work is adapted from Sophia author jonathan Osters.