Or

4
Tutorials that teach
Sampling With or Without Replacement

Take your pick:

Tutorial

Source: Playing Cards; Public Domain: http://www.jfitz.com/cards/

This tutorial is going to cover sampling, both with and without replacement. With replacement means that you put everything back once you've selected it. And without replacement means that each observation is not put back once it's selected. And so once it's selected, it's out. It can't be selected again.

So typically, one big requirement for statistical inference is that the individuals, the values from the sample, are independent. That is that one doesn't affect any of the others. And ideally, this would mean sampling with replacement.

Let's go through an example real quick involving cards. So the probability that you select a spade, something from this bottom row, on the first draw is one fourth. Now suppose that you draw it and you don't put it back. So I took the 10 of spades, and I pulled it out.

Now the probability of a spade, there's only 12 left out of 51 cards. That's not one fourth. That's a different number. So the first draw and the second draw are dependent. The probability of a spade changed after knowing that we got a spade on the first draw.

Now consider if the card got replaced though. The probability of a spade on the first draw is one fourth. And then you pull the 10 of spades. And then you put it back.

Now what's the probability of a spade on the second draw? Well it's one fourth again. It's the same 52 cards. And so you have the same likelihood of selecting a spade.

Now in real life, we often don't sample with replacement. And this is a huge deal because typically sampling with replacement will lead to independence, which is a requirement for a lot of statistical analysis. But you wouldn't call a person for their opinion in a poll twice. So we don't put someone back into the population and see if we can sample them again. It just doesn't make sense to do in real life.

So we need a little bit of a workaround. So what we're going to do is something, even though the sampling done in real life doesn't technically fit the definition for independent observations, there's going to be a workaround. There's a big but here.

So suppose that our population was very large. So suppose we had, instead of the 52 cards, four decks of cards, so 208 different cards. Now suppose the worst case scenario happened in terms of independence. And maybe every card we picked was the same suit. So we'll take five diamonds from the group. So we've selected five cards, all of which were diamonds.

The probability of a diamond on the first draw-- there were 52 diamonds here total listed out of 208 cards-- and so that's one fourth probability, same as if there were one deck. But the larger population actually has an effect now. Look at the probability of a diamond on the last draw. It's 48 remaining diamonds out of 204 remaining cards.

The probability is about 0.24, which is different than 0.25, but not hugely. And this is even after five draws. And so we're going to say that the probability of a diamond didn't change particularly that much from the first to the last draw.

So we're going to have sort of a catch for independence. So when we sample without replacement, if the population is large enough, then the probabilities don't shift very much as we sample. And so the sampling without replacement becomes almost independent because the probabilities don't change very much.

Now the question is, it says when the population is large enough, and that's not very well defined term. How large is considered a large population? What we're going to do is we're going to institute a rule. And a large population is going to be at least 10 times larger than the sample. So the population is greater than or equal to 10 times n, the sample size.

And if that's the case, then we're going to say that the probabilities don't shift very much when you sample several items, n items from the population. And so therefore, we can treat the sampling as being almost independent.

So to recap, sampling with replacement is kind of the gold standard. It always creates independent trials. So the probability of particular events don't change at all-- at all-- from trial to trial. However, in real life, when we sample without replacement, the probabilities do necessarily change. However, our workaround is that if the population from which we're sampling is at least 10 times larger than the sample that we're drawing, the trials can be considered nearly independent.

And so we talked about sampling with replacement and sampling without replacement and how independence relates to those two. Sampling with replacement will guarantee independence. Sampling without replacement can guarantee independence almost if certain conditions are met.

Good luck. And we'll see you next time.