Hi, this tutorial covers sampling with or without replacement. So let's start with some motivation. So suppose you're interested in the average amount of money an employee in a large office building spends on parking per month. So let's think about what our research goal would be.
So what I would propose is that our research goal is to estimate mu. OK, so now, let's make sure we define me. So mu represents the mean parking cost for the employee population. So we're trying to estimate a population mean.
All right, so to estimate mu, a sample must be taken and a sample mean must be calculated. So when estimating a population parameter using a sample statistic, which we would do in this situation, that's called inference. So it's important that the sample measurements are independent. So when we're calculating that sample statistic, we need to make sure that the observations we make in our sample or our measurements that we make are independent.
So ideally, a member of a population would be randomly selected and measured. This process would then be repeated. So if we think about that process, we want to think about what issue may arise. And the issue that might arise is that a member of the population may be sampled more than once. So
For thinking about our office example, if we're just to do a random sample, it's possible that that person might be sampled more than once, and that might kind of screw up our results a little bit if we're sampling people more than once. So really, what we're talking about here are the two different types of sampling. We're looking at sampling with replacement and sampling without replacement.
So sampling with replacement is a method of sampling where an item may be sampled more than once. Now, this is the important part. Sampling with replacement generally produces independent events, which remember, that's what we want. That's what we're dealing with. We want to have independent observations when we're doing an inference.
Now, sampling without replacement is a method of sampling where an item may not be sampled more than once. Sampling without replacement generally produces dependent events. So we have a little bit of an issue here. Sampling with replacement produces independent events, sampling without replacement produces dependent events.
So should the population of employees in question be sampled with or without replacement? Now, sampling without replacement produces dependent events. So the good thing about sampling without replacement is that you're not going to get those repeated measurements. The bad thing is that we're dealing with dependent events.
Now, sampling with replacement allows for repeated members of the-- members to be sampled more than once. So when we're dealing with replacement, the issue here is that we might have repeated observations, but the good thing is that we're dealing with independent events. So we do have a little bit of an issue here.
Luckily, we have some good news. If a large enough sample is taken, and the sample size is no more than 10% of the population, sampling without replacement can be used. So this works because the probabilities will not change much, and the events can be treated as independent. So the sampling without replacement alleviates the issue of repeated observations. And again, if we have a large enough sample, and sample no more than 10% of the population, we can get events that are almost independent.
All right, so going back to our example, to estimate the average monthly parking cost for this population, the employees should be sampled without replacement as long as, again, your sample size is large, and the sample size is no more than 10% of the population size. So again, sampling without replacement can be used in this example as long as these two conditions are satisfied. All right, this has been your tutorial on sampling with or without replacement. Thanks for watching.