[MUSIC PLAYING] Dan Laub here. And in this lesson, we're going to talk about the idea of setting up an experiment. And so before we get started, let's talk a little bit about some review material, a little quick refresher from previous lessons.
And the first thing I want to remind you of is that we do have two different type of hypotheses. One's a null hypothesis. And the other one's an alternative hypothesis. And the idea behind these hypotheses is to largely set up a test, to set up an experiment using the experimental method.
And once we develop the hypotheses and we make a prediction based upon the alternate hypothesis, the next step in the process is to test the prediction. We're going to basically determine whether or not the hypothesis will either be rejected or not the results that we come up with from the experiment will fail or reject the null hypothesis.
In order to test this hypothesis we need to start gathering data, accurate data. And from this data, we're going to largely be able to determine whether or not a cause and effect relationship will exist between the two variables in question.
And so let's use a simple example here. We're going to conduct an experiment. And we're going to see how long it takes to have a pizza delivered.
And let's say our null hypothesis-- we're going to start out saying that we believe the average pizza delivery is going to take less than 40 minutes. And what we're going to talk about in terms of the alternative hypothesis is they're going to test the concept that it's possible that it actually takes more than 40 minutes.
And so the hypothesis in this case would be, well, the average delivery arrives in 40 minutes or less, equal to 40 minutes or less than 40 minutes, whereas the alternative we would simply be stating that, well, in this case, we're going to test whether or not we believe it actually arrives in greater than 40 minutes. And so how would we actually go about testing his hypothesis? How would we actually go ahead and look at data in order to determine this? And that's what I want to get into with the remainder of this lesson.
So just as a quick review, remember explanatory variables and response variables. And so an explanatory variable in this case is essentially what are we altering in order to see how the response variable changes?
So suppose we were to do an experiment with regard to pizza delivery time. And in this case, the response variable might be, well, how long does it take a costumer to get their pizza? And the explanatory variable might be any variety of different factors. But let's say we choose one in particular-- how many delivery drivers happen to be working at one particular point in time?
And so in a case like this, we're going to run an experiment. And we're going to look at how long it takes somebody to get their pizza. And we're going to look at how many drivers happen to be working at that particular time. And then we're going to set up a null and alternative hypothesis. And we're going to look at the experiment. And we're going to see if we can determine if there's any causality between the number of drivers working at a given point in time and how long it takes the customer to get their order.
Let's say we have a simple experiment. We're going to randomly pick 100 people and ask them to perform one of two specific tasks. And so the first one is we're going to take them to a basketball court, put them at the free throw line, hand them a basketball, and ask them to make 10 free throws. And the second one would be we're going to sit them down at a desk and give them a quiz and ask him to name all 50 state capitals in the United States.
Now, obviously, these are two very distinct, different tasks, requiring different skill sets. Some people might be good at one. Some might people might be good at the other one. Some people might be good at neither or both.
And the idea here is that we're going to have it randomly set up. So somebody gets assigned to one task versus somebody gets assigned to another one. And the idea here is that if we randomly select the people, we're in all likelihood going to wind up with roughly a 50/50 break, meaning if we randomly pick people, 50 people are going to be doing one task, 50 people going to be doing another one.
It won't necessarily be the people that are good at one or good at the other one. It's just going to be 50 random people. That'll give us a better sense of what the group of people that we're trying to get a sense for in terms of the larger group how well they happen to be at one task or the other.
We might have a different situation where we allow people to choose which half they want to do. Obviously, those who are good at geography might choose to say, well, I'm going to take a shot and go ahead and throw a free throws and see how well I do that. And those who might not be very athletic might say, let's go ahead and take our chances with answering the 50 state capitals.
What's going to happen is we're going to wind up with two distinct situations. If we pick randomly, we're going to wind up with relatively close to a 50/50 split. If we allow people to do what's called self-select, we're going to wind up with a vastly different number.
And so the idea here is pretty clear. We want to use random assignment in order to better mimic what the overall group that we're drawing from looks like. In this case, it might just be American people. And we're trying to get a sense for whether or not we can actually replicate that or do the experiment over again.
So let's talk a little bit about what's called a controlling variable. So once we have groups with an experience that have been randomly chosen, we need to make sure that one group does not get manipulated.
And what do we mean by manipulating? Well, manipulated means there's something that can alter that group in order to effect the outcome. In this case, it would be on the response variable.
And so the idea of having what's called a control group is that we control everything within it and make sure that the variables they're exposed to are not going to be different throughout the course of the experiment. The idea here is to make sure we're not influencing the response variable.
So if we control for other variables, those who conduct the experiment are going to make sure that only the explanatory variable is the one influencing a response variable and no other variables are. Now, there might be more than one explanatory variable that could affect that response variable. And the idea of a control group is to try to eliminate that. So we're not considering any other explanatory variables other than the one we're simply looking at.
So to give you a simple realistic example, pharmaceutical companies will do trial runs on new experimental medicines on a periodic basis. And what they will do is often break people into a control group and people into a test group. And the idea is pretty simple.
The control group will basically have everything remain the same except they are not getting the drug in question. They're getting what's called a placebo, whereas the test group is the one actually receiving the drug. And the idea behind using the experimental method in a case like this is in order to see what effect the actual drug has on the conditions of, say, for instance, an underlying medical problem.
Now, using a different example, we could come up with the concept of, say, a botanist growing different plants under different conditions and say they wanted to try out a new type of fertilizer. How would they prove-- or how would they show that the fertilizer has any impact on plant growth whatsoever?
Well, what they could do is give one of the groups of plants-- equal plants, same size, same species-- they go ahead and give them fertilizer in the water. And the other group, they simply just get water, no fertilizer added. They get the same amount of water at the same time of day at the same rate. And what they could see after a period time was if the plants grew better under certain conditions, namely having fertilizer.
Now, it's possible we have another variable that influences this. What if, say, for instance, some of the plants were exposed to more sunlight than the other ones? And some were in the shade. Some were in the sun 8 to 10 hours a day. That could obviously have an impact on the growth rate of the plants.
And so the idea here is we want to try to keep that third variable, in this case, sunlight, the same throughout the experiment. So we put all the plants together in the same location where they're going to be subject to the same amount of sunlight per day.
Now that we've set up an experiment, now that we discussed random assignment, now that we discussed the idea of a control or control group, let's talk about the concept of replication. Now, the reason we conduct multiple tests or want to replicate the test is to make sure that they're repeatable, that the fact that one person, one experimenter could run the test with a completely different group of observations or completely different group of people or things, and somebody else to replicate the same experiment using a completely different set of people or things.
We want to see if the tests are going to show similar results, meaning if one experimenter runs the test, what results do they wind up with versus if somebody else runs a test on a different group? Now, if the test wound up showing very different results, something might be wrong with the experiment. And the idea might be that we need to go back and re-evaluate the design of how we actually came up with the experiment to begin with.
And so the key reason here behind replicating an experiment is simply to provide some validity behind why the experiment's actually done.
So again, my name is Dan Laub. And I hope you got some value from today's lesson.
A variable that stays the same throughout an experiment.