Hi. This tutorial covers explanatory and response variables. So let's start with a data set. So this is a bivariate data set it has both an x and y, and each of these x and y pairs are logically paired. So what they represent is x is the age of a runner and y is the average time in minutes for a runner to complete a certain marathon.
So basically, what they did is took all of the marathon runners that were 15 and found the average time that they took to complete the marathon. So that was 302.38 minutes. Of all those that were 25, they averaged all of their times together, and it was 193.63 minutes.
So we're going to come back to this data set in a minute. Let's make sure we have a good working definition of what x what an explanatory variable is and what a response variable is. So an explanatory variable is the variable that causes an effect-- sometimes called the independent variable. And the response variable is a variable that reflects the effect-- sometimes called the dependent variable.
On a scatter plot, the explanatory variable labels the x-axis and the response variable labels the y-axis. So basically, whatever your x variable is, that's going to be explanatory variable. Your y variable will be your response variable. So again, if we back to our data set, if we had to pick between these two which is the response and which is the explanatory variable, it seems that age has an effect on time.
We can definitely see that somebody that's 65 runs the marathon at a much slower rate than the 25 to 35-year-olds. We can also see the 15-year-olds, probably because they're not as experienced running, they don't have the same muscle mass, or whatever will affect their running ability-- but you can see that their time is a lot higher than somebody a little bit older.
So I would say that, if we had to pick which was the explanatory in which was the response, age of runner would be the explanatory variable and then the average time would be the response variable. So let's go ahead now and make a scatter plot of the data. So I've reproduced the data here, and then we're going to make a scatter plot right below it-- again, putting the explanatory variable on the x-axis and the response variable on the y-axis.
Let's start with an x-axis. So I'm going to start my x-axis down here. I'll start it at 0, and I need to get up to 65, so I think what I'll do is maybe count by 20's-- 10's and 20's. All right, so that'll be 0, 20, 40, 60, 80. Again, this is my x-axis, so these are my explanatory variables. And let me just put it in a couple more marks here.
OK, and now, for my y-axis, I need to get up to 302 at least. And I probably don't need to start until maybe about 180, so I don't need to start at 0. So what I'm going to do to reflect that is put in this little squiggle mark. Now, this squiggle there just represents a jump. It means it doesn't start at 0. It represents a jump in the numbers.
So let's go up. And I'll start here at 180, and then what I'll probably do is count by 40's-- so 180, 220, 260, 300, 340. And let me just mark between them also. So now let's go ahead and start plotting the points.
So let's start with 15 and 302. So 15 is going to be-- this is 10, 20, so 15 is here and 302 will be about here. 25 and 193.63-- that's going to be about here-- 25 about here. 35 and about 185-- that's going to be closer to here. 45 and 198-- be a little bit higher, about here. 55 and 224.3-- so it should be about here. And 65 and 288.71-- so 288 will be about here.
All right, so that gives us a pretty clear picture of the data-- again, age as the explanatory variable, time as the response variable-- so again, explanatory variable on the x-axis and response variable on the y-axis. And we can also see that there is a-- seems to be this parabola shape almost. So as your x is increasing, the y starts pretty high, goes lower, and then is going to go high again.
So this picture gives us a pretty good indication of how y is changing in response to how x is changing. And as the last thing we'll do here is examine the following pairs of variables and determine which is explanatory and which is response. So let's start with the first pair here, maximum daily temperature and cooling cost. So what causes the effect and what reflects the effect?
So I would say, in this case, cooling cost is going to-- or excuse me-- maximum daily temperature is going to cause a change in cooling cost. So the higher the temperature, the more it's going to cost to cool your apartment or your workspace. So what I would say here is that temperature is explanatory-- I'll use exp for explanatory-- and your cost here is going to be your response.
The next one-- rent and square footage of an apartment. So does the rent cause a change in the square footage of an apartment, or does the square footage cause a change in the rent? And I would say that your square footage is your explanatory variable. Generally, as the square footage goes up, the rent of the apartment will also go up. So the bigger the apartment, the more expensive it's going to be. So then the rent is what's going to reflect the effect, so the rent is the response variable.
And now, the last pair, SAT verbal score and SAT math score-- now, I would say that these are probably somewhat associated-- generally, somebody that does well on the verbal might also do well in the math-- but I don't know if one will cause a change in the other. So in this case, let's just say unknown.
And really, if we wanted to know the relationship between the two, we might need to do some sort of controlled experiment to figure out if one does seem to cause a change in the other. But right now, it's-- I wouldn't be comfortable assigning one as explanatory and one as response. So those are just a couple examples of thinking about which one causes a change in the other. All right, so that has been your tutorial on explanatory and response variables. Thanks for watching.