Hi. This tutorial covers multiple regression. So let's just start with a little bit of motivation first. So at a school district, the base salary for teachers is $35,000 per year. Additionally, teachers receive an extra $750 per year for each complete year they have taught, and an extra $50 per year for each college credit they have earned after receiving their bachelor's degree.
So if a new teacher came in, had never taught before, and just has their bachelor's degree with no extra college credit, they would get the base salary. But say another teacher comes in and they have two years of experience and $10-- or see me-- 10 college credits. Then they would receive some of this additional money on top of the $35,000.
So let's define some variables here. So we're going to let y equal teacher salary, x sub 1 represent the number of years of teaching experience, and x sub 2 represent the number of college credits earned after receiving the bachelor's degree-- so three different variables here. So now, since teacher salary depends on two variables-- both x1 and x2-- they both need to be part of the equation for determining y.
So really, in this case, what we have is one response variable and two explanatory variables. So this equation ends up being y equals 35,000 plus 750 times x1 plus 50 times x2. So suppose a teacher has accumulated nine years of teaching experience and 55 post-degree credits. How much will the teacher's salary be?
So basically, what we do is start with the base pay and then we're going to need to take that $750 times nine years of experience. Then we will go ahead and add the $50 times the 55 credits that they've earned since the-- since their bachelor's degree. OK, so then what we'll do is go to the calculator and type all of this in-- 50 times 55.
This will end up giving us the teacher's salary, which is $44,500. And this number was dependent on the two different variables-- again, x1, the number of years of experience, and x2, the number of years of credit. So this is just one example of an equation for y that depended on x1 and x2-- two separate variables. So one response variable depended on two explanatory variables.
All right, so since equations can be written using multiple explanatory variables, regression can be done using multiple explanatory variables. So if we have a scatter plot, or if we have a set of data where a y value will depend on multiple x values, we can also do regression. So multiple regression will always provide a better, but more complicated model than regression with a single explanatory variable.
So what multiple regression is is the process of developing a modeling equation for a response variable using two or more explanatory variables. The explanatory variables used in multiple regression must be independent of each other. So whatever those x values are, they have to be independent. A change in one can't affect how another one will change.
Now, the general form of a multiple regression equation is b0 plus b1 times x1 plus b2 times x2 plus b3 times x3 plus-- dot, dot, dot. This pattern could continue for as many variables as you have. So if we just had three variables-- x1, x2, x3 would be your three variables-- b 0 would still be that y-intercept when x1, x2, and x3 are all 0. And then b1, b2, and b3 would end up being the coefficients-- so whatever you're multiplying each of your explanatory variables by.
So we'll just look at a situation where multiple regression could be used. We won't actually calculate anything, but let's just make sure we understand that multiple regression could be used in this situation. So suppose a farmer's interested in predicting crop yield using a regression equation. The farmer knows that crop yield depends on amount of rainfall, amount of fertilizer used, and amount of pesticide use.
So basically, we would be trying to come up with a regression equation for y hat, where y hat represents the predicted crop yield. Now, that y hat value is depends on three different things-- so x1, x2, and x3. x1 will represent amount of rainfall, x2 will represent amount of fertilizer used, and x3 will represent amount of pesticide use.
So using some advanced technique techniques that we're not going to get into here, what we'll end up having is a y hat equation where you have some sort of constant plus b1 times x1-- so you'll get a value of b1 that you'll multiply by the amount of rainfall-- plus b2 times x2-- so again, some coefficient you're going to multiply by the amount of fertilizer use-- plus b3 times the amount of pesticide use.
If you were to actually do a multiple regression, you would end up with numbers in for b1-- or b0, b1, b2, b3. Then you would just put in your specific values of your variables here to get a predicted crop yield under those certain conditions. I'm not actually going to show how to do it because that's pretty advanced, but note that you would get an equation that would look like that with just numbers plugged in. All right, that's been your tutorial on multiple regression. Thanks for watching.