Author:
Christine Farr

All hypothesis tests should include hypotheses, test statistic, p-value or critical value, decision, and conclusion. Please minimize the listing of computer output or the excessive use of appendices in reporting your results. Summarize the results of each regression model simply by displaying the regression equation, the coefficients and their standard errors, as well as the usual summary statistics such as the standard error, R-square and R-square(adj).

A policy analyst for the Ontario Ministry of Education wanted to determine what relationships between income and the aggregate level of education might be used to encourage students to stay in school. Although there were potential problems with interpreting relationships based on aggregate data, she decided to begin with data from the 2011 National Household Survey.

She collected data for the 1075 census tracts in the Toronto area and took a random sample of 250 observations, before compiling a dataset with the following variables:

CensusT: identifying code for the census tract

P_hsgrad: the proportion of adults with high school graduation

P_trades: the proportion of adults with qualifications in a trade

P_collcert: the proportion of adults with a college certificate

P_univdipl: the proportion of adults with a university diploma (no degree)

P_univdegr: the proportion of adults with a university degree

MedInc: the median employment income for individuals above 15 years

AvgInc: the average employment income for individuals above 15 years

MedInc*: the median employment income, with missing values

Note that each proportion tracks the number whose highest level of education is as indicated and the categories are mutually exclusive. The data are in the files toronto.mtw and toronto.xlsx.

(a) Plot the average incomes against the median incomes. What two words would best describe the shape of income distributions in general?

(b) Perform a multiple regression analysis using the five educational variables as predictor variables and the median income (MedInc) as the response variable.

(c) For the regression model in (b), graph the standardized residuals against the fitted values and comment on whether the linear regression model assumptions are warranted.

Tutorial