Author:
Christine Farr

Problem 1In the dataset Smoker, there is information on 1196 males from the United States. Data from this sampleincludes the variables:smoke= 1 for smokers, and 0 for nonsmokersage=age in yearseduc= number of years of schoolingincome= family incomepcigs= price of cigarettes in the individual’s statePart 1)a) Generate a dummy variable “hi_ed” that is a 1 if a person has 16 or more years of education.b) Estimate a linear regression (which in this context is called a linear probability model (LPM)) forthe binary variable smoke on the independent variable hi_ed. Report the beta coefficient on thedummy variable and its p-value. In words, express what the beta coefficient means in this case.c) Create a frequency table for the smoke and hi_ed variables. The command in STATA is tabulatesmoke hi_ed.d) Calculate the probability that a person smokes if low education. Calculate the probability forsmoking for high edcuation.e) What is the relationship between the results for parts b and d?f) Calculate the odds that a low education person smokes. Calculate the odds a high educationperson smokes. Calculate the odds ratio.g) Estimate the logistic regression (logistic command) for smoke and hi_ed. Confirm that this equalsthe odds ratio. In words, express what the odds ratio coefficient means in this case.

Tutorial