Econometrics by stata
Download the datafile “mus03data.dta” from the MEPS directory in the Datasets section of Blackboard. This dataset is an extract from the 2003 Medical Expenditure Panel Survey, used in Cameron & Trivedi (2010), Microeconometrics using Stata (Rev.Ed.). It contains individuals 65
and older, all of whom have health insurance through Medicare. But Medicare does not cover all medical expenditures, and in particular, at that time did not cover prescription drugs. Therefore, some individuals had supplemental health insurance, but not everyone did.
You will need the following variables:
_________________________________________________
totexp Total medical expenditure___
ltotexp ln(totexp) if totexp > 0___
posexp =1 if total expenditure > 0___
suppins =1 if has supplemental private insurance
phylim =1 if has functional limitation
actlim =1 if has activity limitation
totchr number of chronic problems
age Age
female =1 if female
income annual household income/1000
famsze Size of the family
____________________________________________________
Note that “total medical expenditure” as measured in the MEPS does not only include what the individualpays (“out-of-pocket”), but also what the insurance company or Medicare pays.
(a) Load the data. In the income variable, the code -1 is a special kind of missing, so to avoid complications later on, set this to missing (“.”). What is the percentage of individuals who have supplemental insurance? What is the percentage of individuals who have positive medical expenditure? ¨
(b) Regress posexp on suppins. Also, regress totexp on suppins. For both regressions, answer the following questions: (i) What is the interpretation of the constant and the coefficient? (ii) Is the ¨
coefficient significant? (iii) Give an explanation for the sign of the coefficient (i.e., explain why it is positive or negative).
(c) The remaining questions use only the data that have no missings on lotexp and income, so drop any observations with missings on these variables. Regress ltotexp on suppins. What is the interpretation of the coefficient? (For answering this question, you may pretend the coefficient is “small”.)
(d) “totchr” is the number of chronic conditions an individual has, and it seems obvious that medical expenditures are (partly/largely) spent on treating these conditions, so expenditures should be related to totchr. Therefore, add totchr as a control to the regression in (c). Comparing the results from this regression with the regression in (c), what is your estimate of the omitted variables bias in (c)? ¨
(e) Estimate the two factors that make up the omitted variables bias and verify that these correspond with your estimate in (d). Do you find the evidence for the presence of omitted variables bias in the regression in (c) strong or weak? ¨
(f) Investigate whether medical expenditures are related to family size by adding famsze to the regression in (d). Rerun this same regression, but now treating family size as a categorical variable (using dummies). Choose one of the estimated dummy coefficients and explain how this coefficient should be interpreted. ¨
(g) Add the remaining variables (phylim, actlim, age, female, income) to the regression in (f) that uses the dummies. Comment on the sign and significance of the age variable. How would you explain this result?¨
(h) Compute the fitted values and residuals from the regression in (g) and compute their sample means. Explain how you could find these sample means without actually computing these two variables. Make a histogram of the residuals. What do you conclude about the assumption that ei is normally distributed?¨
(i) On slide 16 of lecture 7, it is stated that a possible model of heteroskedasticity is an exponential function of the regressors. A crude way to implement this is the following: (i) compute vˆi = ln(eˆ2i ); (ii) regress vˆi on Yˆi. Does this provide evidence for heteroskedasticity?
(j) Comment on how well the regressions from (c), (d), (f), and (g) fit the data. Compare the coefficient of supplemental insurance in these regressions. Does the coefficient vary a lot among these regressions? How about the standard error? How confident are you about the causal effect of supplemental insurance on medical expenditures? (For example, could there be selection bias? If so, give an example. If not, give an argument why not.)
FOR YOUR ASSIGNMENTS TO BE DONE AT A CHEAPER PRICE PLACE THIS ORDER OR A SIMILAR ORDER WITH US NOW