– Simulate a regression in R by manipulating the code below. Thecode below will produce a multiple regression with normal
distributed variables. This simulation expects GDP to be a function
of balance of payments (pay balance), foreign direct investment
(fdi), inflation, continent, and debt.
. (a) Run the code as is and save the results. Visually check the residual
plots of the first order model (no squared terms) for the following
assumptions: • functional form (including leverage points) •
constant variance • normality
. (b) Try a squared term for for fdi and debt (one at a time). Use a F test
between the squared term model and the first order model. Do the F
test and the regular OLS T-tests agree? Which should you use to
decide if they should be included for the rest of the questions.
.
(c) Test the residuals for normality using a normal probability plot, the
Kolmogorov-Smirnov test, and the Shapiro-Wilk test. Do they
agree? Which do you think is more appropriate?
. (d) For each of the following, interpret the coefficient from the
regression in plane language. Compare the linear regression
estimate to the known parameter values from the regression
simulation. Why are there differences? • intercept • inflation •
debt
(e) Interpret the geography term.
•
Interpret the coefficient for ”East Asia”.
•
Check the number of observations in each factor. Does this
influence the linear re- gression results?
•
If some factor levels fail to achieve statistical significance, can
they be dropped?
•
What is the interpretation when continent=”East Asia and the
Pacific”?
(f) Change fdi and debt to non-normal distributions. You must choose 2
different distribu- tions from the following list: • Change FDI to a
cauchy distribution (rcauchy(n)). • Change debt to any one of the
following: – A beta distribution with parameters (0.4, 0.6) – A
negative Binomial distribution with a size=3 and a prob=0.3 – A uniform
distribution ranging from [-10, 20] (Please note, balance of payments
should remain a normal distribution and the simulated coefficients should
remain constant)
. (g) Rerun the new model (with non-normal variables) and check the
residual diagnostics again.
•
Check the residual plots for the same assumptions in part ”a”.
•
Run a normality test of your choosing (similar to part ”b”).
•
Check for leverage points using Cook’s D. If any are found, what
should be done? Is the model still valid?
•
Report any notable differences between coefficients for the
original (normal) model and the new findings. Explain why
you think differences occurred.
•
If a leverage point exists, remove it and rerun the model. (only
remove one if there are many.) Is the new model an
improvement?
. (h) Hopefully your model is working well, but just in case it is not lets
run a bootstrap on it. Take the previous model (from f) and run a
bootstrap with at least 10,000 replications.
•
Report the updated coefficients and standard errors and compare
them to the previous model.
•
Would a bootstrap fix any issues you identified with the model
from ”f”?
• Which model would you recommend and why?
(i) With your new (using non-normal distributions) model, decrease the
sample size by 50% and simulate a new data set. Report the coefficients
and check the residual plots. What effect does the decreased sample size
have on the model? Report any notable differences and explain why you
think they occurred.
# R:
set.seed(298376)
n= 60
fdi