DXE_EMTR 2021
First assignment (20% of total grade)
Please submit the assignment by 29 Oct in the IS MUNI system. You are allowed to work in groups of
maximum size 3.
1 Regression basics
Write a short essay (no more than 1000 words) discussing the following article:
Imbens, Guido W. "Statistical Signiﬁcance, p-Values, and the Reporting of Uncertainty."Journal of Economic
Perspectives 35.3 (2021): 157-74.
These questions could help you to streamline the discussion:
• what makes the (ab)use of p-values problematic in some contexts?
• should p-values be banned?
• what are some possible strategies for addressing the problem of ’p-hacking’ and publication bias?
Make sure to add your views/perspective, that may be speciﬁc to the ﬁeld of your research expertise.
2 Identiﬁcation
Assume that
• Y, X, are random variables,
• Y = (X − θ)2
+ ,
• E( ) = 0,
• Data reveals φ which is the joint distribution of (Y, X),
• θ is the parameter of interest.
Here are your tasks:
• Deﬁne: the model and the structure in the framework of Lewbel, Arthur. "The identiﬁcation zoo:
Meanings of identiﬁcation in econometrics."Journal of Economic Literature 57.4 (2019): 835-903..
• Show under what conditions is the parameter θ
– point identiﬁed
– set identiﬁed and ﬁnd the identiﬁed set
– not identiﬁed.
• Try to ﬁnd an intuitive explanation for your answer from the previous subquestion.
• Suppose we have an additional variable Z, so that the data reveals φ which is the joint distribution of
(Y, X, Z). Furthermore we replace the assumption E( ) = 0 with the following assumptions: E(Z) = 0,
E(Ze) = 0 and E(ZX) = 0. Is θ point identiﬁed? If so, how would you estimate it?
3 Maximum likelihood
In many situations our dependent variables describe a number of events, so that Y ∈ {0, 1, 2, ...} without
a natural upper bound. This may be, for instance, a number of car crashes, a number of earthquakes or a
number of visitors. Our ambition may be to ﬁnd an association between the variation in the y and some
explanatory variables x1, · · · xp, these may include weather conditions, geographic location or time of day,
depending on what the variable y is.
Consider the special type of regression (called Poisson regression), where we assume the following as-
sumptions:
1
• Data sample consists of n i.i.d. observations (yi, xi1, · · · , xip),
• yi ∼ Pois(λi),
• log(λi) = β0 + β1xi1 + · · · βpxip,
You are asked to do the following:
• We are interested in estimating the vector of unknown parameters (β0, β1, · · · , βp). Derive the likelihood
function (conditional on the covariates x1, · · · , xp), the score function and the Fisher information matrix
for this model.
• Using a small simulation study in R:
– demonstrate that the maximum likelihood estimator of β1 for this particular model has asymptotically
normal distribution. (You don’t need to implement the optimization yourself, it is OK if
you make use of glm function in R with option family = poisson).
– Explore how the sample size aﬀects the variance of the estimator.
4 Bootstrap
Consider the maximum likelihood estimator of the unknown parameter β1 from the previous task.
• Construct a 95% conﬁdence interval based on the non-parametric percentile bootstrap.
• Construct a 95% conﬁdence interval based on the normal approximation and use bootstrap to estimate
the standard errors.
• Using a simulation study in R, compare the coverage properties of these two conﬁdence intervals. That
is: show that the conﬁdence intervals cover the true value in approximately 95% simulated cases.1
Make sure to comment your code and make your best eﬀort to adhere to some reasonable coding standards.
Your code must be easy to read. Present your results in a coherent way and whenever possible make use of
visualization.
1Notice that this may require a lot of computing time - if a single bootstrap conﬁdence interval is based on 100 bootstrap
samples and you will run 500 simulations, you will need to estimate the Poisson regression model 100 · 500 = 50000 times. So
you may need to keep the basic model speciﬁcation simple.
2