DXE_EMTR 2021 First assignment (20% of total grade) Please submit the assignment by 29 Oct in the IS MUNI system. You are allowed to work in groups of maximum size 3. 1 Regression basics Write a short essay (no more than 1000 words) discussing the following article: Imbens, Guido W. "Statistical Significance, p-Values, and the Reporting of Uncertainty."Journal of Economic Perspectives 35.3 (2021): 157-74. These questions could help you to streamline the discussion: • what makes the (ab)use of p-values problematic in some contexts? • should p-values be banned? • what are some possible strategies for addressing the problem of ’p-hacking’ and publication bias? Make sure to add your views/perspective, that may be specific to the field of your research expertise. 2 Identification Assume that • Y, X, are random variables, • Y = (X − θ)2 + , • E( ) = 0, • Data reveals φ which is the joint distribution of (Y, X), • θ is the parameter of interest. Here are your tasks: • Define: the model and the structure in the framework of Lewbel, Arthur. "The identification zoo: Meanings of identification in econometrics."Journal of Economic Literature 57.4 (2019): 835-903.. • Show under what conditions is the parameter θ – point identified – set identified and find the identified set – not identified. • Try to find an intuitive explanation for your answer from the previous subquestion. • Suppose we have an additional variable Z, so that the data reveals φ which is the joint distribution of (Y, X, Z). Furthermore we replace the assumption E( ) = 0 with the following assumptions: E(Z) = 0, E(Ze) = 0 and E(ZX) = 0. Is θ point identified? If so, how would you estimate it? 3 Maximum likelihood In many situations our dependent variables describe a number of events, so that Y ∈ {0, 1, 2, ...} without a natural upper bound. This may be, for instance, a number of car crashes, a number of earthquakes or a number of visitors. Our ambition may be to find an association between the variation in the y and some explanatory variables x1, · · · xp, these may include weather conditions, geographic location or time of day, depending on what the variable y is. Consider the special type of regression (called Poisson regression), where we assume the following as- sumptions: 1 • Data sample consists of n i.i.d. observations (yi, xi1, · · · , xip), • yi ∼ Pois(λi), • log(λi) = β0 + β1xi1 + · · · βpxip, You are asked to do the following: • We are interested in estimating the vector of unknown parameters (β0, β1, · · · , βp). Derive the likelihood function (conditional on the covariates x1, · · · , xp), the score function and the Fisher information matrix for this model. • Using a small simulation study in R: – demonstrate that the maximum likelihood estimator of β1 for this particular model has asymptotically normal distribution. (You don’t need to implement the optimization yourself, it is OK if you make use of glm function in R with option family = poisson). – Explore how the sample size affects the variance of the estimator. 4 Bootstrap Consider the maximum likelihood estimator of the unknown parameter β1 from the previous task. • Construct a 95% confidence interval based on the non-parametric percentile bootstrap. • Construct a 95% confidence interval based on the normal approximation and use bootstrap to estimate the standard errors. • Using a simulation study in R, compare the coverage properties of these two confidence intervals. That is: show that the confidence intervals cover the true value in approximately 95% simulated cases.1 Make sure to comment your code and make your best effort to adhere to some reasonable coding standards. Your code must be easy to read. Present your results in a coherent way and whenever possible make use of visualization. 1Notice that this may require a lot of computing time - if a single bootstrap confidence interval is based on 100 bootstrap samples and you will run 500 simulations, you will need to estimate the Poisson regression model 100 · 500 = 50000 times. So you may need to keep the basic model specification simple. 2