Statistical Inference Lukˊaˇs Laffˊers Matej Bel University, Dept. of Mathematics MUNI Brno 1.10.2021 Maximum likelihood Bootstrap Maximum likelihood Statistical inference deals with the problem of quantifying uncertainty. By uncertainty we mean the statistical uncertainty, not the model uncertainty. Given the fact that our sample size is limited. How sure/unsure are we regarding our parameter estimate? Example 1 - Tossing a coin We observe the following 0000010000100100000001000010010100···0001000010000 500 tosses 97 heads, 403 tails. These are independent coin flips of a single coin with a fixed probability of showing the head. Pr(X = 97) = 500 97 p97 (1 −p)403 Is it fair? If p = 0.5 we would see 97 heads with probability 9.31491 ·10−46 (strictly mathematically speaking: not a whole lot) Example 1 - Tossing a coin What value of p is the most likely? Find the one that makes Pr(X = 97) most likely. Example 2 - Challenger Disaster Oi ∼ Bern(pi) Oi ⊥ Oj Fi = ∑6 i=1 Oi ∼ Bin(6,pi) g(pi) = β0 +β1tempi Example 3 - waiting time We observe inter-arrival times of a insurance claims (in days). 2.07 5.06 6.51 1.75 13.95 2.55 ... 18.03 1.92 1.03 100 observations These may be exponentially distributed. what value would fit the data best? Notation X random variable X1,...Xn iid from parametric distribution f(x|θ) θ ∈ Θ unknown parameter to be estimated. The true value is denoted as θ0. Example: X ∼ Exp(λ) f(x|λ) = exp(−x/λ)/λ λ ∈ [0,∞) unknown parameter to be estimated. The true value is denoted as λ0. Likelihood function: Ln(θ) ≡ f(X1|θ)·...·f(Xn|θ) = ∏i f(Xi|θ) unlike density f it is a function of a parameter θ with data kept fixed i.i.d. is crucial Example: Ln(λ) = ∏ i 1 λ exp − Xi λ = 1 λn exp − n¯Xn λ Maximum likelihood estimator: ˆθ ≡ argmaxθ Ln(θ) what parameter value can rationalise the given data best? the estimator is a random variable, because the data is random has some favourable statistical properties can be computed analytically or numerically Example: We need to solve F.O.C.: 0 = ∂ ∂λ Ln(λ) = −n 1 λn+1 exp − n¯Xn λ + 1 λn exp − n¯Xn λ n¯Xn λ2 ˆλ = ¯Xn Log-likelihood function: n(θ) ≡ logLn(θ) = ∑i logf(Xi|θ) Numerically more stable. argmaxθ n(θ) = argmaxθ Ln(θ) Example: n(λ) = ∑ i logf(Xi|θ) = ∑ i −logλ − Xi λ = −nlogλ − n¯Xn λ Expected log density (θ) ≡ E[logf(X|θ)] under correct specification we have likelihood analog principle: θ0 = argmaxθ l(θ) Example: (θ) = E[logf(X|θ)] = E[−logλ −X/λ] = −logλ − E[X] λ = −logλ − λ0 λ FOC gives 0 = 1 λ + λ0 λ2 which has an unique solution λ = λ0. Score function: Sn(θ) ≡ ∂ ∂θ n(θ) = ∑i ∂ ∂θ logf(Xi|θ) How sensitive is the likelihood to θ for interior solution we have Sn(ˆθ) = 0 Example: Sn(λ) = ∂ ∂λ −nlogλ − n¯Xn λ = − n λ + n¯Xn λ2 Likelihood Hessian: Hn(θ) ≡ − ∂2 ∂θ∂θT n(θ) = −∑i ∂2 ∂θ∂θT logf(Xi|θ) tells us how curved is the log-likelihood Example: Hn(λ) = − ∂2 ∂λ2 n(λ) = − ∂ ∂λ Sn(λ) = − n λ2 + 2n¯Xn λ3 Efficient score: S ≡ ∂ ∂θ logf(X|θ0) derivative of a log-likelihood of a single observation mean zero random vector E[S] = E ∂ ∂θ logf(X|θ0) = ∂ ∂θ E [logf(X|θ0)] = ∂ ∂θ (θ0) = 0 Example: S = ∂ ∂λ logf(X|λ0) = − 1 λ0 + X λ2 0 . E[S] = − 1 λ0 + E[X] λ2 0 = − 1 λ0 + λ0 λ2 0 = 0 Fisher information: Jθ ≡ E[SST ] variance of the efficient score S Example: Jλ = E[S2 ] = Var[S] E[S]=0 = Var − 1 λ0 + X λ2 0 = 1 λ4 0 Var[X] = 1 λ2 0 Expected Hessian: Hθ ≡ − ∂2 ∂θ∂θT (θ0) under regularity conditions Hθ = −E ∂2 ∂θ∂θT logf(X|θ0) Example: Hθ = − ∂2 ∂λ2 (λ)|λ=λ0 = − ∂2 ∂λ2 −logλ − λ0 λ |λ=λ0 = 1 λ2 0 Under correct specification of f(x|θ) (there exists some θ0 ∈ Θ so that f(x|θ0) = f(x)), we have Information Matrix Equality: Jθ = Hθ Example: Jλ = 1 λ2 0 = Hλ MLE has some interesting properties invariant to transformations asymptotically efficient in the class of unbiased estimators (even for transformations) consistent asymptotically normal MLE is invariant to transformations ˆθ is the MLE of θ =⇒ ˆβ = h(ˆθ) is the MLE of β = h(θ) MLE asymptotically achieves Cramer-Rao Lower Bound Under (i) correct specification, (ii) support of X not being dependent on θ and (iii) θ0 lying in the interior of Θ For any unbiased ˜θ we have that Var[˜θ] ≥ (nJθ )−1 For transformation β = h(θ) (under some more regularity conditions) we get that for any unbiased estimator ˜β of β: Var[˜β] ≥ 1 n HT J−1 θ H where H = ∂ ∂θ h(θ0)T . Average log-likelihood: ¯n(θ) ≡ 1 n n(θ) = 1 n ∑i logf(Xi|θ) MLE is consistent, ˆθ →P θ under these conditions: Xi are i.i.d. E |logf(X|θ)| ≤ G(X), with E[G(X)] < ∞ logf(X|θ) is continuous in θ with probability one Θ is compact ∀θ = θ0 : l(θ) < l(θ0) (so that the parameter θ is identified) MLE is asymptotically normally distributed Why? Taylor expansion around θ0: 0 = ∂ ∂θ ¯n(ˆθ) ≈ ∂ ∂θ ¯n(θ0)+ ∂2 ∂θ∂θT ¯n(θ0)(ˆθ −θ0) √ n(ˆθ −θ0) ≈ ∂2 −∂θ∂θT ¯n(θ0) −1 →P H−1 θ √ n ∂ ∂θ ¯n(θ0) →D N(0,Jθ ) →D N(0,H−1 θ Jθ H−1 θ ) = N(0,J−1 θ ) OLS is MLE under normal errors y = Xβ +ε if we assume that ε ∼ N(0,σ2 I) then we get that ˆβMLE = (XT X)−1 XT y and ˆσ2 = 1 n ˆεT ˆε Bootstrap Example - rolling a dice (again) Data is all we have ˆFn → F we wish to understand sample variation, but we don’t have F at least we have our data ˆFn use our ˆFn to simulate new ”bootstrap” datasets Bootstrap in understanding the sample variation Suppose we are considering choosing between two different estimators ˜β and ˆβ These may possess different qualities The question is: Given that you have to pick only once, which one would you choose?? Assume we are in some of the following situations small data sample =⇒ Asymptotic approximations are unreliable (Ex: n = 15 in linear regression) our estimator is complex and we can’t even derive asymptotic approximation (Ex: result of a numerical optimization) asymptotic distribution depends on the unknown parameter (Ex: X1,X2,...,Xn ∼ f(.), sample median ˆm ∼ N m, 1 4nf(m)2 ) traditional estimator is based on dubious assumptions (Ex: stock returns may have fat tails) *Example - Stamp thickness Bootstrap - some remarks very general approach that makes few assumptions bootstrapped distribution can be used to construct standard errors, confidence intervals, bias correction *Bootstrap may fail Paradox: we wish to use it situations that are complex, but in these, it may be also difficult to prove that it ”works” It may fail if the parameter lies on the boundary of the parameter space (Ex: X N(µ,1) where µ ∈ [0,∞] - Andrews, 2000) If there is missing support information: Sample maximum: F0 has support [0,θ0]. ˆθn = max{X1,...,Xn}. ˆTn = n(ˆθn −θ), T∗ n = n(ˆθ∗ n − ˆθn). P∗ n (T∗ n = 0) = 1 −(1 −1/n)n → 1 −e−1 whereas P(ˆTn = 0) → 0. *What if bootstrap fails? Subsampling we draw smaller bootstrap samples without replacement intuition: we sample directly from the true distribution (F0), not from the estimated one (ˆFn) more general than bootstrap less efficient if the regular bootstrap works practical problem - how to choose subsample size? Thank you for your attention! References Very non-technical explanation of MLE in Economics: Lanot, Gauthier. ”Maximum likelihood and economic modeling.” IZA World of Labor 326 (2017). MLE is explained in Hansen’s Probability chapter 10 https://www.ssc.wisc.edu/∼bhansen/probability/ Appendix A in Faraway (2016) provides reasonable basics: Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. CRC press, 2016. A book length treatment of the Bootstrap by the inventors (47000 google scholar citations): Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Bootstrap animations https://www.stat.auckland.ac.nz/∼wild/BootAnim/ A very short and succinct explanation of bootstrap and subsampling in a blog post by Larry Wasserman: https://normaldeviate.wordpress.com/2013/01/19/bootstrapping-and-subsampling-part-i/ and https://normaldeviate.wordpress.com/2013/01/27/bootstrapping-and-subsampling-part-ii/ *A rigourous theory on bootstrap is in chapter 23 in Van der Vaart, Aad W. Asymptotic statistics. Vol. 3. Cambridge university press, 2000. *Andrews, Donald WK. ”Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space.” Econometrica (2000): 399-405.