Statistical Inference Testing of Statistical Hypotheses Stanislav Katina1 1Institute of Mathematics and Statistics, Masaryk University Honorary Research Fellow, The University of Glasgow December 11, 2018 1 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Null and alternative hypothesis a ’hypothesis’ is a theory which is assumed to be true unless evidence is obtained which indicates otherwise ’null’ means ’nothing’ and the term ’null hypothesis’ (H0) means a ’theory of no change’ – that is ’no change’ from what would be expected from past experience ’alternative hypothesis’ (H1) means a ’theory of change’ – that is ’change’ from what would be expected from past experience the procedure which is used to decide between these two opposite theories is called ’hypothesis test’ or sometimes ’significance test’ one-tail test – test in which the alternative hypothesis proposes a change in parameter in only one direction – increase or decrease two-tail test– test in which the alternative hypothesis suggests a difference in parameter in either direction 2 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Test statistic, rejection and acceptance region, critical value and quantile the test statistic is calculated from the sample – its value is used to decide whether the null hypothesis should be rejected the rejection (or critical) region gives the values of the test statistic for which the null hypothesis is rejected the acceptance region gives the values of the test statistic for which the null hypothesis is not rejected the boundary value(s) of the rejection region is (are) called the critical value(s) or quantile(s) the significance level α of a test gives the probability of the test statistic falling in the rejection region when null hypothesis is true 3 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Hypothesis testing procedure a hypothesis is a statement about a population parameter base on a sample from this population H0 and H1 are two complementary hypotheses in a hypothesis testing problem a hypothesis testing procedure or hypothesis test is a rule that specifies – for which sample values the decision is made to accept null hypothesis as true – and for which sample values H0 is rejected the subset of sample space for which H0 will be rejected is called rejection region (critical region) the complement of the rejection region is called the acceptance region 4 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Four possibilities Four choices: A H0 is true – our decision is to reject H0 B H0 is true – our decision is not to reject H0 C H1 is true – our decision is not to reject H0 D H1 is true – our decision is to reject H0 Decision-reality table: decision/reality H0 is true H0 is not true to reject H0 Type I error true decision not to reject H0 true decision Type II error 5 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Four possibilities Four choices: A) Pr(A) = Pr(Type I error) ≤ α [significance level] B) Pr(B) ≥ 1 − α [coverage probability, confidence coefficient (level)] C) Pr(C) = Pr(Type II error) = β D) Pr(D) = 1 − β [power] Four choices (formalised): A) 1 − α ≤ Pr(don’t reject H0|H0 is true) B) α ≥ Pr(CHPD) = Pr(reject H0|H0 is true) C) β = Pr(CHDD) = Pr(don’t reject H0|H0 isn’t true) D) 1 − β = Pr(reject H0|H0 isn’t true) 6 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Empirical 100 × (1 − α)% confidence intervals for parameter θ Relationship of confidence interval and statistical test Empirical 100(1 − α)% confidence interval (CI) for parameter θ α-level hypothesis test about θ Three types of intervals: Pr(LB(X) < θ < UB(X)) = 1 − α (two-tailed CI) Pr(θ < UB∗ (X)) = 1 − α (one-tailed (right-tailed) CI) Pr(LB∗(X) < θ) = 1 − α (one-tailed (left-tailed) CI) 7 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Acceptance region Definition (Acceptance region of H0) Let X be a random variable with certain distribution (probabilistic model) dependent on parameter θ ∈ Θ, g (θ) is parametric function. We are testing null hypothesis H01 : g (θ) = g(θ0) against two-sided alternative H11 : g (θ) = g(θ0). Let (LB, UB) be interval estimate of parametric function g (θ) with coverage probability 1 − α. Then ACI,1 = {LB, UB; g(θ0) ∈ (LB, UB)} is acceptance region of a test H01 against H11 on significance level α. If we are testing H02 : g (θ) ≤ g(θ0) against one-sided (right) alternative H12 : g (θ) > g(θ0) and if LB∗ be lower estimate of g (θ) with coverage probability 1 − α, then ACI,2 = {LB∗; LB∗ < g(θ0)} is acceptance region of a test H02 against H12 on significance level α. If we are testing H03 : g (θ) ≥ g(θ0) against one-sided (left) alternative H13 : g (θ) < g(θ0) and if UB∗ is upper estimate of g (θ) with coverage probability 1 − α, then ACI,3 = {UB∗ ; UB∗ > g(θ0)} is acceptance region of a test H03 against H13 on significance level α. 8 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Rejection region Definition (Rejection (critical) region of H0) Let X be a random variable with certain distribution (probabilistic model) dependent on parameter θ ∈ Θ, g (θ) is parametric function. We are testing null hypothesis H01 : g (θ) = g(θ0) against two-sided alternative H11 : g (θ) = g(θ0). Let (LB, UB) be interval estimate of parametric function g (θ) with coverage probability 1 − α. Then WCI,1 = {LB, UB; g(θ0) /∈ (LB, UB)} is critical region of a test H01 against H11 on significance level α. If we are testing H02 : g (θ) ≤ g(θ0) against one-sided (right) alternative H12 : g (θ) > g(θ0) and if LB∗ be lower estimate of g (θ) with coverage probability 1 − α, then WCI,2 = {LB∗; LB∗ ≥ g(θ0)} is critical region of a test H02 against H12 on significance level α. If we are testing H03 : g (θ) ≥ g(θ0) against one-sided (left) alternative H13 : g (θ) < g(θ0) and if UB∗ is upper estimate of g (θ) with coverage probability 1 − α, then WCI,3 = {UB∗ ; UB∗ ≤ g(θ0)} is critical region of a test H03 against H13 on significance level α. 9 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Test criterion Definition (Test criterion) A test criterion is a test statistic T = T0 = T0(X1, X2, . . . , Xn), with known asymptotic distribution if H0 is true. The set of possible values of T0 is divided to two subsets, i.e. acceptance region H0 (notation A) and critical region H0 (notation W). These two regions are divided by critical values tα/2 and t1−α/2, resp. tα and t1−α (for particular H0 and H1) of the distribution of test statistics T0 (if H0 is true). Definition (Confidence interval) A confidence interval (CI) is a type of interval estimate of a population parameter θ. It is an observed, often called empirical, interval (i.e., it is calculated from the observations) that includes the value of an unobservable parameter θ if the experiment is repeated. The frequency that observed interval contains the parameter is determined by the confidence coefficient 1 − α (i.e. confidence level, coverage probability). 10 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test Step 1 define the null and alternative hypothesis (H0 and H1) Step 2 decide on a significance level α = 0.1, 0.05, 0.01 Step 3 calculate the test statistic (test criterion) T0 Step 4 determine the critical value(s) Step 5 decide on the outcome of the test (reject/don’t reject H0) depending on one of the following ways: base on critical region W = WT (observed test statistic t0 = tobs and critical values tα/2 and t1−α/2, resp. tα and t1−α), base on critical region WIS, i.e. empirical confidence interval (and g(θ0)), base on p-value. Step 6 state the conclusion in words 11 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on test statistic and critical value Definition (Testing based on critical region W) Rejecting H0. If observed test statistic (realisation of test statistic) t0 of test statistic T0 is within a critical region W (equivalently is not from an acceptance region A), H0 is rejected at a significance level α, i.e. we do have sufficiently enough evidence to reject H0. Not rejecting H0. If observed test statistic t0 of test statistic T0 is within an acceptance region A (equivalently, it is not from a critical region W), H0 is not rejected at a significance level α, i.e. we don’t have sufficiently enough evidence to reject H0. Let tmin be the smallest possible value of a test criteria T0 and tmax be the highest possible value of a test criteriaT0, then 1 two-sided alternative – critical region W1 = (tmin, t1−α/2 ∪ tα/2, tmax), 2 one-sided (right) alternative – critical region W2 = tα, tmax), 3 one-sided (left) alternative – critical region W3 = (tmin, t1−α . 12 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on CI Definition (Testing based on CI) Rejecting H0: If g(θ) = g(θ0) is within CI (H0 is valid), H0 is rejected at the significance level α, i.e. we do have sufficiently enough evidence to reject H0. Not rejecting H0: If g(θ) = g(θ0) is not within CI (H0 is valid), H0 isn’t rejected at a significance level α, i.e. we don’t have sufficiently enough evidence to reject H0. Relationship of confidence interval and statistical test hypothesis testing ≡ CIs α-level hypothesis test ≡ 100(1 − α)% CI one-tail test ≡ one-sided CI (left-sided CI ≡ right-sided alternative, right-sided CI ≡ left-sided alternative two-tail test ≡ two-sided CI parameter(s) ∈ CI ≡ not reject H0 parameter(s) /∈ CI ≡ reject H0 13 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on p-value (observed significance level) Definition (Testing based on p-value) Minimal significance level α (for some test statistic T0), base on which H02 : g(θ) ≤ g(θ0) is rejected (tested against H12 : g(θ) > g(θ0)), is called observed significance level or p-value, i.e. p-value = αobs = sup θ∈Θ0 Pr (T(X1, X2, . . . , Xn) ≥ T(x1, x2, . . . , xn); θ) . This could be written less formally as p-value = Pr(any test statistics equal or greater than observed |H0 is true). The closer αobs is to zero, the smaller is the probability that any test statistic T(X1, X2, . . . , Xn) produces a p-value (under H0) equal to or smaller than that observed, while the probability is higher under H1. Therefore, p-value could be understood as an indicator of credibility of H0. 14 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on p-value (observed significance level) Usually, if αobs < α = 0.05, there is sufficiently enough evidence to reject H0 and the result of a test is statistically significant. While αobs > α = 0.1, there is sufficiently enough evidence to reject H0 and the result of a test is not statistically significant. The values between 0.05 and 0.1 should be taken as reference points in a broad sense. As αobs gets closer to either boundary point of the interval 0.05, 0.1 , so this is taken as increasing evidence for one or other alternative. Situation with αobs ∈ 0.05, 0.1) are usually most difficult to handle and the result is here marginally statistically significant. 15 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on p-value (observed significance level) Wording of the results of a statistical test: range for p-value stars of significance wording of the result 0, 0.001) *** extremely highly statistically significant 0.001, 0.01) ** high statistically significant 0.01, 0.05) * statistically significant 0.05, 0.1) ∙ marginally statistically significant 0.1, 1 non-significant 16 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on p-value (observed significance level) Interpretation of p-values: p-value < 0.001: the prevalence of an estimated effect is smaller than one to one thousand (the odds of estimated effect is smaller than 1 : 999), if an effect is not present in a population (the presence of such an effect is highly improbable, if an effect is not present in a population – and – the presence of such an effect is highly probable, if an effect is present in a population) p-value < 0.01: the prevalence of an estimated effect is smaller than one to one hundred (the odds of estimated effect is smaller than 1 : 99), if an effect is not present in a population (the presence of such an effect is very improbable, if an effect is not present in a population – and – the presence of such an effect is very probable, if an effect is present in a population) p-value < 0.05: the prevalence of an estimated effect is smaller than one to one hundred (the odds of estimated effect is smaller than 5 : 95 or 1 : 19), if an effect is not present in a population (the presence of such an effect is sufficiently improbable, if an effect is not present in a population – and – the presence of such an effect is sufficiently probable, if an effect is present in a population) p-value ≥ 0.05: the prevalence of an estimated effect is five to one hundred or greater (5 % or more); p-value = k, k ∈ 0.05, 1 : the prevalence of an estimated effect is 100 × k to one hundred (100 × k % or more). 17 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses To carry out a hypothesis test – based on p-value (observed significance level) How is the p-value (mostly) calculated? 1 two-sided alternative – p-value = 2 min(Pr(T0 ≤ t0|H0), Pr(T0 ≥ t0|H0)), e.g. for normal and Student distribution of test statistic (symmetric distributions) and for χ2 df and Fdf1,df2 distribution of test statistic (asymmetric distributions) or p-value = min(Pr(T0 ≤ t0|H0), Pr(T0 ≥ t0|H0)), e.g. for χ2 df and Fdf1,df2 distribution of test statistic (asymmetric distributions) 2 one-sided (right) alternative – p-value = Pr(T0 ≥ t0|H0) 3 one-sided (left) alternative – p-value = Pr(T0 ≤ t0|H0) 18 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses On a philosophical level distinction between ’rejecting H0’ and ’accepting H1’ ’rejecting H0’ – nothing implies about what state the experimenter is accepting, only that the state defined by H0 is being rejected distinction between ’accepting H0’ and ’not rejecting H0’ ’accepting H0’ – the experimenter is willing to assert the state of nature specified by H0 ’not rejecting H0’ – the experimenter really does not believe H0 but does not have the evidence to reject it 19 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Conservative and liberal test and CI Definition (Conservative and liberal test) A test with actual/observed significance level smaller than nominal significance level α, is called conservative (the test should theoretically be ”rejecting quickly” H0, but, in reality, it is the opposite, i.e. the test is ”rejecting slowly”). A test with actual/observed significance level greater than nominal significance level α, is called liberal (the test should theoretically be ”rejecting slowly” H0, but, in reality, it is the opposite, i.e. the test ”rejecting quickly”). Definition (Conservative and liberal CI) CI with actual/real coverage probability greater than nominal coverage probability 1 − α, is called conservative (i.e. the probability that θ0 is within CI is greater that expected). CI with actual/real coverage probability smaller than nominal coverage probability 1 − α, is called liberal (i.e. the probability that θ0 is within CI is smaller that expected). 20 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood ratio – generalised relative likelihood Two types of hypotheses: 1 simple hypothesis – H0 : θ = θ0 against H1 : θ = θ0, then simple likelihood ratio is equal to λ(x) = λ = L(θ0|x) supθ∈Θ L(θ|x) = L(θ0|x) L(θ|x) , where λ(x) = L(θ0|x) is test statistic and L(θ|x) is continuous for all x. 2 composite hypothesis – H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1, then generalised likelihood ratio is equal to λ(x) = supθ∈Θ0 L(θ|x) supθ∈Θ L(θ|x) . 21 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood ratio test statistic Subsets of Θ, Θ0 and Θ1, remain the same after monotone transformation of λ(x), i.e. the statistical tests before and after transformation are equivalent. Therefore, likelihood ratio test statistic is equal to ULR = −2 ln λ(X). Its realisation, observed likelihood ratio test statistic, is equal to uLR = −2 ln λ(x), where uLR ∈ (0, ∞). 22 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics After applying Taylor series of l(θ0) about θ, ULR = −2(l(θ0|X) − l(θ|X)) ≈ −2 (θ0 − θ)S(θ) − 1 2 (θ0 − θ)2 I(θ) , where S(θ) = 0. Under H0, Wald test statistic UW, is defined as follows ULR ≈ n(θ0 − θ)2 I(θ0) n ≈ n(θ0 − θ)2 i(θ0) H0 ≈ n(θ0 − θ)2 i(θ) = UW, where 1 n I(θ) P → i(θ0); its realisation, observed Wald test statistic is uW . Under H0, Score test statistic US, is defined as follows ULR ≈ n(θ0 − θ)2 i(θ0) H0 ≈ (S(θ0))2 n i(θ0) = US, where √ n(θ − θ0) H0 ≈ S(θ0)/( √ n(i(θ0))); its realisation, observed Score test statistic is uS. 23 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics Geometrical interpretation: 1 ULR – is measuring properly standardised difference between log-likelihoods in θ and θ0 (i.e. in direction of y axis) 2 UW – is measuring properly standardised absolute value of a difference of θ a θ0 (in direction of x axis) 3 US – is measuring properly standardised slope of log-ratio in θ0 24 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics Example (normal distribution) Let X ∼ N(μ, σ2 ), where σ2 is known, H0 : μ = μ0 against H1 : μ = μ0, where θ0 = (μ0, σ2 )T . Then 1 ULR = −2(l(θ0|X) − l(θ|X)) = − n i=1(Xi − X)2 /σ2 + n i=1(Xi − μ0)2 /σ2 = n(X−μ0)2 σ2 , 2 UW = (X − μ0)2 I(x) = n(X−μ0)2 σ2 , 3 US = (S(μ0))2 I(μ0) = (n(X−μ0)/σ2 )2 n/σ2 = n(X−μ0)2 σ2 . All three test statistics are equal, i.e. ULR = UW = US. 25 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics – tests about one parameter Let θ be a scalar. Null hypothesis H0 : θ = θ0 and alternative hypothesis H1 : θ = θ0, where θ0 is a scalar from H0. Let θ be the maximal likelihood estimate of θ. Let Var[θ] be the variance of θ. Then three test statistics are defined as follows: 1 ULR = −2(l(θ0|X) − l(θ|X)) D ∼ χ2 1, 2 UW = (θ − θ0)2 I(θ) D ∼ χ2 1 and equivalently U 1/2 W = ZW D ∼ N(0, 1), 3 US = (S(θ0))2 I(θ0) D ∼ χ2 1 and equivalently U 1/2 S = ZS D ∼ N(0, 1). 26 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics – tests of all parameters Let θ be a vector of all parameters of length k. Null hypothesis H0 : θ = θ0 and alternative hypothesis H1 : θ = θ0, where θ0 is a vector of parameters from H0. Let θ be the maximal likelihood estimate of θ. Let Var[θ] be the covariance matrix. Then three test statistics are defined as follows: 1 ULR = −2(l(θ0|X) − l(θ|X)) D ∼ χ2 k , 2 UW = (θ − θ0)T I(θ)(θ − θ0) D ∼ χ2 k , 3 US = (S(θ0))T (I(θ0))−1 S(θ0) D ∼ χ2 k . 27 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics – tests of subset of parameters Let θ = (θ1, θ2)T , where θ is a vector of all parameters of length k. Let θ1 and θ2 be subsets of parameters of length k1 and k2, where k1 + k2 = k. Null hypothesis H0 : θ1 = θ0 and alternative hypothesis H1 : θ1 = θ0, where θ0 is a vector of parameters from H0. Let θ be maximal likelihood estimate of θ, θ2|0 be maximal likelihood estimate of θ2 if H0 is true, i.e. θ1 = θ0. Then θ0 = (θ0, θ2|0)T . Let Var11[θ] be a submatrix of the covariance matrix Var[θ] corresponding to θ1. Then three test statistics are defined as follows: 1 ULR = −2(l(θ0|X) − l(θ|X)) D ∼ χ2 k1 , 2 UW = (θ1 − θ0)T I11(θ)(θ1 − θ0) D ∼ χ2 k1 , 3 US = (S(θ0))T (I11(θ0))−1 S(θ0) D ∼ χ2 k1 . 28 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics – tests of subset of parameters There is a relationship between likelihood ratio test statistic for subset of parameters and profile likelihood function: LP(θ1|x) = max ∀θ2 L(θ|x) = L((θ1, θ2|0)T |x) or logarithm of profile likelihood function lP(θ1|x) = l((θ1, θ2|0)T |x). Likelihood ratio test statistic is defined as: uLR = −2 ln LP(θ1|x) = −2 lP(θ1|x) − lP(θ1|x) , where θ1 is maximal likelihood estimate of θ1 with respect to LP(θ1|x). ULR is also called generalised likelihood ratio statistic. 29 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics – tests of subset of parameters Additionally LP(θ1|x) = max ∀θ1 max ∀θ2 L(θ|x) = max ∀θ1,θ2 L((θ1, θ2)T |x). Having H0 : θ1 = θ0 a H1 : θ1 = θ0, then LP(θ0|x) = max ∀θ2 L((θ0, θ2)T |x) = max H0 L((θ1, θ2)T |x) and uLR = −2 ln maxH0 L((θ1, θ2)T |x) max∀θ1,θ2 L((θ1, θ2)T |x) = −2 ln LP(θ0|x) LP(θ1|x) . 30 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics – tests of subset of parameters Quadratic approximation of relative profile log-likelihood is defined as: ln LP(θ1|x) ≈ − 1 2 θ1 − θ1 T (I11 (θ))−1 θ1 − θ1 , and quadratic approximation of generalised likelihood ratio statistic −2 ln LP(θ1|x) is defined as: uLR ≈ uW = θ1 − θ0 T (I11 (θ))−1 θ1 − θ0 . Marginal distribution of θ1 if H0 is true is defined as θ1 ∼ Nk1 (θ0, I11 (θ)). 31 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Three test statistics and related confidence intervals If θ is a scalar, three confidence intervals are defined as follows: 1 empirical likelihood ratio (1 − α) × 100% CI for θ is defined as CS1−a = θ : ULR(θ) < χ2 1(α) , where ULR(θ) = −2 ln L(θ|x) L(θ|x) . 2 empirical Wald (1 − α) × 100% CI for θ is defined based on a pivot (pivotal statistics)Tpiv = UW(θ) 3 empirical Score (1 − α) × 100% CI for θ is defined based on a pivot Tpiv = US(θ) If θ is a vector, CIs can be generalized to confidence set CS1−a. If k = 2, CS1−a is an confidence ellipse. If k > 2, CS1−a is an confidence ellipsoid. Additionally, if k = 1, CS1−a is an confidence interval. 32 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Confidence intervals Wald empirical (1 − α) × 100% CI for θ is defined as (l, u) = θL, θU = θ − tα/2SD[θ], θ + tα/2SD[θ] , where the critical value tα/2 depends on the choice of θ. Likelihood ratio empirical (1 − α) × 100% CI for θ is defined by its lower and upper bounds as k% cut-offs of standardized relative log-likelihood as follows Pr L(θ|x) L(θ|x) > cα = Pr −2 ln L(θ|x) L(θ|x) < −2 ln cα = 1 − α, where cα = e− 1 2 χ2 1(α) . Then if 1 − α = 0.95, then cα = 0.1465001 . = 0.15 (15% cut-off ), if 1 − α = 0.90, then cα = 0.2585227 . = 0.26 (26% cut-off), if 1 − α = 0.99, then cα = 0.0362452 . = 0.04 (4% cut-off). 33 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – bisection method Bisection method Let θ01, θ02 ∈ θL, θU and f(θ01)f(θ02) < 0, f(∙) is continuous with at least one root within the interval θ01, θ02 , where f(θ) = −2 ln L(θ|x) − χ2 1(α) = 0. If the first derivative of f(∙) is having constant sign, then exactly one root θ∗ ∈ θ01, θ02 of f(θ) = 0 exists. The iterative process is defined as follows: 1 initialisation step – starting point θ(0) = (θ01 + θ02)/2 and i = 1, 2 updating equations – substitution of the boundaries θ01 and θ02 is defined as θi1, θi2 = θi−1,1, θ(i−1) , if f(θi−1,1)f(θ(i−1) ) < 0 θ(i−1) , θi−1,2 , if f(θi−1,1)f(θ(i−1) ) > 0 , if f(θ(i−1) ) = 0, then end, if not, 34 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – bisection method 3. calculate the mid-point θ(i) = (θi1 + θi2)/2, 4. stopping rule (with the threshold is sufficiently small) based on relative convergence criteria θ(i) − θ(i−1) θ(i−1) < , absolute convergence criteria θ(i) − θ(i−1) < , or often also based on f(θ(i) ) < . 35 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – other numerical method Modifications are based on bracketing methods, i.e. bounding the root within a sequence of intervals. Brent method (Brent-Dekker method) – the combination of bisection method with inverse interpolation. If the interpolation is linear, then it is secant method, where the updating equations are modified as follows θ(i) = θ(i−1) − θ(i−1) −θ(i−2) f(θ(i−1))−f(θ(i−2)) f(θ(i−1) ), if f(θ(i−1) ) = f(θ(i−2) ) (θi1 + θi2)/2, otherwise , where the approximation of the first derivative f (θ(i−1) ) ≈ f(θ(i−1) )−f(θ(i−2) ) θ(i−1)−θ(i−2) . If f(θ) is twice differentiable, then f(θ) has single root (f (θ) = 0 for all θ ∈ θL, θU ). 36 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – other numerical method Geometrical interpretation: θ(i) is the crossing point of secant through the points [θ(i−1) , f(θ(i−1) )] and [θ(i−2) , f(θ(i−2) )], and x axis. In : uniroot(f, interval,tol,...) during the search for lower and upper boundary of 100 × (1 − α)% for θ, the -function uniroot() should be used twice as follows 1 for lower bound – starting interval is defined as θL, θ , 2 for upper bound – starting interval is defined as θ, θU . Then the solutions are θL and θU (root). 37 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – Brent-Dekker method Example (Brent-Dekker method) Let X ∼ Bin(N, p), where N = 10 and n = x = 8. Estimate the boundaries of empirical 100×(1 − α)% CI for (1) p and (2) log odds ln p 1−p . The empirical CI are of the two types (A) likelihood and (B) Wald. Draw the log-likelihood function and its quadratic approximation with the lower and upper boundary of CI. 38 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – Brent-Dekker method Solution (partial) Empirical Wald 100 × (1 − α)% CI for p: p = 8 10 = 0.8; SD[p] = p(1−p) N = 0.13. (l, u) = (pl , pu) = p − uα/2SD[p], p + uα/2SD[p] = (0.55, 1.05). Empirical Likelihood 100 × (1 − α)% CI for p: CS1−α = p : −2 ln L(p|x) L(p|x) ≤ 3.84 , where (l, u) = (pL, pU) = (0.50, 0.96), Wald empirical 100 × (1 − α)% CI for g(p): g(p) = ln p 1−p = ln 0.8 0.2 = 1.39; ∂ ∂p g(p) = 1 p + 1 1−p ; SD[g(p)] = SD[p] 1 p + 1 1−p = p(1−p) N 1 p + 1 1−p = 1 n + 1 N−n = 0.79. Then (lg, ug) = g(pL), g(pU) = (−0.16, 2.94) and back-transformed (l, u) = (pL, pU) = (0.46, 0.95). 39 / 66 Stanislav Katina Statistical Inference Testing of Statistical Hypotheses Likelihood confidence intervals – Brent-Dekker method 1 x <- 8; N <- 10 2 probs <- seq(0.4,.99,length=1000) 3 like <- dbinom(8,10,probs) 4 rellike <- like/max(like) 5 relloglike <- -2*log(rellike) 6 cutoff <- exp(-1/2*qchisq(0.95,df=1)) #0.1465001 7 likeCI.p <- range(probs[rellike>cutoff]) #0.5009910 0.9634234 8 cutoff <- qchisq(0.95,df=1) #3.841459 9 likeCI.p <- range(probs[relloglike