Statistical Inference
Testing of Statistical Hypotheses
Stanislav Katina1
1Institute of Mathematics and Statistics, Masaryk University
Honorary Research Fellow, The University of Glasgow
December 11, 2018
1 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Null and alternative hypothesis
a ’hypothesis’ is a theory which is assumed to be true unless
evidence is obtained which indicates otherwise
’null’ means ’nothing’ and the term ’null hypothesis’ (H0)
means a ’theory of no change’ – that is ’no change’ from what
would be expected from past experience
’alternative hypothesis’ (H1) means a ’theory of change’ – that
is ’change’ from what would be expected from past experience
the procedure which is used to decide between these two
opposite theories is called ’hypothesis test’ or sometimes
’significance test’
one-tail test – test in which the alternative hypothesis proposes
a change in parameter in only one direction – increase or
decrease
two-tail test– test in which the alternative hypothesis suggests a
difference in parameter in either direction
2 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Test statistic, rejection and acceptance region, critical value and quantile
the test statistic is calculated from the sample – its value is
used to decide whether the null hypothesis should be rejected
the rejection (or critical) region gives the values of the test
statistic for which the null hypothesis is rejected
the acceptance region gives the values of the test statistic for
which the null hypothesis is not rejected
the boundary value(s) of the rejection region is (are) called the
critical value(s) or quantile(s)
the significance level α of a test gives the probability of the test
statistic falling in the rejection region when null hypothesis is true
3 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Hypothesis testing procedure
a hypothesis is a statement about a population parameter base
on a sample from this population
H0 and H1 are two complementary hypotheses in a hypothesis
testing problem
a hypothesis testing procedure or hypothesis test is a rule
that specifies – for which sample values the decision is made to
accept null hypothesis as true – and for which sample values H0
is rejected
the subset of sample space for which H0 will be rejected is called
rejection region (critical region)
the complement of the rejection region is called the acceptance
region
4 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Four possibilities
Four choices:
A H0 is true – our decision is to reject H0
B H0 is true – our decision is not to reject H0
C H1 is true – our decision is not to reject H0
D H1 is true – our decision is to reject H0
Decision-reality table:
decision/reality H0 is true H0 is not true
to reject H0 Type I error true decision
not to reject H0 true decision Type II error
5 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Four possibilities
Four choices:
A) Pr(A) = Pr(Type I error) ≤ α [significance level]
B) Pr(B) ≥ 1 − α [coverage probability, confidence coefficient
(level)]
C) Pr(C) = Pr(Type II error) = β
D) Pr(D) = 1 − β [power]
Four choices (formalised):
A) 1 − α ≤ Pr(don’t reject H0|H0 is true)
B) α ≥ Pr(CHPD) = Pr(reject H0|H0 is true)
C) β = Pr(CHDD) = Pr(don’t reject H0|H0 isn’t true)
D) 1 − β = Pr(reject H0|H0 isn’t true)
6 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Empirical 100 × (1 − α)% confidence intervals for parameter θ
Relationship of confidence interval and statistical test
Empirical 100(1 − α)% confidence interval (CI) for parameter θ
α-level hypothesis test about θ
Three types of intervals:
Pr(LB(X) < θ < UB(X)) = 1 − α (two-tailed CI)
Pr(θ < UB∗
(X)) = 1 − α (one-tailed (right-tailed) CI)
Pr(LB∗(X) < θ) = 1 − α (one-tailed (left-tailed) CI)
7 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Acceptance region
Definition (Acceptance region of H0)
Let X be a random variable with certain distribution (probabilistic model) dependent on
parameter θ ∈ Θ, g (θ) is parametric function. We are testing null hypothesis
H01 : g (θ) = g(θ0) against two-sided alternative H11 : g (θ) = g(θ0). Let (LB, UB) be
interval estimate of parametric function g (θ) with coverage probability 1 − α. Then
ACI,1 = {LB, UB; g(θ0) ∈ (LB, UB)}
is acceptance region of a test H01 against H11 on significance level α. If we are
testing H02 : g (θ) ≤ g(θ0) against one-sided (right) alternative H12 : g (θ) > g(θ0) and
if LB∗ be lower estimate of g (θ) with coverage probability 1 − α, then
ACI,2 = {LB∗; LB∗ < g(θ0)}
is acceptance region of a test H02 against H12 on significance level α. If we are
testing H03 : g (θ) ≥ g(θ0) against one-sided (left) alternative H13 : g (θ) < g(θ0) and
if UB∗ is upper estimate of g (θ) with coverage probability 1 − α, then
ACI,3 = {UB∗
; UB∗
> g(θ0)}
is acceptance region of a test H03 against H13 on significance level α.
8 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Rejection region
Definition (Rejection (critical) region of H0)
Let X be a random variable with certain distribution (probabilistic model) dependent on
parameter θ ∈ Θ, g (θ) is parametric function. We are testing null hypothesis
H01 : g (θ) = g(θ0) against two-sided alternative H11 : g (θ) = g(θ0). Let (LB, UB) be
interval estimate of parametric function g (θ) with coverage probability 1 − α. Then
WCI,1 = {LB, UB; g(θ0) /∈ (LB, UB)}
is critical region of a test H01 against H11 on significance level α. If we are testing
H02 : g (θ) ≤ g(θ0) against one-sided (right) alternative H12 : g (θ) > g(θ0) and if LB∗
be lower estimate of g (θ) with coverage probability 1 − α, then
WCI,2 = {LB∗; LB∗ ≥ g(θ0)}
is critical region of a test H02 against H12 on significance level α. If we are testing
H03 : g (θ) ≥ g(θ0) against one-sided (left) alternative H13 : g (θ) < g(θ0) and if UB∗
is upper estimate of g (θ) with coverage probability 1 − α, then
WCI,3 = {UB∗
; UB∗
≤ g(θ0)}
is critical region of a test H03 against H13 on significance level α.
9 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Test criterion
Definition (Test criterion)
A test criterion is a test statistic T = T0 = T0(X1, X2, . . . , Xn), with
known asymptotic distribution if H0 is true. The set of possible values
of T0 is divided to two subsets, i.e. acceptance region H0 (notation
A) and critical region H0 (notation W). These two regions are
divided by critical values tα/2 and t1−α/2, resp. tα and t1−α (for
particular H0 and H1) of the distribution of test statistics T0 (if H0 is
true).
Definition (Confidence interval)
A confidence interval (CI) is a type of interval estimate of a
population parameter θ. It is an observed, often called empirical,
interval (i.e., it is calculated from the observations) that includes the
value of an unobservable parameter θ if the experiment is repeated.
The frequency that observed interval contains the parameter is
determined by the confidence coefficient 1 − α (i.e. confidence
level, coverage probability).
10 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test
Step 1 define the null and alternative hypothesis (H0 and H1)
Step 2 decide on a significance level α = 0.1, 0.05, 0.01
Step 3 calculate the test statistic (test criterion) T0
Step 4 determine the critical value(s)
Step 5 decide on the outcome of the test (reject/don’t reject H0)
depending on one of the following ways:
base on critical region W = WT (observed test statistic
t0 = tobs and critical values tα/2 and t1−α/2, resp. tα and
t1−α),
base on critical region WIS, i.e. empirical confidence
interval (and g(θ0)),
base on p-value.
Step 6 state the conclusion in words
11 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on test statistic and critical value
Definition (Testing based on critical region W)
Rejecting H0. If observed test statistic (realisation of test statistic) t0
of test statistic T0 is within a critical region W (equivalently is not from
an acceptance region A), H0 is rejected at a significance level α, i.e.
we do have sufficiently enough evidence to reject H0.
Not rejecting H0. If observed test statistic t0 of test statistic T0 is
within an acceptance region A (equivalently, it is not from a critical
region W), H0 is not rejected at a significance level α, i.e. we don’t
have sufficiently enough evidence to reject H0.
Let tmin be the smallest possible value of a test criteria T0 and tmax be
the highest possible value of a test criteriaT0, then
1 two-sided alternative – critical region
W1 = (tmin, t1−α/2 ∪ tα/2, tmax),
2 one-sided (right) alternative – critical region W2 = tα, tmax),
3 one-sided (left) alternative – critical region W3 = (tmin, t1−α .
12 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on CI
Definition (Testing based on CI)
Rejecting H0: If g(θ) = g(θ0) is within CI (H0 is valid), H0 is rejected
at the significance level α, i.e. we do have sufficiently enough
evidence to reject H0.
Not rejecting H0: If g(θ) = g(θ0) is not within CI (H0 is valid), H0 isn’t
rejected at a significance level α, i.e. we don’t have sufficiently
enough evidence to reject H0.
Relationship of confidence interval and statistical test
hypothesis testing ≡ CIs
α-level hypothesis test ≡ 100(1 − α)% CI
one-tail test ≡ one-sided CI (left-sided CI ≡ right-sided
alternative, right-sided CI ≡ left-sided alternative
two-tail test ≡ two-sided CI
parameter(s) ∈ CI ≡ not reject H0
parameter(s) /∈ CI ≡ reject H0
13 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on p-value (observed significance level)
Definition (Testing based on p-value)
Minimal significance level α (for some test statistic T0), base on which
H02 : g(θ) ≤ g(θ0) is rejected (tested against H12 : g(θ) > g(θ0)), is
called observed significance level or p-value, i.e.
p-value = αobs = sup
θ∈Θ0
Pr (T(X1, X2, . . . , Xn) ≥ T(x1, x2, . . . , xn); θ) .
This could be written less formally as p-value =
Pr(any test statistics equal or greater than observed |H0 is true).
The closer αobs is to zero, the smaller is the probability that any test
statistic T(X1, X2, . . . , Xn) produces a p-value (under H0) equal to or
smaller than that observed, while the probability is higher under H1.
Therefore, p-value could be understood as an indicator of credibility
of H0.
14 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on p-value (observed significance level)
Usually, if αobs < α = 0.05, there is sufficiently enough evidence
to reject H0 and the result of a test is statistically significant.
While αobs > α = 0.1, there is sufficiently enough evidence to
reject H0 and the result of a test is not statistically significant.
The values between 0.05 and 0.1 should be taken as reference
points in a broad sense. As αobs gets closer to either boundary
point of the interval 0.05, 0.1 , so this is taken as increasing
evidence for one or other alternative.
Situation with αobs ∈ 0.05, 0.1) are usually most difficult to
handle and the result is here marginally statistically
significant.
15 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on p-value (observed significance level)
Wording of the results of a statistical test:
range for p-value stars of significance wording of the result
0, 0.001) *** extremely highly statistically significant
0.001, 0.01) ** high statistically significant
0.01, 0.05) * statistically significant
0.05, 0.1) ∙ marginally statistically significant
0.1, 1 non-significant
16 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on p-value (observed significance level)
Interpretation of p-values:
p-value < 0.001: the prevalence of an estimated effect is smaller than one to
one thousand (the odds of estimated effect is smaller than 1 : 999), if an effect is
not present in a population (the presence of such an effect is highly
improbable, if an effect is not present in a population – and – the presence of
such an effect is highly probable, if an effect is present in a population)
p-value < 0.01: the prevalence of an estimated effect is smaller than one to one
hundred (the odds of estimated effect is smaller than 1 : 99), if an effect is not
present in a population (the presence of such an effect is very improbable, if an
effect is not present in a population – and – the presence of such an effect is
very probable, if an effect is present in a population)
p-value < 0.05: the prevalence of an estimated effect is smaller than one to one
hundred (the odds of estimated effect is smaller than 5 : 95 or 1 : 19), if an effect
is not present in a population (the presence of such an effect is sufficiently
improbable, if an effect is not present in a population – and – the presence of
such an effect is sufficiently probable, if an effect is present in a population)
p-value ≥ 0.05: the prevalence of an estimated effect is five to one hundred or
greater (5 % or more);
p-value = k, k ∈ 0.05, 1 : the prevalence of an estimated effect is 100 × k to
one hundred (100 × k % or more).
17 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
To carry out a hypothesis test – based on p-value (observed significance level)
How is the p-value (mostly) calculated?
1 two-sided alternative –
p-value = 2 min(Pr(T0 ≤ t0|H0), Pr(T0 ≥ t0|H0)), e.g. for normal
and Student distribution of test statistic (symmetric distributions)
and for χ2
df and Fdf1,df2
distribution of test statistic (asymmetric
distributions) or p-value = min(Pr(T0 ≤ t0|H0), Pr(T0 ≥ t0|H0)),
e.g. for χ2
df and Fdf1,df2
distribution of test statistic (asymmetric
distributions)
2 one-sided (right) alternative – p-value = Pr(T0 ≥ t0|H0)
3 one-sided (left) alternative – p-value = Pr(T0 ≤ t0|H0)
18 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
On a philosophical level
distinction between ’rejecting H0’ and ’accepting H1’
’rejecting H0’ – nothing implies about what state the
experimenter is accepting, only that the state defined by H0 is
being rejected
distinction between ’accepting H0’ and ’not rejecting H0’
’accepting H0’ – the experimenter is willing to assert the state of
nature specified by H0
’not rejecting H0’ – the experimenter really does not believe H0
but does not have the evidence to reject it
19 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Conservative and liberal test and CI
Definition (Conservative and liberal test)
A test with actual/observed significance level smaller than
nominal significance level α, is called conservative (the test
should theoretically be ”rejecting quickly” H0, but, in reality, it is the
opposite, i.e. the test is ”rejecting slowly”).
A test with actual/observed significance level greater than
nominal significance level α, is called liberal (the test should
theoretically be ”rejecting slowly” H0, but, in reality, it is the opposite,
i.e. the test ”rejecting quickly”).
Definition (Conservative and liberal CI)
CI with actual/real coverage probability greater than nominal
coverage probability 1 − α, is called conservative (i.e. the
probability that θ0 is within CI is greater that expected).
CI with actual/real coverage probability smaller than nominal
coverage probability 1 − α, is called liberal (i.e. the probability that
θ0 is within CI is smaller that expected).
20 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood ratio – generalised relative likelihood
Two types of hypotheses:
1 simple hypothesis – H0 : θ = θ0 against H1 : θ = θ0, then
simple likelihood ratio is equal to
λ(x) = λ =
L(θ0|x)
supθ∈Θ L(θ|x)
=
L(θ0|x)
L(θ|x)
,
where λ(x) = L(θ0|x) is test statistic and L(θ|x) is continuous for
all x.
2 composite hypothesis – H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1, then
generalised likelihood ratio is equal to
λ(x) =
supθ∈Θ0
L(θ|x)
supθ∈Θ L(θ|x)
.
21 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood ratio test statistic
Subsets of Θ, Θ0 and Θ1, remain the same after monotone
transformation of λ(x), i.e. the statistical tests before and after
transformation are equivalent. Therefore, likelihood ratio test
statistic is equal to
ULR = −2 ln λ(X).
Its realisation, observed likelihood ratio test statistic, is equal to
uLR = −2 ln λ(x), where uLR ∈ (0, ∞).
22 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics
After applying Taylor series of l(θ0) about θ,
ULR = −2(l(θ0|X) − l(θ|X)) ≈ −2 (θ0 − θ)S(θ) −
1
2
(θ0 − θ)2
I(θ) ,
where S(θ) = 0. Under H0, Wald test statistic UW, is defined as
follows
ULR ≈ n(θ0 − θ)2 I(θ0)
n
≈ n(θ0 − θ)2
i(θ0)
H0
≈ n(θ0 − θ)2
i(θ) = UW,
where 1
n I(θ)
P
→ i(θ0); its realisation, observed Wald test statistic is
uW . Under H0, Score test statistic US, is defined as follows
ULR ≈ n(θ0 − θ)2
i(θ0)
H0
≈
(S(θ0))2
n i(θ0)
= US,
where
√
n(θ − θ0)
H0
≈ S(θ0)/(
√
n(i(θ0))); its realisation, observed
Score test statistic is uS.
23 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics
Geometrical interpretation:
1 ULR – is measuring properly standardised difference between
log-likelihoods in θ and θ0 (i.e. in direction of y axis)
2 UW – is measuring properly standardised absolute value of a
difference of θ a θ0 (in direction of x axis)
3 US – is measuring properly standardised slope of log-ratio in θ0
24 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics
Example (normal distribution)
Let X ∼ N(μ, σ2
), where σ2
is known, H0 : μ = μ0 against
H1 : μ = μ0, where θ0 = (μ0, σ2
)T
. Then
1 ULR = −2(l(θ0|X) − l(θ|X)) =
−
n
i=1(Xi − X)2
/σ2
+
n
i=1(Xi − μ0)2
/σ2
= n(X−μ0)2
σ2 ,
2 UW = (X − μ0)2
I(x) = n(X−μ0)2
σ2 ,
3 US = (S(μ0))2
I(μ0) = (n(X−μ0)/σ2
)2
n/σ2 = n(X−μ0)2
σ2 .
All three test statistics are equal, i.e. ULR = UW = US.
25 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics – tests about one parameter
Let θ be a scalar. Null hypothesis H0 : θ = θ0 and alternative
hypothesis H1 : θ = θ0, where θ0 is a scalar from H0. Let θ be the
maximal likelihood estimate of θ. Let Var[θ] be the variance of θ.
Then three test statistics are defined as follows:
1 ULR = −2(l(θ0|X) − l(θ|X))
D
∼ χ2
1,
2 UW = (θ − θ0)2
I(θ)
D
∼ χ2
1 and equivalently U
1/2
W = ZW
D
∼ N(0, 1),
3 US = (S(θ0))2
I(θ0)
D
∼ χ2
1 and equivalently U
1/2
S = ZS
D
∼ N(0, 1).
26 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics – tests of all parameters
Let θ be a vector of all parameters of length k. Null hypothesis
H0 : θ = θ0 and alternative hypothesis H1 : θ = θ0, where θ0 is a
vector of parameters from H0. Let θ be the maximal likelihood
estimate of θ. Let Var[θ] be the covariance matrix.
Then three test statistics are defined as follows:
1 ULR = −2(l(θ0|X) − l(θ|X))
D
∼ χ2
k ,
2 UW = (θ − θ0)T
I(θ)(θ − θ0)
D
∼ χ2
k ,
3 US = (S(θ0))T
(I(θ0))−1
S(θ0)
D
∼ χ2
k .
27 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics – tests of subset of parameters
Let θ = (θ1, θ2)T
, where θ is a vector of all parameters of length k.
Let θ1 and θ2 be subsets of parameters of length k1 and k2, where
k1 + k2 = k. Null hypothesis H0 : θ1 = θ0 and alternative
hypothesis H1 : θ1 = θ0, where θ0 is a vector of parameters from H0.
Let θ be maximal likelihood estimate of θ, θ2|0 be maximal likelihood
estimate of θ2 if H0 is true, i.e. θ1 = θ0. Then θ0 = (θ0, θ2|0)T
. Let
Var11[θ] be a submatrix of the covariance matrix Var[θ]
corresponding to θ1.
Then three test statistics are defined as follows:
1 ULR = −2(l(θ0|X) − l(θ|X))
D
∼ χ2
k1
,
2 UW = (θ1 − θ0)T
I11(θ)(θ1 − θ0)
D
∼ χ2
k1
,
3 US = (S(θ0))T
(I11(θ0))−1
S(θ0)
D
∼ χ2
k1
.
28 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics – tests of subset of parameters
There is a relationship between likelihood ratio test statistic for subset
of parameters and profile likelihood function:
LP(θ1|x) = max
∀θ2
L(θ|x) = L((θ1, θ2|0)T
|x)
or logarithm of profile likelihood function
lP(θ1|x) = l((θ1, θ2|0)T
|x).
Likelihood ratio test statistic is defined as:
uLR = −2 ln LP(θ1|x) = −2 lP(θ1|x) − lP(θ1|x) ,
where θ1 is maximal likelihood estimate of θ1 with respect to
LP(θ1|x). ULR is also called generalised likelihood ratio statistic.
29 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics – tests of subset of parameters
Additionally
LP(θ1|x) = max
∀θ1
max
∀θ2
L(θ|x) = max
∀θ1,θ2
L((θ1, θ2)T
|x).
Having H0 : θ1 = θ0 a H1 : θ1 = θ0, then
LP(θ0|x) = max
∀θ2
L((θ0, θ2)T
|x) = max
H0
L((θ1, θ2)T
|x)
and
uLR = −2 ln
maxH0
L((θ1, θ2)T
|x)
max∀θ1,θ2
L((θ1, θ2)T |x)
= −2 ln
LP(θ0|x)
LP(θ1|x)
.
30 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics – tests of subset of parameters
Quadratic approximation of relative profile log-likelihood is
defined as:
ln LP(θ1|x) ≈ −
1
2
θ1 − θ1
T
(I11
(θ))−1
θ1 − θ1 ,
and quadratic approximation of generalised likelihood ratio
statistic −2 ln LP(θ1|x) is defined as:
uLR ≈ uW = θ1 − θ0
T
(I11
(θ))−1
θ1 − θ0 .
Marginal distribution of θ1 if H0 is true is defined as
θ1 ∼ Nk1
(θ0, I11
(θ)).
31 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Three test statistics and related confidence intervals
If θ is a scalar, three confidence intervals are defined as follows:
1 empirical likelihood ratio (1 − α) × 100% CI for θ is defined as
CS1−a = θ : ULR(θ) < χ2
1(α) ,
where ULR(θ) = −2 ln L(θ|x)
L(θ|x)
.
2 empirical Wald (1 − α) × 100% CI for θ is defined based on a
pivot (pivotal statistics)Tpiv = UW(θ)
3 empirical Score (1 − α) × 100% CI for θ is defined based on a
pivot Tpiv = US(θ)
If θ is a vector, CIs can be generalized to confidence set CS1−a.
If k = 2, CS1−a is an confidence ellipse.
If k > 2, CS1−a is an confidence ellipsoid.
Additionally, if k = 1, CS1−a is an confidence interval.
32 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Confidence intervals
Wald empirical (1 − α) × 100% CI for θ is defined as
(l, u) = θL, θU = θ − tα/2SD[θ], θ + tα/2SD[θ] ,
where the critical value tα/2 depends on the choice of θ.
Likelihood ratio empirical (1 − α) × 100% CI for θ is defined by its
lower and upper bounds as k% cut-offs of standardized relative
log-likelihood as follows
Pr
L(θ|x)
L(θ|x)
> cα = Pr −2 ln
L(θ|x)
L(θ|x)
< −2 ln cα = 1 − α,
where cα = e− 1
2 χ2
1(α)
. Then
if 1 − α = 0.95, then cα = 0.1465001
.
= 0.15 (15% cut-off ),
if 1 − α = 0.90, then cα = 0.2585227
.
= 0.26 (26% cut-off),
if 1 − α = 0.99, then cα = 0.0362452
.
= 0.04 (4% cut-off).
33 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – bisection method
Bisection method
Let θ01, θ02 ∈ θL, θU and f(θ01)f(θ02) < 0, f(∙) is continuous with at
least one root within the interval θ01, θ02 , where
f(θ) = −2 ln L(θ|x) − χ2
1(α) = 0.
If the first derivative of f(∙) is having constant sign, then exactly one
root θ∗
∈ θ01, θ02 of f(θ) = 0 exists.
The iterative process is defined as follows:
1 initialisation step – starting point θ(0)
= (θ01 + θ02)/2 and i = 1,
2 updating equations – substitution of the boundaries θ01 and θ02
is defined as
θi1, θi2 =
θi−1,1, θ(i−1)
, if f(θi−1,1)f(θ(i−1)
) < 0
θ(i−1)
, θi−1,2 , if f(θi−1,1)f(θ(i−1)
) > 0
,
if f(θ(i−1)
) = 0, then end, if not,
34 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – bisection method
3. calculate the mid-point θ(i)
= (θi1 + θi2)/2,
4. stopping rule (with the threshold is sufficiently small) based on
relative convergence criteria
θ(i)
− θ(i−1)
θ(i−1)
< ,
absolute convergence criteria
θ(i)
− θ(i−1)
< ,
or often also based on
f(θ(i)
) < .
35 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – other numerical method
Modifications are based on bracketing methods, i.e. bounding the
root within a sequence of intervals.
Brent method (Brent-Dekker method) – the combination of
bisection method with inverse interpolation. If the interpolation is
linear, then it is secant method, where the updating equations are
modified as follows
θ(i)
=
θ(i−1)
− θ(i−1)
−θ(i−2)
f(θ(i−1))−f(θ(i−2))
f(θ(i−1)
), if f(θ(i−1)
) = f(θ(i−2)
)
(θi1 + θi2)/2, otherwise
,
where the approximation of the first derivative
f (θ(i−1)
) ≈ f(θ(i−1)
)−f(θ(i−2)
)
θ(i−1)−θ(i−2) . If f(θ) is twice differentiable, then f(θ) has
single root (f (θ) = 0 for all θ ∈ θL, θU ).
36 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – other numerical method
Geometrical interpretation: θ(i)
is the crossing point of secant through
the points [θ(i−1)
, f(θ(i−1)
)] and [θ(i−2)
, f(θ(i−2)
)], and x axis.
In :
uniroot(f, interval,tol,...)
during the search for lower and upper boundary of
100 × (1 − α)% for θ, the -function uniroot() should be
used twice as follows
1 for lower bound – starting interval is defined as θL, θ ,
2 for upper bound – starting interval is defined as θ, θU .
Then the solutions are θL and θU (root).
37 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – Brent-Dekker method
Example (Brent-Dekker method)
Let X ∼ Bin(N, p), where N = 10 and n = x = 8. Estimate the
boundaries of empirical 100×(1 − α)% CI for (1) p and (2) log odds
ln p
1−p . The empirical CI are of the two types (A) likelihood and (B)
Wald. Draw the log-likelihood function and its quadratic
approximation with the lower and upper boundary of CI.
38 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – Brent-Dekker method
Solution (partial)
Empirical Wald 100 × (1 − α)% CI for p:
p = 8
10 = 0.8; SD[p] = p(1−p)
N = 0.13.
(l, u) = (pl , pu) = p − uα/2SD[p], p + uα/2SD[p] = (0.55, 1.05).
Empirical Likelihood 100 × (1 − α)% CI for p:
CS1−α = p : −2 ln L(p|x)
L(p|x)
≤ 3.84 , where
(l, u) = (pL, pU) = (0.50, 0.96),
Wald empirical 100 × (1 − α)% CI for g(p):
g(p) = ln p
1−p
= ln 0.8
0.2 = 1.39; ∂
∂p g(p) = 1
p + 1
1−p ; SD[g(p)] =
SD[p] 1
p
+ 1
1−p
= p(1−p)
N
1
p
+ 1
1−p
= 1
n + 1
N−n = 0.79.
Then (lg, ug) = g(pL), g(pU) = (−0.16, 2.94) and back-transformed
(l, u) = (pL, pU) = (0.46, 0.95).
39 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – Brent-Dekker method
1 x <- 8; N <- 10
2 probs <- seq(0.4,.99,length=1000)
3 like <- dbinom(8,10,probs)
4 rellike <- like/max(like)
5 relloglike <- -2*log(rellike)
6 cutoff <- exp(-1/2*qchisq(0.95,df=1)) #0.1465001
7 likeCI.p <- range(probs[rellike>cutoff]) #0.5009910 0.9634234
8 cutoff <- qchisq(0.95,df=1) #3.841459
9 likeCI.p <- range(probs[relloglike<cutoff]) #0.500991 0.9634234
10 p.hat <- x/N
11 i.hat <- N/p.hat/(1-p.hat)
12 loglikeapprox <- -i.hat/2*(probs-p.hat)ˆ2
13 ra <- range(log(rellike))
14 waldCI.p <- p.hat + c(-1,1)*qnorm(0.975)*sqrt(1/i.hat)
15 waldCI.p # 0.552082 1.047918
16 gprobs <- log(probs)-log(1-probs)
17 gp.hat <- log(p.hat)-log(1-p.hat)
18 i.hat <- x*(N-x)/N
19 lgp <- -i.hat/2*(gprobs-gp.hat)ˆ2
20 x <- (gp.hat+c(-1,1)*qnorm(0.975)*sqrt(1/i.hat)) #-0.1632 2.9358
21 waldCI.gp <- exp(x)/(1+exp(x))
22 waldCI.gp # 0.4592920 0.9495872
40 / 66 Stanislav Katina Statistical Inference
Testing of Statistical Hypotheses
Likelihood confidence intervals – other numerical method
0.4 0.6 0.8 1.0
−4−3−2−10
p
log−likelihood
0 1 2 3 4
−4−3−2−10
g(p)
log−likelihood
approx log−like
log−like
Figure: Log-likelihood of p and its quadratic approximation
41 / 66 Stanislav Katina Statistical Inference