Statistical inference I and II Statistical inference Stanislav Katina ÚMS, MU 17.04.2024 16:04 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 1 / 288 Book Aplikovaná štatistická inferencia I Figure 1: Book Aplikovaná štatistická inferencia I Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 2 / 288 Table of contents 1 Overview of testing of statistical hypotheses 2 Three test statistics 3 Confidence intervals 4 Overview of the tests 5 Generalised hypotheses 6 One-sample tests about 𝜇 7 Paired tests about 𝜇 8 One-sample tests about 𝜎2 9 One-sample tests about skewness and kurtosis 10 One-sample tests about correlation coefficient 11 One-sample tests about probability 12 One-sample tests about rate 13 Two sample tests about difference of two means Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 3 / 288 10 One-sample tests about correlation coefficient 10.14 One-sample tests about correlation coefficient 1 "PowerRho11" <- function(r, r0, n, alpha) { 2 z <- atanh(r) 3 z0 <- atanh(r0) 4 sila <- 1 - pnorm(qnorm(1 - alpha/2) - sqrt(n - 3) * abs(z0 - z)) 5 return(sila) 6 } 7 "PowerRho12" <- function(r, r0, n, alpha) { 8 z <- atanh(r) 9 z0 <- atanh(r0) 10 sila <- 1 - pnorm(qnorm(1 - alpha) + sqrt(n - 3) * (z0 - z)) 11 return(sila) 12 } 13 "PowerRho13" <- function(r, r0, n, alpha) { 14 z <- atanh(r) 15 z0 <- atanh(r0) 16 sila <- 1 - pnorm(qnorm(1 - alpha) - sqrt(n - 3) * (z0 - z)) 17 return(sila) 18 } Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 175 / 288 10 One-sample tests about correlation coefficient 10.15 One-sample tests about correlation coefficient −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 n = 10 n = 20 n = 30 n = 40 n = 50 powerfunction ρ − ρ0 H11 : ρ − ρ0 ≠ 0, ρ0 = 0, α = 0.05 −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 n = 10 n = 20 n = 30 n = 40 n = 50 powerfunction ρ − ρ0 H12 : ρ − ρ0 > 0, ρ0 = 0, α = 0.05 −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 n = 10 n = 20 n = 30 n = 40 n = 50 powerfunction ρ − ρ0 H13 : ρ − ρ0 < 0, ρ0 = 0, α = 0.05 Power functions of one-sample 𝑍-statistic 𝑍W for 𝜌 under 𝐻11 (left), 𝐻12 (middle) and 𝐻13 (right). If 𝜌0 not equal to 0, the power function under 𝐻11 is not symmetric around 𝜌0. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 176 / 288 10 One-sample tests about correlation coefficient 10.16 One-sample tests about correlation coefficient For 𝐻11, using Fisher 𝑍-variable, the minimal sample size for a difference 𝜉 − 𝜉0, 𝛼 and 𝛽 is given by 𝑛 ≥ 3 + ( 𝑢 𝛼/2 + 𝑢 𝛽 𝜉 − 𝜉0 ) 2 . For 𝐻12 and 𝐻13, 𝑢 𝛼/2 is change to 𝑢 𝛼. Example (minimal sample size 𝑛) Calculate minimal sample size 𝑛 for 𝜌 = 0.1, 0.2, … , 0.9, 𝜌0 = 0 with 𝛼 = 0.05, 1 − 𝛽 = 0.8 and 𝐻11. Table 1: Minimal sample size 𝑛 for a difference 𝜌 and 𝜌0, where 𝜌0 = 0 𝜌 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 𝑛 783 194 85 47 30 20 14 10 7 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 177 / 288 10 One-sample tests about correlation coefficient 10.17 One-sample tests about correlation coefficient 1 "NRho11a" <- function(r, r0, alpha = 0.05, sila = 0.8) { 2 z <- atanh(r); z0 <- atanh(r0) 3 ualpha <- qnorm(1 - alpha/2) 4 ubeta <- qnorm(sila) 5 n <- ceiling(3 + ((ualpha + ubeta)/(z - z0))^2) 6 names(n) <- paste("r=", seq(0.1, 0.9, by = 0.1), sep = "") 7 return(n) 8 } 9 n80 <- NRho11a(seq(0.1, 0.9, by = 0.1), 0, 0.05, 0.8) 10 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 11 ## 783 194 85 47 30 20 14 10 7 12 n85 <- NRho11a(seq(0.1, 0.9, by = 0.1), 0, 0.05, 0.85) 13 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 14 ## 895 222 97 54 33 22 15 11 8 15 n90 <- NRho11a(seq(0.1, 0.9, by = 0.1), 0, 0.05, 0.9) 16 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 17 ## 1047 259 113 62 38 25 17 12 8 18 n80 <- NRho11a(seq(0.1, 0.9, by = 0.1), 0.5, 0.05, 0.8) 19 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 20 ## 42 69 140 501 Inf 383 81 30 13 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 178 / 288 10 One-sample tests about correlation coefficient 10.18 One-sample tests about correlation coefficient Example (minimal sample size 𝑛) Calculate minimal sample size 𝑛 for 𝜌 = 0.1, 0.2, … , 0.9, 𝜌 − 𝜌0 = 0.1, with 𝛼 = 0.05, 1 − 𝛽 = 0.8 and 𝐻11. Table 2: Minimal sample size 𝑛 for selected differences of 𝜌 and 𝜌0 together with 𝑍R − 𝜉0, as a function of 𝜌 and 𝜌0 𝜌 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 𝜌0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 𝑍R − 𝜉0 0.1003 0.1024 0.1068 0.1141 0.1257 0.1438 0.1742 0.2313 0.3736 𝑛 783 752 692 606 501 383 262 150 60 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 179 / 288 10 One-sample tests about correlation coefficient 10.19 One-sample tests about correlation coefficient 1 "NRho11b" <- function(r, r0, alpha = 0.05, sila = 0.8) { 2 z <- atanh(r); z0 <- atanh(r0) 3 ualpha <- qnorm(1 - alpha/2) 4 ubeta <- qnorm(sila) 5 n <- ceiling(3 + ((ualpha + ubeta)/(z - z0))^2) 6 return(n) 7 } 8 n80 <- c( 9 NRho11b(0.1, 0, 0.05, 0.8), 10 NRho11b(0.2, 0.1, 0.05, 0.8), 11 NRho11b(0.3, 0.2, 0.05, 0.8), 12 NRho11b(0.4, 0.3, 0.05, 0.8), 13 NRho11b(0.5, 0.4, 0.05, 0.8), 14 NRho11b(0.6, 0.5, 0.05, 0.8), 15 NRho11b(0.7, 0.6, 0.05, 0.8), 16 NRho11b(0.8, 0.7, 0.05, 0.8), 17 NRho11b(0.9, 0.8, 0.05, 0.8) 18 ) 19 ## 783 752 692 606 501 383 262 150 60 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 180 / 288 10 One-sample tests about correlation coefficient 10.20 One-sample tests about correlation coefficient 1 "FisherZ11" <- function(r, r0) { 2 z <- atanh(r) 3 z0 <- atanh(r0) 4 z - z0 5 } 6 7 FisherZ <- FisherZ11(seq(0.1, 0.9, by = 0.1), seq(0, 0.8, by = 0.1)) 8 ## 0.1003 0.1024 0.1068 0.1141 0.1257 0.1438 0.1742 0.2313 0.3736 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 181 / 288 10 One-sample tests about correlation coefficient 10.21 One-sample tests about correlation coefficient 1 # one sided alternative 2 "NRho12" <- function(r, r0, alpha = 0.05, sila = 0.8) { 3 z <- atanh(r) 4 z0 <- atanh(r0) 5 ualpha <- qnorm(1 - alpha) 6 ubeta <- qnorm(sila) 7 n <- ceiling(3 + ((ualpha + ubeta)/(z - z0))^2) 8 names(n) <- paste("r=", seq(0.1, 0.9, by = 0.1), sep = "") 9 return(n) 10 } 11 n80 <- NRho12(seq(0.1, 0.9, by = 0.1), 0, 0.05, 0.8) 12 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 13 ## 618 154 68 38 24 16 12 9 6 14 n85 <- NRho12(seq(0.1, 0.9, by = 0.1), 0, 0.05, 0.85) 15 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 16 ## 718 178 79 44 27 18 13 9 7 17 n90 <- NRho12(seq(0.1, 0.9, by = 0.1), 0, 0.05, 0.9) 18 ## r=0.1 r=0.2 r=0.3 r=0.4 r=0.5 r=0.6 r=0.7 r=0.8 r=0.9 19 ## 854 212 93 51 32 21 15 11 7 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 182 / 288 10 One-sample tests about correlation coefficient 10.22 One-sample tests about correlation coefficient Example (convergence of 𝜌 and 𝜉 to normal distribution) In , calculate pseudorandom numbers from 𝑁2 (𝜇𝜇𝜇,ΣΣΣ), where 𝜇1 = 0, 𝜇2 = 0, 𝜎1 = 1, 𝜎2 = 1, 𝑛 = 5, 10, 20, 50 and 100, 𝑀 = 10000. Use (a) 𝜌 = 0, (b) 𝜌 = 0.50 and (c) 𝜌 = 0.9. For each 𝑚 = 1, 2, … , 𝑀, calculate Pearson correlation coefficient 𝑟 𝑚 and Fisher 𝑍-variable 𝑧 𝑅,𝑚. Visualise histograms of simulated 𝑟 𝑚 and 𝑧 𝑅,𝑚 and superimpose them with equivalent theoretical normal densities. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 183 / 288 10 One-sample tests about correlation coefficient 10.23 One-sample tests about correlation coefficient Example (one-sample tests for testing hypotheses about 𝜌) Having data one-sample-correlation-skull-mf.txt and variables 𝑋 the biggest cranium hight skull.pH and variable 𝑌 face hight face.H in millimeters (mm) of ancient Egyptian male population, and we expect bivariate normal distribution of these variables, i.e., 𝑁2(𝜇𝜇𝜇,ΣΣΣ). A. Test null hypothesis (against general alternative) that the correlation coefficient of biggest cranium hight and face hight is equal to 𝜌0 = 0.251 on a significance level 𝛼 = 0.05. B. Calculate 100 × (1 − 𝛼)% empirical confidence interval for the correlation coefficient, where the coverage probability (confidence coefficient) is equal to 1 − 𝛼 = 0.95. Notes: Use (1) Fisher test statistic 𝑍W, (2) likelihood ratio test statistic 𝑈LR and equivalent confidence intervals. Can you find any equivalent of Fisher 𝑍-test in ? Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 184 / 288 10 One-sample tests about correlation coefficient 10.24 One-sample tests about correlation coefficient Figure 6: The biggest cranium hight and face hight in millimeters (mm) Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 185 / 288 10 One-sample tests about correlation coefficient 10.25 One-sample tests about correlation coefficient Mathematical formulation: Null hypothesis – 𝐻0 ∶ 𝜌 = 𝜌0 Alternative hypothesis – 𝐻1 ∶ 𝜌 ≠ 𝜌0, where 𝜌0 = 0.251. Verbal formulation: Null hypothesis – correlation coefficient of biggest cranium hight and face hight in ancient Egyptian male population is equal to 𝜌0 = 0.251. Alternative hypothesis – correlation coefficient of biggest cranium hight and face hight in ancient Egyptian male population is not equal to 𝜌0 = 0.251. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 186 / 288 10 One-sample tests about correlation coefficient 10.26 One-sample tests about correlation coefficient The elements of testing procedure: Realisation of test statistic – correlation coefficient 𝜌 is not known and therefore we need to estimate it. The estimate of correlation coefficient (unbiased estimate) 𝑟 = ̂𝜌 = 0.331 and 𝑟 𝑛 = ̂𝜌 𝑛 = 0.329 (in this example, the difference between the two estimates is very small). Then Realisation of Wald test statistic 𝑧W = √ 𝑛 − 3 (𝑧R − 𝜉0) ≐ 1.105, Realisation of likelihood ratio test statistic 𝑢LR ≐ 18.743 (if 𝐻0 ∶ 𝜌 = 0), 𝑢LR ≐ 1.242 (if 𝐻0 ∶ 𝜌 = 𝜌0). Rejection region – Wald test critical values 𝑢(1 − 𝛼/2) ≐ −1.960 and 𝑢(𝛼/2) ≐ 1.960, rejection region 𝒲1 = (𝑧min, 𝑢(1 − 𝛼/2)) ∪ (𝑢(𝛼/2), 𝑧max) ≐ (−∞, −1.960) ∪ (1.960, ∞) likelihood ratio test, critical value 𝜒2 1 (𝛼) ≐ 3.841, rejection region 𝒲 = (𝜒2 1 (𝛼) , 𝑡max) ≐ (3.841, ∞). Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 187 / 288 10 One-sample tests about correlation coefficient 10.27 One-sample tests about correlation coefficient The elements of testing procedure (cont.): 95% empirical confidence interval for the correlation coefficient Wald 95% empirical confidence interval for 𝜌 ( ̂𝜌 𝐿, ̂𝜌 𝑈 ) ≐ (0.187, 0.461), likelihood 95% empirical confidence interval for 𝜌 𝒞𝒮0.95 = {𝜌0 ∶ 𝑢LR(𝜌0) < 𝜒2 1 (𝛼)} ≐ (0.188, 0.460). p-values – Wald test p-value ≐ 2Pr(𝑍W ≥ 1.105|𝐻0) ≐ 0.269, likelihood ratio test p-value ≐ Pr(𝑈LR ≥ 1.242|𝐻0) ≐ 0.265. Statistical conclusion – 𝐻0 is not rejected on a significance level 𝛼 = 0.05, because (1) tests statistic doesn’t belong to the critical region, (2) 𝜌0 = 0.251 belongs to the confidence interval, and (3) p-value is greater than 0.05. Verbal conclusion – We are not rejecting null hypothesis that correlation coefficient 𝜌 of biggest cranium hight and face hight in ancient Egyptian male population is equal to 𝜌0 = 0.251. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 188 / 288 10 One-sample tests about correlation coefficient 10.28 One-sample tests about correlation coefficient 1 wd <- "~/Documents/TEACHING/LECTURES/M8986-SI-II/DATA/" 2 fname <- paste(wd,"one-sample-correlation-skull-mf.txt", sep = "") 3 DATA <- read.table(fname, header = TRUE) 4 DATA <- na.omit(DATA) 5 skull.pH <- DATA$skull.pH[DATA$sex == "m"] 6 face.H <- DATA$face.H[DATA$sex == "m"] 7 n <- length(skull.pH) ## 164 8 rho_hat <- cor(skull.pH, face.H) ## 0.3306431 9 10 zR <- 1 / 2 * log((1 + rho_hat) / (1 - rho_hat)) ## 0.3435501 11 rho0 <- 0.251 12 xi0 <- 1 / 2 * log((1 + rho0) / (1 - rho0)) ## 0.2564798 13 zW_obs <- sqrt(n - 3) * (zR - xi0) ## 1.104799 14 tW_obs <- sqrt(n - 2) * rho_hat / (sqrt(1 - rho_hat^2)) ## 4.459204 15 uLR_obs <- -n * log(1 - tW_obs^2 / (tW_obs^2 + (n - 2))) ## 18.98718 16 # tW_obs a ani uLR_obs nie je mozne pouzit, lebo rho0 sa nerovna 0 17 uLR_obs <- -n * (log((1 - rho0^2) * (1 - rho_hat^2) / 18 (1 - rho0 * rho_hat)^2)) ## 1.241757 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 189 / 288 10 One-sample tests about correlation coefficient 10.29 One-sample tests about correlation coefficient 1 z_critval_d <- qnorm(0.025) ## -1.959964 2 z_critval_h <- qnorm(0.975) ## 1.959964 3 lr_critval <- qchisq(0.95, df = 1) ## 3.841459 4 "CorrelationCI" <- function(x, y, conf.level = 0.95) { 5 z <- atanh(cor(x, y)) 6 n <- length(x) 7 a <- qnorm(1 - (1 - conf.level) / 2) / (n - 3)^0.5 8 CI_xi <- c(z - a, z + a) 9 CI <- tanh(CI_xi) 10 return(CI) 11 } 12 CIz <- CorrelationCI(skull.pH, face.H) ## 0.1868617 0.4605561 13 rho0i <- seq(-0.999, 0.999, by = 0.0000001) 14 uLR_i <- -n * (log((1 - rho0i^2) * (1 - rho_hat^2) / 15 (1 - rho0i * rho_hat)^2)) 16 CI_LR <- range(rho0i[which(uLR_i < lr_critval)]) 17 ## 0.1879428 0.4596729 18 p_val_zW <- 2 * (1 - pnorm(abs(zW_obs))) ## 0.2692467 19 p_val_LR <- 1 - pchisq(uLR_obs, df = 1) ## 0.2651328 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 190 / 288 10 One-sample tests about correlation coefficient 10.30 One-sample tests about correlation coefficient One-sample 𝑡-test about correlation 𝜌 in (function cor.test()). Arguments (inputs): 1 data vectors x and y – x, y 2 alternative – default alternative = "two.sided", other choices "greater", "less" 3 method – default method = "pearson" 4 coverage probability 1 − 𝛼 – conf.level, default 0.95 Outputs: 1 test statistic 𝑡W – statistic testing only null hypothesis 𝜌 = 0!!! CONFIDENCE INTERVAL FOR 𝜌 CAN’T BE CREATED 2 degrees of freedom (df) – parameter 3 p-value – p.value 4 alternative hypothesis – alternative hypothesis 5 empirical confidence interval for 𝜌 – conf.int (calculated based on Fisher 𝑍-variable !!! INCOMPATIBILITY) 6 point estimate (correlation coefficient) – estimate 7 name of this test – method Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 191 / 288 11 One-sample tests about probability Subsection 11 11 One-sample tests about probability Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 192 / 288 11 One-sample tests about probability 11.1 One-sample tests about probability Tests of probabilities (proportions) are used in anthropology in the evaluation of qualitative features in living people and on human skeletons. Qualitative features are features that we do not evaluate by measuring, but we assess their presence or absence. The populations then differ from each other in the proportion of cases with the occurrence of the trait and without the occurrence of the trait. Among the quality features we can include e.g. blood groups with one antigen, in which case it is the frequency of individuals with the presence or absence of the given antibody. Also the number of individuals of each gender in the population (sex ratio), side asymmetry (greater value on the right or greater value on the left) or side preferences (right- or left-handed) are traditionally described qualitatively. Another example of qualitative features are epigenetic signs on the skull (e.g., metopism – persistence of sutura interfrontalis in adulthood), where we usually classify the individual into one of two categories (sign present or absent) or dermatoglyphic signs, for which individuals we recommend into two categories (e.g., presence or absence of a digital triradium c). Another example of a qualitative feature can be the taste sensitivity of phenylthiocarbamide, where individuals are divided into two groups according to whether they perceive the bitter taste of substances with this component or not. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 193 / 288 11 One-sample tests about probability 11.2 One-sample tests about probability A large number of qualitative characteristics are distinguished in anthropology, and the results of their assessment are easily published in text or tables, which enables extensive comparisons with the published results of various researches (in contrast to continuous characteristics, where currently primary data/individual values are mostly not published and when comparing we have to rely on secondary data). In this way, we can very easily test whether the proportions found in our research (e.g., the ratio of individuals perceiving and not perceiving the taste of phenylthiourea in the Czech population) and the proportion found in a published comparative study differ. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 194 / 288 11 One-sample tests about probability 11.3 One-sample tests about probability Example (tests about probability) Let 𝑋 ∼ 𝐵𝑖𝑛(𝑁, 𝑝). 𝐻0 ∶ 𝑝 = 𝑝0 against 𝐻1 ∶ 𝑝 ≠ 𝑝0, where 𝜃𝜃𝜃0 = (𝑝0, 1 − 𝑝0) 𝑇 , 𝜃𝜃𝜃1 = 𝑝0 and ̂𝜃𝜃𝜃2|0 = 1 − 𝑝0. A = a = (1, 0) 𝑇 and a0 = 𝑎0 = 𝑝0. Then 1 𝑈W = (a 𝑇 ̂𝜃𝜃𝜃 − a 𝑇 𝜃𝜃𝜃0) 𝑇 (a 𝑇 ℐ( ̂𝜃𝜃𝜃)a) (a 𝑇 ̂𝜃𝜃𝜃 − a 𝑇 𝜃𝜃𝜃0), then 𝑢W = 𝑁( ̂𝑝−𝑝0) 2 ̂𝑝(1− ̂𝑝) , 2 𝑈LR = −2(ℓ(𝜃𝜃𝜃0|X) − ℓ( ̂𝜃𝜃𝜃|X)), then 𝑢LR = 2 ∑ 2 𝑖=1 𝑛𝑖 ln ̂𝑝 𝑖 𝑝0𝑖 = 2 ∑𝑖 observed𝑖 × ln observed 𝑖 expected 𝑖 , 3 𝑈S = (a 𝑇 𝑆(𝜃𝜃𝜃0)) 𝑇 (a 𝑇 (ℐ(𝜃𝜃𝜃0)) −1 a) a 𝑇 𝑆(𝜃𝜃𝜃0), then 𝑢S = 𝑁( ̂𝑝−𝑝0) 2 𝑝0(1−𝑝0) . Note: 𝑍2 W = 𝑈W and its realisation is 𝑧2 W = 𝑢W. 𝑢W = ( ̂𝑝 − 𝑝0) 2 ℐ(𝑝) = ( ̂𝑝 − 𝑝0) 2 𝑁 𝑝 (1 − 𝑝) 𝑝= ̂𝑝,𝑥/𝑁= ̂𝑝 = ( ̂𝑝 − 𝑝0) 2 ̂𝑝 (1 − ̂𝑝) /𝑁 . 𝑢S = [ 𝜕 𝜕𝜃 ℓ (𝑝|x)] 2 ℐ(𝑝) = ( 𝑥 𝑝 − 𝑁−𝑥 1−𝑝 ) 2 𝑁 𝑝(1−𝑝) 𝑝=𝑝0,𝑥/𝑁= ̂𝑝 = ( ̂𝑝 − 𝑝0) 2 𝑝0 (1 − 𝑝0) /𝑁 . Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 195 / 288 11 One-sample tests about probability 11.4 One-sample tests about probability Notes (for example above): 𝐻01 vs 𝐻11. Let 𝑋 ∼ 𝐵𝑖𝑛(𝑁, 𝑝), where 𝜃 = 𝑝. Then log-likelihood function is equal to ℓ (𝑝|x) = 𝑥 ln 𝑝 + (𝑁 − 𝑥) ln(1 − 𝑝). If 𝐻01 is true, 𝜃0 = 𝑝0. If 𝐻01 is not true, MLE 𝜃 is equal to ̂𝑝 = 𝑥/𝑁. 𝑢LR = −2(ℓ (𝑝0|x) − ℓ ( ̂𝑝|x)) = −2(𝑥 ln 𝑝0 + (𝑁 − 𝑥) ln(1 − 𝑝0) − 𝑥 ln ̂𝑝 − (𝑁 − 𝑥) ln(1 − ̂𝑝)) = 2 (𝑥 ln ̂𝑝 𝑝0 + (𝑁 − 𝑥) ln 1 − ̂𝑝 1 − 𝑝0 ) = 2 (𝑥 ln 𝑥 𝑁 𝑝0 + (𝑁 − 𝑥) ln 𝑁 − 𝑥 𝑁 − 𝑁 𝑝0 ) . Reparametrisation (notation change). Change 𝑝 to 𝑝1 and 𝑞 = 1 − 𝑝 to 𝑝2, 𝑥 to 𝑛1 and 𝑁 − 𝑥 to 𝑛2. Then 𝜃𝜃𝜃 = p = (𝑝1, 𝑝2) 𝑇 , ̂𝜃𝜃𝜃 = ̂p = ( ̂𝑝1, ̂𝑝2) 𝑇 , where ̂𝑝1 = 𝑛1 𝑁 , ̂𝑝2 = 𝑛2 𝑁 , 𝑝1 + 𝑝2 = 1 and 𝑛1 + 𝑛2 = 𝑁; 𝜃𝜃𝜃0 = p0 = (𝑝01, 𝑝02) 𝑇 , where 𝑝01 = 𝑝0 and 𝑝02 = 1 − 𝑝0. Then 𝑢LR = 2 2 ∑ 𝑖=1 𝑛𝑖 ln ̂𝑝𝑖 𝑝0𝑖 = 2 ∑ 𝑖 observed𝑖 × ln observed𝑖 expected𝑖 . Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 196 / 288 11 One-sample tests about probability 11.5 One-sample tests about probability This reparametrisation is important not only to get simpler form of likelihood ratio test statistics but also for generalisation of this statistics to either more dimensions or to two-sample and multi-sample cases. Wald 100 × (1 − 𝛼)% empirical confidence interval for 𝑝 is defined as 𝒞𝒮 (W) 1−𝛼 = {𝑝0 ∶ 𝑈W(𝑝) < 𝜒2 1 (𝛼)} . Likelihood 100 × (1 − 𝛼)% empirical confidence interval for 𝑝 is defined as 𝒞𝒮 (LR) 1−𝛼 = {𝑝0 ∶ 𝑈LR(𝑝) < 𝜒2 1 (𝛼)} . Score 100 × (1 − 𝛼)% empirical confidence interval for 𝑝 is defined as 𝒞𝒮 (S) 1−𝛼 = {𝑝0 ∶ 𝑈S(𝑝) < 𝜒2 1 (𝛼)} . Example (score confidence interval) Derive score confidence interval. Notes (for example above): Score confidence interval defined above is equivalent to 𝒞𝒮 (S) 1−𝛼 = {𝑝0 ∶ 𝑍S < 𝑢 𝛼/2}. This can be rewritten to the following quadratic equation Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 197 / 288 11 One-sample tests about probability 11.6 One-sample tests about probability 𝒞𝒮 (S) 1−𝛼 = {𝑝0 ∶ (1 + 𝑢2 𝛼/2 𝑁 ) 𝑝2 0 − (2 ̂𝑝 + 𝑢2 𝛼/2 𝑁 ) 𝑝0 + ̂𝑝2 ≤ 0} . Based on Wilson (1927), the roots are 2 ̂𝑝 + 𝑢2 𝛼/2/𝑁 ± √(2 ̂𝑝 + 𝑢2 𝛼/2/𝑁) 2 − 4 ̂𝑝2 (1 + 𝑢2 𝛼/2/𝑁) 2 (1 + 𝑢2 𝛼/2/𝑁) . Finally, we get ̂𝑝 ( 𝑁 𝑁 + 𝑢2 𝛼/2 ) + 1 2 ( 𝑢2 𝛼/2 𝑁 + 𝑢2 𝛼/2 ) ±𝑢 𝛼/2 √ √√ ⎷ 1 𝑁 + 𝑢2 𝛼/2 [ ̂𝑝 (1 − ̂𝑝) ( 𝑁 𝑁 + 𝑢2 𝛼/2 ) + 1 4 𝑢2 𝛼/2 𝑁 + 𝑢2 𝛼/2 ]. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 198 / 288 11 One-sample tests about probability 11.7 One-sample tests about probability Note: The mid-point of this interval is weighted average of two elements and is between ̂𝑝 and 0.5 with weight of ̂𝑝 going asymptotically to one. This mid-point is moving ̂𝑝 towards 0.5, and with increasing 𝑁 is this shift getting smaller. The variance is weighted average of the variance of ̂𝑝 and 0.5 using sample size 𝑁 + 𝑢2 𝛼/2 instead of 𝑁 (Agresti and Coull 1998). Minimal sample size: If 𝑐 = 𝑝−𝑝0 𝜎 , then minimal sample size 𝑁 is defined as 𝑁 ≥ ( 𝑢 𝛼/2 + 𝑢 𝛽 𝑐 ) 2 = ( 𝑢 𝛼/2 + 𝑢 𝛽 𝑝 − 𝑝0 ) 2 𝑝 (1 − 𝑝) . For 𝐻12 and 𝐻13, 𝑢 𝛼/2 has to be changed to 𝑢 𝛼. Minimal sample size for normal approximation: Hald rule 𝑁 𝑝 (1 − 𝑝) > 9 (see table). Table 3: Minimal sample sizes 𝑁 for different 𝑝 using Hald rule 𝑝 0.01 0.02 0.05 0.10 0.15 0.20 0.30 0.40 0.50 1 − 𝑝 0.99 0.98 0.95 0.90 0.85 0.80 0.70 0.60 0.50 𝑁 910 460 190 100 71 57 43 38 36 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 199 / 288 11 One-sample tests about probability 11.8 One-sample tests about probability Critical regions and power functions related to 𝑍W 𝐻0 𝐻1 𝒲 1 − 𝛽 (𝑝) 𝑝 = 𝑝0 𝑝 ≠ 𝑝0 𝒲1 = {𝑍W; |𝑍W| ≥ 𝑢 𝛼/2} 1 − Φ (𝑢 𝛼/2 − |𝑝0−𝑝| √ 𝑝(1−𝑝) 𝑁 ) 𝑝 ≤ 𝑝0 𝑝 > 𝑝0 𝒲2 = {𝑍W; 𝑍W ≥ 𝑢 𝛼} 1 − Φ (𝑢 𝛼 + 𝑝0−𝑝 √ 𝑝(1−𝑝) 𝑁 ) 𝑝 ≥ 𝑝0 𝑝 < 𝑝0 𝒲3 = {𝑍W; 𝑍W ≤ −𝑢 𝛼} 1 − Φ (𝑢 𝛼 − 𝑝0−𝑝 √ 𝑝(1−𝑝) 𝑁 ) P-values related to 𝑍W p-value = ⎧ { ⎨ { ⎩ 2Pr(𝑍W ≥ |𝑧W||𝐻01), if 𝐻11 ∶ 𝑝 ≠ 𝑝0 Pr(𝑍W ≥ 𝑧W|𝐻02), if 𝐻12 ∶ 𝑝 > 𝑝0 Pr(𝑍W ≤ 𝑧W|𝐻03), if 𝐻13 ∶ 𝑝 < 𝑝0 Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 200 / 288 11 One-sample tests about probability 11.9 One-sample tests about probability Wald 100 × (1 − 𝛼)% empirical CI for 𝑝 𝐻0 𝐻1 ( ̂𝑝 𝐿, ̂𝑝 𝑈 ) 𝑝 = 𝑝0 𝑝 ≠ 𝑝0 𝒞𝒮1−𝛼 = {𝑝0 ∶ 𝑝0 ∈ ( ̂𝑝 − 𝑢 𝛼/2 √ ̂𝑝(1− ̂𝑝) 𝑁 , ̂𝑝 + 𝑢 𝛼/2 √ ̂𝑝(1− ̂𝑝) 𝑁 )} 𝑝 ≤ 𝑝0 𝑝 > 𝑝0 𝒞𝒮1−𝛼 = {𝑝0 ∶ 𝑝0 ∈ ( ̂𝑝 − 𝑢 𝛼 √ ̂𝑝(1− ̂𝑝) 𝑁 , 1)} 𝑝 ≥ 𝑝0 𝑝 < 𝑝0 𝒞𝒮1−𝛼 = {𝑝0 ∶ 𝑝0 ∈ (0, ̂𝑝 + 𝑢 𝛼 √ ̂𝑝(1− ̂𝑝) 𝑁 )} Different 𝑔(𝑝), standard deviations and boundaries of confidence intervals: 1 if 𝑔(𝑝) = 𝑝 1−𝑝 , then ̂𝑆𝐷[𝑔( ̂𝑝)] = √ ̂𝑝 𝑁(1− ̂𝑝) 3 , 𝑝 𝐿 = 𝑔 𝐿(𝑝) 1+𝑔 𝐿(𝑝) and 𝑝 𝑈 = 𝑔 𝑈(𝑝) 1+𝑔 𝑈(𝑝) , 2 if 𝑔(𝑝) = ln( 𝑝 1−𝑝 ), then ̂𝑆𝐷[𝑔( ̂𝑝)] = √ 1 𝑁 ̂𝑝(1− ̂𝑝) , 𝑝 𝐿 = exp(𝑔 𝐿(𝑝)) 1+exp(𝑔 𝐿(𝑝)) and 𝑝 𝑈 = exp(𝑔 𝑈(𝑝)) 1+exp(𝑔 𝑈(𝑝)) . 3 if 𝑔(𝑝) = arcsin( √ 𝑝), then ̂𝑆𝐷[𝑔( ̂𝑝)] = 1 2 √ 𝑁 , 𝑝 𝐿 = sin2 (𝑔 𝐿(𝑝)) and 𝑝 𝑈 = sin2 (𝑔 𝑈 (𝑝)). Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 201 / 288 11 One-sample tests about probability 11.10 One-sample tests about probability For different 𝑔(𝑝), Wald test statistics as defined as 𝑍 (alt) W = 𝑔( ̂𝑝)−𝑔(𝑝0) ̂𝑆𝐷[𝑔( ̂𝑝)] . Example (variance for different 𝑔(𝑝)) Using delta method derive the following variances ̂𝑉 𝑎𝑟[𝑔( ̂𝑝)] for: 1 𝑔(𝑝) = 𝑝 1−𝑝 , 2 𝑔(𝑝) = ln( 𝑝 1−𝑝 ). 3 𝑔(𝑝) = arcsin( √ 𝑝). Notes: If 𝑔(𝑝) = 𝑝 1−𝑝 , then ̂𝑉 𝑎𝑟 [𝑔( ̂𝑝)] = [ 𝜕 𝜕𝑝 ( 𝑝 1−𝑝 )] 2 − 𝜕2 𝜕𝑝2 ln 𝐿 (𝑝|x) | 𝑝= ̂𝑝 = [(1−𝑝)+𝑝 (1−𝑝) 2 ] 2 𝑁 𝑝(1−𝑝) | 𝑝= ̂𝑝 = ̂𝑝 𝑁 (1 − ̂𝑝) 3 . Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 202 / 288 11 One-sample tests about probability 11.11 One-sample tests about probability Example (comparison of three CIs when ̂𝑝 = 0) Let 𝑁 = 25, 𝑥 = 0. Calculate Wald empirical 95% CI for 𝑝, score empirical 95% CI for 𝑝, likelihood empirical 95% CI for 𝑝. Solution (also in ): If 𝑥 = 0 then ̂𝑝 = 0/25 = 0. Wald empirical 95% CI for 𝑝 is equal to 0 ± 1.96√(0 × 1) /25 = (0, 0). Score empirical 95% CI for 𝑝 is equal to 1 2 ( 1.962 25 + 1.962 ) ± 1.96√ 1.962 4 (25 + 1.962) 2 = (0.000, 0.133) The log-likelihood ℓ (𝑝|x) = 25 ln (1 − 𝑝). We know that 𝐿 ( ̂𝑝|x) = 0, then 𝑢LR = −2 (ℓ (𝑝0|x) − ℓ ( ̂𝑝|x)) = −50 ln (1 − 𝑝0) and 95% likelihood CI is equal to 𝒞𝒮1−𝛼 = {𝑝0 ∶ 𝑈LR ≤ 𝜒2 1(𝛼)} , where 𝜒2 1(𝛼) = 3.84. Then 𝑔 𝑈 ( ̂𝑝) = 1 − exp (−3.84/50) = 0.074, so 95% likelihood CI is equal to (0.000, 0.074). Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 203 / 288 11 One-sample tests about probability 11.12 One-sample tests about probability 1 N <- 25; p_hat <- 0/N 2 var_phat <- p_hat * (1 - p_hat)/N 3 Wald_CI <- p_hat + c(-1,1) * qnorm(0.975) * sqrt(var_phat) ## 0 0 4 library(Hmisc) 5 Wald_CI <- binconf(x = 0,n = 25, method = "asymptotic") ## 0 0 6 7 skore_CI <- prop.test(x = 0, n = 25, conf.level = 0.95, 8 correct = FALSE)$conf.int ## 0 0.1331923 9 "LRbinom" <- function(pi_0, x, N, alpha) { 10 -2 * (x * log(pi_0) + (N - x) * log(1 - pi_0)) - 11 qchisq(1 - alpha, df = 1) 12 } 13 LR_CI <- uniroot(f = LRbinom, interval = c(0.000001, .999999), 14 x = 0, N = 25, alpha = .05)$root # 0.07395093 The CIs calculated using these three methods are different. If 𝑝 is close to zero, the distribution of 𝑋/𝑁 is extremely right skewed for small 𝑁. In these situations, score or likelihood CI should be used. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 204 / 288 11 One-sample tests about probability 11.13 One-sample tests about probability Example (comparison of several confidence intervals for 𝑝) Compare the following empirical CIs for 𝑝: A. Wald CI, B. likelihood CI, C. score CI, D. back-transformed Wald CI for 𝑔(𝑝) = 𝑝 1−𝑝 , E. back-transformed Wald CI for 𝑔(𝑝) = ln 𝑝 1−𝑝 , F. back-transformed Wald CI for 𝑔(𝑝) = arcsin( √ 𝑝), where 1 𝑁 = 1400 and 𝑝 = 0.10, 2 𝑁 = 40 and 𝑝 = 0.10, 3 𝑁 = 1400 and 𝑝 = 0.30, 4 𝑁 = 40 and 𝑝 = 0.30, 5 𝑁 = 1400 and 𝑝 = 0.50, 6 𝑁 = 40 and 𝑝 = 0.50. Stanislav Katina (ÚMS, MU) Statistical inference I and II 17.04.2024 16:04 205 / 288