E7441: Scientific computing in biology and biomedicine Stochastic methods Vlad Popovici, Ph.D. RECETOX Outline 1 Introduction to Monte Carlo methods Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean 2 Bootstrapping Introduction Empirical distribution and the plug-in principle Improved bootstrap confidence intervals Bootstrapping for hypothesis test 3 Permutation tests Introduction Example/exercise Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 2 / 56 Introduction to Monte Carlo methods Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 3 / 56 Numerical experiments: simulations General approach: 1 identify the random variable of interest X 2 identify/postulate its distributional properties 3 generate one or several large samples identical and independely distributed X1, . . . , Xn from the distribution of X 4 estimate the quantity of interest (e.g. estimate EX using sample average) and assess its accuracy (e.g. via confidence intervals) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 4 / 56 Random number generators (RNGs) all random variables can be generated by transforming a uniformly distributed random variable X ∈ U(0, 1) there is no algorithmic (deterministic) way of generating infinitely long sequences of true random numbers computers generate pseudorandom numbers there exist devices to generate (believed to be) random sequences: e.g. radioactive decay: the time elapsed between emission of two consecutive particles (α, β, γ). See: http://www.fourmilab.ch/hotbits Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 5 / 56 RNGs, cont’d two aspects: 1 generate good pseudorandom numbers in U(0, 1): independent and uniformly distributed 2 find proper trasformations to the desired distribution you cannot prove that an RNG is truly random there are a batteries of tests that an RNG must pass to be acceptable for any RNG, one can find a statistical test that will reject it as a good generator Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 6 / 56 RNGs, cont’d Formalism: an RNG is a structure (S, µ, f, U, g) where ▶ S is a finite set of states ▶ µ is a probability distribution on S used to select the initial seed (state) s0 ▶ f : S → S is a transition function. The state of the RNG evolves according to the recurrence si = f(si−1) for i ≥ 1 ▶ U is the output space. Usually U = (0, 1) ▶ g : S → U is the output function. The numbers ui = g(si) are called random numbers produced by the RNG Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 7 / 56 RNGs, cont’d S is finite ⇒ ∃l ≥ 0, j > 0 finite such that sl+j = sl this implies that ∀i ≥ l, ui+j = ui since both f and g are deterministic the smallest positive j for which this happens is called period lenght of the RNG and is denoted by ρ obviously, ρ ≤ |S| ex.: if the state is represented on k bits, then ρ ≤ 2k Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 8 / 56 RNGs, cont’d Quality criteria: extremly long period ρ efficient implementation repeatability portability availability of jump-ahead property: quickly compute the si+v given si, so you can partition a long sequence in subsequences to be used in parallel randomness Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 9 / 56 RNGs, cont’d Coverage: let Ψt = {(u0, . . . , ut )|s0 ∈ S} is Ψt uniformly covering the hypercube (0, 1)t ? tests of discrepancy between the empirical distribution of Ψt and the uniform distribution figure of merit: a measure of the coverage quality Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 10 / 56 RNGs, cont’d Randomness and i.i.d: statistical tests: try to detect empirical evidence against H0: "ui are realizations of i.i.d U(0, 1)". Example: diehard tests (Marsaglia, 1995) passing more tests improves the confidence in RNG, but cannot prove the RNG is foolproof for all cases good RNG passes a set of simple tests polynomial time perfect RNG: there is no polynomial-time algorithm the can predict any given bit of ui with a probability of success ≥ 1/2 + 2−kϵ, for some ϵ > 0, after observing u0, . . . , ui−1 the usual RNGs are not polynomial time perfect Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 11 / 56 RNGs, cont’d Multiple Recursive Generator has a general recurrence xi = (a1xi−1 + · · · + ak xi−k ) mod m where m (modulus) and k (order) are integers carefully selected, and coefficients a1, . . . , ak ∈ Zm. The state is si = (xi−k+1, . . . , xi)T . When m is prime, it is possible to select ai such that the period length ρ = mk − 1. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 12 / 56 RNGs, cont’d Example (historical, not in serious use anymore): MLCG (Lehmer, 1948): multiplicative linear congruential generator: si+1 = (a1si + a0) mod m This generates integers that are converted to (0, 1) by division with m. Weakness: (Marsaglia, 1968): if (si, . . . , si+d) represent some points in a d−dimensional space, they have a lattice structure: they lie in a number of specific hyperplanes. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 13 / 56 RNGs, cont’d Famous multipliers (a0 = 0): a1 = 23, m = 108 + 1: original version, has higher order correlations a1 = 65539, m = 229 : infamous RANDU generator (IBM 360 series, in the 1970s): catastrophic higher order correlations a1 = 69069, m = 232 (Marsaglia, 1972): good properties and converage up to 6 dimensions (x, y, z) coordinates taken as consecutive values generated by RANDU (a1 = 65539, m = 229 ) from Wikipedia Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 14 / 56 RNGs, cont’d - Exercise write a function random_sample_mlcg(n, a0=0, a1=20, m=53, s0=21) which implements the procedure MLCG (with some default parameters), and returns a sequence of n numbers. generate a sequence and plot ui+1 vs ui u = random_sample_mlcg(200) plt.scatter(u[2:],u[:-1]) discuss! Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 15 / 56 RNGs, cont’d - Exercise let n = 20000 execute n = 20000 u = random_sample_mlcg(n, a0=0, a1=65539, m=2**31, seed=10) z = (u - 0.5) / (2**31-1) is the histogram reasonably uniform? _ = plt.hist(z, bins=20) what about the coverage of (0, 1) × (0, 1)? z1 = z[:-2]; z2 = z[1:-1]; z3 = z[2:]; plt.scatter(z1, z2) any structure? i = np.argwhere(z3 < 0.01); plt.scatter(z1[i], z2[i]) discuss! Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 16 / 56 RNGs, cont’d In general: don’t let the RNG to be "randomly" selected! for serious work, always set the seed, check the RNG, etc: they might be version-dependent; also you want other to be able to reproduce your results read the help for numpy.random using numpy one can specify the generator and a wide range of distributions using something like numpy.random..() like numpy.random.default_rng(seed=42).uniform(0, 1, 20) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 17 / 56 Non-uniform r.v. generation (NRNG) Requirements: correctness: a good approximation of the theoretical distribution robustness: RNG should work well on a large range of parameters efficiency Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 18 / 56 NRNG: inversion method best choice, when feasible to generate X with distribution function F, starting from a uniform variate U ∈ (0, 1), apply the inverse F−1 to U: X = F−1 (U) := min{x|F(x) ≥ U} easy to see that the distribution of X is as required: P[X ≤ x] = P[F−1 (U) ≤ x] = P[U ≤ F(x)] = F(x) for some distributions, F−1 can be obtained analytically. Ex.: Weibull distribution F(x) = 1 − exp(−(x/β)α), with α, β > 0; has the inverse F−1 (U) = β[− ln(1 − U)]1/α other distributions do not have a close form inverse: e.g. normal, χ2 ,... ⇒ approximations Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 19 / 56 NRNG: inversion method, cont’d Example (principle of inversion): # return X with cdf F , f o r a # uniform r . v . 0 < U < 1 # ( look −up table method ) X = 0 while (F(X) < U) X = X + 1 return (X) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 20 / 56 NRNG: Rejection method consider F with a compact support and bounded F(x) ≤ k consider a series of points (Xi, Yi) uniformly distributed under the density function the distribution of Xi is the same as the distribution of X (F): P[a < Xi < b] = probability of a point falling in the region = b a F(x)dx procedure: 1 generate X ∼ U[a, b] and Y ∼ U[0, 1] independently 2 if Y < F(X) return X, otherwise repeat Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 21 / 56 NRNG: Rejection method - Exercise Implement the rejection method for generating random variates from the pdf F(x) =    x if 0 < x < 1 2 − x if 1 ≤ x < 2 0 otherwise Generate n = 5000 r.v. in [0, 2] and plot their histogram. Histogram of z z Density 0.0 0.5 1.0 1.5 2.0 0.00.20.40.60.81.0 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 22 / 56 Generating normally distributed r.v. you can use the rejection method alternative: Box-Muller algorithm: based on the observation that the coordinates of points in a 2D Cartesian system described by 2 independent normal distributions correspond to polar coordinates that are realizations of 2 independent uniform distributions Box-Muller transform: if U1, U2 are independent uniformly distributed on (0,1), then Z1 = r cos θ = −2 ln U1 cos(2πU2) Z2 = r sin θ = −2 ln U1 sin(2πU2) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 23 / 56 Improved Box-Muller algorithm, with rejection step: 1 generate U1, U2 ∼ U(−1, 1) 2 accept S2 = U2 1 + U2 2 if S2 < 1, else go to step 1 3 set W = −2ln S2 S2 4 return X = U1W and Y = U2W Exercise: Implement the procedure above in Python! Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 24 / 56 Other methods for NRNG kernel density estimation: approximate the inverse using a kernel for which efficient generators exist composition: consider F to be a convex combination of several distributions Fj: F(x) = j pjFj(x) To generate from F, one generates J with probability pj and then generates X from Fj convolution: if X = Y1 + · · · + Yn, with Yj independent with specified distributions, then generate the Yj’s and sum them etc etc numpy.random has efficient implementations for many standard distributions Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 25 / 56 MC methods for inference General approach: 1 identify the random variable of interest X 2 identify/postulate its distributional properties 3 generate one or several large samples identical and independently distributed X1, . . . , Xn from the distribution of X 4 estimate the quantity of interest (e.g. estimate EX using sample average) and assess its accuracy (e.g. via confidence intervals) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 26 / 56 MC inference about the mean Reminder: problem: compute z = EZ when z is not available analytically, but Z can be simulated consider n replicates Z1, . . . , Zn of Z and estimate z by the empirical mean ˆz = i Zi/n denote σ2 = Var{Z} < ∞ central limit theorem: √ n(ˆz − z) → N(0, σ2 ), as n → ∞ from this, an 1 − α confidence interval can be obtained as ˆz − z1−α/2 σ √ n , ˆz − zα/2 σ √ n where zα denotes the α−quantile of the normal distribution (Φ(zα) = α) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 27 / 56 MC for inference about the mean - Exercise Implement the following procedure: write the Python function pdf1(n) to generate n = 1000 r.v. drawn from f(X) = 0.2N1(X) + 0.3N2(X) + 0.5N3(X) where Ni are Gaussians with parameters µ1 = 0, σ1 = 0.5, µ2 = 6.5, σ2 = 1.25, µ3 = 14.5, σ3 = 0.75. Do not use for loops or any function from the various nonstandard packages! plot the histogram repeat the procedure for n = 10000 and n = 100000. what do you see? Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 28 / 56 generate p = 1000 samples of n = 1000 r.v.: X[p × n] compute ˆxi as the sample average for each of the p samples and the grand average ˆX what is the true mean of this mixture of Gaussians? test the normality of the distribution of ˆxi (find an appropriate test!) estimate the 95% empirical confidence interval (using quantiles of the distribution of ˆxi) and compare it with the theoretical one (using sample variance for σ2 ) obtained from a single sample (say, X[1,]) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 29 / 56 Introduction to bootstrapping Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 30 / 56 Introduction resampling technique for statistical inference: assess uncertainty especially useful when no assumptions can be made on the underlying model confidence intervals without distributional assumptions there are many versions of bootstrapping Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 31 / 56 Example (from Efron, Tibshirani, 1993): Group Heart attacks Subjects aspirin 104 11037 placebo 189 11034 The odds ratio: ˆθ = 104/11037 189/11034 = 0.55 so it seems that aspirin reduced the incidence of heart attacks. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 32 / 56 Log-odds can be approximated by the normal distribution, so we use it to construct a 95% CI. Standard error is SE(log(OR)) = 1/104 + 1/189 + 1/11037 + 1/11034 = 0.1228 giving a 95% CI for log θ: log ˆθ ± 1.96 × SE(log(OR)) = (−0.839, −0.357) with a corresponding 95% for θ obtained by exponentiating: (0.432, 0.700). Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 33 / 56 At the same time, aspirin seems to have a detrimental effect on strokes Group Stroke Subjects aspirin 119 11037 placebo 98 11034 which leads to an odds ratio of ˆθ = 1.21 with a 95% CI of (0.925, 1.583). Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 34 / 56 ...and how bootstrap would proceed to infering the CI: create a sample for the treatment (s1) and one for the placebo (s2) group as vectors containing as many 1s as case there are draw with replacement a random sample from s1 and from s2, of the same size as the groups compute the odds ratios based on the drawn samples repeat the process a number of times and record all the odds ratios computed using their empirical distribution, estimate the CI of interest Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 35 / 56 A naive implementation n1 = 11037 n1_cases = 119 n2 = 11034 n2_cases = 98 s1 = np.ones((n1, ), dtype=np.int64); s1[n1_cases:] = 0 s2 = np.ones((n2, ), dtype=np.int64); s2[n2_cases:] = 0 B = 1000 # no. of bootstraps p = n2 / n1 theta = np.zeros((B,), dtype=np.float64) for i in np.arange(B): theta[i] = p * np.sum( np.random.choice(s1, n1, replace=True) ) / np.sum( np.random.choice(s2, n2, replace=True) ) _ = plt.hist(theta, 50) print("95% Confidence interval for theta: ", np.quantile(theta, q=(0.025, 0.975))) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 36 / 56 Histogramof theta theta Frequency 0.8 1.0 1.2 1.4 1.6 1.8 050100150200250 the CI estimate by the quantiles is not the most precise nor efficient that can be obtained by bootstrapping it works for symmetric, close to normal distributions of the bootstrap replicate Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 37 / 56 The empirical distribution the underlying probability distribution F generates the observed sample: F → (x1, . . . , xn) = x the empirical distribution ˆF is the discrete distribution that puts probability 1/n at each value xi, i = 1, . . . , n ˆF assigns to a set A in the sample space of x its empirical probability: Prob{A} = #{xi ∈ A} n = ProbˆF {A} Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 38 / 56 a parameter is a functional of the distribution function, θ = t(F). Example: the mean µ(F) = xdF(x) a statistic is a function of the sample x. Example: the sample average, ˆµ = 1 n n i=1 xi the plug-in estimate of a parameter θ = t(F) is defined to be ˆθ = t(ˆF) (sometimes called summary statistics, estimates or estimator) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 39 / 56 Bootstrap estimate of the standard error bootstrap sample: ˆF → (x∗ 1 , . . . , x∗ n) = x∗ (resampling with replacement) let ˆθ = s(x) be an estimate for the parameter of interest the question is: what is the standard error of the estimate? bootstrap replication of ˆθ is ˆθ∗ = s(x∗ ) ideal bootstrap estimate of SE: seˆF (ˆθ∗ ) i.e. the standard error of ˆθ for data sets of size n randomly sampled from ˆF unfortunately, close-form formulas exist only for some estimators Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 40 / 56 General form of the bootstrap method by resampling with replacement from x one samples from the empirical distribution ˆF x∗ b are the bootstrap samples of size n s(x∗ b ) = ˆθ∗ b are the bootstrap replications of the parameter of interest θ Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 41 / 56 Bootstrap estimation of standard errors 1 select B independent bootstrap samples x∗ 1 , . . . , x∗ B 2 evaluate the bootstrap replicate of each bootstrap sample ˆθ∗ b = s(x∗ b ), b = 1, 2, . . . , B 3 estimate the standard error seˆF (ˆθ) by the sample standard deviation of the B replications: seB = 1 B − 1 B b=1 ˆθ∗ b − ˆθ∗ 0 2 where ˆθ∗ 0 = 1 B B b=1 ˆθ∗ b Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 42 / 56 Homework Implement the previous procedure in Python: write a function bstrap_nonparam(x, B, s, ...) which will generate B bootstrap samples x∗ b and for each of them will compute the bootstrap replicate of the parameter: ˆθ∗ b = s(x∗ b , · · · ) write a function bstrap_theta0(T) which computes the bootstrap estimate of the parameter, given the bootstrap replicates in the vector T (ˆθ∗ 0 ) write a function bstrap_se(T) which computes the bootstrap estimate of the standard error of the parameter, given the bootstrap replicates in the vector T (seB) use the Rainfall data set to compute the bootstrap estimate of the mean, median and corresponding standard errors - see the Jupyter notebook for data. compare with textbook results! (discuss!) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 43 / 56 Bias–corrected and accelerated CI the quantile-based CI is not tight enough nor robust idea: better exploit the quantiles of the empirical distribution by: ▶ correcting the bias ▶ improving convergence simple bootstrap quantile-based CI: for an (1 − 2α) coverage, the bounds of the CI are given by (ˆθ∗(α), ˆθ∗(1−α)) where ˆθ∗(q) is the q−th quantile of the bootstrap replicates Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 44 / 56 The BCa CI is given by (ˆθ∗(α1), ˆθ∗(α2)) where α1 = Φ ˆz0 + ˆz0 + z(α) 1 − ˆa(ˆz0 + z(α)) α2 = Φ ˆz0 + ˆz0 + z(1−α) 1 − ˆa(ˆz0 + z(1−α)) where Φ(·) is the standard normal CDF z(q) is the q−th quantile of standard normal distribution ˆa and ˆz0 are cleverly chosen Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 45 / 56 The parameters of BCa CIs: ˆz0 = Φ−1   #{ˆθ∗ b < ˆθ} B   ˆa = n i=1 ˆθ(·) − ˆθ(i) 3 6 n i=1 ˆθ(·) − ˆθ(i) 2 3/2 where ˆθ(i) is the value of the parameter computed on the vector x with the i−th component removed (jackknife values of the parameter) ˆθ(·) = n i=1 ˆθ(i)/n Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 46 / 56 Exercise: implement the BCa procedure in R: (yes, not in Python) write a function bstrap.bca(x, B, s, ..., alpha=c(0.025, 0.05)) that returns the low and upper bounds of the CI computed by BCa method you can use (call) the previous function bstrap.nonparam compute the 90% and 95% BCa CIs for the mean of Rainfall data: bstrap.bca(Rainfall, 2000, mean) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 47 / 56 Important properties of BCa CIs transformation respecting: the bounds of the CIs transform correctly if the parameter is changed by some function: e.g. the CIs for √ ·-transformed parameter are obtained by taking √ of the bounds of the parameter itself second order accurate: convergence rate of 1/n to true coverage Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 48 / 56 Bootstrapping for tests consider two possibly different distributions F and G, F → z = (z1, . . . , zn) G → y = (y1, . . . , ym) hypotheses: H0 : F = G H1 : F G F = G ⇔ ProbF {A} = ProbG{A} for all sets A observe a test statistic ˆθ (e.g. mean difference) achieved significance level (ASL): probability of observing that large a value under H0: ASL = ProbH0 {ˆθ∗ ≥ ˆθ} Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 49 / 56 Bootstrapping hypothesis testing procedure 1 choose a test statistic (not necessary a parameter): t(x) (for example: t(x) = ¯z − ¯y) 2 draw B samples of size n + m from x = (z, y) and call the first n observations z∗ and the remaining m y∗ 3 evaluate t(·) for each sample: t(x∗ b ) (for example t(x∗ b) = ¯z∗ b − ¯y∗ b ) for b = 1, 2, . . . , B 4 approximate ASLboot by ASLboot = #{t(x∗ b) ≥ t(x)}/B Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 50 / 56 Permutation tests Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 51 / 56 Permutation tests nonparametric testing procedure allow testing hypotheses when the properties of the test statistic under the null hypothesis are not known do not make assumptions on the data work on small data sets idea: generate the distribution of the test statistic under the null hypothesis from the data Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 52 / 56 exact permutation tests: for (very) small data sets, generate all permutations and compute the corresponding test statistics random test: for large data sets, generate a number of random permutations, for which compute the test statistic test procedure: count how many times the test statistic from the permutations is more extreme than the real test statistic and reject H0 if the proportion is below the predefined α−level Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 53 / 56 Example - two populations tests consider the data vectors mouse.c and mouse.t for the control and treatment arms of an experiment (some clinical variable) implement a permutation testing procedure for testing H0 : there is no significant difference in the clinical variable between control and treatment vs H1 : there is a significant difference in the clinical variable between control and treatment which test statistic? what to permute? how many permutations? what should be changed if the test was about superiority of treatment vs control? Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 54 / 56 Histogram of d d Frequency −50 0 50 050010001500 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 55 / 56 Questions? Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing in biology and biomedicine 56 / 56