Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják 2 Basic Idea Behind the Statistical Tests Generated random sequences ­ properties as sample drawn from uniform/rectangular distribution Particular tests are based on test statistic Expected value of some test statistic is known for the reference distribution Generated random stream is subjected to the same test Obtained value is compared against the expected value Boundless number of statistical test can be constructed Some of them are accepted as the de facto standard NIST battery consists of 15 such tests (e.g. frequency test) Generators that pass such tests are considered "good" Absolute majority of generated sequences must pass 3 Statistical Hypothesis Testing ­ Basics Null hypothesis (H0) ­ denotes test hypothesis H0 = the sequence being tested is random Alternative hypothesis (HA) ­ negates H0 HA = the sequence is not random Each test is based on some test statistic (TS) TS is quantity calculated from our sample data TS is random variable/vector obtained from transformation of random selection TS have mostly standard normal or chi-square (2) as reference distributions After each applied test must be derived conclusion that rejects or not rejects null hypothesis 4 Statistical Hypothesis Testing ­ Errors Conclusion generation procedure and errors Probability of type I error () = level of significance Set prior the test; typically between 0.0001 and 0.01 Nonrandom sequence, produced by "good" generator Probability of type II error () Random sequence, produced by "bad" generator and are related to each other and to sample size Conclusion Real situation H0 is not rejected H0 is rejected H0 is true good decision type I error H0 is not true type II error good decision 5 Statistical Hypothesis Testing ­ Core Critical value ­ threshold between rejection and non-rejection regions Two (quite similar) ways of testing => critical value; test statistic => value; compare Test statistic => P-value; ; compare NIST battery uses P-values P => reject H0 P > => do not reject H0 6 Frequency (Monobit) Test Basic idea Number of zeros and ones expected in the truly random sequence should be the same Description Length of the bit string: n Sequence of bits: = 1, 2, ..., n Test statistic: sobs = |Sn|/n Sn = X1 + X2 + ... + Xn, where Xi = 2i ­ 1 (conversion to 1) The absolute value => half normal distribution Example (for n = 10) = 1011010101; Sn = 1­1+1+1­1+1­1+1­1+1 = 2 sobs = |2|/ 10 = 0.632455532; P-value = 0.5271 For = 0.01: 0.5271 > 0.01 => "is random" 7 Frequency Test within a Block Basic idea Number of zeros and ones expected in a M-bit block of truly random sequence should be the same M = 1 => Frequency (Monobit) Test. Description Number of non-overlapping blocks: N = n/M Proportion of ones in each M-bit block: Test statistic: 2 obs = 4M(i ­ 1/2)2, where 1 i N Example (for n = 10 and M = 3) = 0110011010; N1 = 011, N2 = 001, N3 = 101 1 = 2/3, 2 = 1/3, 3 = 1/3; 2 obs = 1; P-value = 0.8012 For = 0.01: 0.8012 > 0.01 => "is random" 8 Runs Test Basic idea A run is the uninterrupted sequence of identical bits Number of runs determines the speed of oscillation Description Proportion of ones: = (i)/n Test statistic: 2 obs = r(k) + 1, where 1 k n­1 If k = k+1, then r(k) = 0, otherwise r(k) = 1 Example (for n = 10) = 1001101011; = 6/10 = 3/5 2 obs = (1+0+1+0+1+1+1+1+0)+1 = 7 P-value = 0.1472 For = 0.01: 0.1472 > 0.01 => "is random" 9 Cumulative Sums Test Basic idea A cumulative sums of the adjusted (­1, +1) digits in the sequence should be near zero Description Normalizing: Xi = 2i ­ 1 (conversion to 1) Partial sums of successively larger subsequences Forward: S1 = X1; S2 = X1 + X2; ... Sn = X1 + X2 + ... + Xn Backward: S1 = Xn; S2 = Xn + Xn­1; ... Sn = Xn + Xn­1 + ... + X1 Test statistic (normal distribution): sobs = max1kn|Sk| Example (for n = 10) = 1011010111; X = 1,­1,1,1,­1,1,­1,1,1,1 S(F) = 1,0,1,2,1,2,1,2,3,4; sobs = 4; P-value = 0.4116 For = 0.01: 0.4116 > 0.01 => "is random" 10 NIST Testing Strategy 1. Select (pseudo) random number generator 2. Generate sequences a) Generate set of sequences or one long sequence i. Divide the long sequence to set of subsequences 3. Execute statistical tests a) Select the statistical tests b) Select the relevant input parameters 4. Examine (and analyse) the P-values a) For fixed a certain percentage are expected to failure 5. Assign Pass/Fail 11 Interpretation of Empirical Results Three scenarios may occur when analysing P-values The analysis indicate a deviation from randomness The analysis indicate no deviation from randomness The analysis is inconclusive NIST has adopted two approaches Examination of the proportion of sequences that pass the statistical test Check for uniformity of the distribution of P-values If either of these approaches fails => new experiments with different sequences Statistical anomaly? Clear evidence of non-randomness? 12 Proportion of Sequences Passing a Test Example 1000 binary sequences; = 0.01 996 sequences with P-values > 0.01 Proportion is 996/1000 = 0.9960 The range of acceptable proportions Determined by confidence interval p'3(p'(1-p')/n), where p' = 1 ­ ; n is sample size If proportion falls outside => data are non-random Threshold is the lower bound For n=100 and = 0.01 it is 0.96015 For n=1000 and = 0.01 it is 0.98056 13 Uniform Distribution of P-values Interval [0,1] divided to 10 subintervals Visually may be illustrated by using histogram P-values within each subinterval are counted Chi-square (2) goodness-of-fit test Level of significance = 0.0001 Test statistic: 2 = (oi ­ ei)2/ei oi is observed number of P-values in ith subinterval ei is expected number of P-values in ith subinterval Sample size multiplied by probability of occurrence in each subinterval (i.e. for sample size n it is n/10) PT-value is calculated and compared to PT-value > 0.0001 => sequence is uniformly distributed 14 Conclusion Randomness testing is based on statistical hypothesis testing Each statistical test is based on some function of data (called the test statistic) There exists many statistical tests No set of such tests can be considered as complete New testable statistical anomaly can be ever found Correct interpretation of empirical results should be very tricky