2 Weak law of large numbers and central limit theorem Usually, the data processed by statisticians are taken from a population. The portion drawn is called a sample. The value of any population parameter is always constant, whereas sample parameters are random variables varying from sample to sample. If the sample is large enough to embrace the entire population, there is no difference between the sample parameter and population parameter. In general, the larger the sample size, the more probable it is that the sample parameter comes arbitrarily close to the population parameter. This fact is often called the law of large numbers. It can be specified by Chebyshev theorem or Bernoulli's theorem. The central limit theorem states, that the random variable X, designed as a sum of large number of mutually independent variables X1, X2, . . . , Xn, approaches (on general assumptions) the standard normal distribution. The Moivre-Laplace theorem is the simplest specification of CLT. Generalizing M-L theorem we obtain Lindberg-Levy theorem. The most generalized CLT was formulated by Ljapunov; this will not be shown. Before we list the mentioned theorems, we have to define the term convergence of the sequence of random variables. In probability theory there exist several notions of convergence of random variables, three of them we provide. Definition 2.1 The random sequence (X1, X2, . . . , Xn, . . .) is said to converges toward random variable X (i.) surely if: : lim n Xn() = X() [This is a "common" convergence of the sequence of numbers] (ii.) in probability, if: > 0 : lim n P(|Xn - X| ) = 1 [As n increases, the larger differences between Xn and X are extremely unlikely. It is also often denoted plim Xn = X] (iii.) in distribution, if: for the distribution functions F1(x1) X1, . . . , Fn(xn) Xn, . . ., eventually F(x) X: lim n Fn(x) = F(x) for any x such that F is continuous function [This is a weakest form of convergence, it is defined by means of distribution functions] Remark 2.2 The random sequence may converge to the constant c, what is included in previous definition. Consider a random variable that attains the deterministic value c with probability one. Theorem 2.3 1. Sure convergence of a random variable implies convergence in probability; convergence in probability implies convergence in distribution. The converse implications do not hold in general.] 7 2. The random sequence (X1, X2, . . . , Xn, . . .) converges in probability toward constant if two following conditions hold: lim n E(Xn) = lim n D(Xn) = 0 Theorem2.4 Weak law of large numbers (Chebyshev theorem) Let us consider the random sequence (X1, X2, . . . , Xn, . . .) of independent and identically-distributed random variables with constant expected value and constant variance 2 . Then the random sequence of samle means (X1, 1 2 2 i=1 Xi, . . . , 1 n n i=1 Xi, . . .) converges in probability towards the expected value . Thus for any > 0 it holds: P(| 1 n n i=1 Xi - | < ) 1 - 2 n2 or lim n P(| 1 n n i=1 Xi - | < ) = 1 [Simply: plim 1 n n i=1 Xi = . Interpreting this result, the weak law essentially states that for sufficiantly large sample there will be extremely high probability, that the sample mean will be arbitrarily close to the expected value (population mean). Thus for sufficiantly large n the expected value may be estimated by sample mean.] Theorem2.5 Bernoulli (corollary of Chebyshev theorem) Let the random variable Yn gives the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability , 0 < < 1. Then the random sequence of relative frequencies (Y1, Y2 2 , . . . Yn n , . . .) converges in probability towards the probability of success . Thus for any > 0 it holds: P(| Yn n - | < ) 1 (1 - ) n2 or lim n P(| Yn n - | < ) = 1 The collocation between Chebyshev theorem and Bernoulli theorem will be clear from the following example. Example 2.6 The probability, that the manufactured product is a waster, is 12 3000 . The production reviewing tested 3000 products. What is the probability that the difference between the relative frequence of wasters and the probability of wasters is 0.01 at most? Solution Let Y3000 stands for a random variable giving the number of wasters (successes) in 3000 independent experiments. Then Y3000 Bi(3000, 12 3000 ) As n increases, the relative frequency of successes should be getting close to the probability of the success. We have to find out the probability, that for n = 3000 the relative frequency of successes 8 will deviate from the probability of the success not more than 0.01. Thus in Bernoulli theorem we will set up 0.01 as . For any > 0 it holds: P(|Yn n - | < ) 1 - (1-) n2 . Thus P(|Y3000 3000 - 12 3000 | < 0, 01) 1 - 12 3000 (1- 12 3000 ) 30000,012 . = 0, 872 Using Chebyshev theorem Xi stands for random variable with Bernoulli distribution (zero-one or alternative distribution) which takes value 1 in case of waster (success) and value 0 in case of goodquality product (failure). Thus Xi A( 12 3000 ), i = 1, . . . , 3000 E(Xi) = 12 3000 D(Xi) = 12 3000 (1 - 12 3000 ); X1, . . . , X3000 are independent. Then set up 0.01 as and use Chebyshev theorem. (Consider that binomial random variable is equal to the sum of independent, identically distributed random variables, all Bernoulli distributed with success probability .) Theorem2.7 central limit theorem (Lindberg-Lévy) Let (X1, . . . , Xn, . . .) be a random sequence of n independent and identically distributed random variables each having constant expected value and constant variance 2 . Let us consider X = n i=1 Xi and let us derive the expected value and the variance of the random variable X. E(X) = E( n i=1 Xi) = n i=1 E(Xi) = n i=1 = n D(X) = D( n i=1 Xi) = n i=1 D(Xi) = n i=1 2 = n2 Here let us consider standardized sum Un = X-E(X) D(X) = n i=1 Xi-n n [n can arbitrarily increase] Then the random sequence of standardized sums (U1, U2, . . . , Un, . . .) converges in distribution toward random variable U N(0, 1). Thus u R : limn P(Un u) = u - 1 2 e- t2 2 dt = (u), where (u) is the cumulative distribution function of N(0, 1). We write for short Un N(0, 1) and Un is said to follow asymptotic standard normal distribution. [Notice that Un = 1 n n i=1 Xi- n . So the central limit theorem states that as the sample size n increases, the distribution of the sample average of random variables X1, . . . , Xn approaches the normal distribution with a mean and variance 2 n irrespective of the shape of the original distribution.] Theorem2.8 Moivre-Laplace (corollary of Lindberg-Lévy theorem) Let Yn Bi(n, ), n = 1, 2, . . .. Then E(Yn) = n D(Yn) = n(1 - ). and Un = Yn-n n(1-) N(0, 1) [Interpreting this result, Moivre-Laplace theorem essentially states that if n is large enough, then an excellent approximation to Bi(n, ) is given by the normal distribution N(n, n(1 - )). (Using convergence in distribution.)] Remark 2.9 The Moivre-Laplace theorem allows us to apply the approximate formula, which substitutes onerous calculation of binomial distribution function by simple searching in tabs of standard normal distribution. Compare Exact calculation: 9 P(Yn y) = y t=0 n t t (1 - )n-t ... time-consuming onerous summation Approximation by normal distribution: P(Yn y) = P Yn-n n(1-) y-n n(1-) y-n n(1-) N(0, 1), where (u) is tabulated standard normal distribution function. The approximation may be used if following assumptions hold: n(1 - ) > 9 1 n+1 < < n n+1 . Example 2.10 Consider 100 independent tosses of a die. Find the probability that the number "six" does appear at least 20 times. Solution Let Y100 stands for a random variable giving a number of obtained "sixes" in 100 tosses, Y100 Bi(100, 1 6 ). First we have to validate the assumptions of using normal approximation.: n(1 - ) = 100 1 6 (1 - 1 6 ) = 500 36 > 9 1 101 < 1 6 < 100 101 , hence both assumptions hold. So that to estimate the wanted probability we can use the Moivre-Laplace theorem. P(Y100 20) = 1 - P(Y100 < 20) = P(Y100 19) = 1 - P Y100-100/6 1001/65/6 19-100/6 1001/65/6 = 1 - P(Un 0, 626) 1 - (0, 626) = 1 - 0, 73565 = 0, 2635. (The exact software calculation is 0,2198.) There are some cases, in which the approximation to binomial distribution by normal distribution is not convenient. Especially in case of extremely small probability of success the approximation by Poisson distribution fits better. Theorem2.11 Poisson Let Y1, Y2, . . . be a random sequence of independent random variables, Yn Bi(n, n), n = 1, 2, . . . and lim n nn = . Then the sequence Y1, Y2, . . . Yn, . . . converges in distribution towards random variable Y Po(), thus Yn Po(). [The random variable Y follows the Poisson distribution with parameter = n, which can be used as an approximation to the binomial distribution.] Remark 2.12 The Poisson theorem allows us to apply the approximate formula, which substitutes onerous calculation of binomial distribution (or probability) function by simple searching in tabs of Poisson distribution (or probability) function. ˇP(Yn y) = y t=0 n t t (1-)n-t Fn(y) Po(n), where Fn(y) is a Poisson distribution function with parameter = n ˇP(Yn = y) = n y y (1 - )n-y (n)y y! e-n (Compare with Poisson probability function.) The approximation by Poisson distribution may be used if following assumptions hold: n 30 0, 1. Example 2.13 Testing a reliability of a particular device, this gets out of order with the probability 0.05. Find the probability that testing 100 identical devices, there will be just 5, which get out of order. Solution Let Y100 stands for a random variable giving a number of defected devices in a sample of 100 inde- 10 pendent tests, Y100 Bi(100; 0, 05). First we have to validate the assumptions of using Poisson approximation.: 100 30 0.05 0, 1. Poisson approximation of wanted probability: P(Y100 = 5) (1000,5)5 5! e-1000,05 , which is a Poisson probability function with parameter = 1000, 05 in a point 5 and it is tabulated. Thus p5(5) = 0, 17547. Exact calculation of wanted probability: P(Y100 = 5) = 100 5 0, 055 (1 - 0, 05)95 = . . . = 0, 18 11