2 Weak law of large numbers and central limit theorem
Usually, the data processed by statisticians are taken from a population. The portion drawn is called
a sample. The value of any population parameter is always constant, whereas sample parameters are
random variables varying from sample to sample.
If the sample is large enough to embrace the entire population, there is no difference between
the sample parameter and population parameter. In general, the larger the sample size, the more
probable it is that the sample parameter comes arbitrarily close to the population parameter. This
fact is often called the law of large numbers. It can be specified by Chebyshev theorem or Bernoulli's
theorem.
The central limit theorem states, that the random variable X, designed as a sum of large number of
mutually independent variables X1, X2, . . . , Xn, approaches (on general assumptions) the standard
normal distribution. The Moivre-Laplace theorem is the simplest specification of CLT. Generalizing
M-L theorem we obtain Lindberg-Levy theorem. The most generalized CLT was formulated by
Ljapunov; this will not be shown.
Before we list the mentioned theorems, we have to define the term convergence of the sequence of
random variables. In probability theory there exist several notions of convergence of random variables,
three of them we provide.
Definition 2.1
The random sequence (X1, X2, . . . , Xn, . . .) is said to converges toward random variable X
(i.) surely if:
   : lim
n
Xn() = X()
[This is a "common" convergence of the sequence of numbers]
(ii.) in probability, if:
 > 0 : lim
n
P(|Xn - X|  ) = 1
[As n increases, the larger differences between Xn and X are extremely unlikely. It is also often
denoted plim Xn = X]
(iii.) in distribution, if:
for the distribution functions F1(x1)  X1, . . . , Fn(xn)  Xn, . . ., eventually F(x)  X:
lim
n
Fn(x) = F(x) for any x such that F is continuous function
[This is a weakest form of convergence, it is defined by means of distribution functions]
Remark 2.2
The random sequence may converge to the constant c, what is included in previous definition. Consider
a random variable that attains the deterministic value c with probability one.
Theorem 2.3
1. Sure convergence of a random variable implies convergence in probability; convergence in probability
implies convergence in distribution. The converse implications do not hold in general.]
7
2. The random sequence (X1, X2, . . . , Xn, . . .) converges in probability toward constant  if two
following conditions hold:
lim
n
E(Xn) =   lim
n
D(Xn) = 0
Theorem2.4 Weak law of large numbers (Chebyshev theorem)
Let us consider the random sequence (X1, X2, . . . , Xn, . . .) of independent and identically-distributed
random variables with constant expected value  and constant variance 2
. Then the random
sequence of samle means (X1, 1
2
2
i=1
Xi, . . . , 1
n
n
i=1
Xi, . . .) converges in probability towards the expected
value . Thus for any  > 0 it holds:
P(|
1
n
n
i=1
Xi - | < )  1 -
2
n2
or
lim
n
P(|
1
n
n
i=1
Xi - | < ) = 1
[Simply: plim 1
n
n
i=1
Xi = . Interpreting this result, the weak law essentially states that for sufficiantly
large sample there will be extremely high probability, that the sample mean will be arbitrarily close
to the expected value (population mean). Thus for sufficiantly large n the expected value  may be
estimated by sample mean.]
Theorem2.5 Bernoulli (corollary of Chebyshev theorem)
Let the random variable Yn gives the number of successes in a sequence of n independent yes/no
experiments, each of which yields success with probability , 0 <  < 1. Then the random sequence
of relative frequencies (Y1, Y2
2
, . . . Yn
n
, . . .) converges in probability towards the probability of success
. Thus for any  > 0 it holds:
P(|
Yn
n
- | < )  1 (1
- )
n2
or
lim
n
P(|
Yn
n
- | < ) = 1
The collocation between Chebyshev theorem and Bernoulli theorem will be clear from the following
example.
Example 2.6
The probability, that the manufactured product is a waster, is 12
3000
. The production reviewing tested
3000 products. What is the probability that the difference between the relative frequence of wasters
and the probability of wasters is 0.01 at most?
Solution
Let Y3000 stands for a random variable giving the number of wasters (successes) in 3000 independent
experiments.
Then Y3000  Bi(3000, 12
3000
)
As n increases, the relative frequency of successes should be getting close to the probability of the
success. We have to find out the probability, that for n = 3000 the relative frequency of successes
8
will deviate from the probability of the success not more than 0.01. Thus in Bernoulli theorem we
will set up 0.01 as .
For any  > 0 it holds: P(|Yn
n
- | < )  1 - (1-)
n2 .
Thus P(|Y3000
3000
- 12
3000
| < 0, 01)  1 -
12
3000
(1- 12
3000
)
30000,012
.
= 0, 872
Using Chebyshev theorem Xi stands for random variable with Bernoulli distribution (zero-one or
alternative distribution) which takes value 1 in case of waster (success) and value 0 in case of goodquality
product (failure).
Thus Xi  A( 12
3000
), i = 1, . . . , 3000 E(Xi) = 12
3000
D(Xi) = 12
3000
(1 - 12
3000
); X1, . . . , X3000 are
independent. Then set up 0.01 as  and use Chebyshev theorem. (Consider that binomial random
variable is equal to the sum of independent, identically distributed random variables, all Bernoulli
distributed with success probability .)
Theorem2.7 central limit theorem (Lindberg-Lévy)
Let (X1, . . . , Xn, . . .) be a random sequence of n independent and identically distributed random
variables each having constant expected value  and constant variance 2
. Let us consider X =
n
i=1
Xi
and let us derive the expected value and the variance of the random variable X.
E(X) = E(
n
i=1
Xi) =
n
i=1
E(Xi) =
n
i=1
 = n
D(X) = D(
n
i=1
Xi) =
n
i=1
D(Xi) =
n
i=1
2
= n2
Here let us consider standardized sum Un = X-E(X)

D(X)
=
n
i=1
Xi-n


n
[n can arbitrarily increase]
Then the random sequence of standardized sums (U1, U2, . . . , Un, . . .) converges in distribution toward
random variable U  N(0, 1). Thus
u  R : limn
P(Un  u) =
u
-
1

2
e- t2
2 dt = (u),
where (u) is the cumulative distribution function of N(0, 1). We write for short Un  N(0, 1) and
Un is said to follow asymptotic standard normal distribution.
[Notice that Un =
1
n
n
i=1
Xi-

n
. So the central limit theorem states that as the sample size n increases, the
distribution of the sample average of random variables X1, . . . , Xn approaches the normal distribution
with a mean  and variance 2
n
irrespective of the shape of the original distribution.]
Theorem2.8 Moivre-Laplace (corollary of Lindberg-Lévy theorem)
Let Yn  Bi(n, ), n = 1, 2, . . .. Then E(Yn) = n D(Yn) = n(1 - ).
and Un = Yn-n
n(1-)
 N(0, 1)
[Interpreting this result, Moivre-Laplace theorem essentially states that if n is large enough, then
an excellent approximation to Bi(n, ) is given by the normal distribution N(n, n(1 - )). (Using
convergence in distribution.)]
Remark 2.9
The Moivre-Laplace theorem allows us to apply the approximate formula, which substitutes onerous
calculation of binomial distribution function by simple searching in tabs of standard normal distribution.
Compare
Exact calculation:
9
P(Yn  y) =
y
t=0
n
t
t
(1 - )n-t
... time-consuming onerous summation
Approximation by normal distribution:
P(Yn  y) = P Yn-n
n(1-)
 y-n
n(1-)
  y-n
n(1-)
 N(0, 1),
where (u) is tabulated standard normal distribution function.
The approximation may be used if following assumptions hold:
n(1 - ) > 9  1
n+1
<  < n
n+1
.
Example 2.10
Consider 100 independent tosses of a die. Find the probability that the number "six" does appear at
least 20 times.
Solution
Let Y100 stands for a random variable giving a number of obtained "sixes" in 100 tosses,
Y100  Bi(100, 1
6
).
First we have to validate the assumptions of using normal approximation.:
n(1 - ) = 100  1
6
(1 - 1
6
) = 500
36
> 9  1
101
< 1
6
< 100
101
, hence both assumptions hold.
So that to estimate the wanted probability we can use the Moivre-Laplace theorem.
P(Y100  20) = 1 - P(Y100 < 20) = P(Y100  19) = 1 - P Y100-100/6

1001/65/6
 19-100/6

1001/65/6
= 1 - P(Un 
0, 626)  1 - (0, 626) = 1 - 0, 73565 = 0, 2635.
(The exact software calculation is 0,2198.)
There are some cases, in which the approximation to binomial distribution by normal distribution is
not convenient. Especially in case of extremely small probability of success  the approximation by
Poisson distribution fits better.
Theorem2.11 Poisson
Let Y1, Y2, . . . be a random sequence of independent random variables, Yn  Bi(n, n), n = 1, 2, . . .
and lim
n
nn = . Then the sequence Y1, Y2, . . . Yn, . . . converges in distribution towards random
variable Y  Po(), thus Yn  Po().
[The random variable Y follows the Poisson distribution with parameter  = n, which can be used
as an approximation to the binomial distribution.]
Remark 2.12
The Poisson theorem allows us to apply the approximate formula, which substitutes onerous calculation
of binomial distribution (or probability) function by simple searching in tabs of Poisson
distribution (or probability) function.
ˇP(Yn  y) =
y
t=0
n
t
t
(1-)n-t
 Fn(y)  Po(n), where Fn(y) is a Poisson distribution function
with parameter  = n
ˇP(Yn = y) = n
y
y
(1 - )n-y
 (n)y
y!
e-n
(Compare with Poisson probability function.)
The approximation by Poisson distribution may be used if following assumptions hold:
n  30    0, 1.
Example 2.13
Testing a reliability of a particular device, this gets out of order with the probability 0.05. Find the
probability that testing 100 identical devices, there will be just 5, which get out of order.
Solution
Let Y100 stands for a random variable giving a number of defected devices in a sample of 100 inde-
10
pendent tests, Y100  Bi(100; 0, 05).
First we have to validate the assumptions of using Poisson approximation.:
100  30  0.05  0, 1.
Poisson approximation of wanted probability:
P(Y100 = 5)  (1000,5)5
5!
e-1000,05
, which is a Poisson probability function with parameter  = 1000, 05
in a point 5 and it is tabulated. Thus p5(5) = 0, 17547.
Exact calculation of wanted probability:
P(Y100 = 5) = 100
5
0, 055
(1 - 0, 05)95
= . . . = 0, 18
11