Part VI
Basic techniques II: tail probabilities inequalities
Chapter 6. BASIC TECHNIQUES: CONCENTRATION BOUNDS
Some general, but quite sharp, concentration bounds are derived in this chapter and their
use is illustrated. For example, we derive so called tail probability bounds - bounds on
probability that values of a random variable diﬀer much from its mean.
At ﬁrst will detrmine bounds of the random variable
X =
n
i=1
Xi ,
where all Xi are binary random variables with Bernoulli distribution. That is, Xi can be
seen as a coin tossing with Pr [Xi = 1] = pi and Pr [Xi = 0] = 1 − pi . Such coin tosses
are referred to also as Poisson trials and as Bernoulli trials if all pi are identical.
(Observe that as a special case p1 = p2 = ... = pn = p we have a random variable X with
the binomial distribution.)
At the end we will deal with special sequences of dependent random variables called
martingales and also tail bounds for martingales, what will then be applied also to the
occupancy problem.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 2/67
BASIC PROBLEM and METHODS - I.
If we want to get tight bounds on how values of a random variable X diﬀer much
from its mean, a useful trick is to pick some non-negative function f (X) such that
(a) we can calculate E[f (X)], and (b) f grows so slow enough that only large values
of X produce huge values of f (X).
This way we can get good probability bounds, by applying Markov inequality to
f (X), on huge diﬀerences of X from its mean.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 3/67
BASIC PROBLEM and METHODS - II.
The above approach is often used to show that X lies close to E[X] with reasonably
high probability.
Of large importance is the case X is the sum of random variables. For the case that
these random variaables are independent we derive so called Chernoﬀ bound.
For the case that they are dependent but form so called martingale we get so called
Azuma-Hoeﬀding bound
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 4/67
Basic problem of the analysis of randomized algorithms
What is the probability of the deviation of X = n
i=1 Xi from its mean
EX = µ =
n
i=1
pi
by a ﬁxed factor?
Namely, let δ > 0. (1) what is the probability that X is larger than (1 + δ)µ ?
(2) What is the probability that X is smaller than (1 − δ)µ?
Notation: For a random variable X, let E etX
, t > 0 ﬁxed, be called the
moment generating function of X.
E etX
=
k≥0
tk E Xk
k!
Very important Chernoﬀ bounds on the sum of independent Poisson trials are
obtained when the moment generating functions of X are considered.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 5/67
CHERNOFF BOUNDS - I
Theorem: Let X1, X2, .., Xn be independent Poisson trials such that, for 1 ≤ i ≤ n,
Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n
i=1 Xi , µ = E [X] = n
i=1 pi , and
any δ > 0
Pr [X > (1 + δ) µ] <
eδ
(1 + δ)(1+δ)
µ
(1)
Proof: For any t ∈ R>0
Pr [X > (1 + δ) µ] = Pr etX
> et(1+δ)µ
By applying Markov inequality to the right-hand side we get
Pr [X > (1 + δ) µ] <
E etX
et(1+δ)µ
(inequality is strict).
Observe that:
E etX
= E et n
i=1 Xi
= E
n
i=1
etXi
=
n
i=1
E etXi
,
Pr [X > (1 + δ) µ] <
n
i=1 E etXi
et(1+δ)µ
.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 6/67
CHERNOFF BOUNDS - II.
Since E etXi = pi et
+ (1 − pi ), we have:
Pr [X > (1 + δ) µ] <
n
i=1 pi et
+ 1 − pi
et(1+δ)µ
=
n
i=1 1 + pi et
− 1
et(1+δ)µ
.
By taking the inequality 1 + x < ex
, with x = pi et
− 1 ,
Pr [X > (1 + δ) µ] <
n
i=1 epi (et
−1)
et(1+δ)µ
=
e
n
i=1 pi (et
−1)
et(1+δ)µ
=
e(et
−1)µ
et(1+δ)µ
.
Taking t = ln (1 + δ) we get our Theorem (and basic Chernoﬀ bound), that is:
Pr [X > (1 + δ) µ] <
eδ
(1 + δ)(1+δ)
µ
(2)
Observe three tricks that have been used in the above proof!
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 7/67
COROLLARIES
From the above Chernoﬀ bound the following corollaries can be derived
Corollary: Let X1, X2, .., Xn be independent Poisson trials such that, for 1 ≤ i ≤ n,
Pr [Xi = 1] = pi , where 0 < pi < 1. Then for
X =
n
i=1
Xi and µ = E [X] =
n
i=1
pi ,
it holds
1 For 0 < δ < 1.81
Pr(X > (1 + δ)µ) ≤ e−µδ2
/3
2 For 0 ≤ δ ≤ 4.11 Pr[X ≥ (1 + δ)µ] ≤ e−µδ2
/4
3 For R ≥ 6µ
Pr(X ≥ R) ≤ 2−R
(3)
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 8/67
EXAMPLE I - SOCCER GAMES OUTCOMES
Notation: F+
(µ, δ) = eδ
(1+δ)(1+δ)
µ
– the right-hand side of inequality (1) from the
previous slide.
Example: A soccer team STARS wins each game with probability 1
3
. Assuming that
outcomes of diﬀerent games are independent we derive an upper bound on the
probability that STARS win more than half out of n games.
Let Xi =
1, if STARS win i−th game
0, otherwise.
Let Yn = n
i=1 Xi
By applying the last theorem we get for µ = n
3
and δ = 1
2
,
Pr Yn >
n
2
< F+ n
3
,
1
2
< (0.915)n
−exponentially small in n
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 9/67
SECOND TYPE of CHERNOFF BOUNDS
Previous theorem puts an upper bound on deviations of X = Xi
above its expectations µ, i.e. for
Pr [X > (1 + δ) µ] .
Next theorem puts a lower bound on deviations of X = Xi below its
expectations µ, i.e. for
Pr [X < (1 − δ) µ] .
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 10/67
Theorem: Let X1,X2,..,Xn be independent Poisson trials such that, for 1 ≤ i ≤ n,
Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n
i=1 Xi , µ = E [X] = n
i=1 pi ,
and for 0 < δ ≤ 1
Pr [X < (1 − δ) µ] < e−µ δ2
2
Proof: Pr [X < (1 − δ) µ] = Pr [−X > − (1 − δ) µ] = Pr e−tX
> e−t(1−δ)µ
for any
positive real t.
By applying Markov inequality
Pr [X < (1 − δ) µ] <
E e−tX
e−t(1−δ)µ
=
n
i=1 E e−tXi
e−t(1−δ)µ
<
n
i=1 pi e−t
+ 1 − pi
e−t(1−δ)µ
=
n
i=1 1 + pi e−t
− 1
e−t(1−δ)µ
.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 11/67
By applying the inequality 1 + x < ex
we get
Pr [X < (1 − δ) µ] <
e
n
i=1 pi (e−t
−1)
e−t(1−δ)µ
=
e(e−t
−1)µ
e−t(1−δ)µ
and if we take t = ln 1
1−δ
, then
Pr [X < (1 − δ) µ] <
e−δ
(1 − δ)1−δ
µ
(4)
and then we have
Pr [X < (1 − δ) µ] < e−µ δ2
2
From 3 and 4 it follows
Corollary: For 0 < δ < 1
Pr(|X − µ| ≥ δµ) ≤ 2e−µδ2
/3
(5)
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 12/67
EXAMPLE - COIN TOSSING
Let X be a number of heads in a sequence of n independent fair coin
ﬂips. An application of the bound (7) gives, for µ = n/2 and
δ = 6 ln n
n
Pr X −
n
2
≥
1
2
√
6n ln n ≤ 2e−1
3
n
2
6 ln n
n =
2
n
This implies that concentration of the number of heads around the
mean n
2
is very tight.
Indeed, the deviations from the mean are on the order of O(
√
n ln n).
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 13/67
CHEBYSHEV versus CHERNOFF
Let X be again the number of heads in a sequence of n independent fair coin ﬂips.
Let us consider probability of having either more than 3n/4 or fewer than n/4 heads in a
sequence of n independent fair coin-ﬂips.
Chebyshev’s inequality gives us
Pr X −
n
2
≥
n
4
≤
4
n
On the other side, using Chernoﬀ bound we have
Pr X −
n
2
≥
n
4
≤ 2e− 1
3
n
2
1
4 ≤ 2e−n/24
.
Chernoﬀ’s method therefore gives an exponentially smaller upper bound than the upper
bound obtained using Chebyshev’s inequality.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 14/67
SOCCER GAMES REVISITED
Notation: [For the lower tail bound function]
F−
(µ, δ) = e
−µδ2
2 .
Example: Assume that the probability that STAR team wins the game is
3
4. What is the probability that in n games STAR lose more than n
2 games?
In such a case µ = 0.75n, δ = 1
3 and for Yn = n
i=1 Xi we have
Pr Yn <
n
2
< F−
0.75n,
1
3
< (0.9592)n
and therefore the probability decreases exponentially fast in n.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 15/67
TWO SIDED BOUNDS
By combining two previous bounds we get
Pr[|X − µ| ≥ δµ] ≤ 2e−µδ2
/3
and if we want that this bound is less than an ε, then we
get
Pr |X − µ| ≥ 3µ ln(2/ε) ≤ ε
provided ε ≥ 2e−µδ2
/3
.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 16/67
Proof
If ε = 2e−µδ2
/3
, then
3µ ln(2/ε) = 3µ ln(eµδ2/3)
= 3µ · µδ2/3
= µ2δ2
= µδ
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 17/67
NEW QUESTION
New question: Given ε, how large has δ be in order
Pr [X > (1 + δ) µ] < ε?
In order to deal with such and related questions, the following
deﬁnitions/notations are introduced.
Df.: ∆+ (µ, ε) is a number such that F+ (µ, ∆+ (µ, ε)) = ε.
∆− (µ, ε) is a number such that F− (µ, ∆− (µ, ε)) = ε.
In other words, a deviation of δ = ∆+ (µ, ε) suﬃces to keep
Pr [X > (1 + δ) µ] bellow ε (irrespective of the values of n and pi ’s).
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 18/67
EXAMPLE and ESTIMATIONS
There is a way to derive ∆−
(µ, ε) explicitly.
Indeed, by taking the inequality
Pr [X < (1 − δ) µ] < e− µδ2
2
and setting e− µδ2
2 = ε we get
∆−
(µ, ε) =
2 ln 1
ε
µ
. (6)
because ∆−
(µ, ε) is a number such that F−
µ, ∆−
(µ, ε) = ε.
Example: Let pi = 0.75. How large must δ be so that Pr [X < (1 − δ) µ] < n−5
?
From (2) it follows:
δ = ∆−
0.75n, n−5
=
10 ln n
0.75n
=
13.3 ln n
n
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 19/67
SOME OTHER USEFUL ESTIMATIONS
1 F+
(µ, δ) < [e/(1 + δ)](1+δ)µ
.
2 If δ > 2e − 1, then F+
(µ, δ) < 2−(1+δ)µ
,
3 ∆+
(µ, ε) <
lg 1
ε
µ − 1.
4 If δ ≤ 2e − 1, then F+
(µ, δ) < e−µδ2
4 and
∆+
(µ, ε) <
4 ln 1
ε
µ .
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 20/67
SUMMARY of NOTATION
Let us summarize basic relations concerning values:
F+
(µ, δ) =
eδ
(1 + δ)(1+δ)
µ
and F−
(µ, δ) = e
−µδ2
2
as well as
∆+
(µ, ε) and ∆−
(µ, ε).
It holds
Pr[X > (1 + δ)µ] < F+
(µ, δ) and Pr[X < (1 − δ)µ] < F−
(µ, δ)
and
Pr(X > (1 + ∆+
(µ, ε)µ) < F+
(µ, ∆+
(µ, ε)) = ε
Pr(X < (1 − ∆−
(µ, ε)µ) < F−
(µ, ∆−
(µ, ε)) = ε
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 21/67
EXAMPLE 2 - MONTE CARLO METHOD - I
In this example we illustrate how Chernoﬀ bound help us to show that a simple Monte
Carlo algorithm can be used to approximate number π through sampling.
The term Monte Carlo method refers to a broad collection of tools for estimating various
values through sampling and simulation.
Monte Carlo methods are used extensively in all areas of physical sciences and
technologies.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 22/67
MONTE CARLO ESTIMATION OF π - I.
Let Z = (X, Y ) be a point chosen randomly in a 2 × 2 square centered in (0, 0).
This is equivalent to choosing X and Y randomly from interval [−1, 1].
Let Z be considered as a random variable that has value 1 (0) if the point (X, Y )
lies in the circle of radius 1 centered in the point (0, 0).
Clearly
Pr(Z = 1) =
π
4
If we perform such an experiment m times and Zi be the value of Z at the ith run,
and W = m
i=1 Zi , then
E[W ] = E
m
i=1
Zi =
m
i=1
E[Zi ] =
mπ
4
and therefore W = (4/m)W is a natural estimation for π.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 23/67
MONTE CARLO ESTIMATION OF π - II.
How good is this estimation? An application of second Chernoﬀ bound
gives
Pr(|W − π| ≥ επ) = Pr W −
mπ
4
≥
εmπ
4
= Pr([W − E[W ]) ≥ εE[W ])
≤ 2e−mπε2/12
because E(W ) = mπ
4 and for 0 < δ < 1
Pr(|X − µ| ≥ δµ) ≤ 2e−µδ2/3
(7)
Therefore, by taking m suﬃciently large we can get an arbitrarily good
approximation of π
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 24/67
A CASE STUDY - routing on hypercubes
Networks are modeled by graphs. Processors by nodes and Communication links are
represented by edges.
Principle of synchronous communication. Each link can carry one packet (i, X, d(i))
where i is a source node, X are data and d(i) is destination node.
Permutation routing on an n-processor network
Nodes 1, 2, ..., n
The node i wants to send a packet to the node d(i)
d(1), d(2), ..., d(n) is a permutation of 1, 2, ..., n.
Problem: How many steps are necessary and suﬃcient to route an arbitrary permutation?
A route for a packet is a sequence of edges the packet has to follow from its source to its
destination.
A routing algorithm for the permutation routing problem has to specify a route for each
packet.
A packet may occasionally have to wait at a node because the next edge on its route is
”busy”, transmitting another packet at that moment.
We assume each node contains one queue for each edge. A routing algorithm must
therefore specify also a queueing discipline.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 25/67
OBLIVIOUS ROUTING ALGORITHMS
are such routing algorithms that the route followed by a packet from a source node i to a
destination d(i) depends on i and d(i) only (and not on other d(j), for j = i).
The following theorem gives a limit on the performance of oblivious algorithms.
Theorem: For any deterministic oblivious permutation routing algorithm on a network of
n nodes each of the out-degree d, there is an instance of the permutation routing
requiring Ω n
d
steps.
Example:
Consider any d-dimensional hypercube Hd and the left-to-right routing.
Any packet with the destination node d(i) is sent from any current node ni to the node
nj such that binary representation of nj diﬀers from the binary representation of ni in the
leftmost bit in which ni and d(i) diﬀer.
Example Consider the permutation routing in H10 given by the “reverse” mapping
b1...b10 → b10...b1
Observe that if the left-to-right routing strategy is used, then all messages from nodes
b1b2b3b4b500000 have to go through the node 0000000000.
Left-to-right routing on hypercube Hd requires sometimes Ω 2d
d
steps.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 26/67
RANDOMIZED ROUTING
We show now a randomized (oblivious) routing algorithm with expected number of
steps smaller, asymptotically, than 2d
d
. Notation : N = 2d
Phase 1: Pick a random intermediate destination σ(i) from {1, ..., N}. Let the
packet vi to travel ﬁrst to the node σ(i). Phase 2: Let the packet vi to travel next
from σ(i) to its ﬁnal destination D(i).
(In both phases the deterministic left-to-right oblivious routing is used.)
Queueing discipline: FIFO for each edge.
(Actually any queueing discipline is good provided at each step there is a packet ready to
travel.)
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 27/67
Question: How many steps are needed before a packet vi reaches its destination? (Let
us consider at ﬁrst only the Phase 1).
Let ρi denote the route for a packet vi . It clearly holds:
The number of steps taken by vi is equal to the length of ρi , which is at most d, plus the
number of steps for which vi is queued at intermediate nodes of ρi .
Fact: For any two packets vi , vj there is at most one queue q such that vi and vj are in
the queue q at the same time.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 28/67
Lemma: Let the route of a packet vi follow the sequence of edges ρi =
(e1, e2, ..., ek ). Let S be the set of packets (other than vi ), whose routes pass
through at least one of the edges {e1, ..., ek }. Then the delay the packet vi makes
is at most |S|.
Proof: A packet in S is said to leave ρi at that time step at which it traverses an edge of
ρi for the last time.
If a packet is ready to follow an edge ej at time t we deﬁne its lag at time t to be t − j.
Clearly, the lag of a packet vi is initially 0, and the total delay of vi is its lag when it
traverses the last edge ek of the route ρi .
We show now that at each step at which the lag of vi increases by 1, the lag can be
charged to a distinct member of S.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 29/67
If the lag of vi reaches a number l + 1, some packet in S leaves ρi with lag l. (When the
lag of vi increases from l to l + 1, there must be at least one packet (from S) that wishes
to traverse the same edge as vi at that time step.) Thus, S contains at least one packet
whose lag is l.
Let t be the last step any packet in S has the lag l. Thus there is a packet v ∈ S ready
to follow an edge ej , at t = l + j . We show that some packet of S leaves ρi at t . This
establish Lemma by the Fact from the slide before the previous slide.
Since v is ready to follow ej at t , some packet ω (which may be v itself) in S follow
edge ej , at t . Now ω has to leave ρi at t . We charge to ω the increase in the lag of vi
from l to l + 1; since ω leaves ρi it will never be charged again.
Thus, each member of S whose route intersects ρi is charged for at most one delay, what
proves the lemma.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 30/67
PROOF CONTINUATION - I.
Let Hij be the random variable deﬁned as follows
Hij =
1 if routes ρi and ρj share at least one edge
0 otherwise
The total delay a packet vi makes is at most N
j=1 Hij .
Since the routes of diﬀerent packets are chosen independently and randomly, the Hij ’s are
independent Poisson trials for j = i.
Thus, to bound the delay of the packet vi from above, using the Chernoﬀ bound, it
suﬃces to obtain an upper bound on N
j=1 Hij . At ﬁrst we ﬁnd a bound for
E N
j=1 Hij .
For an edge e of the hypercube let the random variable T(e) denote the number of
routes that pass through e.
Fix any route ρi = (ei,1, ei,2, ..., ei,k ) , k ≤ d. Then
N
j=1
Hij ≤
k
j=1
T(ei,j ) ⇒ E
N
j=1
Hij ≤
k
j=1
E [T(ei,j )]
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 31/67
PROOF CONTINUATION - II.
It can be shown that E [T(ei,j )] = E [T(ei,m)] for any two edges.
The expected length of ρi is d
2
. An expectation of the total route length, summed over
all packets, is therefore Nd
2
. The number of edges in the hypercube is Nd and therefore
⇒ E [T(eij )] ≤ Nd/2
Nd
= 1
2
for any i, j.) Therefore
E
N
j=1
Hij ≤
k
2
≤
d
2
.
By the Chernoﬀ bound (for δ > 2e − 1), see page 7,
Pr [X > (1 + δ)µ] < 2−(1+δ)µ
with X = N
j=1 Hij , δ = 11, µ = d
2
, we get that probability that N
j=1 Hij exceeds 6d is
less than 2−6d
.
The total number of packets is N = 2d
.
The probability that any of the N packets experiences a delay exceeding 6d is less than
2d
× 2−6d
= 2−5d
.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 32/67
PROOF CONTINUATION - III.
By adding the length of the route to the delay we get:
Theorem: With probability at least 1 − 2−5d
every packet reaches its intermediate
destination in Phase 1 in 7d or fewer steps.
The routing scheme for Phase 2 can be seen as the scheme for Phase 1, which ”runs
backwards”. Therefore the probability that any packet fails to reach its target in either
phase is less than 2 · 2−5d
. To summarize:
Theorem: With probability at least 1 − 1
25d every packet reaches its destination in 14d or
fewer steps.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 33/67
WIRING PROBLEM - I.
Global wiring in gate arrays
Gate-array:is
√
n ×
√
n array of n gates.
Net - is a pair of gates to be connected
by a wire.
Manhattan wiring - wires can run
vertically and horizontally only.
1r
r1
2r
r2
3r
r3
4r
r4
Global wiring problem I (GWPW ): given a set of nets and an integer W we need to
specify, if possible, the set of gates each wire should pass through in connecting its
end-points, in such a way that no more than W nets pass through any boundary.
Global wiring problem II: For a boundary b between two gates in the array, let WS (b)
be the number of wires that pass through b in a solution S to the global wiring problem.
Notation: WS = max
b
WS (b)
Goal: To ﬁnd S such that WS is minimal.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 34/67
WIRING PROBLEM - II.
We will consider only so called one-bend Manhattan routing at
which direction is changed at most once.
Problem; how to decide for each net which of the following
connections to use:
p
p
p
p
in order to get wiring S with minimal WS .
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 35/67
REFORMULATION of the WIRING PROBLEM
Global wiring problem can be reformulated as a 0-1 linear programming problem.
For the net i we use two binary variables xi0, xi1
xi0 = 1 ⇔ i-th route has the form
p
p
xi1 = 1 ⇔ i-th route has the form
p p
Notation:
Tb0 = { i | net i passes through b and xi0 = 1 }
and
Tb1 = { i | net i passes through b and xi1 = 1 } .
The corresponding 0-1 linear programming problem
minimize W ,
where xi0, xi1 ∈ {0, 1} for each net i (3)
xi0 + xi1 = 1 for each net i (4)
i∈Tb0
xi0 +
i∈Tb1
xi1 ≤ W for all b. (5)
Last condition requires that at most W
wires cross the boundary b
Denote W0 the minimum obtained this way.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 36/67
TRICK - I.
1. Solve the corresponding rational linear programming problem with
xi0, xi1 ∈ [0, 1]
instead of (3).
This trick is called linear relaxation.
Denote xi0, xi1 solutions of the above rational linear programming problem, 1 ≤ i ≤ n,
and let W be the value of the objective function for this solution. Obviously,
W0 ≥ W .
2. Apply the technique called randomized rounding.
Independently for each i, set xi0 to 1 with probability xi0
to 0 ” xi1
and set xi1 to 0 ” xi0
to 1 ” xi1
The idea of randomized rounding is to interpret the fractional solutions provided by the
linear program as probabilities for the rounding process.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 37/67
TRICK - II.
A nice property of randomized rounding is that if the fractional value of a variable is
close to 0 (or to 1), then this variable is likely to be set to 0 (or 1).
Theorem: If 0 < ε < 1, then with probability 1 − ε the global wiring S produced
by randomized rounding satisﬁes the inequalities:
WS ≤ W 1 + ∆+
W ,
ε
2n
≤ W0 1 + ∆+
W0,
ε
2n
Proof: We show that following the rounding process, with probability at least 1 − ε, no
more than W 1 + ∆+
W , ε
2n
wires pass through any boundary.
This will be done by showing, for any boundary b, that the probability that
WS (b) > W 1 + ∆+
W , ε
2n
is at most ε
2n
.
Since a
√
n ×
√
n array has at most 2n boundaries, one has to sum the above probability
of failure over all boundaries b to get an upper bound of ε on the failure probability.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 38/67
TRICK - III.
Let b be a boundary. The solution of the rational linear program satisfy its constrains,
therefore we have
i∈Tb0
xi0 +
i∈Tb1
xi1 ≤ W .
The number of wires passing through b in the solution S is
WS (b) =
i∈Tb0
xi0 +
i∈Tb1
xi1.
xi0 and xi1 are Poisson trials with probabilities
xi0 and xi1
In addition, xi0 and xi1 are each independent of xj0 and xj1 for i = j.
Therefore WS (b) is the sum of independent Poisson trials.
E[WS (b)] =
i∈Tbo
E [xi0] +
i∈Tb1
E [xi1] =
i∈Tb0
xi0 +
i∈Tb1
xi1 ≤ W
Since ∆+
W , ε
2n
is such that
Pr WS (b) > W 1 + ∆+
W ,
ε
2n
≤
ε
2n
the theorem follows.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 39/67
HOEFFDING INEQUALITY
The problem with Chernoﬀ bounds is that they work only for 0-1 random variables.
Hoeﬀding inequality is another concentration bound based on the moment generating
functions that applies to any sum of independent random variables with mean 0.
Theorem Let X1 . . . , Xn be independent random variables with E[Xi ] = 0 and |Xi | ≤ ci
for all i. Then for all t,
Pr
n
i=1
Xi ≥ t ≤ e
− t2
2 n
i=1
c2
i
In the case xi are dependent, but form so called martingale Hoeﬀdng inequality can be
generalized and we get so called Azuma-Hoeﬀding inequality.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 40/67
MARTINGALES
MARTINGLES
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 41/67
MARTINGALES
Martingales are very special sequences of random variables that arise in numerous
applications, such as gambling problems or random walks.
These sequences have various interesting properties and for them powerful techniques
exist to derive special Chernoﬀ-like tail bounds.
Martingales can be very useful in showing that values of a random variable V are sharply
concentrated around its expectation E[V ].
Martingales originally referred to systems of betting in which a player increases his stake
(usually by doubling) each time he lost a bet.
For analysis of randomized algorithms of large importance is that, as a general rule of
thumb says, most things that work for sums of independent random variables work also
for martingales.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 42/67
MARTINGALES - MAIN DEFINITION
Deﬁnition A sequence of random variables Z0, Z1, . . . is a
martingale (mrtngl) with respect to a sequence X0, X1, . . ., if,
for all n ≥ 0, the following conditions hold:
Zn is a function of X0, X1, . . . , Xn
E[|Zn|] < ∞;
E[Zn+1|X0, . . . , Xn] = Zn;
A sequence of rand. variab. Z0, Z1, . . . is called martingale if it is
mrtngl with respect to itself. That is E[|Zn|] < ∞ and
E[Zn+1|Z0, . . . , Zn] = Zn
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 43/67
EXAMPLE
Let us have a gambler who plays a sequence of fair games.
Let Xi be the amount the gambler wins in the ith game.
Let Zi be the gambler’s total winnings at the end of the ith game.
Because each game is fair we have E[Xi ] = 0
E[Zi+1|X1, X2, . . . , Xi ] = Zi + E[Xi+1] = Zi
Thus Z1, Z2, . . . , Zn is martingale with respect to the sequence
X1, X2, . . . , Xn
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 44/67
DOOB MARTINGALES
A Doob martingale is a martingale constructed using the following general
scheme:
Let X0, X1, . . . , Xn be a sequence of random variables, and let Y be
another random variable with E[|Y |] < ∞. Then the sequence of
Zi = E[Y | X0, . . . , Xi ], i = 1, . . . , n
is a martingale with respect to X0, X1, . . . , Xn, since
E[Zi+1 | X0, . . . , Xi ] = E[E[Y | X0, . . . , Xi+1] | X0, . . . , Xi ]
= E[Y | X0, . . . , Xi ] = Zi
Here we have used the fact that E[V | W ] = E[E[V | U, W ] | W ] for any
three random variales U, V , W .
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 45/67
REMAINDER - CONDITIONAL EXPECTATION
Deﬁnition It is natural and useful to deﬁne conditional expectation of a random variable
Y conditioned on an event E by
E[Y |E] = yPr(Y = y|E).
ExampleLet we roll independently two perfect dice and let Xi be the number that shows
on the ith dice and let X be sum of numbers on both dice.
E[X|X1 = 3] =
x
xPr(X = x|X1 = 3) =
9
x=4
x
1
6
=
13
2
E[X1|X = 5] =
4
x=1
xPr(X1 = x|X = 5) =
4
x=1
x
Pr(X1 = x ∩ X = 5)
Pr(X = 5)
=
5
2
Deﬁnition: For two random variables Y and Z, E[Y |Z] is deﬁned to be a random
variable f (Z) that takes on the value E[Y |Z = z] when Z = z. Theorem For any
random variables Y , Z it holds
E[Y ] = E[E[Y |Z]].
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 46/67
A USEFUL FACT
For random variables X, Y it holds
E[E[X, Y ]] = E[X]
In words: what you expect to expect X to be after learning Y is same as what you now
expect X to be.
Proof:
E[X, Y = y] =
x
xPr[X = x, Y = y] =
x
x
Pr[x, y]
PrY [y]
and therefore
E[E[X|Y = y]] =
y
PrY [y]
x
x
Pr[x, y]
PrY [y]
=
x y
xPr[x, y] = E[X]
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 47/67
EXAMPLE - ”EDGE EXPOSURE”-MARTINGALE
Let Gn,p be a random graph, let its m possible edges be labeled in some arbitrary order,
and let
Xj =
1 if there is an edge in Gn,p in the jth edge slot
0 otherwise
Consider now any ﬁnite-valued function F over graphs. For example, let F(G) be the size
of the largest independent set in G. let Z0 = E[F(G)]] and
Zi = E[F(G) | X1, . . . , Xi ], i = 1, . . . .m
then the sequence Z0, Z1, . . . , Zm is a Doob martingale and represents the conditional
expectation of F(G) as it is revealed when each edge is in the graph, one edge at a time.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 48/67
STOPPING TIME
A stopping time corresponds to a strategy to stop a sequence of steps (say of gamblings)
that is based only on the outcomes seen so far.
Examples of rules when a decision to stop gambling is a stopping time:
1 First time the gambler wins 5 games in a row;
2 First time the gambler either wins or looses 1000 dolars;
3 First time the gambler wins 4 times in a row.
The rule ”Last time the gambler wins 4 times in a row” is not a stopping time.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 49/67
MARTINGALE STOPPING THEOREM
Theorem: If Z0, Z1, . . . , is a martingale with respect to X1, X2, . . . and if T
is a stopping time for X1, X2, . . ., then
E[ZT ] = E[Z0]
whenever one of the following conditions holds:
there is a constant c such that, for all i, |Zi | ≤ c - that is Zi are
bounded;
T is bounded;
E[T] < ∞ and there is a constant c such that
E[|Zi+1 − Zi | | X1, . . . , Xi ] < c;
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 50/67
EXAMPLE - GAMBLER’s PROBLEM
Consider a sequence of independent fair games, where in each round
each player either wins or looses one euro with probability 1
2.
Let Z0 = 0, let Xi be the amount won at the ith game and let Zi be
the total amount won after i games.
Let us assume that the player quits the game when he either looses l1
euros or wins l2 euros (for given l1, l2).
What is the probability that the player wins l2 euro before losing l1
euro?
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 51/67
GAMBLER’s PROBLEM - ANSWER
Let T be the time when the gambler for the ﬁrst time either won l2 or lost l1 euro.
T is stopping time for the sequence X1, X2, . . ..
Sequence Z0, Z1, . . . is martingale. Since values of Zi are bounded, the martingale
stopping theorem can be applied. Therefore, we have:
E[ZT ] = 0
Let now p be probability that the gambler quits playing after winning l2 euros. Then
E
¯
[ZT ] = l2p − l1(1 − p) = 0
and therefore
p =
l1
l1 + l2
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 52/67
ELECTION PROBLEM
Suppose candidates A and B run for elections and at the end A gets vA votes and B
gets vB votes and vB < vA.
Let us assume that votes are counted at random. What is the probability that the
candidate A will be always ahead during the counting process?
Let n = vA + vB and let Sk be the number of votes by which A is leading after k
votes were counted. Clearly Sn = vA − vB .
For 0 ≤ k ≤ n − 1 we deﬁne
Xk =
Sn−k
n − k
It can be shown, after some calculations, that the sequence X0, X1, . . . , Xn forms a
martingale.
Note that the sequence X0, X1, . . . , Xn relates to the counting process in a backward
order - X0 is a function of Sn,....
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 53/67
ELECTION PROBLEM - RESULT
Let T be the minimum k such that Xk = 0 if such a k exists, and T = n − 1
otherwise.
T is a bounded stopping time and therefore the martingale stopping theorem gives
E[XT ] = E[X0] =
E[Sn]
n
=
vA − vB
vA + vB
Case 1 Candidate A leads through the count. In such a case all Sn−k and therefore
all Xk are positive, T = n − 1 and XT = Xn−1 = S1 = 1.
Case 2. Candidate A does not lead through the count. For some k < n − 1 Xk = 0.
If candidate B ever leads it has to be a k where Sk = Xk = 0. In this case
T == k < n − 1 and XT = 0..
We have therefore
E[XT ] =
vA − vB
vA + vB
= 1 · Pr(Case 1) + 0 · Pr(Case 2)
Therefore the probability of Case 1, in which candidate A leads through the account,
is
vA − vB
vA + vB
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 54/67
AZUMA-HOEFFDING INEQUALITY
Perhaps the main importance of the martingale concept
for the analysis of randomized algorithms is due to various
special Chernoﬀ-type inequalities that can be applied even
in case random variables are not independent.
Theorem Let X0, X1, . . . , Xn be a martingale such that for
any k
|Xk − Xk−1| ≤ ck.
for some ck.
Then, for all t ≥ 0 and any λ > 0
Pr(|Xt − X0| ≥ λ) ≤ 2e−λ2
/(2
t
i=1 c2
i )
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 55/67
EXAMPLE - PATTERN MATCHING - I.
Let S = (s1, . . . , sn) be a string of symbols chosen randomly from an s-nary alphabet
Σ. Let P = (p1, . . . , pk ) be a string (pattern) of k characters from Σ.
Let FP,S be the number of occurrences of a ﬁxed pattern P of length k in a random
string S of length n. Clearly
E[FP,S ] = (n − k + 1)
1
s
k
We use now a Doob martingale and Azuma-Hoeﬀding inequality to show that, if k is
relatively small with respect to n, then the number of occurrences of the pattern P
in S is highly concentrated around its mean.
Let Z0 = E[FP,S ] and, for 1 ≤ i ≤ n let
Zi = E[FP,S | s1, . . . , si ].
The sequence Z0, . . . , Zn is Doob martingale, and Zn = FP,S .
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 56/67
EXAMPLE - PATTERN MATCHING - II.
Since each character in the pattern P can participate in no more than k possible
matches, for any 0 ≤ i ≤ n we have
|Zi+1 − Zi | ≤ k.
In other word, the value of si+1 can aﬀect the value of F by at most k. Hence
|E[FP,S | s1, . . . , si+1] − E[FP,S | s1, . . . , si ]| = |Zi+1 − Zi | ≤ k.
By Azuma-Hoeﬀding inequality/theorem,
Pr(|FP,S − E[FP,S ]| ≥ ε) = Pr(|(Zn − Z0)| ≥ ε) ≤ 2e−ε2
/2nk2
.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 57/67
WAITING TIMES for PATTERNS PROBLEM
Problem Let us suppose we ﬂip coins until we see some pattern to appear. What is the
expected number of coin-ﬂips until this happens?
Example We ﬂip coins until we see HTHH.
Suppose that x1x2 . . . xn is the pattern we want to get.
Let us imagine we have an army of gamblers, and let one new shows up before each new
coin ﬂip.
Let each gambler start by borrowing 1$ and betting that the next coin-ﬂip will be x1. If
she wins, she takes her 2$ and bets 2$ that next coin-ﬂip will be x2, continuing to play
double-or-nothing until either she loses (and is out of her initial 1$) or wins her last bet
on xk (and is up 2k
− 1 dollars).
Because each gambler’s winnings form a martingale, so does their sum, and so the
expected total return of all gamblers up to the stopping time τ at which our pattern
occurs for the ﬁrst time is 0.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 58/67
The above facts will now be used to compute E[τ].
When we stop at time τ we have one gambler who has won 2k
− 1. Other gamblers may
still play.
For each i with x1 . . . xk = xk−i+1 . . . xk there will be a gambler with net winnings 2i
− 1.
All remaining gamblers will all be at −1.
Let χi = 1 if x1 . . . xi = xk−i+1 . . . xk , and 0 otherwise. Then, using the stopping time
theorem,
E[Xτ ] = E −τ +
k
i=1
χi 2i
= −E[τ] +
k
i=1
χi 2i
= 0
and therefore
E[τ] =
k
i=1
χi 2i
.
Examples: if pattern is HTHH (HHHH) [THHH], then E[τ] equals 18 (30) [16].
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 59/67
EXAMPLE - POLOYA’s URN SCHEME
Consider an urn that initially contains
b black balls,
w white balls.
Let a sequence of random selections from this urn be performed where at each step the
chosen ball is replaced by c balls of the same color.
If Xi denote the fraction of black balls in the urn after the i-th trial. Then the sequence
X0, X1, . . .
is a martingale.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 60/67
EXAMPLE - OCCUPANCY PROBLEM
Suppose that m balls are thrown randomly into n bins and let Z
denote the number of bins that remain empty at the end.
For 0 ≤ t ≤ m let Zt be the expectation at time t of the number of
bins that are empty at time m.
The sequence of random variables
Z0, Z1, . . . , Zm
is a martingale, Z0 = E[Z] and Zm = Z.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 61/67
SOME ESTIMATIONS
Kolmogorov-Doob inequality Let X0, X1, . . . be a martingale. Then for any λ > 0
Pr[ max
0≤i≤n
Xi ≥ λ] ≤
E[|Xn|]
λ
.
Azuma inequality Let X0, X1, . . . be a martingale sequence such that for each k
|Xk − Xk−1| ≤ ck ,
then for all t ≥ 0 and any λ > 0
Pr[|Xt − X0| ≥ λ] ≤ 2 exp −
λ2
2 t
k=1 c2
k
.
Corollary Let X0, X1, . . . be a martingale sequence such that for each k
|Xk − Xk−1| ≤ c
where c is independent of k. Then, for all t ≥ 0 and any λ > 0
Pr[|Xt − X0| ≥ λc
√
t] ≤ 2e−λ2
/2
,
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 62/67
OCCUPANCY PROBLEM REVISITED
Let us have m balls thrown randomly into n bins and let Z denote the
number of bins that remain empty.
Azuma inequality allows to show:
µ = E[Z] = n(1 −
1
n
)m
≈ ne−m/n
and for λ > 0
Pr[|Z − µ| ≥ λ] ≤ 2e
−
λ2(n−1/2)
n2−µ2
.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 63/67
APPENDIX
APPENDIX
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 64/67
EXERCISES
1 What is larger, eπ
or πe
, for the basis e of
natural logarithms
2 Hint 1: There exists one-line proof of the
correct relation.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 65/67
EXERCISES
1 What is larger, eπ
or πe
, for the basis e of
natural logarithms
2 Hint 1: There exists one-line proof of correct
relation.
3 Hint 2: Solution: use inequality ex
> 1 + x with
x = π/e − 1.
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 66/67
EXERCISES
1 What is larger, eπ
or πe
, for the basis e of natural
logarithmsa
2 Hint 1: There exists one-line proof of correct relation.
3 Hint 2: Use the inequality ex
> 1 + x with x = π/e − 1.
4 Solution:
eπ/e−1
> 1 + π/e − 1
implies:
eπ/e−1
> π/e ==> eπ/e
> π ==> eπ
> πe
prof. Jozef Gruska IV054 6. Basic techniques II: tail probabilities inequalities 67/67