1
A review of basic probability theory
This is a book about the applications of probability. It is hoped to convey that this subject is both a fascinating and important one. The examples are drawn mainly from the biological sciences but some originate in the engineering, physical, social and statistical sciences. Furthermore, the techniques are not limited to any one area.
The reader is assumed to be familiar with the elements of probability or to be studying it concomitantly. In this chapter we will briefly review some of this basic material. This will establish notation and provide a convenient reference place for some formulas and theorems which are needed later at various points.
1.1 PROBABILITY AND RANDOM VARIABLES
When an experiment is performed whose outcome is uncertain, the collection of possible elementary outcomes is called a sample space, often denoted by Ct. Points in Q, denoted in the discrete case by (Oj, i = 1,2,... have an associated probability P{a>j}. This enables the probability of any subset A of Q, called an event, to be ascertained by finding the total probability associated with all the points in the given subset:
P{A} = £ Pw
cote A
We always have
0
0 takes on non-negative integer values and has the probability law
pk = Pr{X = k} =
e~xXk
k\
fc = 0,l,2,...
(1.2)
For any random variable the total probability mass is unity. Hence if pk is given by either (1.1) or (1.2),
where summation is over the possible values k as indicated.
Random variables 3
For any random variable X, the distribution function is
F(x) = Pr {X < x}, - oo < x < oo.
Continuous random variables take on a continuum of values. Usually the probability law of a continuous random variable can be expressed through its probability density function,/(x), which is the derivative of the distribution function. Thus
m=Txm
= lim = lim = lim
Ax->0
= lim
Ax-0
F(x + Ax) - F(x) Ax
PrfX^x + Axj-Prpf^x} Ax
Pr {x < X ^ x + Ax} Ax
Pr {Xe(x,x + Ax~]} Ax
(1.3)
The last two expressions in (1.3) often provide a convenient prescription for calculating probability density functions. Often the latter is abbreviated to p.d.f. but we will usually just say 'density'.
If the interval (x^Xj) is in the range of X then the probability that X takes values in this interval is obtained by integrating the probability density over (x1,x2).
Pr{x1 0.
The quantity T{p) is the gamma function defined as
r(p)= | xf-^-'dx, p>0.
When p = 1 the gamma density is that of an exponentially distributed random variable
f(x) = ke
x>0.
For continuous random variables the density must integrate to unity:
where the interval of integration is the whole range of values of X.
Conditional probability 5
1.2 MEAN AND VARIANCE
Let X be a discrete random variable with
Pi{X = xk} = pk, fc=l,2,.... The mean, average or expectation of X is
For a binomial random variable E(X) = np whereas a Poisson random variable has mean E{X) = L For a continuous random variable with density f(x),
If X is normal with density given by (1.4) then E(X) = p; a uniform (a,b) random variable has mean E{X) = %a + b); and a gamma variate has mean E(X) = pß.
The nth moment of X is the expected value of X":
E{X") = i
£ pkxk if X is discrete,
x"f(x)dx if X is continuous.
If n = 2 we obtain the second moment E(XZ). The variance, which measures the degree of dispersion of the probability mass of a random variable about its mean, is
Var{X) = El(X-E{X))2] = E{X2)-E2(X). The variances of the above-mentioned random variables are: binomial, npq; Poisson, X; normal, a1; uniform, ^(6 — a)1; gamma, pjk1. The square root of the variance is called the standard deviation.
1.3 CONDITIONAL PROBABILITY AND INDEPENDENCE
Let A and B be two random events. The conditional probability of A given B is, provided Pr {B} # 0,
6 Basic probability theory
where AB is the intersection of A and B, being the event that both A and B occur (sometimes written AnB). Thus only the occurrences of A which are simultaneous with those of B are taken into account. Similarly, if X, Y are random variables defined on the same sample space, taking on values x,, i = 1,2,..., yj,j =1,2,..., then the conditional probability that X = xf given Y = yJht if Pr{y = yj}#0,
Pr^-x^-y^}- Pr(y = 3,.} .
the comma between X = xt and Y = y;- meaning 'and'. The conditional expectation of X given 7 = y, is
£(X 17 = y;) = X x, Pr {Jf = x,| y = y,-}.
i
The expected value of XY is
£(Xy)=Vxfy>Pr{Z = x,,y = yJ},
and the covariance of X, Y is
Cov(X, Y) = E{(X - E(X))(Y - £(y))] = E{XY)-E(X)E(Y).
The covariance is a measure of the linear dependence of X on Y.
If X, Y are independent then the value of Y should have no effect on the probability that X takes on any of its values. Thus we define X, Y as independent if
Pr{X = xi\Y = yJ} = Pr{X = xi}, all
Equivalently X, Y are independent if
Pr{X = xi,y = yJ-} = Pr{X = xi}Pr{y = y7.},
with a similar formula for arbitrary independent events. Hence for independent random variables
E(XY) = E(X)E{Y),
so their covariance is zero. Note, however, that Cov (X, Y) = 0 does not always imply X, Y are independent. The covariance is often normalized by defining the correlation coefficient
Covpr,y)
axoY
Law of total probability 7
where ax, aY are the standard deviations of X, Y. pXY is bounded above and below by
-1 < Pxr ^ 1
Let Xl,X2,...,Xn be mutually independent random variables. That is,
Pi{X1eAuX2eA2,...,XneAn}
= Pr {X^AJ Pr {X2eA2}... Pr {X„eAn},
for all appropriate sets Alt..., An. Then
= E Var(^()
i=i
so that variances add in the case of independent random variables. We also note the formula
Var {aX + bY) = a2 Var (X) + b2 Var (Y),
which holds if X, Y are independent. If XUX2,...,X„ are independent identically distributed (abbreviated to i.i.d.) random variables with = fi, Var(Z1) = ff2, then
e( 1 = m Var( £ X^j = no2.
If X is a random variable and {Xu X2,X„} are i.i.d. with the distribution of X, then the collection {Xk} is called a random sample of size n for X. Random samples play a key role in computer simulation (Chapter 5) and of course are fundamental in statistics.
1.4 LAW OF TOTAL PROBABILITY
Let be a sample space for a random experiment and let {Ah i = 1,2,...} be a collection of nonempty subsets of Q such that
(i) AiAj = 0,
(ii) vj^ = Q.
i
(Here 0 is the null set, the impossible event, being the complement of Jl) Condition (i) says that the At represent mutually exclusive events. Condition (ii) states that when an experiment is performed, at least one of the Ax must be observed. Under these conditions the sets or events {Ah i = 1,2,...} are said to form a partition or decomposition of the sample space.
8 Basic probability theory The law or theorem of total probability states that for any event (set) B,
Pv{B}=^Pr{B\Ai}Pt{Ai}
_[__
A similar relation holds for expectations. By definition the expectation of X conditioned on the event At is
E(X\Ai) = ^xkPv{X = xk\Ai},
it
where {xk} is the set of possible values of X. Thus £(X) = 2>fcPr =
k
= YIxkZPr{X = xk\Ai}PT{Ai}
k i
i k
Thus
E(X)=ZE(X\Ai)T>r{Ai}
which we call the law of total probability applied to expectations.
We note also the fundamental relation for any two events A, B in the same sample space:
Pr {A u B) = Pr {A} + Pr {B} - Pr {AB}
where A u B is the union of A and B, consisting of those points which are in A or in B or in both A and B.
1.5 CHANGE OF VARIABLES
Let X be a continuous random variable with distribution function Fx and density fx. Let
y = g(x)
be a strictly increasing function of x (see Fig. 1.1) with inverse function
x = fc(y).
Then
Y = g(X)
is a random variable which we let have distribution function FY and density fY.
Change of variables 9
Figure 1.1 g(x) is a strictly increasing function of x.
It is easy to see that X ^ x implies Y ^ g(x). Hence we arrive at
Pr{X 0 is said to be 'little o of Ax', written o(Ax). Thus for example (Ax)2 is o(Ax) because (Ax)2 vanishes more quickly than Ax. In general, if
The little o notation is very useful to abbreviate expressions in which terms will not contribute after a limiting operation is taken. To illustrate, consider the Taylor expansion of e**:
= 0,
we write
#(Ax) = o(Ax).
e** = 1 + Ax 4
(Ax)2 (Ax)3
2! 3!
= 1 + Ax + o(Ax).
We then have
—e* = lim
dX x = q Ax-»(
14 Basic probability theory
.. 1+Ax + o(Ax)-1 = lim----
Ax->0 Ax
,. Ax o(Ax) = lirn + ——
Ajc-»0AX ax
= 1.
£^«a/ 6y definition
As seen already, when we write, for example,
9 = (1~p)
we are defining the symbol q to be equal to 1 - p. This is not to be confused with approximately equal to, which is indicated by