4.13. Remark.
(1) In case of k = 1 (one-step prediction) we shall simplify the
notation omitting the superscript (1).
(2) Observe that dividing (4.2) by (0) > 0 leads to an equivalent
SLE Rn
(k)
n = (k)
n expressed in terms of autocorrelation function
() instead of the autocovariance function ().
(3) Neither n nor (k)
n depends on t (by stationarity) and thus
hereafter we can assume t = n without the loss of generality.
4.14. Theorem (Durbin-Levinson algorithm for n).
Let X = {Xt | t  Z} be a stationary time series with X = 0
and autocovariance function X (h)  0 for h  . If bXn+1 =
n,1Xn +    + n,nX1 is the best linear prediction as of Theorem
4.12 then coefficients n,j and the mean square error vn =
E|Xn+1 - bXn+1|2
may be recursively computed as follows
Initial step with n = 0:
v0 = X (0). (4.4a)
Recursive step with n > 0:
n,n = X (n) =0
for n=1
z }| {
n-1X
j=1
n-1,j X (n - j) v-1
n-1, (4.4b)
vn = vn-1(1 - |n,n|2
), (4.4c)
2
6
6
6
4
n,1
n,2
...
n,n-1
3
7
7
7
5
=
2
6
6
6
4
n-1,1
n-1,2
...
n-1,n-1
3
7
7
7
5
- n,n
2
6
6
6
4
n-1,n-1
n-1,n-2
...
n-1,1
3
7
7
7
5
for n > 1.
(4.4d)
Proof. See [BD93, §5.2].
4.15. Definition. Let bX = PL(1,X1,...,Xn)X and bY = PL(1,X1,...,Xn)Y
where X, Y, X1, . . . , Xn  L2 then (X, Y | X1, . . . , Xn) := (X bX,
Y - bY ) is called partial correlation coefficient of random
variables X and Y given X1, . . . , Xn.
Interpretation: partial correlation between X and Y given X1, . . . , Xn
is a correlation between X and Y cleaned of its part transmitted via
the influence of random variables X1, . . . , Xn.
4.16. Definition. Let X = {Xt | t  Z} be a stationary time series
with autocorrelation function X (). Then the partial autocorrelation
function (pacf) X () of X is defined as follows:
X (0) = X (0) = 1,
X (1) = X (1) = (Xt+1, Xt)
X (h) = (Xt+h, Xt | Xt+1, . . . , Xt+h-1) for h  2.
4.17. Theorem. If X = {Xt | t  Z} is a stationary time series with
X = 0 and partial autocorrelation function X () then X (n) =
n,n for n  1 where n = [n,1, . . . , n,n]T
is the solution to the
1-step best linear prediction problem as of eq. (4.2), i.e. bXn+1 =Pn
j=1 n,jXn+1-j.
Proof. See [BD93, §5.2].
Clearly, X (h) may be computed recursively using the intermediate
result (4.4b) of the Durbin-Levinson algorithm 4.14. Another
procedure is based on Cramer's rule according to the next corollary.
Unfortunately, that method is computationally not much efficient
for large n.
4.18. Corollary. If the matrix n of eq. (4.2) is nonsingular, then
X (n) = n,n =
det
n
detn
,
where 
n := [(:, 1), . . . , (:, n - 1), n].
2
4.19. Definition (ARMA process).
Stochastic process X = {Xt | t  Z} is called ARMA process of
order p, q (0  p, q < ), we write X  ARMA(p, q), if
Xt = 1Xt-1 + 2Xt-2 +    + pXt-p
| {z }
Autoregression component (AR)
+
+ Zt + 1Zt-1 + 2Zt-2 +    + qZt-q
| {z }
Moving average component (MA=Moving Average)
, (4.5a)
where Z := {Zt | t  Z}  WN (0, 2
) and p = 0, q = 0,  = 0.
Rewriting (4.5a) into an equivalent form
Xt - 1Xt-1 - 2Xt-2 -    - pXt-p =
= Zt + 1Zt-1 + 2Zt-2 +    + qZt-q, (4.5b)
a short form may be used
(B)Xt = (B)Zt or (z)X(z) = (z)Z(z), (4.5c)
giving with 0 = 0 = 1
(z) = 1 - 1z -    - pzp
,
(z) = 1 + 1z +    + qzq
, z  C.
4.20. Remark. In the preceding definition we assumed 0 = 1 without
the loss of generality, because otherwise it would be sufficient
to replace the original white noise by a modified one {0Zt} 
WN (0, (0)2
), and the original i by i
0
for i = 1, . . . , q.
3
4.21. Remark (Special cases).
a) Autoregressive process (AR process):
X  AR(p) = ARMA(p, 0) : (B)Xt = Zt
because (z)  1. Then
Xt = 1Xt-1 + 2Xt-2 +    + pXt-p + Zt. (4.5d)
We admit p =  provided that  := {j}
j=1  1.
b) Moving average process (MA process):
X  MA(q) = ARMA(0, q) : Xt = (B)Zt
because (z)  1. Then
Xt = Zt + 1Zt-1 + 2Zt-2 +    + qZt-q. (4.5e)
We admit q =  provided that  := {j}
j=1  1.
c) White noise:
White noise is the only process which is both AR and MA
process:
X  ARMA(0, 0) = AR(0) = MA(0) = WN (0, 2
) :
Xt = Zt.
d) General ARMA process:
X  ARMA(p, q), 0 < p, q < : True mixture of autoregressive
and moving average components.
4.22. Definition.
X = {Xt | t  Z}, X  ARMA(p, q) is called causal ARMA
process if there exists  = {j}
j=0,
P
j=0|j| <  (i.e.   1)
such that
Xt =
X
j=0
jZt-j (or in short form Xt = (B)Zt), t  Z. (4.6a)
4
X is called invertible ARMA process if there exists  = {j}
j=0,P
j=0|j| <  (i.e.   1) such that
X
j=0
jXt-j = Zt (or in short form (B)Xt = Zt), t  Z. (4.6b)
4.23. Remark.
Consequently, causality in this context says that ARMA process
X is also a time series generated by white noise {Zt} in the sense
of Remark 4.5(1), or X  MA() in our notation.
On the other hand, invertibility means that the white noise {Zt}
itself may be generated by the given ARMA process X, which is
equivalent with X  AR().
Above we assumed 0 = 0 = 1 again, which will be confirmed in
section 4.32 later on. There the main issue will be the computation
of the causal and invertible representation of an ARMA process.
4.24. Theorem (Autocovariance function of an MA process).
{Xt}  MA(q), q   is a stationary process having zero mean
X = 0 and autocovariance function
X (h) = 2
q
X
k=0
h+kk for h  0. (4.7a)
Hence for q <  we get
X (h) =
(
2 Pq-h
k=0 h+kk for 0  h  q
0 for h > q
(4.7b)
and in particular X (q) = 2
q = 0 in view of 0 = 1.
5
Proof. By Corollary 4.4 is {Xt} stationary and it holds
X = Z
q
X
j=0
j = 0, because Z = 0.
X (h) =
q
X
j,k=0
jkZ (h - j + k), where Z (h) =
(
2
for h = 0
0 for h = 0
.
The only nonzero terms in the sum are with h - j + k = 0, i.e.
with j = h + k, which yields (4.7a)
X (h) =
q
X
k=0
h+kk Z (0)
| {z }
2
.
With q <  we have h+k = 0 for h + k > q, or equivalently for
k > q - h, which allows us to rewrite (4.7a) as (4.7b).
4.25. Corollary.
2
X = X (0) = 2
q
X
k=0
|k|2
.
2
X = 2
(1 + |1|2
+ |2|2
+    + +|q|2
) for q < .
(4.8a)
X (h) =
Pq-h
k=0 h+kk
Pq
k=0|k|2
for h  0. (4.8b)
4.26. Theorem (Pacf of a causal AR process).
Let {Xt}  AR(p), p <  be a causal AR process. Then {Xt}
is stationary with zero mean X = 0 and partial autocorrelation
function X satisfying X (p) = p = 0 and X (h) = 0 for h > p.
Moreover bXt = 1Xt-1 +  +pXt-p = PL(Xt-1,...,Xt-p)Xt where
1, . . . , p are precisely the 1-step best linear prediction coefficients.
Proof. In view of 4.23 and due to causality {Xt}  MA() is zero-mean
stationary by 4.24. By (4.6a) is Xt =
P
j=0 jZt-j and
6
consequently Xt  L(Zt, Zt-1, . . . ) =: Lt (closure of a linear subspace
in L2(, A, P) spanned by random variables Zu, u  t).
Putting j = 0 for j > p, we can write for each n  p in view
of (4.5d):
Xt =
nX
j=1
jXt-j
| {z }
Lt-1
+Zt.
Random variables Zt are uncorrelated: Zt  Zu for t = u. In particular
Zt  Zu for u < t and thus Zt  Lt-1 by the continuity and
bi-linearity of inner-product in L2(, A, P). Applying the orthogonal
projection theorem we get bXt =
Pn
j=1 jXt-j as a unique best
linear prediction Xt in terms of Xt-1, . . . , Xt-n for every n  p.
By the Theorem 4.17 it holds (p) = p and (n) = n = 0 for
n > p.
Figures 4.1, 4.2 and 4.3 show typical behaviour of estimated autocorrelation
and partial autocorrelation functions of simulated processes
AR(2), MA(2) and ARMA(2, 2), respectively. Dashed band
stands for the appropriate point confidence interval containing zero
with probability 0.95. We see that processes AR(2) on Fig. 4.1, or
MA(2) on Fig. 4.2, exhibit X (h)  0, or X (h)  0 for h > 2 in accordance
with theorems 4.26 and 4.24, respectively. Otherwise the
envelope of X (h), or X (h) (with ARMA(2, 2) on Fig. 4.3 both of
them) exhibits exponential decay, eventually combined with oscillatory
behaviour. It is because one can show that both X and X
may be expressed in such cases as a linear combination of decreasing
geometrical sequences and/or cosine waves with geometrically
decreasing amplitudes.
7
Sample path, n = 500
0 50 100 150 200 250 300 350 400 450 500
-8
-6
-4
-2
0
2
4
6
8
10
12
Data with the mean estimate
Autocorrelation function
0 5 10 15 20 25 30 35 40 45 50
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Autocorrelation function
Partial autocorrelation function
0 5 10 15 20 25 30 35 40 45 50
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Partial autocorrelation function
Figure 4.1. AR(2) :  = [0.5, 0.2], 2
= 2.25.
8
Sample path, n = 500
0 50 100 150 200 250 300 350 400 450 500
-8
-6
-4
-2
0
2
4
6
8
Data with the mean estimate
Autocorrelation function
0 5 10 15 20 25 30 35 40 45 50
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Autocorrelation function
Partial autocorrelation function
0 5 10 15 20 25 30 35 40 45 50
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Partial autocorrelation function
Figure 4.2. MA(2) :  = [-0.5, -0.2], 2
= 2.25.
9
Sample path, n = 500
0 50 100 150 200 250 300 350 400 450 500
-10
-8
-6
-4
-2
0
2
4
6
8
10
Data with the mean estimate
Autocorrelation function
0 5 10 15 20 25 30 35 40 45 50
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Autocorrelation function
Partial autocorrelation function
0 5 10 15 20 25 30 35 40 45 50
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Partial autocorrelation function
Figure 4.3. ARMA(2, 2) :  = [0.5, 0.2],  =
[-0.6, 0.3], 2
= 2.25.
10
References
[BD93] Peter J. Brockwell and Richard A. Davis, Time series: Theory and
methods, 2-nd ed., Springer-Verlag, New York, 1991 (corrected 2-nd
printing 1993).
11