4.13. Remark. (1) In case of k = 1 (one-step prediction) we shall simplify the notation omitting the superscript (1). (2) Observe that dividing (4.2) by (0) > 0 leads to an equivalent SLE Rn (k) n = (k) n expressed in terms of autocorrelation function () instead of the autocovariance function (). (3) Neither n nor (k) n depends on t (by stationarity) and thus hereafter we can assume t = n without the loss of generality. 4.14. Theorem (Durbin-Levinson algorithm for n). Let X = {Xt | t Z} be a stationary time series with X = 0 and autocovariance function X (h) 0 for h . If bXn+1 = n,1Xn + + n,nX1 is the best linear prediction as of Theorem 4.12 then coefficients n,j and the mean square error vn = E|Xn+1 - bXn+1|2 may be recursively computed as follows Initial step with n = 0: v0 = X (0). (4.4a) Recursive step with n > 0: n,n = X (n) =0 for n=1 z }| { n-1X j=1 n-1,j X (n - j) v-1 n-1, (4.4b) vn = vn-1(1 - |n,n|2 ), (4.4c) 2 6 6 6 4 n,1 n,2 ... n,n-1 3 7 7 7 5 = 2 6 6 6 4 n-1,1 n-1,2 ... n-1,n-1 3 7 7 7 5 - n,n 2 6 6 6 4 n-1,n-1 n-1,n-2 ... n-1,1 3 7 7 7 5 for n > 1. (4.4d) Proof. See [BD93, §5.2]. 4.15. Definition. Let bX = PL(1,X1,...,Xn)X and bY = PL(1,X1,...,Xn)Y where X, Y, X1, . . . , Xn L2 then (X, Y | X1, . . . , Xn) := (X bX, Y - bY ) is called partial correlation coefficient of random variables X and Y given X1, . . . , Xn. Interpretation: partial correlation between X and Y given X1, . . . , Xn is a correlation between X and Y cleaned of its part transmitted via the influence of random variables X1, . . . , Xn. 4.16. Definition. Let X = {Xt | t Z} be a stationary time series with autocorrelation function X (). Then the partial autocorrelation function (pacf) X () of X is defined as follows: X (0) = X (0) = 1, X (1) = X (1) = (Xt+1, Xt) X (h) = (Xt+h, Xt | Xt+1, . . . , Xt+h-1) for h 2. 4.17. Theorem. If X = {Xt | t Z} is a stationary time series with X = 0 and partial autocorrelation function X () then X (n) = n,n for n 1 where n = [n,1, . . . , n,n]T is the solution to the 1-step best linear prediction problem as of eq. (4.2), i.e. bXn+1 =Pn j=1 n,jXn+1-j. Proof. See [BD93, §5.2]. Clearly, X (h) may be computed recursively using the intermediate result (4.4b) of the Durbin-Levinson algorithm 4.14. Another procedure is based on Cramer's rule according to the next corollary. Unfortunately, that method is computationally not much efficient for large n. 4.18. Corollary. If the matrix n of eq. (4.2) is nonsingular, then X (n) = n,n = det n detn , where n := [(:, 1), . . . , (:, n - 1), n]. 2 4.19. Definition (ARMA process). Stochastic process X = {Xt | t Z} is called ARMA process of order p, q (0 p, q < ), we write X ARMA(p, q), if Xt = 1Xt-1 + 2Xt-2 + + pXt-p | {z } Autoregression component (AR) + + Zt + 1Zt-1 + 2Zt-2 + + qZt-q | {z } Moving average component (MA=Moving Average) , (4.5a) where Z := {Zt | t Z} WN (0, 2 ) and p = 0, q = 0, = 0. Rewriting (4.5a) into an equivalent form Xt - 1Xt-1 - 2Xt-2 - - pXt-p = = Zt + 1Zt-1 + 2Zt-2 + + qZt-q, (4.5b) a short form may be used (B)Xt = (B)Zt or (z)X(z) = (z)Z(z), (4.5c) giving with 0 = 0 = 1 (z) = 1 - 1z - - pzp , (z) = 1 + 1z + + qzq , z C. 4.20. Remark. In the preceding definition we assumed 0 = 1 without the loss of generality, because otherwise it would be sufficient to replace the original white noise by a modified one {0Zt} WN (0, (0)2 ), and the original i by i 0 for i = 1, . . . , q. 3 4.21. Remark (Special cases). a) Autoregressive process (AR process): X AR(p) = ARMA(p, 0) : (B)Xt = Zt because (z) 1. Then Xt = 1Xt-1 + 2Xt-2 + + pXt-p + Zt. (4.5d) We admit p = provided that := {j} j=1 1. b) Moving average process (MA process): X MA(q) = ARMA(0, q) : Xt = (B)Zt because (z) 1. Then Xt = Zt + 1Zt-1 + 2Zt-2 + + qZt-q. (4.5e) We admit q = provided that := {j} j=1 1. c) White noise: White noise is the only process which is both AR and MA process: X ARMA(0, 0) = AR(0) = MA(0) = WN (0, 2 ) : Xt = Zt. d) General ARMA process: X ARMA(p, q), 0 < p, q < : True mixture of autoregressive and moving average components. 4.22. Definition. X = {Xt | t Z}, X ARMA(p, q) is called causal ARMA process if there exists = {j} j=0, P j=0|j| < (i.e. 1) such that Xt = X j=0 jZt-j (or in short form Xt = (B)Zt), t Z. (4.6a) 4 X is called invertible ARMA process if there exists = {j} j=0,P j=0|j| < (i.e. 1) such that X j=0 jXt-j = Zt (or in short form (B)Xt = Zt), t Z. (4.6b) 4.23. Remark. Consequently, causality in this context says that ARMA process X is also a time series generated by white noise {Zt} in the sense of Remark 4.5(1), or X MA() in our notation. On the other hand, invertibility means that the white noise {Zt} itself may be generated by the given ARMA process X, which is equivalent with X AR(). Above we assumed 0 = 0 = 1 again, which will be confirmed in section 4.32 later on. There the main issue will be the computation of the causal and invertible representation of an ARMA process. 4.24. Theorem (Autocovariance function of an MA process). {Xt} MA(q), q is a stationary process having zero mean X = 0 and autocovariance function X (h) = 2 q X k=0 h+kk for h 0. (4.7a) Hence for q < we get X (h) = ( 2 Pq-h k=0 h+kk for 0 h q 0 for h > q (4.7b) and in particular X (q) = 2 q = 0 in view of 0 = 1. 5 Proof. By Corollary 4.4 is {Xt} stationary and it holds X = Z q X j=0 j = 0, because Z = 0. X (h) = q X j,k=0 jkZ (h - j + k), where Z (h) = ( 2 for h = 0 0 for h = 0 . The only nonzero terms in the sum are with h - j + k = 0, i.e. with j = h + k, which yields (4.7a) X (h) = q X k=0 h+kk Z (0) | {z } 2 . With q < we have h+k = 0 for h + k > q, or equivalently for k > q - h, which allows us to rewrite (4.7a) as (4.7b). 4.25. Corollary. 2 X = X (0) = 2 q X k=0 |k|2 . 2 X = 2 (1 + |1|2 + |2|2 + + +|q|2 ) for q < . (4.8a) X (h) = Pq-h k=0 h+kk Pq k=0|k|2 for h 0. (4.8b) 4.26. Theorem (Pacf of a causal AR process). Let {Xt} AR(p), p < be a causal AR process. Then {Xt} is stationary with zero mean X = 0 and partial autocorrelation function X satisfying X (p) = p = 0 and X (h) = 0 for h > p. Moreover bXt = 1Xt-1 + +pXt-p = PL(Xt-1,...,Xt-p)Xt where 1, . . . , p are precisely the 1-step best linear prediction coefficients. Proof. In view of 4.23 and due to causality {Xt} MA() is zero-mean stationary by 4.24. By (4.6a) is Xt = P j=0 jZt-j and 6 consequently Xt L(Zt, Zt-1, . . . ) =: Lt (closure of a linear subspace in L2(, A, P) spanned by random variables Zu, u t). Putting j = 0 for j > p, we can write for each n p in view of (4.5d): Xt = nX j=1 jXt-j | {z } Lt-1 +Zt. Random variables Zt are uncorrelated: Zt Zu for t = u. In particular Zt Zu for u < t and thus Zt Lt-1 by the continuity and bi-linearity of inner-product in L2(, A, P). Applying the orthogonal projection theorem we get bXt = Pn j=1 jXt-j as a unique best linear prediction Xt in terms of Xt-1, . . . , Xt-n for every n p. By the Theorem 4.17 it holds (p) = p and (n) = n = 0 for n > p. Figures 4.1, 4.2 and 4.3 show typical behaviour of estimated autocorrelation and partial autocorrelation functions of simulated processes AR(2), MA(2) and ARMA(2, 2), respectively. Dashed band stands for the appropriate point confidence interval containing zero with probability 0.95. We see that processes AR(2) on Fig. 4.1, or MA(2) on Fig. 4.2, exhibit X (h) 0, or X (h) 0 for h > 2 in accordance with theorems 4.26 and 4.24, respectively. Otherwise the envelope of X (h), or X (h) (with ARMA(2, 2) on Fig. 4.3 both of them) exhibits exponential decay, eventually combined with oscillatory behaviour. It is because one can show that both X and X may be expressed in such cases as a linear combination of decreasing geometrical sequences and/or cosine waves with geometrically decreasing amplitudes. 7 Sample path, n = 500 0 50 100 150 200 250 300 350 400 450 500 -8 -6 -4 -2 0 2 4 6 8 10 12 Data with the mean estimate Autocorrelation function 0 5 10 15 20 25 30 35 40 45 50 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Autocorrelation function Partial autocorrelation function 0 5 10 15 20 25 30 35 40 45 50 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Partial autocorrelation function Figure 4.1. AR(2) : = [0.5, 0.2], 2 = 2.25. 8 Sample path, n = 500 0 50 100 150 200 250 300 350 400 450 500 -8 -6 -4 -2 0 2 4 6 8 Data with the mean estimate Autocorrelation function 0 5 10 15 20 25 30 35 40 45 50 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Autocorrelation function Partial autocorrelation function 0 5 10 15 20 25 30 35 40 45 50 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Partial autocorrelation function Figure 4.2. MA(2) : = [-0.5, -0.2], 2 = 2.25. 9 Sample path, n = 500 0 50 100 150 200 250 300 350 400 450 500 -10 -8 -6 -4 -2 0 2 4 6 8 10 Data with the mean estimate Autocorrelation function 0 5 10 15 20 25 30 35 40 45 50 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Autocorrelation function Partial autocorrelation function 0 5 10 15 20 25 30 35 40 45 50 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Partial autocorrelation function Figure 4.3. ARMA(2, 2) : = [0.5, 0.2], = [-0.6, 0.3], 2 = 2.25. 10 References [BD93] Peter J. Brockwell and Richard A. Davis, Time series: Theory and methods, 2-nd ed., Springer-Verlag, New York, 1991 (corrected 2-nd printing 1993). 11