TIME SERIES I VÍTĚZSLAV VESELÝ Department of Statistics and Operations Research University of Malta 1 Email: vvese01@um.edu.mt CONTENTS • General notation and abbreviations • Time series as a special case of a random process — definition, examples • Consistent system of distribution functions • Moment functions (mean, autocovariance and autocorrelation function), their properties and estimates • Strict and weak stationarity, white noise • Time-series models (overview): additive and multiplicative decomposition, Box-Jenkins mothodology • Transformations stabilizing variance (Box-Cox and power transformation) • Identification of periodic components — periodogram, Fisher's and Siegel's testing statistics • Using regression in time series decomposition and prediction, and other related techniques • Detrending: linear filtration (moving average) & stepwise linear regression, exponential weighting and other related techniques • Randomness tests Date: May 20, 2002. 1Visiting lecturer, 2-nd semester from 4-th February to 12-th July 2002 1. General notation and abbreviations s := v or v =: s . . . denoting expression v by symbol s. iff stands for if and only if. Sets and mappings: • N,Z,]R,C . . .natural numbers, integers, real and complex numbers, respectively. • ZN := {0, 1, . . . , N - 1} . . . residuals modulo N G N. • R . . . the set of all non-negative real numbers. • exp X . . . class of all subsets of the set X. • card M . . . cardinality of a set M. • (•) : R —> R . . . mapping defined as (x) = max(0, x). • (a,b), [a,b], (a,b], [a,b) ...intervals on real line. 3(a, b) = {x \ min(a, b) < x < max(a, 6)} J[a,6] = {x \ min(a,6) < x < max(a,6)}. • f (A) := {y e Y\y = f(x),x G A C X} ...range (image) of set A under mapping / : X —> Y. • f~1(B) := {x G X | f(x) G B} C X . . .inverse image of set B Y. • I a ■ ■ ■ indicator function of set A C X: T , N í 1 for x G A 1a(x) = < . . I 0 otherwise • An t • • • increasing or non-decreasing sequence of numbers or sets. • An 4- • • • decreasing or non-increasing sequence of numbers or sets. • X)ľ=i A' := Uľ=i A-i ■ ■ ■ union of a family of sets which are pairwise disjoint. • Ac := X — A ... complement of set A C X in X where X is a priori known from the context. • A := lim inf MOO A„ := LC=i H^n ^j • • • inferior limit of a sequence of sets. 2 A i . superior and • A := \imsupn^ca An := flľ=i Uj°l,. limit of a sequence of sets. • A = linin-j-oo An iff A = A, clearly Ant A implies limn-too An = \J™=1 Ar, Ani A implies limn-too An = fC=i An- Vectors and matrices: • x := [x\, . . . , x„] . . . vector of numbers (by default column vector if not stated otherwise). x + h := [x\ + h, . . . , x„ + h] , h G C. f G C* where t = [h,...,tk]T G Xt \Xt, , Xtu ] for any 1 < i < n. Mk, U e {!,...,n} for i = 1, ... . QSyl) . — L^lj • • • •) %i — 1 •) %i-\-l •) • • • 5^ji_, f (x) := f(xi, . . . ,x„), dx:=dx! 0, 0nxi • • • vector of n zero entries. A, Amxn '■= [a i j] = [A(i, j)] . . .matrix of size m x n. 7£(j4) '■= {y \y = Ax} . . . range space of matrix operator A. Af(A) := {x | Ax = 0} . . . null space (kernel) of matrix operator A. A := [ctji] . . . matrix transpose. A* := [čijí] ... matrix adjoint. J, In '■= Inxn = [Sij] ■ ■ .identity matrix of order n. det A . . . determinant of a square matrix A. 0, 0mXn • • • zero matrix of size m x n. X! 0 ... 0 0 x2 ... 0 • diag(íc) 0 0 trix. • A(i, :) := [au, . . . ,atn_ MATLAB style. • A(:,j) := [aij,...,am. using MATLAB style. 3 . diagonal ma- . i-th row of matrix A using . . . j-th column of matrix A • A := [n; . . . ; rm] = [si, • • • , s„] ... forming matrix A row-by-row or columnwise using MATLAB style. • A > 0 (or A > 0) ... positively (semi)definite (non-negatively definite) matrix. • (x,y) : = X^r=i x*y* = 2/** ••• scalar (inner) product of vectors x and y. • ||x|| := v/Sr=il2ľíl2 = \/(xi x) ■ ■ ■ Euclidean norm of vector x. Random variables and random vectors: • X . . . random variable. • X := [Xi, . . . , X„] . . . (real) random vector, indexing conventions listed above for number vectors are adopted accordingly. • /U := nx '■= EX . . . expectation of random variable X. • ß := ß1L := EX := [EXi, . . . , EXn]T . . . expectation of random vector X. • a2 := (tx := vaiX := E|X - EX|2 = E|X|2 - |EX|2 > 0 . . . variance of random variable X. • axY := cov(X, Y) := E(X - EX)(Y - EY) = EXY -(EX)(Ey) . . . covariance of random variables X and Y. • Ex := varX := [cov(Xi, Xj)] = E(X-EX)(X-EX)T = EXXT-(EX)(EX)T.. . variance matrix of random vector X. • Exr := cov(X,Y) := [cov(X8, Y,)] = E(X - EX)(Y-EY)T = EXYT - (EX)(EY)T ... covariance matrix of X and Y. It holds: • varX = cov(X, X). • cov(Y,X) = cov(X,Y). • cov(Y,rXr,Y,sYs) = ErEscov(^r.^) and hence in particular: • var(X + Y) = varX+ cov(X,Y)+ cov(Y, X)+ varY = varX + 2cov(X, Y) + varY. • cov(X,X) = varX. • cov(Y,X) = cov(X,Y)T implies: • varX = (varX)T ... variance matrix X is symmetrical. • Given number vectors a and c, and matrices B and D of compatible sizes then cov(a+BX,c+DY) = cov(BX,DY) = Bcov(X, Y)DT l),X = Y • var(a + BX) = cov(a + BX,a + BX) = cov(BX,BX) = Bvar(X)BT ^ a = 0, B = bT • 0 < var(bTX) = bTvarX b implies: • varX > 0 ... variance matrix is non-negatively positive and consequently it has non-negative eigen values X, and its square root matrix E^ having eigen i_ values A82 may be constructed such that: • Ex = E| E|. . cov(£rXr,£sYs) = £r£,cov(Xr,Y,) and hence in particular: • var(X + Y) = varX + cov (X, Y) + cov(Y,X) + varY = varX + 2cov(X, Y) + varY. 2. Introduction Definition 2.1. Random (stochastic) process X is a nonempty family (T ^ 0) of (real) random variables defined on the same probability space (Q,A, P). We write X := {Xt \ t G T} or simply {Xt}. Special cases: TCI., .continuous-time process or random function. T C TL ... discrete-time process, random sequence or time series. Remark 2.2. • Indexing set T is usually ordered and interpreted as continuous or discrete time interval. It may be also disordered, for example coordinates of points on the plane (meteorology) or in 3-D space (geophysics). • As Xt : fž —> R is a (measurable) mapping for each t G T, the stochastic process may be viewed as a mapping X : fixT^Ras well. Definition 2.3. For fixed w£!lwe get function x : T —> R as an outcome of a random experiment: x(t) := Xt(io). This function is called sample-path (trajectory, realization, observation) of X. Remark 2.4. Figures 1.1-1.6 illustrate trajectories of various random processes (time series). Later on formulation stochastic process is related to the general case with any T in contrast with the formulation time series which assumes T = TL or sometimes T = N. Definition 2.5. If X = {Xt} is a time series where Xt, í £ ľ, are all mutually independent and identically distributed with mean ß and variance a , we shall write X~ IID(n,(j2) 6 Example 2.6 (Examples of time series). (1) Sinusoid with random amplitude and phase (Fig. 1.1). (2) Binary process of tossing a coin (cf. Fig. 1.4 as well). (3) Random walk. (4) Branching process. Definition 2.7 (Consistent system of distribution functions of X). Let us denote T := {t\t = [ti, Í2, ■ ■ ■ , t„] G Tn, t, ^ t3, for i ^ j, n G N}. For each t G 7 of any size n G N let Ft (x) be the joint distribution function of the marginal random vector lit being selected from the stochastic process X = {Xt}t£T at time instants ti, Í2, ■ ■ ■ , t„. The system {Ft}teT describes completely the stochastic behaviour of X and is called consistent system of distribution functions of X (cf. the next theorem). Theorem 2.8. The system {Ft}teT °f definition 2.7 is called consistent because the following two consistency conditions hold for each x = [xi, X2, ■ ■ ■ , xn]T G R" and n G N : (i) Ftp (x p) = Ft (x) for any permutation p of indices {1,2, ... ,n}. (ii) lim^-s-co Ft (a;) = Ft(x\,... ,xt-li00,xt+1,... ,x„) =: = ■ Ft(i){x(i)) for any i G {1,2, ...,«}. Theorem 2.9 (Kolmogorov's theorem). Given T and 7 as of definition 2.7, let ÍF := {Ft}teT be a consistent system of distribution functions. Then there exists a stochastic process {Xt}t£T defined on a suitable probability space (O, A, P) such that ÍF is its system of distribution functions. Remark 2.10. Conditions (i) and (ii) of theorem 2.8 can be replaced by equivalent conditions formulated in terms of characteristic functions $t(w) = E(exp(zMTXt)) = E(exp(iE7=1«j^ti)), « G R™, which are associated with the distribution functions Ft: (i) ^tp(wp) = $t(w) for any permutation p of indices {1, 2, . . . , n}. (ii') limui-to $((«) = $((«1, • • • , «j-1,0, Ui+i, . . . ,u„) =: =■■ $t(o(w(0)for any i G í1-2-•••,«}• 7 Definition 2.11. We call a stochastic process normal or gaussian if every distribution function Ft of its consistent system (ť G T) is a joint distribution function of normally distributed marginal random vector lit. Now we are about to introduce moment functions as analogies to the expectation and variance matrix of a random vector, which may be considered as a special case of a stochastic process with finite index set T = {1, 2, . . . , n}. Definition 2.12. Given stochastic processes X = {Xt}t£T and Y = {Yt}teT, both on the same probability space, we define 1-st and 2-nd moment functions as follows. (1) mean of X: ßx : T -> R by ßx(t) := EXt provided that the expectations exist for all t G T. (2) autocovariance function of X: -/x '■ T x T —> R by Jx(r, s) : = cov(Xr, Xs) provided that the covariances exist for all r, s G T. (3) variance of X: ax:T -> R + by ax(t) := cov(Xt,Xt) = Jx(t, t) provided the variances exist for all í £ ľ. (4) autocorrelation function of X: px '■ T x T —> [—1,1] by px(r,s) := I V^(r.r)V^(s.s) I 0 otherwise, provided that the correlations exist for all r, s G T. (5) cross-covariance function of X and Y: jxy ■ T x T —> R by jxY(r,s) := cov(Xr,Ys) provided that the covariances exist for all r, s G T. (6) cross-correlation function of X and Y: PXY :TxT^[-l, I] hy pxY(r, s) := t V^(r'r)Viľ(s.s) I 0 otherwise, provided that the correlations exist for all r, s G T. 8 Theorem 2.13. Given stochastic process X := {Xt \ t G T} such that E|Xt| < oo for all t G T, then ßx(-), ~fx(-, •) and px(-, •) era'sr as well. In such a case we say that X has finite second moments. Definition 2.14. Time series X := {Xt 11 G Z} is called strictly stationary if each distribution function from its consistent system {rt}t6T-, is shift-invariant (time-invariant): Ft(-) = Ft+h(-) for each teTandheZ. Definition 2.15. Time series X := {Xt | ŕ G Z} is called (weakly) stationary if the following three conditions are fulfilled: (1) X has finite second moments. (2) Jx(r, s) = jx(r + h, s + h) for each r, s,h G Z. (3) (ix(') = ßx is a constant function. If only the first two are valid then X is called covariance stationary. Remark 2.16. (1) Clearly (2) implies with r = s that the variance function of a stationary time series is a constant function as well: ^(•) = *2x. (2) If (3) holds, then yx(r,s) = EXrXs - (EXr)(EXs) = EXrXs — ßx implies that (2) is equivalent (and might be thus substituted) with the condition: EXrXs = EXr+hXs+h for each r,s,h G "Z- We see altogether that all first and second moments are shift-invariant with weak stationar-ity. That is why weak stationarity is sometimes denoted as 2-nd order stationarity. Remark 2.17. Clearly condition (2) of definition 2.15 may be substituted be a modified condition (2') Jx(r, s) depends only on the difference of arguments r — s. That is why we can introduce autocovariance and autocorrelation 9 function of a stationary time series as a function of one argument only: jx(h) :=lx(t + h,t) ru\ (t- i ;, a ~fx(t + h,t) ~fx(h) px(h) := px(t + h,t) =-----i-----------'- = -----)-(■ (2.1) VX&X ixyJ) v2x = ~fx(t,t) = 7x(0) where (,/iGZ are arbitrary. Theorem 2.18. Every strictly stationary time series with finite second moments is stationary. Example 2.19. In general stationarity does not imply strict stationarity (counterexample) Theorem 2.20. Every stationary gaussian time series is strictly stationary. Definition 2.21. Time series X = {Xt} is called white noise with 2 (fr for r = s mean ß and variance a , if ßx(t) = fJ. and jx{r, s) = < I 0 otherwise We write X~ WN(ß,a2). Stationary time series which is not white noise, is sometimes called coloured noise. Remark 2.22. It is straightforward to verify the following implications: X ~ IID(n,• X ~ WN(iJ,a2) =>• X is stationary. Observe that neither of inverse implications holds in general (cf. example 2.19). 10 Example 2.23. (1) Let Xt(u) := A{lo) cos(8ť) + B(u) sin(oŕ), t G Z, 8 G [-tt, tt], cov(A, S) = O, EA = EB = O, a\ = a\ = 1. Then {Xt} is a stationary time series. (2) Let Xt := Zt + 0Zt_i, {Zt} ~ WN(0,a2), ÍEZ,Í£l. Then {Xt} is a stationary time series. (3) Let Xt := \Yt for even \ t e z, where {Yt} is a I Yt + 1 for odd t stationary time series. Then {Xt} is a time series which is covariance stationary but not stationary. (4) The random walk {5t}tgz from example 2.6(3) is neither stationary nor covariance stationary. Remark 2.24 (Multivariate Time Series). One can introduce the concept of m-dimensional time series (m G N) following the analogy with the univariate case (m = 1): X := {Xt | t G T} where Xt = [-X"t,i, . . . , Xt,m] are m-dimensional random vectors on the same probability space (Q,A,P). We obtain univariate partial time series, vector mean function and matrix au- tocovariance/autocorrelation functions: X, := {Xt,; 11 G T} ... i-th partial time series. í"x(ŕ) := [ßi(t), ■ ■ ■ ,(im(i)]T where fjn(t) := EXt,t = fix,(t)- 7x(r, s) := Ex(r, s) := [cov(Xr>M Xs,j)]i,j = [~fij(r, s)]itj where ~fij(r,s) := ~fXiXj{r,s) is clearly just the cross-covariance function of partial time series X, and Xj. px(r,s) := [p(Xr,i,Xs,j)]i,j = [pij(r, s)]i,j where Pij(r,s) := pxiXj{r,s) is cross-correlation function of partial time series X, and Xj. The m-dimensional stationarity is to be established quite in analogy to definition 2.15 (see also remarks 2.16 and 2.17) simply assuming finite second moments for all partial time series in (1) and substituting 7x for jx in (2) and /Ux for ßx in (3). li It is an easy exercise to prove the following statement: X is stationary iff the following two conditions are fulfilled: (a) Each partial time series X, (i = 1, . . . , ra) is stationary. (b) 7x(f, s) = jii(r + h, s + h) for each r, s,h G Z. Clearly the following relationships hold and Px(r, s) Pit(h) TO'(ft) " in the general case in the stationary case. where 78 := j„ is autocovariance function of i-th partial time series. Definition 2.25. A bivariate function / : T x T -> R, T / 0, is said to be symmetric or non-negatively definite if each square matrix [f{ti,tj)]ij of any size n G N has the respective property for any choice of ť := [ŕi, . . . , t„] G Tn , i.e. all such matrices are symmetric or non-negatively definite. A univariate function g : TL —> R is said to be symmetric or non-negatively definite if the bivariate function /:ZxZ4R defined by /(r, s) := g(r — s) is symmetric or non-negatively definite. Lemma 2.26. Let f and g be functions as of definition 2.25. Then f (or g) is symmetric iff f(s, r) = /(r, s) holds for any r, s G T (or g( — t) = g(t) holds for any t G "Z). Lemma 2.27. The sum of two symmetric (or non-negatively definite) functions, which are both bivariate or univariate and defined on the same domain, is a symmetric (or non-negatively definite) function as well. Theorem 2.28 (Autocovarinace and autocorrelation function properties). Let X := {Xt \ t G T} be a stochastic process with the autocovari-ance function Jx(-, •) [ autocorrelation function px(-, •) ]■ Then the following holds: (1) yx(t,t)>0 [ px(t,t) = 1 if fx(t,t) = (Tx(ť) Ť^ 0, or = 0 otherwise ] for all t G T. (2) \-/x(r, s)\ < y/~fx(r, r)v/7x(s, s) [ \px(r, s)\ < 1 ] for all r,s G T. (3) Jx [ px ] is a symmetric and non-negatively definite function. Corollary 2.29 (for stationary time series). Let X := {Xt \ t G "Z} be a stationary time series with the autoco-variance function Jx(-) [ autocorrelation function p x (•) ]■ Then the following holds: (ľ) 7x(0) > 0 [ px(0) = 1 if 7x(0) = (Jx Ť^ 0; or = 0 otherwise ]. (2') \jx(h)\ < 7x(0) [\px(h)\ R (or-/(■) : Z -> R^ which is symmetric and non-negatively definite, then there exists a gaussian stochastic process (or stationary gaussian time series) X having autocovariance function jx = 7- Corollary 2.31. Properties (1) and (2) (or (1') and (2')) are direct consequence of the property (3) (or (3')). Corollary 2.32. Given two stochastic processes (stationary time series) X and Y with autocovariance functions jx and yy, then there exists a stochastic process (stationary time series) Z, even gaussian, such that jz = Jx + Jy■ 13 Theorem 2.33. Íl forh = 0 r for h = ±1 can be an autocorrelation func-0 for \h\ > 1 tion of a suitable stationary time series X iff \r\ < ^- ^n such a case one possible choice is the time series X of example (2) in 2.23: xt := Zt + eZt-i, {Zt} ~ WN(o,a2), tei, with e= 1±^/21~4r2. Definition 2.34 (Estimates of moment functions). Let x = [xi, . . . , xn] be n samples (xt = Xt(ui) for t = 1, . . . , n) of a stationary time series with mean ß, variance a , autocovariance function 7(-) and autocorrelation function p(-). Their estimates are computed as follows: J2 := — X)"=i x3 ■ ■ ■ estimate of ß; l(h) '■= 71 J2]=i(x3 + h - P)(xj -ß),0 "/(h) with n —> oo. Morover it is consistent in the quadratic mean in the sense that E|7(/i) — ~/(h)\ —> 0 with n —> oo, where the convergence is even faster than with the unbiased estimate. (2) The estimate is reliable only for n > 50 and h < j. (3) From the algebraic point of view "/(h) may be written in the form of a dot product "/(h) = — {xo, Xh) where Xh '■= [0, . . . , 0, x\ — /í, . . . , xn — /í, 0, . . . , 0]. Thus Xo repre- h n—l—h sents the original sample vector (padded with n — I zeros) and Xh its copy shifted by h. Clearly ||a?o||2 = ||í"fe||2 = X)ľ=i \XJ ~P\2■ From the Schwarz inequality we have |(a;o,iBfe)| < 11£0o11 resulting in |7(/i)| < — 11 as o 11 = —(a?o,a?o) = 7(0). Hence we see that the estimate of the autocorrelation function preserves its natural property |p(/i)| < 1. (4) In view of (3) the estimate p(h) may be interpreted geometrically as a cosine of the angle between the original and shifted copy of Xo which is a measure of their linear dependence (similarity): zero means ortogonality (full linear independence=no correlations between them), ±1 means linear dependence (full correlation: one of them is obtained as scalar multiple of the latter). (5) Trend is indicated by correlations at great lags implying small decay of "/(h) with h —> 00. Periodic component is reflected by oscillatory behaviour of "/(h) with the basic period of that component, or mixture of them if there is more than one such periodic component. 16