Bayesian Approach to Estimation: Natural Condition of Control and DSGE Models* Martin Fukac^ Vladimir Havlena* October 2010 Abstract This paper is written by authors from technical and economic fields, motivated to find a common language and views on the problem of the optimal use of information in model estimation. The center of our interest is the natural condition of control - a common assumption in the Bayesian estimation in technical sciences. We discuss its validity in the estimation of dynamic stochastic general equilibrium (DSGE) models in economics. We argue that the models contain implicit control variables that have observable/measurable counterparts, and they carry-information that affects estimation efficiency and model adaptability. We illustrate our points on a basic RBC model. Key words: natural condition of control, Bayesian estimation, DSGE model estimation JEL: Cll, C18 * PRELIMINARY VERSION. Please do not cite or quote. All usual disclaimers apply. This paper was prepared for the Conference in honor of Osvald Vašíček, held at Masaryk University in Brno, Czech Republic, 22-23 October 2010. ^Federal Reserve Bank of Kansas City, 1 Memorial Drive, Kansas City, Missouri 64198; e-mail: first.lastname kc.frb.org. * Faculty of Electrical Engineering, Czech Technical University in Prague, Technická 2, 166 27 Prague 6, Czech Republic; e-mail: havlenaQfel.cvut.cz 1 1 Introduction Since the seminal paper by Peterka (1981), it has been well understood in the technical sciences that on the way from the Bayesian formula to the standard recursive least square method for an ARX model estimation, or the Kalman filter estimation, several assumptions about information contained in observed input and output variables have to be adopted. While such assumptions are well justified and easy to interpret in technical applications like LQG observer-based state feedback or adaptive control, the technical assumptions may be violated in some other areas like economics. Our attention is focused on the natural condition of control (henceforth NCC or condition). In this way, we would like to stimulate the discussion on the proper use of the information available to econometricians and on the adaptation of theoretical model concepts to a particular estimation algorithm. We review the development of the model estimation from a conceptual Bayesian solution - resulting in a generic functional recursion on conditioned probability density functions (c.p.d.f.) - to famous Kalman filter equations, and demonstrate the loss of optimality in the case when the assumptions used for the development of the standard Kalman filter are not satisfied. The natural condition of control is an assumption made in the control system literature. This mathematical assumption significantly simplifies the algorithm for the optimal estimation of unknown variables like parameters or state (latent) variables. The condition says that if an external observer (econometrician/statistician) is at the same time also the one who controls the system, then his control decisions, if optimal, do not provide any additional information about the state of the system, and vice versa. In contrast to economic applications, the NCC is a credible assumption in the technical sciences. The observer and controller is one person, the system under his control is well identified and he uses algorithms that lead to optimal estimation and decisions. On the other hand in economic applications, it is always almost difficult to argue that the condition holds, because the observer (econometrician) is almost always different from the controller. The econometrician observes the real-time decisions (of workers, employers, central bankers, treasurers, etc.) with a substantial time delay. Moreover, economic models that he uses are often poorly identified. The NCC is a mathematical assumption, and its violation is difficult to detect in the data. The problem is different from the one of model misspec-ification that manifests itself in residuals, shock estimates, or model implied expectations. But, if we (economic agents or econometricians) objectively know that the condition does not hold, i.e. there are observed control variables that are not explicitly included in our models (the NCC is violated), the NCC entitles us to use that knowledge in our favor. There are two effects we can expect. The first, we improve the efficiency of our estimates. The second, because the observed control variable is a result of optimal de- 2 cision, we can use that variable to infer the encoded underlying information to improve our own knowledge about the modelled system. We are going to review the condition's validity for the estimation of dynamic stochastic general equilibrium (DSGE) models. We choose them because they have become the norm for an optimal policy and decision analysis in policy institutions such as central banks. At the same time, they are exactly the class of economic models for which the NCC is the most relevant. In the experiments we are going to conduct with the models, we will assume that 1.) econometrician differs from the controller, but 2.) the econometri-cian knows the data generating process, 3.) he also knows the technology for recovering hidden (unobserved) variables, but, and this an important twist, 4.) out of a data-rich database, the econometrician decides to use only a limited subset of observed time series, which he assumes, carry all relevant information. l.)-4.) is a scenario very close to the reality. In fact, the choice of observable variables matters. There are many decision variables in DSGE models that are implicit in the DSGE's reduced forms but the variables have direct observable counterparts. For instance, such variables may be labor income, capital income, or all kinds of fees or tax revenues. For model dynamics they may be viewed as being of a second-order importance because they do not carry any extra information for the dynamics. Output, prices, and interest rates carry the whole piece of information. But from the estimation point of view, the variables, if observed, carry an important piece of information about the optimal (control) decisions from which we can infer the believes of the other agents, and use them to improve our beliefs. By definition of the DSGE models, every decision is optimal. And by the construction of the Kalman filter, every decision variable must be present in the estimation. Otherwise the filter does not provide optimal estimates. Our paper is closely related to the literature on DSGE models in data rich environment. Boivin and Giannoni (2006) propose a framework for exploiting information from a large datasets to improve the estimation of DSGE models. Our arguments go a similar direction. In comparison to the data-rich literature we provide justification why the use of all available information in estimation is a must. It is the dictate of the natural condition of control. In contrast to Boivin and Giannoni, who work with empirical relationships, we use only the information that can be linked directly to a decision process captured by the model (like the variables listed above).1 In that respect we are also using the information springing from cross-equation restrictions. For the convention, the paper is structured as follows. We begin by 1On the ground of NCC, the methodology proposed in Schorfheide, Sill and Kryshko (2010) is an example of an inconsistent and inefficient use of available information. The off-model variables, if relevant at all, should be used to update the information of the model states (variables, factors or whatever you wish to call it) and not be treated exogenously. 3 reviewing the derivation of the basic Kalman filter equations from the engineering point of view. It will help us to understand the motivation and consequences of the natural condition of control. In the third section, we show how the engineering world maps in to the world of dynamic stochastic general equilibrium models. In the fourth section, we illustrate our points on a neoclassical growth model. 2 State Estimation and Output Prediction In engineering typically, the motivation of parameter and/or state estimation is the optimal control problem. The definition of the model is then implied by this task. Consider a discrete-time dynamic system depicted in Figure 1 with the observable/measurable input sequence ut and output sequence yt and some hidden variables that can be interpreted as the system parameters 6 or system state xt. The input sequence enters in a closed loop, in which the control decision is based on the system states estimates. Figure 1: Dynamic system (DSGE model) 4 2.1 Optimal control problem Let the sequence of input and output data observed at time interval from t\ up to time £2 be D*2 = {ut1 ,yt1, ■■-,Ut2 , y*2}. If the initial time t\ = 1, it can be omitted, i.e. D\ = D*. Suppose, we have observed the input-output sequence up to the time £ and are looking for optimal control on T step prediction horizon with opti-mality criterion min E j J ^D*+^j D* j. This optimization problem requires the joint probability density function p(D*+^|D*). Using the chain rule, this c.p.d.f. can be written as p(D**+f | D*) = p(y*+T | D*+T"1, u*+T )p(u*+T | D*+T"1) x ■■■ xp(y*+11 D\, u*+1 )p(u*+ The set of c.p.d.f.s p(yr|DT-1,ur) for t = £ + 1,---,£ + T defines the dependence of system output yr on system history up to the time t — 1, and the system input at time t. These c.p.d.f.s define the model of the system. The set of c.p.d.f.s p(ur|D[-1) for t = £ = 1,^,£ + T is a general description of the law by which the input uT is generated. We will call this set of c.p.d.f.s as the control law. Note the information delay in the control law; while the input uT is applied to the system to generate its output in the -r-th period, the output yT is not available to calculate the control law ur. 2 2.2 State Estimation If there exists a hidden (latent) variable x* of fixed dimension such that +1, y* |D x*, u*) = p(x*+1, y* |x*, u*) it is called the state of the system, x* constrains all the information about {x +1, y } the state definition above, the output model can be obtained as a marginal distribution p(y*|x*,u*)= / p(x*+1 ,y*|x*,u*)dx*+1 and the state transition model as a conditioned distribution p(x *+1 ,y* |x*,u*) p(x*+1|x*,u*,y*) p(y* |x*,u*) x +1 £ in the optimal prediction (see the sampling scheme in Figure 2). To calculate 2 In engineering applications it is typically assumed that a continuous process is observed at regular intervals t = tTs with sampling period Ts and the input is constant during the sampling period, i.e. u(t) = ut for tTs < t < (t + 1)TS. 5 the output prediction 'p{yt\Dt 1,ut)= p(yt\ut,xt)p(xt\Dt 1 ,ut)dxt information about the state given by the c.p.d.f. p(xt\Dt-1 ,ut) is required at each step of the recursion. That is the point when the NCC comes in to play. Suppose the information about the state p(xt\Dt-1) based on the data up to time t — 1 is available. This information can be updated after a new input-output observation {ut, yt} has been obtained using the Bayes formula p(x | Dt) = p(yt\Dt~\xt, ut)p(xt\Dt~l,ut) = p(yt\xt,ut) p(x , Dt-i) P[xtlD ) = p(yt\Dt-1,ut) = p(yt\Dt-1,ut)p[xtlD h where the properties of the state and the natural condition of control for the state estimation (Peterka, 1981) p(xt\Dt-1 ,ut) = p(xt\Dt-1) are used to get the second term. The NCC assumption cannot be deduced from the properties of the dynamic system itself but rather from the process of information accumulation. In the technical context, its interpretation is twofold: 1. The condition p(xt\Dt-1,ut) = p(xt\Dt-1) says that the control vari-ut xt observer-based LQG control - incomplete information feedback - with the control variable based on state estimate ut = f (E[xt\Dt-1]). In ut Dt-1 2. Using the equality p(xt\Dt-1, ut)p(ut\Dt-1) = p(ut\xt, Dt-1)p(xt\Dt-1), the condition p(xt\Dt-1, ut) = p(xt\Dt-1) implies that also p(ut\Dt-1 ,xt) = p(ut\Dt-1). If the state-estimation and control is performed by the same subject, the system input is based only on the available data and is not modified by the state estimate, which does not provide any "new" information for the calculation of the control law. If any additional information about the system state is available to calculate the control law, the standard Kalman filter is not optimal from the Bayesian inference/information accumulation point of view. That is why some applications in the economic literature may not fully comply with the NCC assumption: typically in multi-agent environment where individual agents operate based on different information content, the control action of 6 one agent may provide additional information to the remaining agents, i.e. p(xt\Dt-1) = p(xt\Dt-1, ut). If this additional information is not used to evaluate their optimal control strategy, their behavior is not optimal from the Bayesian inference/information accumulation point of view. As an example, assume a statistitian observing a linear system controlled by (complete information) state feedback. Then his (noisy) observation of controlled variable ut = —Kxt + provides significant information about the state. If the statistitian knows the control law K, interpreting the control variable ut as an additional observation defined by c.p.d.f. p(ut\xt) = peu(ut + Kxt) in parallel to the observed outputs yt = Cxt + Dut + ey defined by p(yt\xt,ut) = pey(yt — Cxt — Dut), the optimal data update step of state estimation process (Kalman filter) should cover input update step p(xt \ Dt- 1,ut) oc p(ut \xt)p(xt \Dt-1) and output update step p(xt\Dt) = p(xt\Dt-1,ut,yt) ocp(yt\xt,u4)p(xt\Dt-1 ,u4). K incorporate this information into the state estimation process. However, if he knows that NCC is not satisfied3 and he is sure that the observed control ut xt to recover this information. One of his options is adaptation of his behavior K observation model p(ut\xt, K). In the DSGE models the rational expectations operate over the information set that is pooled by the agents. But the idea of information heterogeneity has becoming popular in the adaptive learning literature. 3 General Equilibrium Models Now we turn our attention to the dynamic stochastic general equilibrium (DSGE) model. We work with the following (log-linear) DSGE model: ro(0)xt = Ti(6)Etxt+i +T2(6)xt-i +T3(9)et, (1) where xt is (n x 1) vector of endogenous variables (log-deviations from their steady state), and et is (k x 1) vector of unobservable exogenous iid shocks. For simplicity in notation, we assume that n = k. This assumption will be relaxed in the later discussion. To(0), T1(0), T2(9) and T3(9) are (time invariant) matrices of structural parameters. Their elements are functions 3detection of NCC violation may be a separate topic of interest 7 Figure 2: Sampling from a continuous process - logic for the Kalman filter timing Note: of deep structural parameters, 6. Et(.) is the rational expectation operator conditional on the model M and information available to the economic agents at time £ - the information matrix is Qt £ (xt, xt-1,x0, st, M). The structural matrices To (6), T1(6), T2(6) and T3(6) are such that the model has unique and stable equilibrium. Solving for the rational expectations Et(.), model (1) has a minimum state representation xt = A(6)xt-i + B (6)st. (2) Equation (2) characterizes the dynamic equilibrium in the reduced form. A(6) and B(6) are functions of Ts and through them they are functions of 6 xt the measurement equation yt = Cxt, (3) 8 where yt is (m x 1) vector of observable variables, and C is the (m x n) (usually identity) matrix that maps the model variables in to yt. (2) and (3) establish together the state-space representation of (1). Estimating (2) and (3), it is standard to assume that (i) model (1) is the yt m x 1 available to estimate and evaluate xt. Of course, in reality this is not always true, and for policy oriented models such assumptions are unjustified. Here is one example for all. DSGE models include implicit definitions of variables that have observable counterparts, e.g. income tax or capital (property) tax revenues. They almost never explicitly appear in DSGE models. Tax rates affect dynamics indirectly via resource allocation, but the tax revenues per se never appear in the minimum-state representation as they do not bring any additional information about the dynamics. Unless we are interested in estimating the model, carrying around the capital tax revenues as an extra variable is not necessary. But if we estimate the model, we must include it.4 xt be view as a model of control in closed loop (or full-state control). The hidden (implicit) control variables in DSGE models can be contemporaneously expressed as follows ro(0)xt = lo(0i )xt + Ut. ut = r0(62)xt is the cumulative effect of the structural parameters T0(02)-Ut equation (3) to inform the estimates of 6 and xt. To facilitate that fact, we augment measurement equation (3) to xt Ut yt A(6)xt-i + B(6)et To(62) 0 0 CA(6) I 0 t xt-i + 0 CB(6) + (4) (5) Let us illustrate the benefit of such a state-space specification on a simple example. 4 Illustration We run a Monte Carlo experiment to demonstrate our point that optimally using the whole disposable information improves estimates of latent (state) variables. This is done first under the assumption of information pooling but its partial use, and then under the assumption that one of the control 4For similar reasons, the demand for money is excluded from the model, because it does not carry any additional information about the inflation rate and output gap. But when the output gap is not observable, then money has additional forecasting power. 9 variables is generated based on a more precise information about the state variables. We use a simple real business cycle model to generate artificial data of private consumption, hours worked, investment, consumption tax receipts, and of disposable income. Those series form our set of disposable information. In the first experiment, we assume that an econometrician (observer) uses only two series out of the set. In the following experiments we gradually expand the information set that the econometrician utilizes and observe the efficiency and consistency improvement. But let us define the real business cycle model first. In the model, there are two sectors - household and firm. Households maximize their expected life-time welfare E0[^t=o ^ (Ct+1Ha--+ £ log (1 — Lt)] subject to a budget constraint wtLt + (1 — rt-1 — 5)Kt-1 + Tt = (1 — Tc)Ct + Kt. The household's welfare derives from consumption Ct and 1 — Lt Ht depends on the past consumption and an iid habit shock: Ht = 0 is the measure of household's risk aversion. Firms maximize their profits nt = Yt — rtKt-1 — wtLt by optimally hiring labour and capital to produce the consumption goods Yt. They use a Cobb-Douglas technology: Yt = AtKa-1L\-a. At is the total factor productivity and follows a log-linear AR(1) process: log At = p log At-1 + eA. eA ~ N(0, a\) and we interpret is as the productivity shocks, a G (0,1) is the share of capital in production. In equilibrium, all (labour, capital, and consumption goods) markets clear in this economy. The dynamic equilibrium of the model economy is then characterised by four equations: (i) the Euler equation for consumption, (ii) labour demand, (iii) resource constraint, and (iv) exogenous supply of 10 technology. ( Ct + 0Ct_ie£t \ ~a Et{Ct+i} + 0 is the drift term, which sets the economy on an exogenous but balanced growth path. Both capital and consumption grow at that rate in the long run. The transitory parameters (0, p} and the variances (ef, e"} are estimated using the maximum likelihood. The other parameters are kept fixed at their parametrized values mentioned above. We will not report the results here but instead we show the estimates of the capital stock Kt and labor augmenting At In the top two panels of Figure 4, we compare the confidence intervals Kt At through investment, we see that adding the investment among our observable variables improves our confidence by about 2 percent. The standard errors become lower. The model's good (in-sample) predictive power for investment explains the marginal gain in efficiency The graph in the third panel of Figure (4 compares the model implied investment (when treated as latent) to the actually realized (observed) data. Clearly, the private consumption expenditures and hours worked data carry enough information about investment at thus the capital stock and technology. The same conclusion can be made from the (out-of-sample) forecasting exercise. The investment as an observable has a weak information content Kt At 5-year rolling forecasts coincide under the two setups. 14 Figure 4: Estimates of capital stock and labor productivity (US data) Investment (annual) growth 1968: 1973: 1978: 1983: 1988: 1993: 1998: 2003: 2008: Note: The top two panels show the relative efficiency of the capital stock estimate (top left) and labor augmenting technology estimated (top right) when (i) the information on the growth of consumption and hours worked is used (model 1), and (ii) when that information is extended with the investment growth (model 2). The shaded areas are computed as 100(std(Xt,modeii)/std(Xt>modei2) — 1)- Positive values mean that the model with more information (model 2) outperforms the model with less information (model 1). This is the case of the estimate of labor augmenting technology over the whole sample period 1951 - 2009. In case of the capital stock, model 2 starts to outperform after the year 1971 when we accumulate enough new information from hours worked and investment that we start observing in 1965 and 1967, respectively. The lower panel plots the estimate of the latent investment growth implied by the model 1 (dashed line). The solid line is the actual observed series used in the estimation of model 2. Prom the graph we see that the model structure (6)-(8) and (10) has a strong predictive power about investment. 15 Figure 5: Pseudo real-time forecast (US data) Note: The pseudo real-time forecast exercise starts in 1990, and we roll 5-year forecast window through the end of our sample in 2009. The solid line is the actual or smoothed estimate available in 2009, and the dash lines are the rolling 5-year forecasts. The forecasts based on both model 1 (consumption and hours worked observed) and model 2 (including investment growth) exhibit the same central tendencies. That is why we only report the results based on model 1. The two forecasts differ in their efficiency. 5 Final Remarks We reviewed the basic derivation of Kalman filter equations with the focus on the role of the natural condition of control. We were interested what this condition implies for the estimation of DSGE models used in economics. We provided a theoretically consistent justification for the use of all available (observable) information that can be structurally linked to the model. Under the assumption of information pooling, we showed that that leads to a significantly improved efficiency. The NCC can provide an alternative structural perspective for the DSGE model developers. The model may be well specified but the NCC still can be 16 violated. It is because the condition does not deal with the model structure per se but with the flow of information in it. We would like to look at the possible avenues for formal testing of the NCC. We would like to use them for the empirical assessment of decision rules. DSGE models consist of optimal decision (control) rules, thus, in principle, each equation can be subject to the testing. In the future work we also relax the assumption of information pooling, and look at the case of an agent with a significant market power and private information. If the NCC should hold the remaining market players should try to infer the private information encoded in the decisions of the dominant player and adapt to it. The NCC then leads to the adaptive learning arguments. References Boivin J., and M. Giannoni (2006), "DSGE Models in a Data-Rich Environment," NBER Working Paper Xo. 12772. King R. G., Plosser Ch., and Rebelo S. T. (1988), "Production, growth and business cycles : I. The basic neoclassical model," Journal of Monetary Economics, vol. 21(2-3), pages 195-232. Peterka, V. (1981), "Bayesian Approach to System Identification " in P. Eykhoff (ed) Trends and Progress in System, Identification (Pergamon Press), Oxford. Also available at http://moodle.utia.cas.ez/moodledata/4/peterka.pdf 17