STOCHASTIC CONTROL FOR ECONOMIC MODELS Second Edition Books by David Andrew Kendrick Programming Investment in the Process Industries Notes and Problems in Microeconomic Theory (with Peter Dixon and Samuel Bowles ) The Planning of Industrial Investment Programs (with Ardy Stoutjesdijk) The Planning of Investment Programs in the Steel Industry (with Alexander Meeraus and Jaime Alatorre) GAMS: A User's Guide (with Anthony Brooke and Alexander Meeraus) Feedback: A New Framework for Macroeconomic Policy Models for Analyzing Comparative Advantage Handbook of Computational Economics (edited with Hans M. Amman and John Rust) STOCHASTIC CONTROL FOR ECONOMIC MODELS Second Edition David A. Kendrick The University of Texas Typeset by VTEX Ltd., Vilnius, Lithuania (Rimas Maliukevicius and Vytas Statulevicius) STOCHASTIC CONTROL FOR ECONOMIC MODELS Second Edition, Version 2.00 2002 Copyright for the First Edition ©1981 by McGraw-Hill, Inc. Copyright transferred to David Kendrick in 1999. David Andrew Kendrick Department of Economics The University of Texas Austin, Texas, U.S.A. kendrick@eco.utexas.edu http ://eco.utexas. edu/faculty/Kendrick To Gail Contents Preface iv Preface to Second Edition vii 1 Introduction 1 1 Deterministic Control 3 2 Quadratic Linear Problems 4 2.1 Problem Statement......................... 5 2.2 Solution Method .......................... 10 3 General Nonlinear Models 19 3.1 Problem Statement......................... 20 3.2 Quadratic Linear Approximation Method ............. 21 3.3 Gradient Methods.......................... 25 3.4 Special Problems.......................... 27 3.4.1 Accuracy and Roundoff Errors............... 27 3.4.2 Large Model Size...................... 27 3.4.3 Inequality Constraints on State Variables ......... 28 4 Example of Deterministic Control 29 4.1 System Equations.......................... 29 4.2 The Criterion Function....................... 35 in CONTENTS iv II Passive-Learning Stochastic Control 37 5 Additive Uncertainty 38 5.1 Uncertainty in economic problems................. 38 5.2 Methods of Modeling Uncertainty................. 39 5.3 Learning: Passive and Active.................... 40 5.4 Additive Error Terms........................ 43 6 Multiplicative Uncertainty 45 6.1 Statement of the Problem...................... 45 6.2 Period TV .............................. 47 6.3 Period N - 1............................ 50 6.4 Period k............................... 54 6.5 Expected Values of Matrix Products................ 55 6.6 Methods of Passive-Learning Stochastic Control.......... 56 7 Example of Passive-Learning Control 57 7.1 The Problem............................. 57 7.2 The Optimal Control for Period 0.................. 58 7.3 Projections of Means and Covariances to Period 1......... 63 III Active-Learning Stochastic Control 68 8 Overview 70 8.1 Problem Statement......................... 72 8.2 The Monte Carlo Procedure..................... 75 8.3 The Adaptive-Control Problem: Initiation............. 76 8.4 Search for the Optimal Control in Period k............. 76 8.5 The Update............................. 80 8.6 Other Algorithms.......................... 81 9 Nonlinear Active-Learning Control 83 9.1 Introduction............................. 83 9.2 Problem Statement......................... 83 9.3 Dynamic Programming Problem and Search Method....... 85 9.4 Computing the Approximate Cost-to-Go.............. 85 9.5 Obtaining a Deterministic Approximation for the Cost-to-Go ... 95 9.6 Projection of Covariance Matrices................. 97 CONTENTS v 9.7 Summary of the Search for the Optimal Control in Period k .... 101 9.8 Updating the Covariance Matrix.................. 102 9.9 Summary of the Algorithm..................... 102 10 Quadratic Linear Active-Learning Control 103 10.1 Introduction............................. 103 10.2 Problem Statement......................... 103 10.2.1 Original System ...................... 103 10.2.2 Augmented System..................... 105 10.3 The Approximate Optimal Cost-to-Go............... 106 10.4 Dual-Control Algorithm ...................... 112 10.4.1 Initialization ........................ 115 10.4.2 Search for the Optimal Control............... 115 10.5 Updating State and Parameter Estimates.............. 118 11 Example: The MacRae Problem 120 11.1 Introduction............................. 120 11.2 Problem Statement: MacRae Problem............... 120 11.3 Calculation of the Cost-To-Go................... 122 11.3.1 Initialization ........................ 122 11.3.2 Search for Optimal Control................. 123 11.4 The Search ............................. 129 12 Example: Model with Measurement Error 133 12.1 Introduction............................. 133 12.2 The Model and Data ........................ 134 12.3 Adaptive Versus Certainty-Equivalence Policies.......... 138 12.4 Results from a Single Monte Carlo Run.............. 139 12.4.1 Time Paths of Variables and of Parameter Estimates . . . . 141 12.4.2 Decomposition of the Cost-to-Go ............. 154 12.5 Summary.............................. 164 Appendices 166 A Second-Order Expansion of System Eqs 166 CONTENTS vi B Expected Value of Matrix Products 169 B.l The Expected Value of a Quadratic Form..............169 B.2 The Expected Value of a MatrixTriple Product...........170 C Equivalence of Matrix Riccati Recursions 172 D Second-Order Kaiman Filter 176 E Alternate Forms of Cost-to-Go Expression 184 F Expectation of Prod of Quadratic Forms 187 F. 1 Fourth Moment of Normal Distribution: Scalar Case .......188 F.2 Fourth Moment of Normal Distribution: Vector Case.......189 F.3 Proof................................199 G Certainty-Equivalence Cost-To-Go Problem 203 H Matrix Recursions for Augmented System 206 I Vector Recursions for Augmented System 216 J Proof That Term in Cost-To-Go is Zero 221 K Updating the Augmented State Covariance 224 L Deriv of Sys Equations wrt Parameters 228 M Projection of the Augmented State Vector 232 N Updating the Augmented State Vector 236 O Sequential Certainty-Equiv Method 238 P The Reestimation Method 240 Q Components of the Cost-to-Go 242 R The Measurement-Error Covariance 245 S Data for Deterministic Problem 249 CONTENTS Vil T Solution to Measurement Error Model 253 T. 1 Random Elements.......................... 253 T.2 Results................................ 256 U Changes in the Second Edition 263 Bibliography 264 Preface This book is about mathematical methods for optimization of dynamic stochastic system and about the application of these methods to economic problems. Most economic problems are dynamic. The economists who analyze these problems study the current state of an economic system and ask how various policies can be used to move the system from its present status to a future more desirable state. The problem may be a macroeconomic one in which the state of the economic systems is described with levels of unemployment and inflation and the instruments are fiscal and monetary policy. It may be a microeconomic problem in which the system is characterized by inventory, sales, and profit levels and the policy variables are investment, production, and prices. It may be an international commodity-stabilization problem in which the state variables are levels of export revenues and inventories and the control variables are buffer-stock sales or purchases. Most economic problems are stochastic. There is uncertainty about the present state of the system, uncertainty about the response of the system to policy measures, and uncertainty about future events. For example, in macroeconomics some time series are known to contain more noise than others. Also, policy makers are uncertain about the magnitude and timing of responses to changes in tax rates, government spending, and interest rates. In international commodity stabilization there is uncertainty about the effects of price changes on consumption. The methods presented in this book are tools to give the analyst a better understanding of dynamic systems under uncertainty. The book begins with deterministic dynamic systems and then adds various types of uncertainty until it encompasses dynamic systems with uncertainty about (1) the present state of the system, (2) the response of the system to policy measures, (3) the effects of unseen future events which can be modeled as additive errors, and (4) errors in measurement. In the beginning chapters, the book is more like a textbook, but in the closing chapters it is more like a monograph because there is a relatively Vlll PREFACE ix widespread agreement about methods of deterministic-model solution while there is still considerable doubt about which of a number of competing methods of stochastic control will prove to be superior. As a textbook, this book provides a detailed derivation of the main results in deterministic and stochastic control theory. It does this along with numerical examples of each kind of analysis so that one can see exactly how the solutions to such models are obtained on computers. Moreover, it provides the economist or management scientist with an introduction to the kind of notation and mathematics which is used in the copious engineering literature on the subject of control theory, making access to that literature easier. Finally, it rederives some of the results in the engineering literature with the explicit inclusion of the kinds of terms typical of economic models. As a monograph, this book reports on a project explicitly designed to transfer some of the methodology of control theory from engineers to economists and to apply that methodology to economic problems to see whether it sheds additional light on those problems. The project has been funded by the National Science Foundation and has involved two engineers, Edison Tse and Yaakov Bar-Shalom, and two economists, Fred Norman and the author. Fred and I decided at an early stage in the project that we could best learn from Edison and Yaakov if we programmed their algorithm ourselves. This involved rederiving all the results and then making two separate codings of the algorithm (one by each of us). This procedure enabled us to understand and check both the algorithm and the computer codes. The principal application is to a macroeconomic stabilization problem which included all the kinds of uncertainty described above. The procedures are enabling us to determine the effects of various kinds of uncertainty on policy levels. Some readers of this book may find themselves disturbed by the fact that the derivations are given in such detail. This is in contrast with many books in econometrics and mathematical economics, where a theorem is stated and the proof is developed in a terse fashion. However, in contrast to econometrics and mathematical economics, control theory is still a relatively new area of concentration in economics. As a result the notation is not familiar, and the mathematical operations are different from those commonly used by economists. Therefore the derivations included in this book are spelled out in detail either in the text or in appendixes. Readers who are already familiar with the usual control-theory notation and mathematical operations may find parts of the text moving much too slowly for their taste, but the liberal relegation of derivations to appendixes should make the book read more smoothly for these researchers. PREFACE x The economist who is willing to learn the notation and style of control theory will find the investment well repaid. The effort will make it easier to understand the wealth of results contained in such journals as IEEE Transactions on Automatic Control, Automatica, and the Journal of Economic Dynamics and Control and in conference proceedings like those from the annual IEEE Conference on Decision and Control. It seems likely that the adaptive-control algorithm developed in Chapters 9 and 10 may eventually be superseded by more efficient algorithms. Thus although one can question the value of learning the notation and operations which are particularly associated with it, many of the operations contained in it are common to a variety of adaptive-control algorithms and much of the notation is common to the larger field of control theory. Not only the derivations but also the numerical examples given in the book are spelled out in considerable detail. The reason for this is that numerical methods are basic to the development of the work in this field and the existence of some thoroughly documented numerical examples will enhance the development and debugging of new algorithms and codes and the improvement in the efficiency of existing algorithms and codes. The reader who is interested in a shorter and less detailed discussion of some of the subjects covered in this book is referred to Kendrick (1980). In addition to Edison Tse, Yaakov Bar-Shalom, and Fred Norman, I am grateful to Bo Hyun Kang and Jorge Rizo-Patron, for their help in preparing some of the materials which constitute this book. I am also indebted to Peggy Mills, for her excellent work as administrative assistant and secretary, and to the National Science Foundation for support of this work under grants SOC 72-05254 and SOC 76-11187. Michael Intriligator, Stephen Turnovsky, Homa Motamen, Mohamed Rismanchian, and Ed Hewett read an earlier draft and provided many helpful comments. Michael Äthans provided hospitality in the Laboratory for Information and Decision Sciences and access to the Air Force Geophysical Laboratory Computational Facilities during a year on leave at M.I.T. Connie Kirkland helped with the final typing and reproduction of the manuscript and Susan Lane assisted in the typing. I am grateful to both of them for their help in a tedious task. Most of all I should like to thank my wife, Gail, for her warm support, even while the demands of her own career were great, and to thank my children, Ann and Colin, for adding so much to the joy and spontaneity in my life. David Kendrick Preface to the Second Edition I have wanted for some years to make Stochastic Control for Economic Models available on the Internet. Therefore, a few years ago I asked McGraw-Hill Book Company, who published the first edition of the book, to return the copyright to me. They graciously did so. The original book had been typed on a typewriter so there was no electronic version available to be posted on the Internet. Therefore, I ask Rimas Maliukevicius, the President of VTEX Ltd. in Vilnius, Lithuania if that firm would retype the book in LaTex. Rimas agreed to do so and asked his colleague, Vytas Statulevicius, to design the book and oversee the project. My plan was to make substantial changes to the content of the book before posting it on the Internet as a second edition. However, it now appears that will take longer than I had expected, so this second edition is identical to the original book except for editing to make corrections (see Appendix U). Since the book is now in an electronic form I have assigned it a version number as well as an edition number. This will permit small changes as necessary while keeping the same edition number but changing the version number. I would like to thank Hans Amman, Greg de Coster, Enrique Garcilazo, Pedro Gomis, Paula Hernandez-Verme, Haibo Huang, Chun Yau Kwok, Younghan Kwun, Josef Matulka, Yue Piyu, Marco Tucci and Felipe Vila for helpful comments on the first edition of the book. Also, thanks to my long-time collaborator, Hans Amman, for encouraging me to prepare an electronic version of the book and helping me along the way with many technical matters. Finally, thanks to Rimas Maliukevicius, Vytas Statulevicius and the staff at VTEX for preparing the electronic version. However, I alone am responsible for the final version of the book since I have made modifications in the content and style while taking account of the suggestions from those listed above. David Kendrick XI Chapter 1 Introduction Many problems in economics are naturally formulated as dynamic models, in which control or policy variables are used to move a system over time from a less desirable to a more desirable position. One example is short-run macroeconomic problems. The controls are monetary and fiscal policy, the dynamic system is a macroeconometric model, and the desired position is low levels of inflation and unemployment. Another example is the problem of the firm. Here the controls are pricing and production levels, the dynamic system is a model of production and sales, and the desired position is high levels of profits. Economists and engineers have been applying control theory to economic problems since the early works of Tustin1 (1953), Phillips (1954, 1957), Simon (1956), and Theil (1957). These pioneers were followed by a sprinkling of studies in the 1960s by Holt (1962), Fisher (1962), Zellner (1966), and Dobell and Ho (1967) and by many studies in the early 1970s by Chow (1970), Kendrick and Taylor (1970), Prescott (1971, 1972), Livesey (1971), Pindyck (1972, 1973a,b), Shupp (1972), MacRae (1972), Äthans (1972), Aoki (1973), Norman and Norman (1973), and many others. This work has been characterized by the solution of increasingly larger deterministic models and by movements into stochastic control theory. Surveys of this literature have been published by Arrow (1968), Dobell (1969), Äthans and Kendrick (1974), Intriligator (1975), and Kendrick (1976). There are also a number of books on control theory and economics, including Chow (1975), Aoki (1976), and Pitchford and Turnovsky (1977). Some of the books on control theory are Äthans and Falb (1966), Aoki (1967), and Bryson and Ho (1969). 1A list of references appears after the appendixes. 1 CHAPTER 1. INTRODUCTION 2 This book covers deterministic control, passive-learning stochastic control, and active-learning stochastic control. The methods differ in their treatment of uncertainty. All uncertainty is ignored in deterministic control theory. In passive-learning stochastic control the effects of uncertainty on the system are considered, but there is no effort to choose the control so that learning about the uncertainty is enhanced. In active-learning stochastic control, also called adaptive control or dual control, the control is chosen with a view toward both (1) reaching the desired states at present and (2) reducing uncertainty through learning, permitting easier attainment of desired states in the future. Part One is devoted to deterministic control, Part Two to passive-learning stochastic control, and Part Three to active-learning stochastic control. Parti Deterministic Control Chapter 2 Quadratic Linear Problems Deterministic problems are control problems in which there is no uncertainty. Most economic control problems which have been posed and solved to date are of this variety. Deterministic problems fall into two major groups: (1) quadratic problems and (2) general nonlinear problems. This chapter is devoted to quadratic linear problems, and the next chapter discusses general nonlinear problems. Quadratic linear problems (QLP) are problems in which the criterion function is quadratic and the system equations are linear. In continuous-time problems the criterion is an integral over time, and the system equations are linear differential equations. In discrete-time problems the criterion is a summation over time, and the system equations are difference equations. Discussion in this book is confined to discrete-time models since they lend themselves naturally to the computational approach used here. For a discussion of continuous- and discrete-time models together the reader is referred to Bryson and Ho (1969). As one progresses from deterministic, to passive-learning stochastic, to active-learning stochastic control methods, the size of the numerical models rapidly declines. For example, deterministic control models now commonly include hundreds of equations, passive-learning stochastic control models usually have tens of equations, and active-learning stochastic control models have fewer than ten equations. This pattern results from the increasing computational complexity inherent in the treatment of uncertainty. This chapter begins with the statement of the quadratic linear problem as the minimization of a quadratic form subject to a set of first-order linear difference equations. Then two types of common quadratic linear problems which are not exactly in this form are introduced, and the method of converting them into this form is given. The first of these problems is the quadratic linear tracking problem, 4 CHAPTER 2. QUADRATIC LINEAR PROBLEMS 5 in which the goal is to cause the state and control variables to follow desired paths as closely as possible. The second problem is a quadratic linear problem with nth-order rather than first-order difference equations. Following the problem statement in Sec. 2.1, the solution method is described in Sec. 2.2. The solution method used here is the dynamic-programming approach rather than the maximum-principle method since dynamic programming lends itself well to generalization to stochastic control methods. Finally the chapter closes with a short discussion of the feedback rules used to represent the solutions to quadratic linear problems. 2.1 Problem Statement In control-theory problems the variables are separated into two groups: state variables x and control variables u. State variables describe the state of the economic system at any point in time, and control variables represent the policy variables, which can be chosen. For example, in macroeconomic control models the state variables are typically levels of inflation and unemployment, as well as levels of consumption, investment, and gross national product. The control variables in these problems are levels of government taxation, government expenditure, and open-market purchases of bonds. Also since control models are dynamic models, initial conditions are normally specified, and at times terminal conditions are also given. These are conditions on the state variables. With this nomenclature in mind one can write the quadratic linear control problem as (the prime on a vector indicates transposition) Find (u^Jo1 to minimize the criterion N-l J = |x'JVWJVXJV + W^XAr+ Y, (IXfcWfeXfc+W^Xfe+X^FfcUfe + IUfcAfeUfc + AfcUfc) (2.1) subject to the system equations xfc+1 = AfeXfe + BfcUfc + ck fork = 0,1,...,N- 1 (2.2) and the initial conditions x0 given (2.3) CHAPTER 2. QUADRATIC LINEAR PROBLEMS 6 where Xfc = state vector for period k with n elements, Ufc = control vector for period k with m elements, wfc = n x n positive definite symmetric matrix, Wfc = n-element vector, Ffc = n x m matrix, Afc = m x m positive definite symmetric matrix, ^fc = m-element vector, Afc = n x n matrix, Bfc = n x m matrix, Cfc = n-element vector. Also the notation K)L"o means the set of control vectors from period zero through period TV — 1, that is, (u0, ui, u2,..., ujv-i). Period TV is the terminal period of the model. Thus the problem is to find the time paths for the m control variables for the time periods from 0 to TV — 1 to minimize the quadratic form (2.1) while starting at the initial conditions (2.3) and following the difference equation (2.2). Most quadratic linear control models in economics are not exactly in the form of (2.1) to (2.3), but they can be easily transformed into that form. For example, the quadratic linear tracking model used by Pindyck (1973a) and Chow (1975) uses a form of the criterion differing from (2.1). Also the model in Pindyck (1973 a) has nth-order difference equations rather than first-order equations of the form (2.2). Since (2.1) to (2.3) constitute a general form, we shall use them as the basis for computation algorithms and show what transformations are required on each class of quadratic linear problems to bring them into this form. Quadratic Linear Tracking Problems The criterion function in these problems is of the form J = ^[Xjv-xfj'W^XAr-X*] N-l + i £ ([xfc - xf]'Wf[x* - xf] + [uk - u#]'A#[ufc - u*]) (2.4) fc=0 where CHAPTER 2. QUADRATIC LINEAR PROBLEMS 7 xf = desired vector for state vector in period k, u* = desired vector for control vector in period k, Wf = positive definite symmetric penalty matrix on deviations of state variable from desired paths, A* = positive definite symmetric penalty matrix on control variables for deviations from desired paths. Normally the matrices W* and A# are diagonal. The equivalence of (2.4) to the criterion in the original problem (2.1) can be seen by expanding (2.4). The results are given in Table 2.1, which shows the nota-tional equivalence between (2.1) and (2.4). The constant term which results from the expansion of (2.4) is not shown in the table since it does not affect the solution and can be dropped from the optimization problem. Table 2.1 Notational equivalence for quadratic linear problems Equation (2.1) Equation (2.4) Equation (2.1) Equation (2.4) wN w* Fk 0 wN -W#x# AK A# wfe wf \ "A*u* Wfc -W#x# One example of the application of quadratic linear tracking problems to economics is Pindyck (1972, 1973a). The state variable x includes consumption, nonresidential investment, residential investment, the price level, unemployment, and short- and long-term interest rates. The control variable includes government expenditures, taxes, and the money supply. Desired paths for both the state variable and the control variables are included as xf and uf, respectively. The diagonal elements of the matrices W* and \f are used not only to represent different levels of desirability of tracking the targets but also to equivalence relative magnitudes of the different variables.1 'For other examples of quadratic linear control (but not necessarily tracking problems) the reader is referred to Tustin (1953), Bogaard and Theil (1959), van Eijk and Sandee (1959), Holt (1962), Theil (1964, 1965), Erickson, Leondes, and Norton (1970), Sandblom (1970), Thalberg (1971a,b), Paryani (1972), Friedman (1972), Erickson and Norton (1973), Tinsley, Craine, and Havenner (1974), Shupp (1976a), You (1975), KaulandRao (1975), Fischer and Uebe (1975), and Oudet (1976). CHAPTER 2. QUADRATIC LINEAR PROBLEMS Lagged State and Control Variables For many economic problems the difference equations which represent the econometric model cannot be written as a set of first-order difference equations but must be written as second- and higher-order difference equations. The procedure for converting second-order difference equations in states and controls is given here. The procedure for higher-order equations is analogous. Consider an econometric model with second-order lags in control and state variables xfe+i = A0xfc + Aixfc_i + B0Ufc + Biufe_i Then define two new vectors and and rewrite (2.5) as yk = xfc_i vfc = Ufe_i xfe+i = A0xfc + AiVfc + B0ufc + BiVfe Next define the augmented state vector zk as Zfe X y and rewrite (2.6) and (2.7) as and Vfc+l — Xfc v£i+l — life Then Eqs. (2.8), (2.10), and (2.11) can be written as x y fc+i or as A0 Ax Bi 10 0 0 0 0 !*+! = Azfe + But X B0" y + 0 V k I Ufc (2.5) (2.6) (2.7) (2.8) (2.9) (2.10) (2.11) (2.12) (2.13) CHAPTER 2. QUADRATIC LINEAR PROBLEMS 9 with r a0 Ax Bx 1 ľB0 i 0 0 and B = 0 0 0 0 I (2.14) Equation (2.13) is then a first-order linear difference equation in the augmented state vector z. An example of this can be found in Pindyck (1973a). The original state vector includes 10 elements, and the augmented state vector includes 28 elements [see Pindyck (1973a, p. 97)]. For example, the augmented state vector includes not only prices but also lagged prices and not only unemployment rates but also lagged unemployment rates and unemployment rates lagged two periods. It can be argued that for computational reasons it is unwise to convert nth-order difference equations of the form (2.5) into augmented systems of first-order equations of the form (2.13). Norman and Jung (1977) have compared the computational efficiency of the two approaches and have concluded that in certain cases it is better not to transform the equations into augmented systems of first-order difference equations. A slightly different kind of problem occurs in many economic models. The difference equations are written as xfc+1 = Axt + But +i (2.15) i.e., the control vector is not uk, as in Eq. (2.2), but ufe+i. While it may be true that there are some economic problems in which there is an important and immediate effect of the control variable on the state variables, usually the choice of control is actually made at least one time period before it has an affect. For example, the simple multiplier-acceleration model Yk — Ck + h + G k Ck = a + bYk h = e(n-n_x) (2.16) where Y = gross national product, C = consumption, / = investment, reduces to Yk = ßYk_1 + 1Gk + S (2.17) with CHAPTER 2. QUADRATIC LINEAR PROBLEMS 10 However, government expenditures is not actually the decision or control variable since in fact the decision variable is appropriations made by the Congress or obligations made by the administration. Both these variables lead expenditure by at least one quarter. Therefore it is common to add to a model like Eq. (2.17) another relationship like Gk+1 = Ok (2.18) where Ok stands for government obligations. Then substitution of Eq. (2.18) into Eq. (2.17) yields Yk = ßYk_1 + 1Ok.1 + S (2.19) and this model is in the same form as the system equation (2.2). For models which truly have the simultaneous form of Eq. (2.15) the reader is referred to Chow (1975). The derivations in that book are made for system equations of the form (2.15). Although the difference between Eqs. (2.15) and (2.2) may be viewed as simply a matter of labels, in the stochastic control context when one is dealing with the real timing of events and the arrival of information, the matter may be more than just one of labels. This concludes the demonstration of how a variety of types of quadratic linear economic control models can be reduced to the form (2.1) to (2.3). Next the problem (2.1) to (2.3) will be solved by the method of dynamic programming to obtain the feedback-control solution. 2.2 Solution Method The crucial notion from dynamic programming2 is that of the optimal cost-to-go. Since the idea is more simply thought of in space than in time, a spatial example is used here; later the method will be applied in time. Consider an aircraft flying from New York to London. Different routes are flown each time the Atlantic is crossed because of the constantly shifting wind and weather patterns. Next consider flights on two different days when the weather is exactly the same in the eastern half of the crossing but different in the western half. Now suppose that on these two days the plane flies different routes over the western half of the Atlantic but ends up at the same point just as it begins to cross the eastern half. One can ask: Will the plane fly the same route the rest of the way into London on the two different days? Since the weather is the same in the 2See Intriligator (1971, chap. 13), for a discussion of dynamic-programming methods. CHAPTER 2. QUADRATIC LINEAR PROBLEMS 11 eastern half on the two days, there is no reason not to use the same route for the rest of the way into London. This is the basic idea of dynamic programming, i.e., that from a given point the route the rest of the way home to the finish will be the same no matter how one happened to get to that point. Also since the route is the same from that point the rest of the way home, the cost-to-go from that point to London is the same no matter how one arrived at the point. It is called the optimal cost-to-go since it is the minimum-cost route for the rest of the trip. It is written in symbols as J* (xfc), where xfc is a vector giving the coordinates of a point in space and J* (xfc) is the cost of going from the point xfc to London. The elements of the vector xfe in this example could be the longitude and latitude of the point in the middle of the ocean. The next idea is that one can associate with every point in the Atlantic a minimum-cost path to London and an associated optimal cost-to-go. If one had this information available on a chart, one could simply look on the chart and say that at a given latitude and longitude one should set the rudder of the aircraft in a certain position in order to arrive at London with minimum cost. This idea gives rise to the notation of a feedback rule of the form Ufc = GfcXfc + gfe (2.20) where Xfc = state vector giving location of aircraft at place k Ufc = control vector consisting of settings for ailerons and rudder Gfc = matrix of coefficients gfc = vector of coefficients so the feedback rule (2.20) says that when the plane is in a position xfc, the various controls should be set in the positions u/;. Of course the problem is finding the elements of G^ and g^ — but that is what dynamic programming is all about.3 For the problems in this book the primary dimension is not space but time. So the feedback rule index k changes from place k to time k. Then the feedback rule (2.20) is interpreted as "given that the economy is in state xfc at time k, the best policy to take is the set of policies in the vector ufc". For example, in a commodity-stabilization problem the state vector x would include elements for 3For a Ml discussion of dynamic programming see Bellman (1957) and Bellman and Dreyfus (1962). CHAPTER 2. QUADRATIC LINEAR PROBLEMS 12 price and buffer-stock level, and the control would include an element for buffer-stock sales (or purchases). Then the feedback rule (2.20) would be interpreted as "given that the price and stocks are xfe, the amount u^ should be sold (or bought) by the stabilization scheme managers".4 The feedback rule (2.20) is generally nonlinear, rather than linear as in Eq. (2.20), but for an important class of problems, namely the quadratic linear problems that are the subject of this chapter, the feedback rule is linear. Also, the cost-to-go for this class of problems is a quadratic function of the state of the system at time k J*(xfc) = J*(k) = ±x^Kfexfc + p'fexfe + vk (2.21) where Kk is an n x n matrix which is called the Riccati matrix, pfc is an n-element vector and vk is a scalar term. In words this equation says that when the system is in the state xfc at time k, the optimal cost-to-go is a quadratic function ofthat state. To return momentarily to the New York-to-London flight example, Eq. (2.21) can be interpreted as saying that the cost to go from point xfc in the middle of the Atlantic is a quadratic function of the latitude and longitude at that point. It seems more reasonable to say that the cost-to-go would be some function of the entire path from xfe to London, but that is not what Eq. (2.21) implies. Instead it states that the optimal cost-to-go from point xfe to London can be written as a quadratic function of the coordinates ofthat single point. To derive the optimal feedback rule for the problem (2.1) to (2.3) one begins at the terminal time and works backward toward the initial time. So if the optimal cost-to-go at time k is defined by Eq. (2.21), the optimal cost-to-go at time TV can be written as J* (Xjv) = J* {N) = \x'NKNxN + p'^xjv + uN (2.22) From Eq. (2.1) the cost which are incurred in the terminal period N are Ix'^WatXat + w^Xjv (2.23) so by comparison of Eqs. (2.22) and (2.25) one obtains KN = WN (2.24) Pn = wjv (2.25) 4For an application of control methods to commodity stabilization see Kim, Goreux, and Kendrick (1975). CHAPTER 2. QUADRATIC LINEAR PROBLEMS 13 vN = 0 Equations (2.24) and (2.25) provide the terminal values for a set of difference equations which are used to determine Kk and p^ for all time periods. In fact the information in Kk and p^ is like price information in that Wat and wN provide information about the value of having the economic systems in state xjy at time N. Later it will become apparent how the difference equations in K and p (which are called the Riccati equations) are used to transmit this price information from the last period backward in time to the initial period. The Kfe's and p^'s will in turn be used to compute the G^ and g^ components of the feedback rule (2.20). The optimal cost-to-go for period TV is given in Eq. (2.22). Now one can begin working backward in time to get the optimal cost-to-go in period TV — 1, that is, J*(N - 1) = min{J*(7V) + LN_^N_U Ujv_i)} (2.26) ujv-i where LN_i is the cost-function term in Eq. (2.1) for period N — 1, that is, from Eq.(2.1), -kiV-l(XjV_l,UjV_i) = 2X7V-lWAT_iXAr_i + Wjv_1Xjv_i + XAr_1Fjv_iUAf_i + iu^Aiv-iUiv-i + A'jy.jUiv-i (2.27) Equation (2.26) embodies an important notion from dynamic programming. It says that the optimal cost-to-go at time TV — 1 will be the minimum over the control at time TV — 1 of the optimal cost-to-go at state -x.N in time TV and the cost incurred in time period N — 1, that is, LN_i. So in the airplane example the optimal cost-to-go from position TV — 1 in the Atlantic will be the minimum over the available controls at time TV — 1 of the cost incurred in period TV — 1 plus the optimal cost-to-go in period N. Substitution of Eqs. (2.22) and (2.27) into Eq. (2.26) then yields J*(N - 1) = min(^'NKNiiN + pVx» + |x'Ar_1WAr_ixAr_1 + wJV_1xJV_1 + xAr_1FJV_1Ujv_i + |u'iv_1AAr_1UAr_i + A^UAř-i) (2.28) Furthermore, the -x.N in Eq. (2.28) can be written in terms of xjv_i and uat_i by using the system equations (2.2), i.e., XW = Aat.iXat.i + Bat.iUjv.i + Cjv_i (2.29) CHAPTER 2. QUADRATIC LINEAR PROBLEMS 14 Then substitution of Eq. (2.29) into Eq. (2.28) and collection of like terms yields J*(N-1) = min(!x/JV_1*jV_iXjV_i + ki/JV_10jV_iUiv ujv-1 -1 + x'^.^jv-iUiV-l + ^V-lXTV-l + Ö^r.iUjv.i + 77^-1) (2.30) $7V-1 = A'jv.jKivAiV-l + Wjv_i Ojv_i = Bjv_1KatBjv-i + Atv_i *&n-i = A'Ar_1KjvBjv-i + Fjv-i (f)N_1 = A'n^K'xCn-x + pN) + Wat_i 0|V-1 = BJV_1(KArCjV-l + Pat) + Ajv-1 IjN-l = 2CAr-l^JvCAr-1 + PatCW-I Next the minimization for u^-i in Eq. (2.30) is performed to yield the first-order condition iViOiv-i + x'^tf jv_! + 0^ = 0 (2.32) or ©'aT-iUat-I + Ý^Xat.i + 0at_i = 0 This first-order condition can then be solved for Uat-i in terms of xjv-i to obtain the feedback rule for period N — 1, that is, Uat_i = Gjv.iXat.i + gAT_i (2.33) where Gjv-! =-(0'JV_1)-1*'JV_1 and giv-^-te'iv-ir'öiv-i (2-34) This is the feedback rule for period TV — 1; however, one needs the feedback rule for a general period k, not just for the next-to-last period TV — 1. To accomplish this look back at Eq. (2.26), which gives the optimal cost-to-go for period TV — 1. One can use the optimal cost-to-go for period TV — 2 to obtain the feedback rule for period N — 2 and then see whether the results can be generalized to period k. The optimal cost-to-go for period N — 2 can then be written, by analogy to Eq. (2.26), as J*(N - 2) = min{J*(7V - 1) + LN_2{xN_2, uN_2)} (2.35) "iV-2 CHAPTER 2. QUADRATIC LINEAR PROBLEMS 15 The second part of Eq. (2.35) is obtained simply by inspecting Eq. (2.1) for the cost terms which are appropriate to period TV — 2 Ln-2\X-N-2, UjV-2) = 2XAr-2^r^r-2XAr-2 + W^_2XjV-2 + x'Ar_2FjV-2UjV_2 + 2UN -2^-JV -2UjV -2 + ^N-2UN-2 (2-36) but the first term in Eq. (2.35) is slightly more difficult to obtain. Equation (2.30) gives an expression for J*N_X, but it includes terms in both xat_i and ujv-i- If one is to state the optimal cost-to-go strictly as a function of the state xjy-i, then Ujv_i must be substituted out. Since this can be done by using the feedback rule (2.33), substitution of Eq. (2.33) into (2.30) and collection of like terms yields J* (TV - 1) = \-x!N_-^KN-ix.N-i + v'n-i'Xn-i + vn-i (2-37) where KN_t = ^N_t + G'^Ojv-iGjv-i + 2*Ar_1GJV_1 (2.38) Pat.! = (ÝAT.! + G'jv.iOjv-Ogjv-i + G'jv.^jv-i + ^jv-i (2.39) VN-! = — 2^'n-1\®'n-l)~ &N-1 + VN-! The matrix K and the vector p have been used in Eq. (2.37) just as they were in the optimal cost-to-go term for J*(N) in Eq. (2.22). Next Eqs. (2.36) and (2.37) can be substituted into Eq. (2.35) to obtain an expression for the optimal cost-to-go at time TV — 2 in terms of xjy-i, xjy-2, and Ujv_2. Then xn-i can be substituted out of this expression by using the system equations (2.2). This leaves the optimal cost-to-go as a function of xjv-2 and Ujv-2 only. Then the first-order condition is obtained by taking the derivative with respect to uat_2, and the resulting set of equations is solved for ujv-2 in terms of x.N_2. This provides the feedback rule for period TV — 2 Uat_2 = Gjv^Xa^ + g7V-2 (2.40) where GN-2 = -(&N_2)-1*'N_2 and gJv_2 = -(0'JV_2)-1oJV_2 (2.41) CHAPTER 2. QUADRATIC LINEAR PROBLEMS 16 (2.42) with $at_2 = A'jy.aKjv.i AN_2 + Wjv_2 ®7V_2 = BAr_2KjV-lB7v_2 + AjV-2 ^N-2 = A'jv_2Kat_iBjv-2 + Fat_2 4>N-2 = AJV_2KAr_1CjV_2 + An_2Pn-1 + WjV-2 @N-2 = BAr_2KJV_1CAT_2 + BJV_2pAr-l + AjV-2 i?7V-2 = 2CjV-2KAr_lCjV-2 + PjV-iCat_2 Then exactly as was done for period TV — 1 the optimal cost-to-go for period TV — 2 as a function of the state xat_2 alone can be obtained by substituting the feedback rule (2.40) back into the expression for the cost-to-go in terms of xjy-2 and uat_2. This procedure yields J*(N - 2) = ix'Ar_2KAr-2XjV-2 + p'Ar_2XAr-2 + VN-2 (2.43) where Kjv.2 = &N_2 + G'Ar_207V-2GAr_2 + 2*W_2G]V_2 (2.44) Pat_2 = (*AT-2 + G'Ar_20Ar_2)gAT-2 + G'N_20N-2 + 4>N-2 (2-45) VN-2 = — 20N_2(®N_2) On-2 + TJN-2 The feedback rule for periods TV—1 and Af—2 have now been obtained; comparing Eqs. (2.33) and (2.40) shows them both to be of the form Ufc = GfcXfc + gfe (2.46) with Gfc = -(0^)-1*'fe and Sk = -(Q'k)-1ek (2.47) So Eq. (2.46) is the optimal feedback rule for the problem (2.1) to (2.3). Also by comparing Eqs. (2.44) and (2.45) with Eqs. (2.38) and (2.39) one can write the Riccati equations for the problem as Kk = k + G^0fcGfe + 2*feGfc (2.48) pk = (*fc + G'k@'k)gk + G'k0k + cf>k (2.49) CHAPTER 2. QUADRATIC LINEAR PROBLEMS 17 with *k -- = A'fcKfc+1 Ak + Wk ®k -- = BfcKfc+iBfe + Ak *fc = = A'^Kfe+iBfc + Ffc 4>k = = A'^K^c* + Pfc+i) + wfe 0fc = = B^K^Cfc + pfc+O + Afc Vk = = 2Ck^k+l^k + Pfe+lCfc In summary, then, the optimal control problem (2.1) to (2.3) is solved by beginning with the terminal conditions (2.24) and (2.25) on KN and pN and then integrating the Riccati equations (2.48) and (2.49) backward in time. With the Kk and pk computed for all time periods, the G^ and g^ for each time period can be calculated with Eq. (2.47). These in turn are used in the feedback rule (2.46). First the initial condition x0 in Eq. (2.3) is used in the feedback rule (2.46) to compute u0. Then u0 and x0 are used in the system equations (2.2) to calculate xx. Then xx is used in the feedback rule to calculate ui. The calculations proceed in this fashion until all the xfc's and u^'s have been obtained. For comparability to other texts and to increase the intuitive nature of the solution slightly it is worthwhile to define the feedback matrices and the Riccati equations in terms of the original matrices of the problem (2.1) to (2.3), i.e., in terms of A, B, c, W, and A instead of in terms of the intermediate matrix and vector elements <&, 0, Sť, , 6, and r\. This can be accomplished by substituting the intermediate results in Eqs. (2.50) into the feedback matrices defined in Eq. (2.47) and the Riccati equations (2.48) and (2.49), yielding the feedback rule Ufc = GfcXfc + gk (2.51) where Gk = -p'fcKfc+xBfc + A'^tFJt + B'fcKfc+xA*] gfc = -[B,í;Kfc+iBfc + A,í;]-1[B^(Kí;+iCí; + pfc+i) + Afc] (2.52) with the Riccati equations Kfc = Wk + A^Kfe+1Afc - [F* + AJtKfc+iBfcHB'fcKfc+iB* + A',]"1 [B^Kfc+1 Ak + F'k] (2.53) CHAPTER 2. QUADRATIC LINEAR PROBLEMS 18 Pfe = A'k(Kk+1ck + pk+1) + wk - [A^Kfc+1Bfe + Ffe][B^Kfe+1Bfe + A^]"1 [B'^Kfc+iCfc + pfc+i) + Afc] (2.54) and with terminal conditions KN = WN (2.55) and Pn = wjv (2.56) The difference-equation nature of the Riccati equations is much clearer in Eqs. (2.53) and (2.54) than it was in Eqs. (2.48) and (2.49). It is also apparent how the equations can be integrated backward in time from the terminal conditions (2.55) and (2.56). Furthermore these equations indicate how the pricelike information in the W, w, A, and A elements in the criterion function is integrated backward in time in the Riccati equations and then used in the G and g elements of the feedback rule as the solution is brought forward in time using the feedback rule and the system equations. Comparability to results for the quadratic linear problem published in other texts and articles can be obtained by using the fact that the cross term in the criterion x'Fu is frequently not used and that the constant term in the system equations c^ is usually omitted. When both F and c are set to zero, the results stated above can be considerably simplified. Also, for comparability of the results above to those derived for quadratic linear tracking problems it is necessary to use the national equivalence given in Table 2.1. Chapter 3 General Nonlinear Models The previous chapter dealt with the restricted case of deterministic models with quadratic criterion functions and linear system equations. In this chapter the deterministic assumption is maintained, but the quadratic linear assumptions are dropped. Thus both the criterion function and the system equations can take general nonlinear forms. If the model is written in continuous time, the criterion will be an integral over time and the system equations will be differential equations. If the model is written in discrete time, the criterion will be a summation over time periods and the system equations will be difference equations. Since the basic approach used throughout this book is one of numerical solution of the models, and since continuous-time problems are transformed into discrete-time problems when they are solved on digital computers, only discrete-time problems are discussed here.1 This chapter begins with a statement of the general nonlinear problems in Sec. 3.1.2 This is followed by a discussion of approximation methods for solving the problem. The approximation methods use a second-order approximation of the criterion function and a first-order approximation of the system equations. The approximation problem is then in the form of the quadratic linear problems :For a discussion of continuous-time problems see Miller (1979), Intriligator (1971), or Pitchford and Turnovsky (1977). 2Examples of the application of nonlinear control theory to economic problems include Livesey (1971, 1978), Cheng and Wan (1972), Shupp (1972), Norman and Norman (1973), Fitzgerald, Johnston, and Bayes (1973), Holbrook (1973, 1974, 1975), Woodside (1973), Friedman and Howrey (1973), Healey and Summers (1974), Sandblom (1975), Fair (1974, 1976, 1978a,b), Rouzier (1974), Healey and Medina (1975), Gupta et al. (1975), Craine, Havenner, and Tinsley (1976), Ando, Norman, and Palash (1978), Äthans et al. (1975), Palash (1977), and Klein (1979). 19 CHAPTER 3. GENERAL NONLINEAR MODELS 20 discussed in the previous chapter. The approximation QLP is then solved iteratively until the results converge. While this approximation method may be adequate for solving some nonlinear optimization problems, convergence may be too slow. Therefore it is common to solve this class of problems with one of a variety of gradient methods. These methods commonly employ the maximum principle and then use iterative means to satisfy the optimality conditions. Basically, they integrate costate equations backward in time and state equations forward in time to satisfy these conditions and then check to see whether the derivative of the hamiltonian with respect to the control variable has gone to zero. If it has not, the controls are moved in the direction of the gradient and the costate and state equations are integrated again. This procedure is repeated until the derivative is close enough to zero. These gradient methods are discussed in Sec. 3.3. Even these gradient methods are inadequate to solve many economic optimization problems. Many economic models are very large, containing hundreds of nonlinear equations. To solve these problems on computers where the highspeed memory is limited, the sparsity of the model is exploited. Since not every variable enters every equation, it is not necessary to store large matrices fully; only the nonzero elements need be stored and manipulated. An introduction to this topic will be provided in Sec. 3.4. 3.1 Problem Statement The problem is to find the vector of control variables u^ in each time period k (Uk)k=o = (Uo, Ui, U2, . . . , Un-x) which will minimize the criterion function N-l J = LN(xN) + Y, £fc(xfc,ufe) (3.1) £i=0 where Xfc = vector of state variables, Ufc = vector of control variables, I/fc = scalar function. CHAPTER 3. GENERAL NONLINEAR MODELS 21 The last period, period N, is separated from the other time periods to simplify the specification of terminal conditions. Also the criterion function is assumed to be additive over time. This assumption is not essential, but its use greatly simplifies the analysis. In Chap. 2 the functions Lk were assumed to be quadratic forms; here they will remain general nonlinear forms. The criterion function (3.1) is minimized subject to the system equations Xfc+i = fA.(xA.,ufc), /c = 0,1,..., TV-1 (3.2) and the initial conditions x0 = given (3.3) where f is a vector-valued function. The system equations are written in explicit form; i.e., the variable xfc+i is an explicit function of xfe and U&. Some econometric models are developed in implicit form; i.e., the system equations are written in the form gfe(xfe+1,xfe,ufc) =0 (3.4) For a discussion of computational methods which are specific to such problems see Drud (1976). The problem (3.1) to (3.3) can be solved by a variety of methods. A discussion of a quadratic linear approximation method is given next, followed by an elaboration of gradient methods. 3.2 Quadratic Linear Approximation Method The problem (3.1) to (3.3) can be approximated by a second-order expansion of the criterion function and a first-order expansion of the system equations.3 The resulting approximation problem can be solved using the quadratic linear problem methods discussed in the previous chapter. This procedure can be iterated, the equations being expanded each time around the solution obtained on the previous iteration. The iterations are continued until satisfactory convergence is obtained. First consider a second-order expansion of the criterion function. This expansion is done about a path4 (xo,fc+l, ^ok)k=0 3The method described here is like Garbade (1975a,b, chap. 2). 4 A lowercase o is used to denote the nominal path, and 0 is used to denote the period zero. CHAPTER 3. GENERAL NONLINEAR MODELS 22 which is chosen as close to the expected optimal path as possible. This second-order expansion of the criterion function is written as5 J — ^xNl^N — X0at] + ö[xiV — X0tv] LXX;at[xjv — X, N-l Yl [Lxfe L'ufe] k=0 Xfc ~X-ok Ufc — uofc |[[xfe -Xofc]'[Ufc -uofc]'] I'xx I'xu o» xfe xofc Ufc — Uofe (3.5) where Lxfc is the vector of the derivatives of the function L with respect to each element in the vector x at time k, that is, Jxk dLk dx lk dLu dx nk (3.5a) with xik the ith element in n vector xfc. Also, Lufc is the vector of the derivatives of the function L with respect to each element in the vector u at time k, that is, dLk Juk du lk dLh (3.5b) oumk with Uik the ith element in the m vector u^. Also LXX;fc is the matrix of second derivatives of the function Lk with respect to the elements in the vector xfe d2Lk d2Lk Jxxi dx-ikdx lk dXikdXnk d2Lk d2L, dXnkdXik O^nkO^nk (3.5c) 5 This notation differs from the convention of treating gradient vectors as row vectors. Thus the usual notation would treat, for example, L xjv as a row vector and the transpose shown in Eq. (3.5) would not be necessary. Departure from that convention was adopted here so that all vectors can be treated as column vectors unless an explicit transpose is given, in which case they are row vectors. CHAPTER 3. GENERAL NONLINEAR MODELS 23 Lxu k is the matrix of cross partial derivatives of the function Lk with respect to elements of the vecto rs xfe and u^ ľ d2Lk dxxkdulk d2Lk L dxxkdumk d2Lk . dxnkdulk d2Lk OxnkOV,mk l: ux,fc (3.5d) and LUU;fc is the matrix of second derivatives of the function Lk with respect to the elements in the vector uk Juui d2Lk duikduik d2Lk dUikdUrak d2Lk d2Lk dumkduik OUfiikOUrimk (3.5e) The approximate criterion (3.5) is minimized subject to first-order expansion of the system equations around the path (xo,fc+l, ^ok)k=0 that is xfe+1 = ffc + fxfc[xfe -xofc] +iuk[uk -uok] k = 0,l,...,N-l (3.6) where fk is the vector-valued system equations evaluated on the path (xo,fc+l) uoA:)a;=1 fxfc is the matrix of first-order derivatives of each of the functions f'k in ik with respect to each of the variables Xjk in xfe rx£i (4)' (fx"fe)' m dx lk dfk1 dx. nk dx lk df, dx. (3.6a) nk CHAPTER 3. GENERAL NONLINEAR MODELS 24 and iuk is the matrix of first-order derivatives of each of the functions f I in ik with respect to each of the variables Ujk in u^ fufc — . (fSfe)'. Thus the notation dfl d ft du ífc 9m. mfc df, du lfc M. du mh (3.6b) Ixfc[xfc xofcJ in Eq. (3.6) does not represent a matrix of derivatives evaluated at the point Xfc — xofc but the matrix of derivatives fxfc evaluated at xofe multiplied by the vector The approximation problem (3.5) and (3.6) is the same form as the quadratic linear problem (2.1) and (2.2) discussed in the previous chapter. The equivalence between the matrices of these two problems is given in Table 3.1. Thus the problem (3.5) and (3.6) with the initial condition (3.3) is solved to obtain the optimal path (x; k+li U N-l k)k=0 using the algorithm of the previous chapter. Then the iteration procedure is used to obtain a new nominal path x0,fc+l) uok)k=0 in the following manner. Let Table 3.1: Equivalence of the arrays in QLP and the approximation QLP QLP Approximation QLP QLP Approximation QLP (2.1X2.2) (3.5X3.6) (2.1X2.2) (3.5X3.6) Wat LXX;7V Afe -Ľuu,fc Wat Lxjv — LXX)Arx07v Afc Lufc - Ľuu,fcuofc ~ LxuXofe Wfc LXX;fc Afc fxfc wfc Lxfc — LXX)fex0fc — Lxu ,fcU0fc Bfe fufc *- k -Ľxu,fc Cfc — (fxfcXofe + fufcU0fc) CHAPTER 3. GENERAL NONLINEAR MODELS 25 be the nominal path about which the expansion is done on the pth iteration. Then <ř = «to - <*] + GA-1 (4.20) For the model (4.15), G = 2 and K = 4. Also for the first equation GA = 2, since both endogenous variables appear in that equation. On the other hand, K* = 3, since, from Eq. (4.18), Ik_i does not appear in the first equation. Thus the inequality (4.20) becomes K - K* = K** > GA - 1 4-3 = 1 > 2 — 1 (4.21) 1 = 1 When, as in this case, the inequality holds as an equality, the equation is said to be exactly identified. Similarly for the second equation in (4.15), GA = 2 and K* = 3, since from Eq. (4.18), Gk does not enter the equation. Thus the inequality (4.20) holds as an equality for the second equation, and it is also exactly identified. When all the equations of the model are exactly identified, the ordinary (unrestricted) least-squares estimates are consistent estimates of the n's. These estimates will also be equivalent to maximum-likelihood estimates and will possess the properties of asymptotic efficiency and asymptotic normality.2 The reduced-form equations (4.19) were estimated by ordinary least squares on the TROLL system at M.I.T. for the period 1947-11 through 1969-1, to obtain3 Ck = 1.014Cfc_i + .002ifc_i - .004Gfc - 1.312 (.016) (.047) (.031) (1.52) Ik = .093Cfc_i + .7534_! - .lOOGfe + .448 (.023) (.068) (.044) (2.164) R2 = .998 DW = 2.19 (4.22) R2 = .938 DW = 1.62 (4.23) As can be seen from quick examination, this model has some characteristics which make it something less than the perfect model for conducting stabilization 2SeeKmenta(1971,p. 551). 3 These data are listed in Appendix S. CHAPTER 4. EXAMPLE OF DETERMINISTIC CONTROL 35 experiments. First, the coefficient in the first equation on Ck-\ of 1.014 gives the model an explosive character. Second, the small coefficient on Gk of —.004 in the same equation renders government policy very weak in affecting consumption. Also the predominant effect of government spending on private consumption (as on investment in the second equation) is a "crowding out" effect. Thus increases in government spending result in decreases in both consumption and investment. This effect is of course not of significant magnitude in the consumption equation but is significant in the investment equation. While these characteristics make it somewhat undesirable for stabilization experiments, the model in Eqs. (4.22) and (4.23) has the virtue of being derived and estimated in a straightforward manner from the Keynesian textbook model which is widely taught in freshman economics textbooks. Also, as will become apparent in Chap. 12, in the experiments with active-learning stochastic control the model is rich enough to begin to provide some insights into the relative magnitudes involved. The consumption path proves to be uninteresting, but the investment path shows considerable realism in the stochastic control experiments.4 Before the model (4.22) and (4.23) is written in control-theory notation, it is convenient to define government spending as equal to government obligations the previous quarter Gk+1 = Ok = government obligations (4.24) Then by using Eqs. (4.22) and (4.23) the model can be written as the systems equations of a control model xfe+i = Axt + But + c (4.25) where 1.014 .002 .093 .753 Ck h Ufc [Ok -.004 -.100 Also the initial state variable for the model is -1.312 .448 x0 460.1 113.1 4For a more interesting example of deterministic control see Pindy ck (1973 a). A smaller model is used here so that it can also be used for stochastic control in later chapters. CHAPTER 4. EXAMPLE OF DETERMINISTIC CONTROL 36 where the first element corresponds to private-consumption expenditures and the second element to gross private domestic investment in billions of 1958 dollars for 1969-1. 4.2 The Criterion Function The criterion function is written to minimize the deviation of control- and state-variable paths from desired paths J = -[Xjv - Xjvj'WjvfxAT - Xjv] i N-l + Ö Y. {[x* - Xfe]'Wfc[xfc - Xfc] + [ufc - ük]'Ak[uk - üfc]}(4.26) where x = desired state vector ü = desired control vector W = matrix of weights on state-variable deviations from desired paths A = matrix of weights on control-variable deviations from desired paths There has been considerable debate about the desirability of using quadratic rather than more general nonlinear functional forms for the criterion in macroeco-nomic problems.5 The arguments for using quadratic functions are: Computational simplicity. Since the first-order conditions for quadratic linear problems are linear, solution methods for solving such problems can be highly efficient. Ease of explanation. It is likely that it will be easier to discuss desired paths and relative weights in quadratic penalty functions with politicians than to discuss general nonlinear utility functions. The arguments against using the quadratic are: Accuracy. The quadratic does not capture the true nature of political preferences. Symmetric nature. Symmetric penalties about a given point are not desirable.6 5 See for example Palash (1977) and related comments by Shupp (1977) and Livesey (1977). 6See Friedman (1972), however, for an asymmetric quadratic penalty function. CHAPTER 4. EXAMPLE OF DETERMINISTIC CONTROL 37 For the problem at hand, the quadratic formulation has been adopted. The paths x and ü were chosen by assuming desired growth rates of .75 percent per quarter. The initial conditions for these desired paths are the actual data for the economy for 1969-1, that is, Xfc Üfc [1.0075]fc 460.1 113.1 k = 0,1,..., N [1.0075]fc[153.644] k = 0,1,..., N The weighting matrices are chosen to represent the decision makers' preferences over the desired paths. For example, when unemployment levels and inflation rates are among the state variables, relatively higher penalties may be assigned to one or the other to represent political preferences.7 Also the weights can be used to represent the fact that politicians may care much more about deviations of the economy from desired paths in some quarters than in others [see Fair (1978a,b)]. For example, the penalty matrices may be Afe 100 0 0 100 w* 1 0 0 1 where k 1,2,...,7V [1] In this scheme the politician cares 100 times as much about deviations of the economy from its desired path in the last quarter (say the quarter before an election) than in other quarters. The solution to this problem is given in Table 4.2. Table 4.2: Solution to a macro control problem States 1 2 3 4 5 6 7 C I 464.8 112.8 469.6 112.9 474.5 113.4 479.4 114.2 Controls 484.4 115.3 489.5 116.7 494.6 118.3 0 1 2 3 4 5 6 0 156.4 156.8 157.2 157.4 157.2 156.7 155.6 7For a discussion and application of this procedure to a larger model see Pindyck (1973 a). Part II Passive-Learning Stochastic Control 38 Chapter 5 Additive Uncertainty 5.1 Uncertainty in economic problems Uncertainty is pervasive in dynamic economic problems, but it is frequently ignored for three reasons: 1. It is assumed that the effect of the uncertainty in the economic system under study is small enough to have no noticeable affect on the outcome. 2. It is conjectured that even if the uncertainty were considered, the resulting optimal policy would not be different. 3. It is thought that the incorporation of uncertainty into the analysis will make the problem intractable. Now consider in turn each of these reasons for ignoring uncertainty. First comes the argument that its effects are small and thus can be ignored. This may be true. However, one does not know about this until uncertainty is systematically incorporated into the analysis and the system is analyzed both with and without the uncertainty. In some cases this analysis can be done by comparing terms in mathematical expressions. In other cases it is necessary to compare numerical results since analytical mathematics is insufficient. It emerges from those numerical results that in some cases the degree of uncertainty matters. For example if variances are sufficiently small, there is no significant effect on the solution. Second, the case is put forward that even when the uncertainty is considered in posing the problem, its effects do not appear in optimality conditions. This 39 CHAPTER 5. ADDITIVE UNCERTAINTY 40 is the classic case of certainty equivalence, in which a deterministic problem is equivalent to the stochastic problem. This occurs in special cases of economic problems under uncertainty, particularly when the uncertainty can be modeled in an additive fashion. The latter part of this chapter is devoted to a discussion of the circumstances under which certainty equivalence holds. However, there are many economic problems where certainty equivalence does not hold. Finally, it is thought that the incorporation of uncertainty into the analysis will make it intractable. This is unfortunately sometimes true, but even in these cases it is frequently possible to obtain approximate numerical solutions. These methods are relatively new to economics, and it is not yet known whether the quality of the approximation is sufficiently good. However, this knowledge will come in due course as experimentation with the methods increases. Approximation methods are used in the last part of this book on active-learning control problems. Whether or not approximation is necessary depends on how the uncertainty is modeled. 5.2 Methods of Modeling Uncertainty Uncertainty in economic problems can be separated into two broad classes: uncertainty in the economic system and uncertainty in the measurement of the system. Although most work with economics of uncertainty has been with the first type, econometricians are returning increasingly to work on measurement error.1 Uncertainty in the system is commonly modeled in one of two ways: additive error terms and parameter uncertainty. Additive error (or noise) terms is the most common treatment of uncertainty. Cases of this type can usually be treated with the certainty-equivalence procedures discussed later in this chapter. Parameter uncertainty is more difficult to treat since certainty-equivalence methods do not apply. However, procedures are available for analyzing this problem. Furthermore, they are sufficiently simple in computational terms to be applicable to large models involving hundreds of equations. This is the subject of Chap. 6. When the uncertainty is in the parameters, it can be modeled with two kinds of assumptions. The simplest assumption is that the parameters are in fact constant but that the estimates of the parameters are unknown and stochastic. This case is analyzed later in this book. The alternative is that the parameters are themselves stochastic, a more difficult problem. Methods for analyzing this problem are 1 See for example Geraci (1976). CHAPTER 5. ADDITIVE UNCERTAINTY 41 discussed in this book, but no numerical examples of this type are given.2 This completes the discussion of uncertainty in the system equations and leaves only the uncertainty in the measurement relations. In engineering applications of control-theory measurement errors on various physical devices such as radar are used in the analysis. Since these devices are used to measure state variables, the existence of measurement error means that the states are not known exactly but are estimated. Thus the engineering models include estimates of the mean and covariance of the state vector. These notions are also being adopted in economics. Certainly measurements of economic systems are also noisy, so it is reasonable to assume that although the state variables are not known exactly, estimates of their means and covariances can be made. The models used in the last chapters of this book will include measurement errors. The various kinds of uncertainty require different methods of analysis. One of the most important differences in the treatment of uncertainty is the distinction between passive and active learning. 5.3 Learning: Passive and Active Passive learning is a familiar concept in economics, though the term has not been widely used.3 It refers to the fact that new data are collected in each time period and are periodically used to reestimate the parameters in economic models. When measurement errors are present, this concept can be extended to include reestimation of the state of the system at each period after data have been collected. In contrast, active learning not only includes the idea of reestimation but also the notion that the existence of future measurement should be considered when choosing the control variables. That is, one should take account of the fact that changes in a control variable at time k will affect the yield of information in future time periods. Stated another way, perturbations to the system today will provide more accurate estimation of state variables and parameters in future time periods. Furthermore, the more accurate estimates will permit better control of the system in subsequent periods. An example from guidance systems will serve to illustrate this point. The control theorist Karl Astrom and his colleagues have used stochastic control 2For a discussion of this problem see also Sams and Äthans (1973). 3 See Rausser (1978) for a more complete discussion of active- and passive-learning stochastic control. CHAPTER 5. ADDITIVE UNCERTAINTY 42 methods for developing a control system for large oil tankers. Whenever a tanker takes on or discharges crude oil, the response of the ship to changes in the wheel setting is different. With a passive-learning scheme the ship pulls away from the dock and the system reestimates the response parameters every few minutes as the ship is maneuvered out of the harbor. With an active-learning scheme the control system perturbates the controls on purpose to learn faster about the response of the ship to different control settings. In order to make these concepts somewhat precise it is useful to set out the scheme proposed by Bar-Shalom and Tse (1976b) and to distinguish between various types of control schemes. In order to do this some additional notation must be developed. Recall the notation xfe = state vector in period k Ufe = control vector in period k and consider a model with system equations xfe+i = ffc(xfe,ufc,6) k = 0,l,...,N -1 (5.1) where £& is the vector of process noise terms at time k. Further, as discussed above, assume that measurements are taken on the state of the system and that there is error in these measurements; i.e., yfc = hfc(xfc,Cfc) k = 0,1,..., N (5.2) where yk is the measurement vector and Ck is the measurement error (noise). Next define variables which represent the collection of state and control variables, respectively, for all the time periods in the model Also define the set of all observations between period 1 and period k as Yfc = (yj)}=i Next the notation is used to represent the knowledge that a measurement is made. Note the distinction between Y and M. Y represents the actual measurement, but M CHAPTER 5. ADDITIVE UNCERTAINTY 43 represents the knowledge that a measurement will be made without specifying what the actual measurement will be. Finally the notation Sfe = P(x0,(^=o,(C^=1) is used to represent the probability distribution of the initial state vector, the system error terms, and the measurement error term. A subset of these data S* = P(x0,(£J)f=o1) is defined for use in the definition of one kind of control policy. With this notation in mind the following breakdown of control policies made by Bar-Shalom and Tse (1976b) can be stated. First comes the open-loop policy, which ignores all measurement relationships, i.e., u°L = gfc(S*) k = 0,l,...,N-l Next comes feedback (or passive-learning) policy, which uses the measurement relations through period k, that is, u£ = gfc(Yfc, U*"1, Mfe, Sfc) k = 0,1,..., TV - 1 This policy makes use of both the actual measurement yk and the knowledge that measurements are made through period k. Finally there is the closed-loop (or active-learning) policy u£L = gk{Yk, Uk~\ MN~\ SN~1) k = 0,l,...,N-l which not only uses the state observation through period k but also takes account of the fact that the system will be measured in future time periods, i.e., for M*1"1 andS^"1. In practice this means that in choosing the control under a passive-learning scheme one ignores the future covariances of the states and parameters while under an active-learning scheme one considers the impact of the present choice of control on the future covariances of states and controls. The idea is not that one can use actual future measurements (since they are not available) but can anticipate that present perturbations of the controls will improve the accuracy of future estimates as represented by the covariance matrices for future states and controls. CHAPTER 5. ADDITIVE UNCERTAINTY 44 This completes the introductory material for the remainder of the book on stochastic control as well as the introductory material for Part Two, which is on passive-learning stochastic control. Now a discussion of the first kind of passive-learning stochastic control, namely additive uncertainty, will be given. This will be followed in Chap. 6 by a discussion of an algorithm for the treatment of multiplicative uncertainty. 5.4 Additive Error Terms The most common form of uncertainty in economic models is an additive error term, i.e., a random error term is added to the system equations so that they become xfe+1 = ffe(xfe,ufc) +£k (5.3) where £& is a vector of additive error terms. Furthermore it is assumed that the error terms (1) have zero mean, (2) have the covariance Q^, and (3) are serially uncorrected; i.e., E{tk} = 0 E{M'k} = Qk E{Me} = 0 (5-4) The mean-zero assumption is not crucial since the nonzero mean can be added into the f^ function. Also the serial-correlation assumption is not crucial since it can be treated by augmenting the state equations.4 The criterion function is no longer deterministic but is an expectation taken over the random quantities. Thus the problem is to find (u^)^1 to minimize J = E{C} = E\LN(yLN) + Y, Lk(xk, uk)\ (5.5) subject to Eqs. (5.3) and (5.4) and given initial conditions x0 for the state variables. If L is quadratic and f is linear, the certainty-equivalence conditions hold and the results of Simon (1956) and Theil (1957) can be applied. This means that the expected value of the random components can be taken and the problem solved as a deterministic model. Alternatively, when L is not quadratic, the postponed-linear-approximation method of Ashley (1976) can be applied.5 4Correlated error terms in control problems are discussed in Pagan (1975). 5For a generalization of this result to adaptive-control problems see Ashley (1979). CHAPTER 5. ADDITIVE UNCERTAINTY 45 Also for the general case when L is not quadratic and f is not linear, approximation methods are available. For example, see Äthans (1972). An application of this approach to macroeconomic stabilization problems is given in Garbade (1975a) and to a commodity-stabilization problem is given in Kim, Goreux, and Kendrick (1975). The latter is a cocoa-market stabilization study.6 As with most approximation methods a Taylor expansion is made around a nominal path. It is customary to choose the nominal path by taking expectations of all random variables and solving the resulting deterministic problem. In the cocoa-market stabilization problem the resulting deterministic nonlinear control problem was solved using the differential dynamic-programming method of Jacobson and Mayne (1970). In contrast, Garbade used the quadratic linear approximation method discussed in Chap. 3. These procedures yield a nominal path (xo,fc+l,UO)fe)A.=0 Next a second-order Taylor expansion of the criterion function (5.5) and a first-order expansion of the system equations (5.3) are made along the nominal path, as described in Sec. 3.2. Finally, the resulting quadratic linear control problem is solved. This yields a feedback rule of the form Ufc = uofe + Gfc[xfc-xofc]+gfc (5.6) One merit of this procedure is that the quadratic approximation in the criterion functions works like a tracking problem in the sense that the problem is solved to minimize some weighted sum of terms in [xfc — xofe] and [ufc — uofe] for all k. Thus the quality of the approximation is enhanced by the fact that the criterion works to keep the optimal path for both the controls and states close to the nominal paths about which the approximation is made. When this method is used for stabilization problems, the effect of this is to stabilize about the certainty-equivalence path. In some cases this may not be desirable.7 6For other applications of control theory to models with additive error terms see, for microeconomics, Kendrick, Rao, and Wells (1970), a water-pollution control problem; for macroeconomics (1) United States Economy, Pindyck and Roberts (1974), Chow (1972), Brito and Hester (1974), and Gordon (1974); (2) United Kingdom Economy, Bray (1974, 1975) and Wall and Westcott (1974, 1975); (3) theoretical models, Kareken, Muench, and Wallace (1973), Phelps and Taylor (1977), and Sargent and Wallace (1975). 7 See Denham (1964) for an alternative procedure for choosing the nominal path with consideration of the uncertainty. Chapter 6 Multiplicative Uncertainty If all uncertainty in economic problems could be treated as additive uncertainty, the method of the previous chapter could be applied; however, many economic problems of interest include multiplicative uncertainty. Consider, for example, agricultural problems. The total output is represented as the yield of the crop per acre times the number of acres planted. But since the yield is a random variable, multiplicative uncertainty occurs because the acreage is a state or control variable and the yield multiplies the acreage. Or consider policy choice in macroeconomic models. Since the coefficients in these models are estimated, they should be treated as random variables and once again multiplicative uncertainty is introduced. The optimal control problem with multiplicative uncertainty is stated in the next section. Then dynamic-programming methods are used to derive the optimal control just as was done in Chap. 2 for deterministic problems. As in Chap. 2, the analysis is restricted to problems with quadratic criterion functions and linear system equations. Unlike Chap. 2, however, an expectations operator is introduced into the criterion function. Therefore special attention is paid in this chapter to methods of taking expectations of products of matrices. The chapter closes with a brief discussion of methods of updating the estimates of the unknown parameters. 6.1 Statement of the Problem The system equations for the problem are written exactly as they were in Chap. 5 with an additive error term except that the parameters are considered to be 46 CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 47 stochastic rather than fixed. Thus the system equations are written xfc+1 = Akxk + BfcUfc + ck + $,k A; = 0,1,..., TV-1 (6.1) where £k = vector of additive noise terms in period k with x0 = given Means and covariance for the parameters are assumed to be known: Means: E{dij} for alii, j E {bij} for alii, j (6.2) where Covariances: E{cj} for all j E = the expectation operator cov(aijaKi) for all i, j, k, I cov(bijbKi) foralli,j, k, I cov(cjCj) for alii, j cov(aijbKi) for all i, j, k, I cov(a,ijCK) for all i, j, k cov(bijCK) for all i, j, k (6.3) The elements in Eq. (6.3) are the familiar covariance matrices obtained when estimating equations with econometrics packages. For example, consider the coefficients in the first row of the matrix A as the coefficients of a single equation. Then the first element in Eq. (6.3) becomes cov(aijdiK) for all j, k which is the familiar E matrix for the coefficients of a single equation, in this case En since it is for the first equation. Of course in the first element of Eq. (6.3) there is a matrix like this for each equation, namely En, E22, etc., and then there are also off-diagonal matrices which provide the covariance between the coefficients of each equation with every other equation. These matrices are obtained when one is performing simultaneous-equation estimation. CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 48 Next consider the criterion function for the problem. It is the expected value of a quadratic function; i.e., the problem is to find the controls (u^)^1 to minimize J = ElLN(xN)+ J2 Lk(xk,uk)\ (6.4) I fc=0 J where E is the expectations operator. The functions LN and Lk are the same quadratic functions as in Chap. 2 LN(xN) = -x'^WjvXjv + w^Xjv (6.5) and Lk(*k, Ufc) = |x'feWfcxfe + w^Xfc + x^FfeUfc + |u'feAfcufe + \'kuk k = 0,1,...,7V-1 (6.6) where N : = last time period, k-- = all other time periods, X : = state vector, U : = control vector, W,A,F = matrices, W, A : = vectors. So in summary, the problem is to minimize the criterion function (6.4) subject to the system equations (6.1) and the initial conditions. The problem is solved by using dynamic-programming methods and working backward in time.1 First the problem is solved for period TV and then for period TV — 1. This leads to the solution for the general period k. 'The derivation here follows the procedure of Farison, Graham, and Shelton (1967) and Aoki (1967, pp. 44-47). Related algorithms have been developed by Bar-Shalom and Sivan (1969), Curry (1969), Tse and Äthans (1972) and Ku and Äthans (1973). Yaakov Bar-Shalom provided private communications that helped in developing the derivations used here. Also a few elements from Tse, Bar-Shalom, and Meier (1973) and Bar-Shalom, Tse, and Larson (1974) have been used. For a similar derivation see Chow (1975, chap. 10). For an alternative treatment of multiplicative uncertainty see Turnovsky (1975, 1977). CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 49 6.2 Period N It is useful to introduce notation for the cost-to-go, keeping in mind that it is usually written as the cost-to-go when one is TV — j periods from the end. Thus the deterministic cost-to-go N — j periods from the terminal period is written as N-l CN-j = LN(xN) + J2 Lk(*k, Ufc) (6.7) k=j Thus Cjv-j is the cost-to-go with N — j periods to go. With this notation C0 is the cost-to-go with zero periods remaining, and CN is the cost-to-go with all TV periods remaining, i.e., C0 = LN(xN) (6.8) and N-l CN = LN(xN) + Y, Lk(xk, Ufc) (6.9) fc=0 The expected cost-to-go J is defined in the same manner as the random cost-to-go C JN = E{CN} = expected cost-to-go for full TV periods Jjv-j = E{CN-j) = exp cost-to-go at period j with N — j periods remaining J0 = E{C0} = expected cost-to-go for terminal period Finally, J* is defined as the optimal expected cost-to-go. It is written in an elaborate manner for the general period TV — j as J* = min£{- • • min Ei min E{CN_i I O^"1} I O^"2) ■ • ■ I T) (6.10) where ^ = (xJ5 Ej) is the mean and covariance of the unknown elements. The expectations are nested in Eq. (6.10). That is, the inside expectation in the nested expressions is min E{CN-j I O^-1} (6.11) Ujv_i This expression means the minimum over the control variables in the next to last period of the expectation of the term in the braces. Recall that since no control is chosen in the last period, the control in the next-to-last period is the final set of control variables chosen for the problem. The terms in the braces are the cost-to-go N — j periods from the end conditional on the information yN-1 being CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 50 available. The information fö is defined as the means and covariances of the parameters at time j. The symbols J* and C have indices which indicate the number of periods remaining; all other symbols like u and !P have subscripts and superscripts indicating the period in which the action occurs. Thus in a problem with eight time periods C2 means the cost-to-go with two periods remaining, i.e., the cost-to-go at period 6 {CN-§ = C8_6 = C2). Returning to the entire nested expression (6.10), one sees that each control Uj must be chosen with the information available only through time j. For example, u3 is chosen with the means and covariances available in period 3, while u6 has the advantage of being chosen three periods later when better estimates of the means and covariances will be available. If the general expression (6.10) is specialized to zero periods to go, i.e., to the last period, it becomes J* = E{C,\7N~1} (6.12) Substitution of Eq. (6.8) into Eq. (6.12) yields J* = E{LN(xN) | ?N~1} (6.13) When Eq. (6.5) is used, this becomes J* = £ jix'^WjvXjv + w^Xjv} (6.14) The information variable J^-1 is dropped here in order to simplify the notation. Then the expectation in Eq. (6.14) can be taken to yield J* = ±x'NE{WN}xN + E{wN}'xN (6.15) This expression gives the optimal cost-to-go with no periods remaining. Next recall from Chap. 2 that it was assumed for the deterministic problem that the optimal cost-to-go is a quadratic function of the state of the system. That assumption is used here, and the expected cost-to-go with zero periods to go is written as Jq = "N + p'ivxiv + l*.'NKNxN (6.16) where the scalar v, the vector p, and the matrix K are the parameters of the quadratic function. These parameters are determined recursively in the optimization procedure described in the remainder of this chapter. Then comparing Eqs. (6.15) and (6.16), one obtains the terminal conditions for the Riccati equations, namely Kjv = ^{Wjv} = Wjv Pn = E{wN} = wN vN = 0 (6.17) CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 51 This completes the discussion for period N. Consider next the period before the last one, namely period TV — 1. 6.3 Period N - 1 Recall from Chap. 2 the discussion of the dynamic-programming principle of optimality, which states that the optimal cost-to-go with N — j periods remaining will equal the minimum over the choice of the control at time j of the cost incurred during period j plus the optimal cost-to-go with N — j — 1 periods remaining, i.e., rN_3 = mm£{L,(x,, Uj) + J^ | V] (6.18) Equation (6.18) can be used to obtain the optimal cost-to-go in period TV — 1. For this case it is written with j = N — 1 as Jn-(N-I) = mm E\Ln-i(X-N-1, Uat_i) + Jn-(N-1)-1 I 'S* } or as J* = min £{!,*_!(xiv-i.Uiv-i) + J0* | O^-1} (6.19) Thus the optimal cost-to-go with one period remaining is the minimum over the control at time TV — 1 of the expected value of the sum of the cost incurred in period TV — 1 and the optimal cost-to-go with zero periods remaining. Both these terms have already been developed. The cost in each period in Eq. (6.6), and the optimal cost-to-go with zero periods remaining is in Eq. (6.14). Substituting these two expressions into Eq. (6.19) yields J* = min El ix'^WivXjv + w^x» + \-x!N_{WN-ix.N-± + w^xjv-i Uiv_i} (6.20) The logical steps to follow, as shown in Eq. (6.20), are to take the expected value and then to find the minimum over Uat_i . However, it is helpful to write the entire expression in terms of xff_! and Ujy-i by using the systems equations (6.1) to substitute out the xjv terms. Before doing so, however, we shall review the steps that remain: 1. Substituting the system equations into the optimal cost-to-go expression CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 52 2. Applying the expectations operator 3. Applying the minimization operator 4. Obtaining the feedback rule from the first-order conditions 5. Substituting the feedback rule back into the optimal cost-to-go in order to obtain the Riccati recursions. These are the same steps used in Chap. 2 expect for the application of the expectations operator in step 2. The substitution of the system equations (6.1) into Eq. (6.20) and the use of the TVth-period Riccati equations (6.17) yields the optimal cost-to-go entirely in terms of xn-i and Ujv_i J* = minE'llx^^Ar.iXjv.i + ^.^Ar.i+ u'JV_1*'Ar_1xJV_i + 2UiV-l®iV-luiV-l + uN-l@N-l + 2^ N-l^1 N-iŠ N-I + ojn_i$,n-i + x-n-i^n-iŠn-i + uAr_irjv-i^jv-i + Vn-i I ^ r (6.21) where $at_! = Wjv_i + A^KjvAjv-i (f)N_1 = A'jy.^KivCjv-i + pN) + wat_i Ýjv-i = Fjv-i + AAr_1KjvBjv-i OjV-1 = Ajv-1 + Bjv_1KatBjv-1 @N-1 = Bjv_1(KAtCjv-1 + Pat) + Ajv-1 iljV_l = KN (6.22) WjV-1 = KjvCtv_i + Pjv Yn-i = AAt_1Kjv Tat-i = Bjv_1Ktv ^7jv—i = 2CAr-i^-JvCAr-1 "^ Pncn-i Next we perform the expectations and minimization operations in Eq. (6.20). Taking the expectation in Eq. (6.21) yields Jľ = min Ix'^.^^jv-JXjv-i + ^{^AT-ij'Xjv.! + u'jv.^i^Ar.J'Xjv.! CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 53 ôu'iv_1£ľ{0jv-i}ujv-i + u'N_1E{0N_i} + ^E{^'N_1flN_i^N_1} E{vn-i} (6.23) The expected value of the additive error term £ is assumed to be zero, so all terms involving only the expected value are dropped. In contrast, the covariance of the noise term is not zero, and so the term involving it remains. Since the state variables xfe are assumed to be observed without error, they are a deterministic quantity. Also the control variables uk are deterministic. This leaves expectations of matrices and vectors in Eq. (6.23). From Eq. (6.22) some of these expectations are of products of matrices. They are rather complicated, and a full explanation of this process will be given in Sec. 6.5. Now the minimization operation in Eq. (6.23) can be performed. This yields the first-order condition ^{tf^Jxjv-! + ^{©'„.Juiv-i + £{0iv-i} = 0 (6.24) The feedback rule can then be obtained from Eq. (6.24) as Uat_i = gJ^xjv.i + gjv.! (6.25) where gu = -^{©^jr^n-i}) gU = -(^{©^j)-1^^}) (6.26) The feedback rule (6.25) and (6.26) provides the optimality condition sought for period TV — 1. It is instructive to compare it with the feedback rule for period N — 1 in the deterministic problem, Eqs. (2.33) and (2.34). The rules are identical except that the G t and g^ feedback gain matrix and vector are now products of expectations of matrices. In order to be able to evaluate G t and g^ one must calculate the Riccati matrix and vector K and p, and to do that one needs a recursion in these elements. This recursion is obtained by substituting the feedback rule (6.25) back into the optimal cost-to-go expression (6.23) in order to eliminate uat_i and to be able to write the optimal cost-to-go entirely in terms of xjv-i- This substitution yields the optimal cost-to-go J* = ix^Kiv-iXiv-i + p^.iXjv.i + vN_x (6.27) where KN_t = £{$w_1}-£{*w_1}(£{0w_1})-1(£{*Ar_1})' CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 54 Pjv_! = ^{(Aiv-il-^^iv-iKMejv.!})-1^^.!} (6.28) In order to see the recursive nature of these Riccati equations it is necessary to rewrite them in terms of the original parameters of the problem. This can be done by substituting Eq. (6.22) into Eq. (6.28) to obtain KN_t = Wat.! + EÍA^KnAn^} - (Fjv-i + ^A'^K^B^}) x (AN_! + JE{B'Ar_1KArBJV_1})-1(^{B'JV_1KArAAr_1} + F'^) (6.29) Pn-i = E{A'N_1KN Pat + Wjv_i - [F^-i + EÍA^KnBn^}] + [AN_ľ + ^B^K^B^}]"1 x [^{B^KjvCiv-i} + ^{Bat.J'pat + Xn-i] The Riccati equation for K is seen to be a difference equation with values of Kjv on the right-hand side and Kat_i on the left-hand side. Since the terminal condition for this equation was obtained in Eq. (6.17), one can evaluate Kjv-i by using Eq. (6.29). This is sometimes called backward integration since the integration occurs backward in time. In fact, the reader may recall from Chap. 2 that this is how quadratic linear control problems are solved. First the Riccati equations are integrated backward in time, and the feedback-gain matrices G+ and g^ can be computed so that the system equations and the feedback rule can be used in tandem as they are integrated forward in time from x0 to find the optimal paths for the states and controls. Also the p equation in Eq. (6.29) can be integrated backward by using the terminal conditions for both K and p in Eq. (6.17) Kjv = Wat pat = w^v The v equation in Eq. (6.28) is not evaluated here since it does not affect the optimal control path but only the optimal cost-to-go. The optimal control problem has now been solved by dynamic programming for periods TV and TV — 1. The process can now be repeated for periods TV — 2, TV — 3, etc. It is not necessary to show this here since the basic structure of the solution is already present. The derivations will not be given, and the feedback and Riccati equations for the typical period k will simply be stated. CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 55 6.4 Period k The optimal feedback rule for period k is, from Eq. (6.25), ufc = G£xfc + gt (6.30) where, from Eq. (6.26), Gj = -(tfie'J)-1^«}) gl = -{E{&k})-lE{0k} (6.31) where from Eq. (6.22), E{*k} = Fk + E{A'kKk+1Bk} E{ek} = Ak + E{B'kKk+1Bk} (6.32) E{0k} = E{BkKk+1ck} + E{Bk}'pk+1 + \k Also the Riccati equations can be written using Eq. (6.28) as Kk = E{®k}-E{Vk}(E{Gk})-1(E{yk})' Pk = Eifay-EiV^Eie^EiOk} (6.33) where from Eq. (6.22), E{*k} = Wk + E{A'kKk+1Ak} E{(j>k} = E{A'kKk+1ck} + E{Ak}'pk+1 + wfe or, in terms of the original matrices of the problem, by using Eq. (6.29), as Kk = Wk + E{A'kKk+lAk} - [Fk + E{A'kKk+lBk}] x [A; + EiB'.K^Bkjr'iEiB'.K^A,} + F^] Pfc = ^{A^Kfc+1cfc} + E{Ak}'pk+1 + wfe - [Ffc + E{A'kKk+1Bk}} x [A; + ^{BjfeKfc+xBfc^-^EÍBjfeKfc+xc*} + E{Bk}'pk+l + Xk] (6.34) In summary the problem is solved by using the terminal conditions (6.17) in Eq. (6.34) to integrate the Riccati equations backward in time. Then the G* and gt elements can be computed for all time periods. Next the initial condition on the states, x0, is used in the feedback rule (6.30) to compute u0. Then u0 and x0 are used in the system equations (6.1) to compute xx. Then xx is used in the feedback rule to get ui. In this manner the system equations are integrated forward in time and the optimal controls and states are calculated for all time periods. CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 56 6.5 Expected Values of Matrix Products One loose end remains to be cleared up. This is the method for calculating the expected value of matrix vector products. Consider the general case E{A'KB} (6.35) where A, K, and B are all matrices. The A and B matrices are assumed to be random, and the K matrix is assumed to be deterministic. If A and/or B is a vector, the method suggested here is somewhat simplified. Define the matrix D = A'KB so that E{T>} = E{A'KB} and consider a single element in D, namely d^. Then E{dij} = EfäKbj} (6.36) where a, is the ith column of A and b j is the jth column of B. From the result in Appendix B the expectation in Eq. (6.36) can be written as E{dij} = (E{at})'KE{b,} + tr[KEbjaJ (6.37) where Vhjat=E{\bj-E{bj}][ai-E{ai}]'} is the covariance matrix for the jth column of B and the ith column of A and tr[-] is the trace operator, i.e., the sum of the diagonal elements of the matrix in the brackets. While Eq. (6.37) is the form of this expectations operator which is commonly used in displaying mathematical results, it is not the most efficient form to use in computers.2 Observe that Eq. (6.36) can be written and rewritten as E{dij} = ^{ajKbj} = EÍj2ast(j2kSrbrj)\ (6.38) Where Y, is an ordinary summation sign (not a covariance matrix) and ksr, is the element in the sth row and rth column of the matrix K. Continuing from Eq. (6.38), one obtains E {dij} = El Y, Y, asiksrbrj \ = YlYl EWsiksAj} = J2 Y, ksrE{asibrj} Ksr ) s r s r 2 This procedure was suggested to the author by Fred Norman. CHAPTER 6. MULTIPLICATIVE UNCERTAINTY 57 Thus E{dij} = J2 J2 ksr[{Easi)(Ebrj) + cov(asibrj)] (6.39) s r gives the form desired. The advantage of using Eq. (6.39) instead of Eq. (6.37) is that it is not necessary to store the matrix Sb ,a. and to compute the KE product and take its trace. Only the scalar elements cov(asAj) are necessary. This completes the discussion of the methods for obtaining the control of each time period, since the expectations evaluations discussed here can be coupled with the Riccati equations, feedback law, and system equations discussed in Sec. 6.4. Before ending the chapter, however, it is useful to describe briefly two methods of passive-learning stochastic control. 6.6 Methods of Passive-Learning Stochastic Control Methods of stochastic control include a procedure for choosing the control at each time period and a procedure for updating parameter estimates at each time period. The differences in the names for the procedures depend on the method for choosing the control at each time period. For example, if the control at each time period is chosen while ignoring the uncertainty in the parameters, the method is called sequential certainty equivalence, update certainty equivalence [Rausser (1978)], or heuristic certainty equivalence [Norman (1976)]. In contrast, if the control is chosen at each time period using the multiplicative uncertainty, the method is called open-loop feedback.3 3 Rausser (1978) distinguishes between open-loop feedback and sequential stochastic control. In sequential stochastic control in his nomenclature the derivation of the control rule is based on the assumption that future observations will be made but they will not be used to adapt the probability distribution of the parameters. He classifies as open-loop feedback studies those of Aoki (1967), Bar-Shalom and Sivan (1969), Curry (1969), Ku and Äthans (1973), and Tse and Äthans (1972). He classifies as sequential stochastic control the studies of Rausser and Freebairn (1974), Zellner (1971), Chow (1975, chap. 10), and Prescott (1971). Chapter 7 Example of Passive-Learning Stochastic Control 7.1 The Problem This chapter contains the solution of a two-period, one-unknown-parameter problem used by MacRae (1972),1 i.e., find (ii0, u\) to minimize J = ^{E(2^+2r^-1)j subject to xk+i = axk + buk + c + ^k k = 0,1 with x0 given. Also2 i.e., both £fc and b0 are assumed to be normally distributed with means and variances as indicated. Consider the case with3 N = 2 q = l r = 1 a = .7 1 This chapter has been written with an eye toward its use in debugging computer programs. For this reason, the calculations are presented in considerable detail with all intermediate results explicitly shown. 2 This notation means that £& is a normally distributed random variable with mean zero and covariance Q. 3 S^0 is the variance of b. The reason for this elaborate notation is given in subsequent chapters. 58 CHAPTER 7. EXAMPLE OF PASSIVE-LEARNING CONTROL 59 lib = -.5 Sgf0 = .5 c = 3.5 Q = .2 x0 = O This corresponds to the (q : r) = (5 : 5) case in Table 2 of MacRae (1972) with N = 2. She solves only for the first-period control. In contrast, sample calculations will be presented here for a single Monte Carlo run in which the optimal policy for both period 0 and period 1 are calculated.4 Begin by solving the open-loop-feedback problem from period k to period TV.5 7.2 The Optimal Control for Period 0 The solution to the open-loop-feedback problem is given in Eq. (6.30), i.e., ufc = Glxfc + g£ (7.1) where, from Eq. (6.31), Gl = -(E{e'k})-\E{^k}y gl = -(E{®'k})-l(E{ek}) { • ; with, from Eq. (6.32), E{@k} = Ak + E{B'kKk+1Bk} E{*k} = Ffc + £{A^Kfc+1Bfe} (7.3) E{0k} = E{B'kKk+1ck} + E{Bk}'pk+1 + Xk Also, the K and p recursions are defined in Eq. (6.33) as Kk = E{$k} -^{^K^Ofc})-1^*,})' K7V = WJV Pk = E{k}-E{yk}{E{®k})-lE{0k} PAT = wAr (7.4) 4For other examples of the applications of passive-learning stochastic control methods to economic problems with multiplicative random variables see Fisher (1962), Zellner and Geisel (1968), Burger, Kalish III, and Babb (1971), Henderson and Turnovsky (1972), Bowman and Laporte (1972), Chow (1973), Turnovsky (1973, 1974, 1975, 1977), Kendrick (1973), Aoki (1974a,b), Cooper and Fischer (1975), Shupp (1976b,c), and Walsh and Cruz (1975). 5 The results are of course a function of the particular random quantities generated. However, the calculations are done here for a single set of random quantities to show how the calculations are performed. CHAPTER 7. EXAMPLE OF PASSIVE-LEARNING CONTROL 60 with, from Eq. (6.32), E{Qk} = Wk + E{A'kKk+lAk} E{cf>k} = E{A'kKk+1ck} + E{Ak}'pk+1 + wk (7.5) Also compare the criterion function for this problem with the criteria for the quadratic linear problem (2.1) to obtain wfc = 0 Afc = 0 (7.6) For the problem at hand Ak = a = .7 Bk = nb = -.5 ck = c = 3.5 WN = Wk = q = l Ak = r = l Fk = 0 {L,) and Sgf0 = .5 0olo = ijb = -.5 (7.8) In order to obtain the solution u^, one can work backward through the relationships above, obtaining Eq. (7.5), then Eq. (7.4), then Eq. (7.3), then Eq. (7.2), and finally Eq. (7.1). Begin with Eq. (7.5) E1 = Wi + £{A'K2A} (7.9) Then from Eqs. (7.4) and (7.7) we have K2 = W2 = 1, and from Eq. (6.39) £{A'K2A} = K2[(Ea)(Ea)+ cov (aa)] = K2[a2 + cov (a)] = K2a2 = (1)(.7)2 = .49 So Eq. (7.9) becomes £{*!> = 1 + .49 = 1.49 (7.10) Also, from Eq. (7.5), ^{01} = JE{A'K2c} + ^{A}'p2 + w1 (7.11) and from Eq. (6.39) £{A'K2c} = K2[E{a}E{c}) + cov (ac)] = K2(ac + 0) = K2ac = (1)(.7)(3.5) = 2.45 (7.12) CHAPTER 7. EXAMPLE OF PASSIVE-LEARNING CONTROL 61 Also, E{A}'p2 = ap2 but p2 = w2 from Eq. (7.4) 0 from Eq. (7.6) and so £{A}'p2 = (.7)(0) = 0 (7.13) Finally, from Eq. (7.6), wi = 0 (7.14) Then from Eqs. (7.11) to (7.14) #{0i} = 2.45 + 0 + 0 = 2.45 (7.15) This completes the evaluation of Eq. (7.5). In order to evaluate Eq. (7.4) it is necessary first to evaluate the elements in Eq. (7.3). Begin with E{Bk} E{e1} = A1 + E{B'K2B} (7.16) From Eq. (6.39) £{B'K2B} = K2[ßbtib +coy (bb)] = (l)[(-.5)(-.5) + .5] = .75 (7.17) Then using Eqs. (7.17) and (7.7) in Eq. (7.16) yields E{&i} = 1 + . 75 = 1.75 Therefore, (^{©x})"1 = .5714 (7.18) The next element in Eq. (7.3) is £{*i} = F1 + £{A'K2B} (7.19) Then, from Eq. (6.39), £{A'K2B} = K2[(Ea)(Eb)+ cov (ab)] = (l)[(.7)(-.5) + 0] = -.35 (7.20) CHAPTER 7. EXAMPLE OF PASSIVE-LEARNING CONTROL 62 Using Eqs. (7.20) and (7.7) in (7.19) yields £{*i} = 0-.35 =-.35 (7.21) The last element in Eq. (7.3) is ^{6>1} = ^{B'K2c} + ^{B}'p2 + A1 (7.22) From Eq. (6.39) E{B'K2c} = K2[ßbc + cov(6c)] = (l)[(-.5)(3.5) + 0] = -1.75 (7.23) From Eqs. (7.4) and (7.6) £{B}'p2 = /i6w2 = (-.5)(0) = 0 (7.24) From Eq. (7.6) Ai = 0 (7.25) Therefore, substitution of Eqs. (7.23) to (7.25) in Eq. (7.22) yields E{0í} = -1.75 + 0 + 0 = -1.75 (7.26) This completes the evaluation of Eq. (7.3). Now Eq. (7.4) can be evaluated Ki = E{^} - £{*xXtfi©!})-1^*x})' (7.27) Substitution of Eqs. (7.10), (7.21), and (7.18) into Eq. (7.27) yields Ki = 1.49 - (-.35)(.5714)(-.35) = 1.42 Also from Eq. (7.4) Pl = £{ uoj)j=fc+l, £fc+l|fc, (Qj)j=fc+l, (Sj|j)j=fc+l) (8-6) For better understanding it is useful to divide Eq. (8.6) into three components [Bar-Shalom and Tse (1976a)], called the deterministic, cautionary, and probing terms. They are written in general functional form as J±N-k = m^n{JD,N-k + Jc,N-k + Jp,N-k) (8-7) CHAPTER 8. OVERVIEW 80 where JD,N-k = deterministic component = f(ul,(xoJ+1,uoj)fjk+i) (8-8) Jc,N-k = cautionary term = /(Sfc+i|A.,(Qj)^1+1) (8.9) JP,N.k = probingterm = /((Eib)f=-i;1+i) (8.10) The deterministic component is a function of only the search value of the control and the nominal path. It contains no covariance terms. The cautionary component is a function of Sfc+1|fc, which is the covariance of the state variable at time k + 1 as projected with data available at time k. This represents the uncertainty in the response of the system to a control applied at time k before the state of the system can be observed again at time k + 1 and a new control applied to bring the system back onto the desired path. The name "cautionary" comes from the fact that such uncertainty normally biases the choice of the control variable in a conservative direction since one is uncertain about the magnitudes of the response to expect. This component is also a function of the covariance of the system equation error terms. This does not necessarily fit well into a component called cautionary. Thus it shows that the separation into these particular three components is somewhat arbitrary. Perhaps it would be better to separate these terms into yet a fourth component. The probing component is a function of the covariance matrix H^j for all future time periods. This is the uncertainty associated with the state vector at each time period after the measurement has been taken at that time period and the covariance matrix has been updated. Since probing or perturbation of the system early in time will tend to reduce the uncertainty and to make the elements of these matrices smaller later, this term is called the probing term. Now return to Fig. 8.2. The next step, shown in the fourth box down on the left-hand side of the figure, is to compute the Riccati matrices K. Analogous to the Riccati matrices in the deterministic and multiplicative-uncertainty problems, there are also Riccati matrices in this problem. They can be computed for all future time periods by integrating backward from terminal conditions. Next, since the nominal path is known, the deterministic component of the approximate cost-to-go can be computed. Also, the part of the cautionary term involving Y,k+i\k can be computed at this stage since that matrix is available. It was computed in the step shown in the second box from the top on the left along with the projected mean of the state variable Xfc+i|fe- Next we enter the fourth of the nested do loops, which projects the covariance matrices £ju forward all the way to period TV and uses these terms to compute the CHAPTER 8. OVERVIEW 81 probing component. Also the part of the cautionary component which involves Q j is computed in this loop. Once the do loop has been completed, the total approximate cost-to-go can be obtained by adding the three components. This is then used to determine whether or not the search is complete. If the search is a grid search and the vector uk consists of a single control, the problem reduces to a line search. This is the method used in the example in Chap. 12. The approximate cost-to-go is evaluated at many points on the interval between the highest and lowest likely values for the controls. The search value of the control which yields the lowest cost-to-go is then chosen as the optimal control. With a gradient technique the third loop is used as the procedure for evaluating the function at each iteration. The gradient method then proceeds until satisfactory convergence has been obtained. It is useful to note the computational complexity of the problem at this stage. The iterations in the search for the optimal control require the backward integration of the Riccati equations and the forward integration of the covariance equations at each step. The search must in turn be carried out for each time period of the problem in the second of the nested do loops. Furthermore, the entire problem must be solved for each of the Monte Carlo runs. This means that only a fairly limited number of Monte Carlo runs can be made for even small econometric models. Return now to the search in Fig. 8.2. If the search is not completed, the iteration counter is increased and the evaluation of the cost-to-go is repeated. If the search is completed, the update procedure is entered in the concluding phase of the solution of the adaptive-control problem. 8.5 The Update Once the search is completed and the optimal control u*k for period k has been obtained, this control is used along with the additive-noise terms in Eq. (8.3) to obtain xfc+1. The vector xfe+1 is used in turn in the measurement relationship Eq. (8.4) along with the measurement error term to get y^+i. The measurement is used to obtain updated estimates of the mean and covariance of the state vector at time A;+ 1 using data obtained through period k + 1, that is, xfe+i|fc+i and Y,k+i\k+i. This is shown on the right-hand side of Fig. 8.1. Next the time-period index k is increased by 1 and a test is made to see whether CHAPTER 8. OVERVIEW 82 all TV periods have been completed. If not, the certainty-equivalence control for the new time period is computed and the search is made again. If all TV periods have been completed, the Monte Carlo run counter is increased by 1 and a test is made to see whether the desired number of Monte Carlo runs has been completed. 8.6 Other Algorithms As discussed in the introduction to this chapter, a variety of other algorithms are available for solving active-learning stochastic control problems, but very little work on comparison of algorithms has been done. It is beyond the scope of this book to provide a detailed comparison of the various algorithms, but a brief comparison to three other algorithms is provided, namely those of Norman (1976), MacRae (1975), and Chow (1975, Chap. 10). Norman's algorithm is like the algorithm described above except that a couple of simplifications are adopted: (1) he assumes that there is no measurement error, and (2) he employs a first-order rather than a second-order expansion of the cost-to-go function (hence the name first-order dual control). MacRae also uses the assumption of no measurement noise. Thus the £ matrix used in Chap. 10 of this book consists of one component, Eee, instead of four components. With this assumption MacRae derives an updating rule for the inverse of the covariance matrix of the form r,"1 =/(r^) (8.11) This same type of relationship can be derived by assuming (in the notation of Chap. 10) that D = I, H = I, Exx = 0, and R = 0, that is, by assuming that the parameters of the problem are constant over time and that the state variables can be measured exactly. Then Eq. (10.60) can be substituted into Eq. (10.69) to obtain a relationship like Eq. (8.11). In MacRae's algorithm the update relationship (8.11) is appended to the criterion function with lagrangian variables, and the resulting function is minimized. Chow's algorithm also relies on the assumption of perfect measurement of the state vector, but it is more general than the algorithm used in this book in at least one way. Chow's development includes cross terms from different time periods. Another difference is in the path about which the second-order approximation is made. In the Tse, Bar-Shalom, and Meier algorithm this path is chosen anew at each iteration in the search path; in Chow's algorithm it is selected before the CHAPTER 8. OVERVIEW 83 search is begun and not altered during the search. Finally in the development of the algorithm Chow takes the expectation first and then performs the second-order expansion while Tse, Bar-Shalom, and Meier reverse these steps. This completes the brief review of other algorithms and the survey of the adaptive-control algorithm used in this book. The next two chapters include a detailed development of the nonlinear algorithm and the application of this algorithm to a quadratic linear control problem with unknown parameters. The reader who is more interested in the application of stochastic control to economics than in the algorithms may prefer to skip to Chap. 12, which includes an application to a small econometric model to the United States economy. Chapter 9 Nonlinear Active-Learning Stochastic Control with Bo Hyun Kang 9.1 Introduction This chapter provides a detailed description and derivation of the algorithm of Tse, Bar-Shalom, and Meier (1973).1 It also extends that algorithm to cover cases where (1) a constant term is given explicitly in the systems equations and (2) the criterion function includes a cross term in x and u. 9.2 Problem Statement The problem is to select U*"1 = (u^)^1 to minimize the cost functional JN = E{CN} (9.1) where N-l CN = LN(xN) + Y, Lk(xk, Ufc) (9.2) fc=0 where the expectation E{-} is taken over all random variables. The subscripts denote the time period. It will be convenient at times to divide the cost function 1 See also Bar-Shalom, Tse, and Larson (1974). 84 CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 85 into three component functions, one including only terms in x, another including only terms in u, and a third including cross terms in x and u, that is, Lfe(xfe, Ufc) = vk{*)k + Wfe(xfc, Ufc) + 0fc(ufc) (9.3) The cost functional is to be minimized subject to the system equations xfc+i = ffc(xfc,u*)+£fc /c = 0,1,..., TV-1 (9.4) and the measurement equations yfc = hfc(xfc) + Cfc k = l,...,N (9.5) where x = n-element state vector, u = m-element control vector, y = r-element observation vector. It is assumed that x0 and (£&, Ck+i)k=o are independent gaussian vectors with statistics £{x0} = x0|o cov(xo) = E0|o E{£k} = 0 cov(&) = Qk (9.6) E{Ck} = 0 cov(Cfe) = Rk As discussed in Chap. 5, we seek a control which is a closed-loop rather than a feedback control [see Bar-Shalom and Tse (1976b)], the distinction being that the feedback control depends only on past measurements and random variables while the closed-loop control includes some consideration of future measurements and random variables. In fact the control used here is of the form uk = ufe(Yfc, U*"1, CN, D, E""1, Q""1, RN) (9.7) where Yfc = (y,.)J=1 u*-1 = (u^S QN-' = (Qi)ÍTo1 RAr = (R,)f=i and where CN is the cost functional, D is the systems dynamics ffc(-) for k = 0,1,..., TV - 1, and E""1 = (Sj)^1, where Efc is an estimate of E at k based on Yk and Ufc_1 and (Ej)^1 is a projection of E for future time periods based on Yk, Ufc_1, and the statistical description of the future measurements. So the control depends on the estimated state-variable covariance matrix at time k CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 86 and on projections of this same matrix which take account of the fact that system noises will be increasing the variance but also that future measurements can be used to decrease the variance of the state vector. Also, it is assumed here that Q and R are known for all future time periods. The dual-control method used here is said to be a wide-sense method in that it employs the first and second moments x and £ in computing the optimal control. Higher moments are ignored. 9.3 Dynamic Programming Problem and Search Method As stated in Eq. (6.18), the dynamic-programming problem at time k is to find uk to minimize the expected cost-to-go, i.e., J*N_k = mmE{Lk(xk,uk) + J*N_k_x | Yfc,Ufc"1} (9.8) The first problem then is to describe the search method over the space u^. Since the search for the optimal control ujji is initiated from the certainty-equivalence control uJ?E, it is necessary first to solve the certainty-equivalence problem. Repeated values of u^ are chosen, and the cost-to-go is evaluated for each set of control values. If u^ is a scalar quantity, a line search is appropriate; in Tse and Bar-Shalom (1973) a quadratic fit method is used. If u^ is a vector, more general gradient or grid-search methods can be used. However, the function JN-k(uk) may have multiple local optima. Therefore, if gradient methods are used, they should be given multiple starting points. Because of the presence of local optima, Kendrick (1979) employed both a quasi-Newton gradient method and a grid-search technique.2 9.4 Computing the Approximate Cost-to-Go In order to evaluate Eq. (9.8), an approximate cost-to-go must be computed, and this requires a nominal path on which the second-order Taylor expansion of «/jv-fc-i can be evaluated. 2 The gradient technique used was ZXMIN from the IMSL Library (1974). CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 87 Choosing the Nominal Path The nominal path is (x^\, u^E)^Sk+1, that is, the certainty-equivalence path of values which minimize the cost functional from time k + 1 to TV with all random variables set to their mean values. In order to solve the problem one must have the value xfc+i as an initial condition. This value is obtained by using the current search value of u^ to obtain an estimate xfc+1|fe of the state at time k+1 as projected with data available at time k. In order to do this, consider the system equation (9.4) xfe+1 = ffe(xfe,ufc) +£k (9.9) and expand it to second order about xfe|fc, the current estimate of the state, and uk, the current search value of uk. This yields (see Appendix A for derivation) Xfe+1 « ffc(Xfc|fe, UTk) + [fx][xfc - Xfclfc] + [fu][ufc - u*] + \ J2 e^Xfc - Xfcifcj'f^Jxfc - X/ti*] i + lJ2e*[uk-ul]Tuu[uk-ul] i + £ e>, - u^'ČJx* - **,*] + 6 (9.10) i Taking the expected value of Eq. (9.10) with data through period k and setting uk = u£, since we wish to find xfe+1|fc conditional on uk = u£, yields xfc+i|fe « ffc(xfe|fc,Ufc) + |£'|^V[xfe -xfe|fc]'f^x[xfc -xfc|fe]| (9.11) i In Appendix B it is shown that the expected value of a quadratic form is Ejx'Ax} = xAx + tr[AE] (9.12) where x = ^{x} and £ = cov(x). The application of this result to Eq. (9.11) yields Xfe+i|fc = fx(x*|fc,u£) + ^eHr^Sfcifc] (9.13) j since E{xk} = xfc|fe and £fc|fe = £{[xfc - xfe|fc][xfe - x^]'} . Therefore, given the current statistics on x, namely (xfc|fe, Efe|fc), and the current search value of the control ufc, one can use Eq. (9.13) to obtain xfe+1|fc, next period's state as estimated with data available through period k. CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 88 As indicated above, xfc+1|fe then provides the initial condition for the certainty-equivalence problem to find a nominal path from periods k + 1 to N. If the resulting certainty-equivalence problem is a quadratic linear problem, the Riccati method can be used. If the problem is a general nonlinear problem, a gradient method like the conjugate gradient used by Kendrick and Taylor (1970) or a variable-metric algorithm used by Norman and Norman (1973) can be used. Now define the nominal path as (■x.0j+i,u0j)^Sjl+1 and set it equal to the certainty-equivalence path (x^, u^)^.!^. Second-Order Expansion of the Optimal Cost-to-Go Following Eq. (6.10) the optimal cost-to-go at period k + 1 can be written as J*N-k-i = mmE{--- min#{min£{CW-*-i I T^"1} I O^"2) • • • ?fc+1) (9.14) where CV-fc-i = LN(xN) + J2 ^i(xi'ui) (915) j=k+l y = (xfe|fc, Efc|fc) A second-order expansion of Eq. (9.15) about the nominal path is then + ACN.k_1 (9.16) where C0jN-k-i is the zeroth-order term in the expansion and ACjv-fe-i is the first- and second-order terms. Then N-l C0,N-k-i = Liv(x0,iv) + Y. Li(xoj,Uoj) (9.17) j=k+l and ACV_* -1 — LtVx^Xtv + g^Xjy-LjVjXX^Xjv N-l + J2 (Ljx^j + 5^-Lj)XX6xj + ÔXjLj^ôuj j=k+l + ĽjuSuj + iSu'jLj>uuôuj) (9.18) CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 89 where LWx, LJX, and LJU are the gradients and L^xx, L?)XX, L,-jXU, and L,-jUU are the hessians evaluated at xoAr, x0j, and u0J and (5xj = Xj — X0j áUj = Uj — U0j Substitution of Eqs. (9.16) to (9.18) into Eq. (9.14) then yields an approximate optimal cost-to-go of the form /rV-fc-l = 'K,N-k-l + ^/rV-fc-1 ("-I") where Jo,N-k-i = Co,iv-fc-i (9.20) Aj;.^! = min e{- • • min E{ACN-k-i | O^"1} • • • Tfc+1) (9.21) Suk+i <• (5uiv_i J Here the motivation for dividing J* into zeroth-order terms and first- and second-order terms becomes apparent. The expectation of the zeroth-order term is simply itself since it contains no random variables but only the nominal-path variables. The first- and second-order terms now constitute a separate optimization problem with a quadratic criterion. This criterion is minimized subject to system equations, which are constituted from the expansion of the original system equations. This can be obtained by rewriting Eq. (9.10) in perturbation form as <5xfe+i = fx(5xfc + íuôuk n i=l + 6 (9.22) where all the derivatives are evaluated on the nominal path and are for period k unless otherwise noted. In Eq. (9.22) n is the number of state variables. Now Eqs. (9.21) and (9.22) constitute a problem with a quadratic criterion and quadratic system equations. It is assumed that the solution to this problem can be represented up to second-order terms by the quadratic form AJ;_M = gk+i + E{p'k+16xk+1 + ±JV_J._1 + A^., | Y'\ IP"1} Removal of the constant terms provides Jn-j - Lj(*oV u0j) - J*N_j_l = min£{Lx<5xj + i^L^Xj + <5xjLxuáuj + L'u<5u,- + |<5u^Luuáuj + AJ^_J_1|Y^U^-1} (9.25) Substitution of the optimal cost-to-go for the perturbation problem A J'^_j_1 from Eq. (9.23) yields ^Jn-j = Jn-j - Jo,N-j = min E \l'xSx.j + lôx'jLxxSx.j + Sx'jLxuSuj ÖUj { + ĽuÔUj + ^ôWjLuuôUj + E[gj+l + Pj-+1<5xi+1 + ±/ixx ^ux^uu^ux v"--' '/ Then substitute Eq. (9.37) into Eq. (9.36) and collect terms to obtain AJ^. = E{gj+1 - |H'UJ{-UHU + (Hx - ^xu^Hu)'ô^ + ^x^óx, - <5x;..Axx<5% + ^yA^ô^j + fáKjtj I ?j} (9.38) Removal of the constant terms from the expectation leaves (also using !KXU = ■^uxj &J*N-j = 9j+i - 2H'u^üuHu + ^tr[Kj+1Qj] + #{[HX - jf^^Hui'óxj + í(5x;.:kxx(5x, i o«} - lôx'^A^Ô^j (9.39) Now recall from the discussion of the trace operator in Appendix B that E{Ö^A^Ö*3 I yj'} = «Sx^Acx^x^ + tr[^xxEjb] (9.40) Solving Eq. (9.40) for the ox.'Aôx. term and substituting the result back into Eq. (9.39) yields ^J*N-j = 9j+l - |Hú^uuHu + ^tľ[Kj+1Qj + AxxĽjtf] + #{(HX - H'^H-iHjôxj + i<5x^xx<5Xj- I ^} - ^{äx^^äxj | ÍF'} (9.41) CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 94 or AJ*N_j = g, + Efa'jSxj + lö^jKjöXj | 9j] (9.42) where 9j = gj+1 - ±H'UIK-^Hu + itr[Kj+iQj + .Axx^-] (9.43) Pj = Hx - Kx^üuHu (9.44) and Kj = !KXX - -Axx (9.45) Expression (9.42), which gives the approximate optimal cost-to-go at time j for the perturbation problem, is a quadratic function of the state of the system at time j. Thus the induction has shown that if the approximate optimal cost-to-go at time j+1 is quadratic, it will also be quadratic at time j. Also expressions (9.43) to (9.45) provide the recursions on g, p, and K which were sought. Slightly different expressions of the recursions on g, p, and K are given in Bar-Shalom, Tse, and Larson (1974) and in Tse, Bar-Shalom, and Meier (1973). Since both sets of results will be used in this book, Appendix C shows the equivalence of the two sets of recursions. Partial Solution of the g Recursion Expressions (9.42) to (9.45) provide a method of calculating the perturbation approximate cost-to-go, i.e., the approximate cost-to-go for the first- and second-order terms in the Taylor expansion. These terms can be added to the zeroth-order term in the expansion to get the full approximate cost-to-go. However, before doing that it is useful to partially solve the difference equation for g in order to provide a clear separation of the stochastic terms from the nonstochastic terms in it. In order to solve Eq. (9.43) define Ai = |H'uj:K-^Huj (9.46) and Bj = |tr[Ki+1Q, + AaZft] (9.47) so that Eq. (9.43) can be written as 9j = 9j+i - Aj + Bj (9.48) CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 95 Then solve Eq. (9.48) by working backward from period TV gN-x =gN- AN_t + BN_t (9.49) 9n-2 = 9n-i - AN-2 + BN_2 (9.50) Then substitute Eq. (9.49) into Eq. (9.50) to obtain Qn-2 = On — AN_i + BN_i — AN_2 + BN_2 N-l N-l = 9n- E Aj+ E Bj (9.51) j=N-2 j=N-2 or in general N-l N-l 9N-k = gN- E AJ+ E B3 (9-52) j=N-k j=N-k so the g difference equation has been solved for both the A and the B; however, it was desired only to solve it partially for the B term. Therefore we define jj = jj+i - Aj 7at = 0 (9.53) or 7at_! = 7iv - AN_t (9.54) 7at_2 = 7AT-1 - AN_2 = 1n - AN_i - AN_2 (9-55) or in general N-l N-l lN_k = lN- y, A3 = - E A3 (9-56) j=N-k j=N-k Substitution of Eq. (9.56) into Eq. (9.52) yields N-l 9N-k = 9n + lN-k + E B3 (9-57) j=N-k Then the use of the fact that gn = 0 (from Appendix C) and substitution of the definition of B in Eq. (9.47) back into Eq. (9.57) results in -i N-l gN-k = jN-k + - E trtKj+iQj +^xx£j|j] (9-58) Z j=N-k or -r N-l 9k+i = 7k+i + T E tr[K,+1Q,+yixxEib] (9.59) z j=k+i where 7fc = Jk+i - iH'U)fcJC-«,*HU)fc (9.60) CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 96 9.5 Obtaining a Deterministic Approximation for the Cost-to-Go Substitution of Eq. (9.59) into the perturbation cost-to-go expression (9.23) yields i N-l A4-*-i = 7fe+i + ö Iľ tr[Kj+iQj+^xxSj|j] 2 j=fc+l + £{p^+1c5xfc+1 + ±<5x'fc+1K*+1<5xfc+1 | Tfc+1} (9.61) Expression (9.61) then provides the optimal perturbation cost-to-go (the first- and second-order terms in Taylor expansion). Next this term is added to the zeroth-order term (9.20). So substitution of Eqs. (9.61) and (9.20) into Eq. (9.19) yields i N-l ■ĽN-k-i = Co,N-k-i + 7fc+i + ~z 2^i tr[Kj+iQj + >lxxEj|j] z j=k+i + E{p'k+1ôxk+1 + |K+1Kt+1áxH1 | ?k+1] (9.62) which is the approximate optimal cost-to-go at period k + 1. Substitution of Eq. (9.62) into Eq. (9.8) provides the optimal cost-to-go at period k, JN-k = min^LA.(xA.,ufc) + C'0)iv-fc-i + 7fc+i N-l + Ö E tr[KJ+1QJ+lAxx£j1j] A j=k+i + E{p'k+1öxk+1 + i<5x'fc+1Kfc+1<5xfc+1 | 9k+l} | 9k\ (9.63) Next use the result that E{E{p'k+1öxk+1 + ±ôxk+1Kk+1ôxk+1 I Tfc+1} | Tfc} = E{p'k+lôxk+l + |<5x'fe+1Kfc+1<5xfc+1 | ?k} (9.64) i.e., since 7k+1 D J)fc, the expression reduces to one taken over the smaller set. Then E{p'k+1ôxk+1 | ?k} = p'k+1E{ôxk+1 \?k} = 0 (9.65) CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 97 and E{±ox.'k+1Kk+iöx.k+i I yh} = l^'k+i\kKk+i^k+i\k + |tr[Kfe+iEfe+i|fc] = itr[Kfe+1Efe+1|fc] (9.66) since <5xfc+i|fe = 0 from Eq. (9.65). Substituting the results of Eqs. (9.64) to (9.66) into Eq. (9.63) and taking the expectation over the remaining terms yields J*N-k = minÍLfe(xfe,ufc)+ C0)JV_fe_i + 7fc+1 uk I 1 N~1 1 + - J2 tT[Kj+1Qj+A^3:j\j] + ±tr[Kk+1Vk+llk]\ (9.67) Zj=k+l ) The reader may recall that a search is made over values of u^ in order to find minimum of the function (9.67). Next we substitute Eq. (9.3) into Eq. (9.67). Since vk does not depend on ufc, we can drop this term, leaving only the terms which are dependent on uk in Eq. (9.3), i.e., J*dN-k = mnJwfc(xfc,Ufc) + (ßk(uk) + C0jN-k-i + 7fc+i 1 7V"1 1 + itr[Kfc+1Efc+1|fe] + - E ^[Kj+xQj + A^Ľá\j] (9.68) z j=k+i ) where J^N-k 1S tne optimal cost-to-go, which is dependent on u*;. The expression (9.68) can then be used in the search to find the best choice of uk at period k. An alternative formulation of Eq. (9.68) which is used less in the further development in this book is Jd,N-k = minlw^Xfc, Ufc) + 0fe(ufc) + J0yN-k-i + 7fc+i} Ufc ^tr< (Efc+i|fe — Efc+i|fe+i)Kfe_ N-l -\ + £ [HxxÄ + (E^iij - EJ+1|J+1)K,+1] (9.69) j=fc+i J This expression could be used in the search, since Eq. (9.69) is equivalent to Eq. (9.68).3 The derivation of Eq. (9.69) from Eq. (9.68) is given in Appendix E. In order to evaluate either (9.68) or (9.69) one needs the values of (Tij\j)^~k+l. The next section outlines the method used to project these covariance matrices. 3 Expression (9.69) is the same as the cost-to-go in Tse, Bar-Shalom, and Meier (1973). Expression (9.68) is the same as the cost-to-go in Bar-Shalom, Tse, and Larson (1974). CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 98 9.6 Projection of Covariance Matrices When projections of economic data are made to compute both future means and variances, one ordinarily finds a rapidly growing variance so that the confidence which can be attached to predictions in the distant future is sharply limited. The same phenomenon would occur here except for the fact that it is assumed that future measurements will be made. So the dynamics and the system-equation noise cause the variance to increase and the measurements cause the variance to decrease. Also the noise in the measurement equation modifies the ability of the measurements to decrease the variance. These notions are embodied in the mathematical model in the distinction between Efc+1|fe = £{[xfc+1 - xfc+1|fe][xA.+1 - Xfc+iifc]' | Yfc} (9.70) and ^£i+i|fc+i = -^{[xfc+i — £fe+1|fc+1][xfc+1 — Xfc+i|£i+i] I Y } (9.71) where xfe+1|fc = £{xfc+1 | Yfc} and xfc+1|fe+1 = £{xfc+1 | Yfc+1} for Yk = (Yj)kj=1 That is, T,k+1\k is the covariance matrix in period k + 1 calculated from observations through period k, and Efc+1|fe+1 is the covariance matrix in period k + 1 as calculated with observations through period k + 1. Consider first the method of obtaining Y,k+i\k. To do so one can use the system equations (9.4), make a second-order expansion of them as in Eq. (9.10), and set Ufe = uk to obtain Xfe+i « fk(*k\k, Ujfe) + fx[Xfc - Xfclfc] + \ Y, eix* - *fci*]'4c[xfc - **ifc] + & (9-72) i Also, the mean-value term xfe+1|fc was obtained earlier in Eq. (9.13) as Xfc+ii* « ffc[x*|fc,u£] + \ Ee'tr^Efci*] (9 73) i CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 99 Then using Eqs. (9.72) and (9.73), we have Xfc+i - xfe+i|fc = fx[xfe - xfc|fc] + I 51 e^Xfe - xfc|fe]'fxx[xfc - xfc|fe] + &-5£eitr[4tE*|fc] j The use of Eq. (9.74) in Eq. (9.70) yields ^k+l\k = E{[xk+i — Xfe+1|fc][xfc+i — Xfc+1|fc] } = £{fx[xfe - xfe|fc][xfe - xfc|fe]'fx} + \E\ E e'[X* - Xfc|fe]'fxx[Xfc - Xfe|fc] l L í EeJ[xfc - xfc|fe]'fxx[xfe - xfe|fc] (9.74) + £{&&}+ i^< E^trfôxE felfc EeJ'trKx^h L J \E< E e^Xfc - xfc|fe]'fxx[xfe - xfc|. EeMfxx£ fclfcj (9.75) since the other cross terms are equal to zero after expectations are taken. Next Eq. (9.75) can be rewritten as £fc+i|fc = fx£fc|fcfx + \ E E eV'£{[(xfc - x^/f^Xfe - xfc|, x [(xfc - xfc|fe)/f^x(xfe - Xfe|fc)]'} Qk + l E^Kx^ felfc 5>tr[fxx£fe|fc] EeJtrKxEfc|fe] Ee?tr[fxxE k\k\ (9.76) Using the result derived in Appendix F that £[(x'Ax)(x'Bx)] = 2tr[A£B£] + tr[A£]tr[B£] one obtains4 lEE^'^ÍK^ -Xfe|fc)'fXx(Xfe -Xfc|fe)][(Xfc -Xfc|fe)/f^x(xfe -Xfclfc)]'} í J This result is given in Äthans, Wishner, and Bertolini (1968, eq. (48)). CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL 100 = \ E E eV'trK^Ä^I*] + I E E eV'tr[^Efc|fc]tr[&Efc|fc] (9.77) Substitution of Eq. (9.77) into Eq. (9.76) yields Vk+i\k = fxEfc|*f; + Qk + | E E eV'tr^E^f^E^] (9.78) i j This expression propagates the covariance one period forward through the system equations. The next step is to devise an expression for T,k+i\k+i based on a knowledge of Efc+1|fc and on the covariance of the measurement noise R^+i- This is done by applying the method of the Kaiman filter to the measurement equation (9.5) yfc = hfc(xfc) + Cfc (9.79) A first-order Taylor expansion of this equation is Vfc = h(x0)fc) + hX)fe[xfe - x0)fc] + Cfc (9.80) or «tyfe = hX;fc<5xfe + Ck (9.81) and the (k + l)th-period version of Eq. (9.81) is <5yfe+i = hX;fc+1<5xfe+1 + Ck+i (9.82) At the time when the measurement relationship (9.82) is used, the covariance matrix for <5xfc+i, -"fc+ilfc ^{[xfe+x - xfe+1|fc][xfe+1 - xfe+1|fc]' I Yfc} (9.83) is known from Eq. (9.78) above, and the covariance matrix for £k+1 is given as R. In Appendix D the Kaiman filter for a linear observation equation like Eq. (9.82) is derived following the method given in Bryson and Ho (1969). The notational equivalence between the appendix and Eqs. (9.82) and (9.83) is given in Table 9.1. The result obtained in Appendix D as (D.41) is M - [Mh; 1 x hxMK + R + - E E eV'tr[hxxM14xM] i j hxM] CHAPTER 9. NONLINEAR ACTIVE-LEARNING CONTROL Table 9.1: Notational equivalence Eq. (D.17) Eqs. (9.82) and (9.83) z = Hx + v 6yk+1 = hX)fe+i<5xfe+i + Ck+i z ,[b;.(kj+1cj + p,+1)-(f;.x,- + Ai(Ki+iC, + pj+1) - (Wjij + Fjüj) with PAT = -WatXat Finally, the cost-to-go can be written as ■h,N-k-l = 2^k+l\k^k+l^-k+l\k + Vk+l^k+l\k + Vk+1 w, (10.33) AA)] (10.34) (10.35) The scalar r\ here is from Eq. (6.22) and is not the same as the vector 77 of additive noise terms in the parameter evolution Eq. (10.9). Substituting Eqs. (10.18), (10.19), (10.26), and (10.35) into Eq. (10.24), dropping fjk+i, which is independent of the choice of uk, and using 7^+1 = 0 (see Appendix J), we obtain J, d,N-k [Xfc - Xfc]'F[ufe - Üfc] + \[uk - Ük]'Ak[uk - Üfc + ž±k+1\kKk+1itk+1\k + p'fc+1xfc+i|/. + TjtH [Ľk+i\k — T,k+i\k+i]Kk+1 + Wjv5] XX N\N N-l j=k+l (si+i|i i)K? where (see Appendix H) W, H7 E«W+i(4)' iex ji+m+ ._?_ejpl+l_4. i+ij (10.36) 0 (10.37) CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 111 K, Kxx ! K^x' KfT~~Kf~ (10.38) Kxx = Kj (10.39) Kf = [(^)'K^ + D'K^A - [[(^)'K^ + D'K*]B + [EeJp^b^V^B'K^A + F'] + Ee^px+1a^ with K^x = 0 (10.40) in which (see Appendix L)4 f$(k) = E^W + E^KľbiW + E^W (10-41) iex iex iex Kf = (^'[K-f^ + K^Dl + D'p^ + K^D] - m)'K^1 + D'K^]B + [Ee;Px+1b^] V x [B'fK^tf + K&D] + Ee;Px+1b^] withK^ = 0 (10.42) where n = [A + B'K^B]1 Also see Appendix I for the derivation of px = K,x0)i + p, (10.43) Thus Eq. (10.36) provides one way of evaluating the approximate optimal cost-to-go. Alternatively, one may use a second approach to evaluation of the cost-to-go in order to separate it into deterministic, cautionary, and probing terms, as in Bar-Shalom and Tse (1976a). To do this, begin with Eq. (9.68) instead of Eq. (9.69). Expression (9.68) is Jd,N-k = milM wfc(zA!> Ufc) + 0fc(Ufc) + C0;Ar_fc_i + 7fc+i N-l + |tr[Kfe+1Efe+1|fc] + I J2 tr[KJ+1Q- + AzzXjtí]\ (\0.44) 2 j=k+i (The notation Qz is used for the covariance of the system-equation noise terms for the augmented system.) This can be separated into three components as J*dN-k = min(JD;Ar_fc + Jc,N-k + Jp,N-k) (10.45) 4 Expression (10.40) is similar to Tse and Bar-Shalom (1973, eq. (3.15)). Equation (10.40) contains a term in a.le which should be added to the equation in Tse and Bar-Shalom. CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 112 where the deterministic component is JDyN-k = ujk(zk, Ufc) + 4>k(uk) + C0;Ar_fc_i + 7fc+i (10.46) the cautionary component is i N-l Jc,N-k = ltv[Kk+1Vk+llk} + - J2 tr[KJ+1Q^] (10.47) z j=k+i and the probing term is ■i N-l Jp,N-k = ň E tr[yLzzE,b] (10.48) 1 j=k+i Expression (10.46) contains all the deterministic terms, and this is the rationale for separating it from the stochastic terms in Eqs. (10.47) and (10.48). Increases in control do not affect Qj in Eq. (10.47) but may increase Y,k+1\k. Therefore, minimization of the cautionary component (10.47) usually requires selecting uk so as to decrease the K weighting matrices and the T,k+i\k term. In contrast, since the elements of the matrices Hj\j in Eq. (10.48) can in general be decreased through use of more vigorous control levels u^, this expression is called the probing term. Expressions (10.46) to (10.48) define the cost components for the augmented system. For both computational efficiency and insight into the nature of the results, it is useful to write these components out in terms of the matrices which are the parts of the augmented system. This is done in Appendix Q. The results are shown below for deterministic terms [from Eq. (Q.3)] JDyN-k = [xfc -±k]'Fk[uk -üfc] + \[uk -ük]'Ak[uk -üfc] + |[X0,AT - XAr]'WAr[x0)Jv - X,v] -r N-l + 9 E (K,j - x/Wjfoj - Xj] + [X0J - XoJ'FjlUoj - Üj] z j=k+l + [n0,-üJ]'A][n0,-ü]]) (10.49) cautionary terms [from Eq. (Q.8)] Jc,N-k = 2tr(Kfe+i^fc+i|£i) + tr(KA.^1EA.^1|fc) + 2tr(KA.+1Efc+1|A.) i N-l + ô E [tríK^.Q,) + tr(KJ^ir,)] (10.50) A j=k+i CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 113 and probing terms [from Eq. (Q. 13)] 1 N-l J, P,N-k Y: { tr([A'K-B + F]A,,[B'K-A + F']E: z j=k+i XX \ + 2tr([B'K£1A + F]'^'[K™^ + Kf+1B + EeiPxH]E^) + tr([[D'K^ + f^'K-jB + [EeiPxH] V. 3 00 \ x [B'[K-f* + K&D] + EeiPxH]E%.)}(10.51) With these components in hand the algorithm can now be explained in detail. 10.4 Dual-Control Algorithm A flowchart of the algorithm is provided in Fig. 10.1. There are two major sections: (1) the search on the left in the figure and (2) the update procedure on the right. The purpose of the search is to determine the best control to use in the current time period U&, and the update procedures project the system forward one time period and update the estimates of the means and covariances of both the original states and the parameters. The means and covariances are zfc+l|£i+l e and Hk+i\k+\ fc+llfc+l Exx E~0" Ex0 ""E"0"0" fc+l|£i+l The search procedure is further outlined in Fig. 10.2, which shows three trial values of the control u£. In fact more trial values that this are generally used before the search converges. The trial value uj[ for the rth trial is used to project the mean and covariance (zfc+1|fc, Efc+1|fc) of the state of the system at time k+1 with measurements through time k. These values are then used as the initial condition for the solution of a deterministic optimization (certainty-equivalence) problem from time k + 1 to final time N. This problem is a quadratic linear approximation of the nonlinear deterministic problem. The solution to this problem provides the nominal path (z£ ■, ulj)jLk+1 around which the approximate cost-to-go J d can be determined. This procedure is repeated for each uj[ until the search algorithm converges, at which time the optimal control u£ for period k is set equal to that search value uj[ which minimizes Jd(uTk). CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 114 Initialize t-0 V and * - k + 1 Compute CE control uj^ and set uj ■ ujp " / ^ Apply uj and get estimated mean and covariance of (täte in period * + 1, that is,£»♦„! and2t,,u r>x_ Use measurement yt„, and update the mean covariance matrix ' ' Solve CE problem with it» lit as initial condition and call this path the nominal path C ry^ < ' Apply measurement noise and get measurement yt„ t Evaluate the dual cost: 1. Backward recursion in K 2. Projection of 1 and computation of dual cost 1 C \ D-i~ Apply the control u;*u; and the systems noise li to Determine new uj with the xearch algorithm * No ^^ Is the ^ ^^ search \v over? \^Yes **«**♦! Figure 10.1: Flowchart of the algorithm. CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 115 Associated approximate cost-to-jo V»Ž> ■&<«*> Figure 10.2: Search procedure to determine ujji. Set ujji = {uTk for r that gives mmJd(ul)}. Then the update procedure is begun by using ujji and the system noise £k, which is obtained from the random-number generator, to determine the state of the system zfc+i at time k + 1 through the system equations (see Fig. 10.3). The state cannot be directly observed but is measured through the observed y^+i-The measurement yfe+i is obtained from the measurement relation, where the measurement noise Ck+i is included. Then the measurement yfe+i is used with the Kaiman filter to update the estimates of the mean and covariance of the augmented state ik+i\k+i and Sfc+1|fc+1. These two procedures of a search iteration and then an update are repeated for each time period until the final time period is reached. The cost of the dual-control solution is then calculated for this single Monte Carlo run. A number of Monte Carlo runs are then performed in order to obtain a distribution of the cost associated with the use of the wide-sense dual-control strategy. The algorithm is now outlined step by step. The algorithm outlined here is based on the evaluation of the cost-to-go and its three components in Eq. (10.45). At each time period k there are three major steps in the algorithm: 1. Initialization State at * + 1 State at k <**.*. Itia) Trial values oŕuj used (žfiii.2*»!!*)^ in search Nominal path (»I u1 1" «W.»;/J".*.i lz3 u3 1* * + ] N 2. Search for the optimal control u^ CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 116 State estimates at* <**!*• 2ti*> * after measurement **♦! tt.I y*. " öi«iit»i•^•»♦ii*»]) State estimates ati+ 1 System noise k + 1 before Measurement * + 11 at measurement noise measurement Figure 10.3: Monte Carlo and update procedures. 3. Updating the estimates of the states and parameters. 10.4.1 Initialization The first step in the initialization is to compute the nominal value of the parameters. If the parameters are constant, this simply means setting 60yk to 0fc_i|fc_i. If they are not constant, it means using Eq. (10.29) to project 0oj for j = k,... ,N from öfc_1ifc_1. Once this has been completed, it is necessary to update Aj, Bj, and c j for j = k, .N. j = k + l, N are calculated by Next the Riccati parameters Kj and pj using Eqs. (10.33) and (10.34). Finally, it is necessary to choose a value of the control u^ with which to begin the search for the optimal control u*k. While this may be done in a variety of ways, it is normally done by solving the certainty-equivalence problem for period k to TV and then setting u£ for step r = 1 in the search to the certainty-equivalence solution for period k as given by Eq. (10.30). 10.4.2 Search for the Optimal Control There are eight steps in this procedure: 1. Use the control u£ to get the proj ected states and covariances in period k+1, that is, zk+1\k andEfe+iifc. CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 117 2. Get the nominal path for period k + 1 to TV by solving the certainty-equivalence problem using the x component of zfe+i|fc as the initial condition. 3. Compute the Riccati matrices K0X and Kee for all periods. 4. Calculate the deterministic component of the cost-to-go for period k and period N. 5. Calculate part of the cautionary component for period k + 1. 6. Repeat the calculation of the following components for periods k + 1 through TV- 1: a. Deterministic b. Cautionary c. Probing d. Total cost-to-go 7. Choose a new control uTk+1 in the search. 8. Repeat steps 1 through 7 until all the search points have been evaluated and then select the control which yields the minimum total cost-to-go. In greater detail the eight steps are as follows. Step 1. Use u£ in Eqs. (9.73) and (9.78) to project the future state žk+í\k and covariance Sfc+1|fc. These results are specialized in Appendix M to the components x and 0 of z for the linear problem and are given in Eqs. (M.8) and (M.9) as xfc+1|fe = Afexfe|fc + Bfcu£ + Cfc + J2 eitr(4s*|fc) (10.52) and c\k) 0k+i\k = D0O;fc (10.53) Also the covariance terms are given in Eqs. (M. 16) to (M. 19) as -"fc+iifc Sxx i (Eex)' i XJě"x"T"5Íěě" (10.54) k+i\k CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 118 with the component matrices T XX Jk+l\k ^fc+llfc - AfcSfc|fcAfc + AfcSfc|fc(föfc) ^0k^kfk-^k + ^9k^k\k(^0k) + Qfc E £ eV)'tr[aÄa^ + a^U'E^] (10.55) S£i|* = DfcE^Ai + DfcEfJC^)' (10.56) SKii* = DfcEfffcD'fc + rfc (10.57) Vex A ' Jfc+i|fe Also recall that the igk term is given as íok = Ee^a* + Ee*K)'H + E^c« (10-58) i j j The initial conditions for Eqs. (10.55) to (10.57) are normally set to be diffuse priors, i.e., the diagonal elements of Eg£ and Ex,x are set to large numbers and the other elements of these two matrices and the elements of E^ are set to zero. Step 2. Obtain the nominal paths (x0J)^=A.+1 and (u0yj)fskl+1 by using Xfc+i|fe as the initial state and solving the certainty-equivalence problem from period k +1 to period N. This also provides the initial value of u£ for the search, i.e. for r = 1. Thus i4 = u£E. Step 3. Compute the Riccati matrices K0X and Kee for periods k +1 to period N. (Recall that Kxx = K was computed during the initialization stage.) For these computations use the backward recursions (10.40) and (10.42). The matrix Kj can then be formed from the components by using Eq. (10.38). Step 4. Calculate the deterministic component of the approximate cost-to-go for period k and for period TV (but not for the periods in between) by using the first through third terms on the right-hand side of Eq. (10.49). Step 5. Calculate the cautionary component for period k+1 by using the first three terms in Eq. (10.50). This expression uses the terms Efc+1|fe. They are available from step 1. It also uses the terms Kfc+1, which were calculated in step 3. Step 6a: Deterministic Component For each period j = k + 2,..., N - 1 evaluate the fourth through sixth terms in Eq. (10.49). Step 6b: Cautionary Component Use the fourth and fifth terms in Eq. (10.50). CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 119 Step 6c: Probing Component Use the right-hand side of Eq. (10.51). The KJ+i matrices were calculated in step 3, but the T,^ matrices must be calculated. They are obtained by using Eqs. (10.55) to (10.57) to get ^j\j-i from Tij_i\j_1. Then H^j can be obtained from Ej|j-i by using Eqs. (K. 17) to (K. 19) ^fc+i|fe+i = [I ~~ ^+i\k^k+i^k+i^k+i]^k+i\k (10.59) ^fc+i|fe+i = (Efc+i|fc+i) = ^fc+i|fe[I ~~ HA.+1SA.+1Hfe+iE^1|A.] (10.60) ^k+l\k+l = ^k+l\k ~ ^'A;+l|fcHfc+iSA.+1Hfc+iEA.+1|A. (10.61) where, from Eq. (K.15), Sfc+i = Hfc+iE^1|A.H'A;+1 + Rfe+i (10.62) Step 6d: Total Cost-to-go Sum the deterministic, cautionary, and probing terms over the periods k + 1 to N. Step 7. Choose a new control u.Tk+l for the grid search. In practice the total cost-to-go in step 6 is evaluated at 20 to 30 points in the range where the optimal control is expected to lie. This is used when uk consists of a single control and when there is concern that Jd(uk) may have local optima. If local optima are not a concern, gradient methods can be employed at this step to get the new control uTk+1. Step 8. Repeat steps 1 through 7 until all the search points have been evaluated (for a grid-search technique) or until satisfactory convergence is obtained (for gradient methods). This concludes the eight steps in the search for the optimal control u*k at time period k. The final part of the algorithm is the updating, outlined next. 10.5 Updating State and Parameter Estimates Once the optimal control u*k has been determined, it is applied to the systems equations (10.13), (10.20), and (10.21) to obtain the two components of zk+i zk+1 = \ *fe+1 1 (10.63) where xfc+i = AfeXfc + BfeU^ + Cfc + Vfc (10.64) CHAPTER 10. QUADRATIC LINEAR ACTIVE-LEARNING CONTROL 120 and 0fc+i — D0fc + rjk (10.65) A Monte Carlo procedure is used to generate the random variables vfc and rjk using the covariances Q& and Tk, respectively. Next the values xfc+1 and 6k+i are used in the measurement relationship (10.14) with (10.22) and (10.23) to obtain the measurement vector yfe+1 Yfc+i [Hfc+1iO öfc+i + _Y_fc+! 0 or yfc+i = Hfc+1xfc+1 + wfc+1 (10.66) A Monte Carlo procedure is used to generate the random elements in wfc+1 using the covariance Rfc+i- Finally the measurement vector yfc+i is used in the augmented Kaiman filter equations (N.7) and (N.8) to obtain updated estimates of the means of the initial states x and of the parameters 0 xfe+i|fc+i 0 fc+l|£i+l Xfc+i|fe + ^k+i\k^-k+i^k+i[yk+i 0k+i\k + sfc+iifcHfc+isfc+i [y^+i where 'fc+i H VXX TT/ k+l-^k+llk^-k+l R ■fc+i Hfe+1xfe+1|fc] (10.67) Hfc+iXfc+iife] (10.68) (10.69) These estimates are then used as the starting values for the next time period. The algorithm is then repeated for each time period until the last period is reached. If one wishes to make comparisons across Monte Carlo run, the entire multiperiod problem must be solved for each set of random elements obtained. This is the procedure used in Chap. 12. Chapter 11 Example: The MacRae Problem 11.1 Introduction Two examples are presented in this chapter and Chap. 12. The first is a problem drawn from MacRae (1972), with a single state variable and a single control variable. It was chosen both because it was simple enough to permit hand calculations and because a variant of it was used in Chap. 7 to illustrate the calculations used for passive-learning stochastic control. The calculations for this problem are shown in considerable detail, both to enhance understanding and to make them more useful for debugging computer programs. This same problem was used by Bar-Shalom and Tse (1976a) to compare the performance of a number of active-learning stochastic control algorithms. The second problem is constructed from the small macroeconometric model used in the deterministic example in Chap. 4. Detailed calculations for the second problem are not given; instead the focus is on the final results and their economic implications. 11.2 Problem Statement: MacRae Problem Find («o, u\) to minimize J = E[\qxl + -'£l(q4 + r4)] (11.1) 121 CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 122 A = .7 B = -.5 c = 3.5 x0 = 0 X2 = Xi = x0 = 0 üi = üo = 0 #0|0 = £o|o = 0 ^oio — -D ^oio — u Q = .2 W2 = Wi = Wo = 1 Ai = subject to Xk+i = axk + buk + c + Ek for k = 0,1 (11.2) xo = 0 MacRae uses a set of different parameter values. For this example, let a = .7 b = —.5 c = 3.5 a£ = .2 g = 1 r = 1 <76 = .5 <7a = ac = 0 Only the b parameter is treated in an adaptive manner. The parameters a and c are treated as though they were known perfectly. In the notation used in the previous chapter the parameters of this problem are as follows: -.5 The only element in 6 is the single unknown parameter b. The desired paths are set to zero. Since the problem assumes perfect observation, the initial covanance of x is zero. Because A and c are not functions of 6 but B is, we have xfe+1 = Axfe + B(0)ufe + c + £fe (11.3) Also, since it is assumed the b is unknown but constant, the parameter equations (10.9) become ek+1 = nok + r/* (11.4) with D = 1. It assumed that 97o|o = 0 and crv = 0 So, in the notation of Chap. 10, 7]k - 7V(0, Tk) with Tk = 0 Since there is no measurement error, the measurement equation yfc = Hxfc + wfc (11.5) becomes . , wfc ~ 7V(0,Rfc) yk = xfe + wk with R = g (1L6) CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 123 11.3 Calculation of the Cost-To-Go The calculations performed here follow the description in Sec. 10.4. 11.3.1 Initialization (a) Initialize with k = 0. (b) Generate 0oj with Eq. (11.4). Since the parameter is assumed to be constant, one has Ooj = "5 j = (0,1) (c) Compute Kj and p, for j = 1, 2 from (10.33) and (10.34) K2 = W2 = l p2 = -W'2x^=-(1)(02=0 (11.7) Kx = A'[I-K2B^iB']K2A + Wi where, firomEq. 10.32, Mi = [Ai + B'KaB]"1 = [1 + (-.5)(l)(-.5)]"1 = .8 Therefore, Kx = .7[1 - (l)(_.5)(.8)(-.5)](l)(.7) + 1 = 1.392 Also, P! = -AK2B/Lt1[B'(K2c + p2)-A1ü1] + A'(K2c + p2) = -(.7)(l)(-.5)(.8){(-.5)[(l)(3.5) + 0] - (1)(0)} + (.7)[(1)(3.5) + 0] = 1.96 (11.8) In summary, k Kfc________Pfc 2 1 0 1 1.392 1.96 CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 124 (d) Set uk = ujr as given by Eq. 10.30 u0 = -/íotB'ÍKiAxo + KxC + pO-Aoui] (11.9) Ho = [A0 + B'K1B]-1 = [l + (-.5)(1.392)(-.5)]-1 = .742 Then, from Eq. (11.9), u0 = -(.742){(-.5)[(1.392)(.7)(0) + (1.392)(3.5) + 1.96)] - (1)(0)} = 2.534 11.3.2 Search for Optimal Control Search for the optimal control in period k as outlined in Sec. 10.4. Those steps are followed here. Step 1. Apply uk to get the predicted state zk+1\k and its covariance, T,k+1\k. Use zk+l\k I* Xfc+l|£i 0k+l\k and, from Eq. (10.52), X!|o = Ax0|0 + B(Ö0|o)u5 + c + tr[a*EJ^ (11.10) Since A is not a function of 6, a0 = 0. Also uj[ = u^E = 2.534 from the initialization above. Therefore, X!|o = (.7)(0) + (-.5)(2.534) + 3.5 + 0 = 2.233 Similarly, from Eq. (10.53), #i|o = D0o|o and since D = 1, (11.11) 0i|o — 0, oio The covariance Sfc+1ifc is obtained by using Eq. (10.54) -1X0 ^feH life L Eex : E ee (11.12) fc+l|£i CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 125 where, from Eq. (10.55), Exx _ » ^x a ' i A Vxe fx ' i fx V0X A ' _l_ fx Vee fx'i_n i|o — A2jo|oA + ^^oio^o +100^010^ +Ieo2jo|oIeo + ^o +tr[ae£^oa0£*x1 + a^a^] (11.13) with, from Eq. (10.58), feo = xo|oa0 + u^be + c0 (H-14) Since A and c are not functions of 0, a0 and c0 equal zero. However, b is a function of 0 and be = 1, so f|0 = 0 + (2.534) (1) + 0 = 2.534 Therefore, Eq. (11.13) becomes £x,x0 = (.7)(0)(.7) + (.7)(0)(2.534) + (2.534)(0)(.7) + (2.534)(.5)(2.534) + .2 + 0 = 3.410 (H-15) Next to use Eq. (10.56) to obtain E Ox __ pirite a' I nv^fx ' i|o — U2-lo\oA- + ^^oio^o = (1)(0)(.7) + (1)(.5)(2.534) = 1.267 (H16) Finally, use Eq. (10.57) to obtain E?f0 = DEgjD' + r0 = (1)(.5)(1) + 0 = .5 (11.17) In summary Xl|0 = 2.233 0i|o = --5 £xx, = 3.410 £?*, = 1.267 £^0 = .5 Step 2. Use xi|0 as the initial state and solve the certainty-equivalence problem for period 1 to 2 by computing (x0j)|=1 and (u0j)]=1 using Eqs. (10.30) and (10.28). From Eq. (10.30), u0)i = -/Lti[B(K2Axi|o + K2c + p2) - Aiüi] with/íi = [Ai + B'KaB]"^ [1 + (-.5)(1.0)(-.5)]"1 = .8 (11.18) u0)i = -(.8){(-.5)[(l)(.7)(2.233) + (l)(3.5) + 0]-(l)(0)} = 2.025 CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 126 and, from Eq. (10.28), x0;2 = Axi|o + B(0O)i)uO;i + c = (.7)(2.233)+ (-.5)(2.025)+ 3.5 = 4.050 (11.19) Therefore, the nominal path is ^ xo,fc uo,k 1 2.233 2.025 2 4.050 Step 3. Compute K^x and K^e for j = 1,2 by using the backward recursions (10.40) and (10.42). Recall that Kxx = Kj from Eq. (10.39); therefore it is not necessary to evaluate it since Kj was computed above. First compute K^x using Eq. (10.40) Kf = 0 (11.20) K?x = [f^'K^ + D'KfJA - [(f0YKxx + D'Kf )B + (pxbe)']Ml[BKxxA] + Pxae (11.21) where, from Eq. (10.43), px = K2x0,2 + p2 = (1) (4.050) + 0 = 4.050 (11.22) and, from Eq. (10.41), fei = xO)ia0 + uO)ib0 + ce = (2.233)(0) + (2.025)(l) + 0 = 2.025 (11.23) and, from Eq. (10.32), m = [Ä! + B'K2B]_1 = [1 + (-.5)(1.0)(-.5)] = (1.25)-1 = .8 Then, Eq. (11.21) can be solved as K?x = [(2.025)(1) + (1)(0)](.7) - {[(2.025X1) + (l)(0)](-.5) + (4.050)(1)}.8(-.5)(1)(.7) + (4.050)(0) = 2.268 (11.24) CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 127 Next calculate Kf9 from Eq. (10.42) as Kf = 0 (11.25) Kee = ^'[K-f^ + KfDj+D^Kff^ + KfD] - [(ft'KJ* + D'Kf )B + pxbe]Ml x[B'(K-f0x1+KfD)+p^be] = (2.025)[(1)(2.025) + (0)(1)] + (1)[(0)(2.025) + (0)(1)] -{[(2.025)(l) + (l)(0)](-.5) + (4.050)(l)}(.8) x{(-.5)[(l)(2.025) + (0)(1)] + (4.050)(1)} = -3.282 In summary, the Riccati matrices for the augmented problem are k Kxx Kg* Kf px 1 1.392 2.268 -3.282 2 1.000 0__________0___________4.050 In order to show the breakdown for the cost-to-go into deterministic, cautionary, and probing components, steps 4 through 6 from Sec. 10.4 will be used. Step 4. Calculate the deterministic cost for period k and period TV by using the first through third terms on the right-hand side of Eq. (10.49). Calling the sum of these terms J^kN_k, one obtains Jn%-k = [xfc -XfcJ'F^Ufc -üfc] + \[uk -ük]'Ak[uk -üfc] + |[X0,AT - XAr]'WAr[x0)Jv - Xjy] j£* = 1(2.534-0)(l)(2.534-0) +|(4.050 - 0)(1)(4.050 - 0) = 11.412 (11.26) Step 5. Calculate the cautionary cost for period k +1 by using the first three terms in Eq. (10.50) Jc,N-k = 2^T(^-k+i^k+i\k) + tr ^KA,_|_1SA,_|_1|r.J + 2tr(KA;+1EA.+1|A.J J%$ = itrCK^S^J+tr^E^ + itr^fsyo) (11.27) J^+! = i[(1.392)(3.410)] + (2.268)(1.267) + |[(-3.282)(.5)] = 4.426 CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 128 Step 6 Repeat steps 6a through 6d for time periods j = k + 1 through j = N — 1, that is, from j = 1 through j = 1. STEP 6a Calculate the future deterministic cost for period j by evaluating the fourth through sixth terms in Eq. (10.49)(using F = 0) 1 i JÍ,N-k = g SQX°J - Xj]'WÁXo,j - Xj] + [Uoj - Üj]'Aj[u0)j - Üj]) J^ = i[(2.233)(l)(2.233) + (2.025)(l)(2.025)] = 4.543 (11.28) STEP 66 Calculate the future cautionary cost for period j by evaluating the fourth and fifth terms in Eq. (10.50) Jc,N-k = IntrtK-QiRt^KfrJ) ^Civ = |[tr(1.0)(.2)+tr(0)(0)] = |(.2) + 0] = .100 (11.29) STEP 6c Calculate the future probing cost for period j by evaluating the right-hand side of Eq. (10.51). The KJ+1 matrices are available from step 3, but the Tij\j matrices must be computed. From Eq. (10.51) JP,k = itr([A'KrB]/iip'K£*A]Eíjí) +tr([B'KrA]>1(B'[Krf0x1 +Kf D] +pxb0)E^) +|tr([[D'Kf + feVKr]B + pxb0] x/ixfB'fK^! + Kf D] +pxbfl]syi) (11.30) All the matrices in Eq. (11.30) except the 'Ľ^j terms have been computed before. To obtain the £j|/ s use Eqs. (10.55) to (10.57) to get ^j\j-i from'Ľj_1\j_i and Eqs. (K.17) to (K.19) to get Eib from E^-i. We need £i|i in order to evaluate Eq. (11.30). This can be obtained by using Eqs. (10.55) to (10.57) to obtain Si|0 from £0|o, but this has already been accomplished in step 1, with the result £** = 3.410 £?*, = 1.267 £?f0 = .5 Then, to obtain £x,x use Eq. (10.59) £xx = [I - E^H^Sr1^]^^ (11.31) CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 129 where, from Eq. (10.62), Si = HiEX|gH i + Ri = (1)(3.410)(1)+ 0 = 3.410 Therefore, S71 = .293 and thus £xx = [1 - (3.410)(1)(.293)(1)](3.410) = 0 In this situation with no measurement error and H = 1, the covariance of the state variables reduces to zero after the new measurement is taken. Next, use Eq. (10.60) to obtain Eöx __ v^x rj tt/ o —ItJ Vxx1 i|i — -^iioL1 ~ n i°i "í-^iioJ = 1.267[1-(1)(.293)(1)(3.410)] = 0 (11.32) with the result that Tif?[ also is zero after the measurement. Finally, use Eq. (10.61) to obtain V00 — V^e VÖXTT/ C-1tt vxö ^111 — ^lio ~~ ^lio*1 isi "l-^iio = .5- (1.267)(1)(.293)(1)(1.267) = .03 (11.33) Also, with perfect measurement and a single unknown parameter there is a substantial reduction in the uncertainty associated with the parameter 6. In this case the variance of b (the only element in 6) is reduced from .5 to .03 by a single measurement. In summary, from the initial data, from the initialization, and this step: j k Vxx Vox ^i\k vöö 0|0 0 0 .500 1|0 3.410 1.267 .500 111 0 0 .030 Now Eq. (11.30) can be evaluated. Since both £xi and H% are equal to zero, it can be simplified substantially to JP,N = itr([[D'Kf+ f0xÍKxx]B + pxbe] xn1[B'[K?f$1 + KfD] +pxb0]syi) (11.34) CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 130 Al so, fgl3 jLti, and p| can be obtained from Eqs. (11.23) and (11.22), respectively, giving Jp,n = ^([[(1)(0) + (2.025)(l)](-.5) + (4.050)(l)](.8) {(-.5)[(1)(2.025) + (0)(1)] + (4.050)(1)}(.03)) = .110 In summary, the deterministic component can be obtained from steps 4 and 6a as JDN = J™N + J^N = 11.412 + 4.543 = 15.955 (11.35) The cautionary component is obtained from steps 5 and 66 as JCN = J* J: + J^N = 4.426 + .1 = 4.526 (11.36) Finally, the probing component is obtained from Eq. (11.34) in step 6c as Jp,n = .110 (11.37) STEP 6d The total cost-to-go conditional on u^ is then obtained by summing the three components, as inEq. (10.45) J(t,N-k{uk) = Jd,N-H + Jc,N-k + Jp,N-k (11.38) or, for u = 2.534 at time k = 0, ^d,iv(2.534) = Jd,N + Jc,n + Jp,N = 15.955+ 4.526+ .110 = 20.591 This completes the evaluation of the approximate cost-to-go for a single value of the control, namely u0 = 2.534. As the search proceeds, the cost-to-go function is evaluated at other values of the control. 11.4 The Search The search is then carried out to find that value of the control Uq which minimizes J^k1 Table 11.1 and Fig. 11.1 give the results of the evaluation of the deterministic, cautionary, and probing cost as well as the total cost for a number of values of the control u0.2 CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 131 Table 11.1: Evaluation of cost-to-go and its components for the MacRae problem Control Deterministic Cautionary Probing Total Uo Jd,n Jc,N Jp,N Jd,N 1.17 17.201 1.197 .496 18.894 1.28 17.005 1.434 .423 18.863 1.32 16.935 1.525 .400 18.860 1.37 16.869 1.616 .378 18.863 1.56 16.588 2.056 .294 18.938 2.53 15.957 4.527 .108 20.593 24 22 20 18 - 16 14 8 '* Total Deterministic 1 2 Firjt-period control Uq Figure 11.1: Total cost-to-go and components of two-period MacRae problem. CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 132 In Fig. 11.1 the deterministic cost component is relatively large and has the expected quadratic shape. The cautionary cost component rises with increases in the control value; i.e., caution results in a smaller control value that the deterministic component alone would imply. Finally, the probing cost component falls with increases in the control value. Thus, caution and probing work in opposite directions; however, the probing term is smaller and has a smaller slope. By way of contrast and in order to emphasize that the function Jdiuk) may have local minima, Fig. 11.2 provides a plot similar to Fig. 11.1 but for a slightly different problem. This problem is the same as the previous MacRae problem with two exceptions: (1) all three of the parameters a, b, and c are treated as unknown rather than only b (the initial variances of all three parameters are set at .5), and (2) the model is solved for 10 time periods instead of 2 (the penalty ratio of q to r is kept at 1:1 for all time periods). As Fig. 11.2 shows, the probing cost component is nonconvex, and this produces two local optima in the total cost-to-go. This situation was discovered by accident. The author and Fred Norman were using this problem to debug their separately programmed codes. Both obtained the local optimum around 5 and concluded that the codes were debugged. The author subsequently modified his code, solved the problem again, and found the local optimum near 10. At first it seemed that there was an error in the modified code, but subsequent analysis revealed the nonconvex shape of the cost-to-go function. 'For a discussion of this, see Kendrick (1978). 2See also Bar-Shalom and Tse (1976a, p. 331). CHAPTER 11. EXAMPLE: THE MACRAE PROBLEM 133 240- Total Cautionary Deterministic Figure 11.2: Total cost-to-go and components for 10-period MacRae problem. Chapter 12 Example: A Macroeconomic Model with Measurement Error 12.1 Introduction In Chap. 4 a small quarterly macroeconomic model of the United States economy was used as an example of deterministic control. Here that model is converted into a stochastic control model with measurement error and solved with the active-learning algorithm of Chap. 10. Four sources of uncertainty are added to the model of Chap. 4: 1. An additive error (or noise) term in the system equations 2. An error term in the measurement equations 3. Uncertainty about initial conditions 4. Uncertainty about parameter values in the system equations Of these four sources of uncertainty, the first is the most widely considered in economic models. It was discussed as additive uncertainty in Chap. 5. The fourth type of uncertainty, i.e., the parameter values, was discussed under multiplicative uncertainty in Chap. 6; however, the control was not chosen in an active-learning manner. Uncertainty of types 2 and 3 are much less widely used in economic models. There is a substantial literature in econometrics on measurement errors [see Geraci (1976)], but this has not previously been systematically incorporated into macroeconometric models to show the effect of measurement error on policy 134 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 135 choice. A new start in this direction was made by Conrad (1977) and the model of this chapter continues by including measurement error in a model with active learning. Since different economic time series are of greatly varying accuracy, the use of measurement-error information provides a systematic way to take account of this fact while choosing policy levels. For example, the uncertainty associated with inventory investment data is much greater than that associated with aggregate consumption data; so one would like to discount somewhat inventory investment data relative to consumption data when making decisions about fiscal and monetary policy. The procedures outlined in Chaps. 9 and 10 provide a way to do this. Also, once one introduces measurement error, it becomes apparent that the initial conditions of the model can no longer be treated as though they were known with certainty. Instead one must take account of the fact that policy makers do not know the present state of the economy exactly. However, economists frequently have information about the degree of uncertainty attached to each element in a state vector describing the economy. It is this information which is exhibited in the application discussed in this chapter. 12.2 The Model and Data Recall from Eq. (4.25) that the model can be written as xfe+i = Axt + But + c where (12.1) Xfc with Ck h consumption investment u*: = Ok = [obligation] (12.2) and 1.014 .002 .093 .753 x0 .004 .100 460.1 113.1 -1.312 .448 (12.3) (12.4) CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 136 Also from Eq. (4.26) the criterion function is J = |[xat - XatJ'Wa^Xjv -x,v] -r N-l + 9 J2 (txfc - *fc]'wfc[xfc - Xfc] + [ufc - ük]'Ak[uk - üfc]) (12.5) where x = desired state vector, ü = desired control vector, W = weights on state deviations, A = weights on control deviations. The paths x and ü were chosen by assuming desired growth rates of .75 percent per quarter. The base for these desired paths are the actual data for 1969-1 x0 460.1 113.1 üo = [153.644] (12.6) The weighting matrices were chosen to give greater weights to state deviations in the terminal year than in other years in order to model the fact that political leaders care much more about the state of the economy in quarters just before elections than in other quarters. Therefore, these matrices were set as WN = diag [100,100] Wfc = diag[l,l] A* = [1] k = 0,1,...,7V-1 (12.7) Take stochastic version of the model is obtained by minimizing the expected value of Eq. (12.5) subject to system equations v _ay ,d„ ,.,v x0~7V[x0|0,£^] and measurement relations yk = Hxfc + wk wfc - N[0, R] (12.9) where v = system-equation noises, w = measurement errors, CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 137 H = measurement matrix, y = measurement vector. It is assumed here that the initial x is known imperfectly and that its estimate is normally distributed with mean x0|0 and covariance E*,*. The system-equation noise vk and the measurement noise are both assumed to be normally distributed and serially uncorrected with means zero and covariances Q and R, respectively. Although it is not true that the error terms are uncorrected, that assumption has been used here for the sake of simplicity. The diagonal elements of the covariance of the system-noise terms Q are the square of the standard errors of the reduced-form equation errors. The diagonal elements of this matrix are Q = diag [9.61,18.92] (12.10) The measurement-error covariance R was estimated from the revisions data by the procedure outlined in Appendix R. The resulting matrix is R 2.71 1.12 1.12 2.78 (12.11) Note that the variance of the measurement error for consumption is low relative to its value of x0 (2.71 on 460.1 billion) while that of investment is relatively high (2.78 on 113.1 billion). The algorithm described here takes account of this fact and relies less heavily on the observed value of investment y2 than the observed value of consumption y\ in updating estimates of both states and parameters and therefore in determining the control to be used in subsequent periods. In the results reported here a single case of parameter uncertainty has been considered. In this case all eight of the coefficients in A, B, and c were learned.1 As in Chap. 10, a parameter vector 0 consisting of the uncertain parameters is created and added to the initial state vector x to create a new state vector z. The state equations for the augmented model are xfc+1 = A(0k)xk + B(0k)uk + c(0k)+vk (12.12) 0k+1 = D0fe + 77fc (12.13) where D is assumed to be an identity matrix and rj is assumed to have both mean and covariance equal to zero (to model the case of constant but unknown parameters). 'In contrast, in Kendrick (1979) only 5 of the 15 parameters were learned. The other 10 were assumed to be known perfectly. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 138 Part of the system equations, namely Eq. (12.12), is now nonlinear in the new state vector r xfe Zfe Oh (12.14) Also, the covariance of the state vector at time k as estimated with data obtained through period k is now defined as ^k\k ^k\k Vox ^k\k VX0 ^k\k ^k\k (12.15) With this notation the initial conditions for the augmented state equations (12.12) and (12.13) are 1.014 .002 -.004 460.1 1 £ _ -1.312 113.1 0|° ~ .093 .753 -.100 .448 xo|o (12.16) T1XX ^oio 2.71 1.12 1.12 2.78 (12.17) V>0x ^oio (S?o)' "0 0" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (12.18) V*00 ^oio .2690E-03 .5469E-03 .3743E-03 .5619E-02 .5469E-03 .2297E-02 .1590E-03 .1992E-01 .3743E-03 .1590E-03 .9675E-03 .1039E-01 -.5619E-02 -.1992E-01 .1039E-01 .2316E+01 0 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 139 0 .5440E-03 .1106E-02 .7568E-03 .1136E-01 .1106E-02 .4644E-02 .3215E-03 .4028E-01 .7568E-03 .3215E-03 .1956E-02 .2102E-01 -.1136E-01 -.4028E-01 .2102E-01 .4684E+01 (12.19) The prior mean of x is set to x0, and the prior mean for 6 is set to the estimated reduced-form parameter estimates. The state covariance (12.17) is set equal to the measurement-error covariance. The covariance (12.18) was set to zero.2 The covariance (12.19) was estimated with the Time Series Processor (TSP) econometric package. 12.3 Adaptive Versus Certainty-Equivalence Policies When measurement errors are considered, will adaptive-control methods yield substantially better results than certainty-equivalence and open-loop-feedback methods? Posed another way, the question is whether or not it is worthwhile to carry out the elaborate calculations which are required to consider the possibility of learning parameters and the gains which accrue from this learning. Results presented later in this section provide some evidence that it is not worthwhile; however, these results are based on assumptions that many economists—including the author—find unrealistic. Before presenting the results a word of caution about numerical results obtained from complicated calculations is in order. As is apparent from Chap. 10, the computer codes from which these results have been obtained are rather complex since they include both estimation and optimization procedures embedded in a Monte Carlo routine. Independently coded programs have therefore been used to check results. Fred Norman, Yaakov Bar-Shalom, and the author have independently coded various versions and parts of the adaptive algorithms. The most complicated part of the code is in the evaluation of the cost-to-go. Norman (using his program), Kent Wall (using Bar-Shalom's program), and the author have been able to duplicate each other's results 2 This covariance could be estimated by applying a Kaiman filter to the same data that were used for estimating the reduced form of the model. Some sensitivity tests on an earlier model indicated that the results were affected substantially by the choice of £ %£. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 140 on a number of other problems but have not fully checked the present problem. Therefore, the results presented here must be checked against one's intuition until complete numerical checking can be accomplished. It is in this spirit that the results are presented. Table 12.1 shows the results from the 34 Monte Carlo runs completed. For each run random values of the systems noise vfc, the measurement noises wfc, the initial state estimate x0|o, and the initial parameter estimate ö0|o are generated using the means and covariances described above. The evidence suggests that the sequential certainty-equivalence procedure of Appendix O is inferior to both the open-loop-feedback method (OLF) of Chap. 6 and the adaptive-control (dual) method of Chap. 10. Of the two stochastic methods the OLF was superior in 18 and the dual method in 12 of the 34 runs. As more data are obtained, it will be useful to see whether there is a statistically significant difference between the three methods. If the OLF results continue to appear to be better than the dual results, it would be possible to use the computationally simple OLF results rather than the computationally complex dual procedures in performing stochastic control on macroeconomic models. Of course this tendency may not continue as larger models are used for experimentation. Also, these results are for a model is which the parameters are assumed to be constant over time. If, alternatively, it had been assumed that some or all of the parameters were time-varying (a realistic assumption for some parameters), the ranking of the three methods might be different. Under the assumption of time-varying parameters the initial covariance matrix for the parameters £g£ would probably have larger elements, representing the fact that the parameters would be known with less certainty. Then there would be more to learn, and the dual method might be superior to the OLF method. However, though more could be learned, the information obtained would be less valuable since its worth would decay over time with the time-varying paths of the parameters. 12.4 Results from a Single Monte Carlo Run In order to provide more insight into the types of results obtained from stochastic control models the results of one of the Monte Carlo runs (run 4) are presented in detail in the following pages. This run is representative in the sense that the OLF solution was the least costly (23.695), the dual was next (23.717), and the certainty-equivalence solution was the worst (23.914). Also the results make clear CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR __________Table 12.1: Comparison of criterion values (thousands)t Lte Carlo Certainty Open-loop Order equivalence C feedback 0 Dual D 1 0, D,C 22.450 22.240 22.320 2 0,C,D 28.710 28.610 28.730 3 C,D,0 31.850 32.000 31.870 4 0,D,C 23.941 23.695 23.717 5 D,0,C 23.020 22.917 22.909 6 0,D,C 29.229 28.787 28.867 7 C,D,0 21.597 21.759 21.637 8 0,D,C 19.219 19.139 19.213 9 D,0,C 25.392 25.324 25.278 10 0,D,C 27.907 27.418 27.504 11 D,0,C 25.975 22.242 22.115 12 0,D,C 26.402 25.818 25.975 13 D,0,C 21.615 21.438 21.298 14 0,C,D 27.810 27.705 27.853 15 D,0,C 15.893 15.701 15.563 16 D,0,C 23.078 22.862 22.811 17 0,D,C 23.545 23.084 23.107 18 C,0,D 26.899 26.934 27.013 19 0,D,C 22.366 21.820 22.092 20 0,D,C 18.360 18.283 18.349 21 D,0,C 22.069 21.745 21.512 22 D,0,C 20.177 19.914 19.904 23 0,C,D 17.938 17.879 17.976 24 0,D,C 36.133 35.678 36.057 25 0,C,D 24.143 24.128 24.182 26 C,D,0 27.871 27.943 27.911 27 0,D,C 21.019 20.678 20.905 28 0,D,C 26.181 26.080 26.082 29 0,D,C 27.344 27.123 27.276 30 D,C,0 18.488 18.498 18.458 31 D,C,0 29.107 29.156 28.977 32 D,0,C 26.894 26.581 26.496 33 0,D,C 16.811 16.312 16.523 34 D,0,C 28.878 28.708 28.678 t In these runs, the number of times each method had the lowest cost was OLF = 18, Dual = 12, CE = 4. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 142 512.00r 11________________________i________________________i________________________i________________________i________________________i________________________' ' 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Time, quarters Figure 12.1: Consumption. that the model used has some characteristics which detract from its usefulness for testing the relative performance of different control-theory methods on economic models. The input data which are specific to Monte Carlo run 4 and the numerical results for that particular run are included in Appendix T. The primary results are displayed graphically in the remainder of this section. 12.4.1 Time Paths of Variables and of Parameter Estimates Figures 12.1 and 12.2 show the time paths of the two state variables, consumption and investment, under each of the three control schemes, and the desired path for each state variable. Figure 12.1 tells very little about the results but illustrates one of the undesirable properties of this model, the fact that the consumption path is explosive and that differences in controls have very little impact on the consumption path. These results come from the fact that the coefficient an is 1.014 and the coefficient b\ is —.004. Thus consumption grows almost CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 143 , Open-loop feedback 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Time, quarters Figure 12.2: Investment. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 144 independently of changes in government expenditures. Figure 12.2 displays the investment paths under the alternative control schemes and is considerably more interesting. It illustrates the difficulty of maintaining an economy on a steady path in the face of the various kinds of uncertainty which face economic policy makers. (1) There is the additive uncertainty in the equation, representing the impact of unpredictable changes in investment which affect the level of investment additively. (2) The policy maker has an estimate of how the economy will respond to a policy measure but does not know what the actual response will be. (3) The policy maker does not know what the true state of the economy is at the moment because the statistics which report that state are affected by measurement errors. Next compare the sequential certainty-equivalence path (CE) and the dual-control path (dual) in Fig. 12.2. Qualitatively, one would expect the dual-control path to deviate farther from the desired path than the certainty-equivalence path in the early time periods but be closer to the desired path in the terminal period (just before the election). This occurs in this particular Monte Carlo run. In the first time period desired investment is roughly 114.0 billion, the CE investment-path level is about 114.3, and the dual-path investment level is roughly 115.4. So the CE path deviates from the desired path by .3 billion while the dual path deviates by 1.4 billion. In contrast, in the last time period (period 7), supposedly the quarter just before the next presidential elections, the CE time path deviates from the desired by roughly .5 billion while the dual time path deviates by less than .1 billion. It should be emphasized that this kind of pattern is not observed in all the Monte Carlo runs but is illustrative of the kind of result that one expects when comparing certainty-equivalence results with dual-control results. Next compare the OLF path with the adaptive-control (dual) path. This path is neither as far off the desired path in the first period nor as close to the desired path in the last period as the adaptive-control path. However, on average when all the costs are considered, including both the state and control cost, the OLF path has a slightly lower cost than the dual path. If Fig. 12.2 seems to confirm one's preconceptions about adaptive-control results, Fig. 12.3 shows that matters are not so simple. This figure shows the desired, CE, OLF, and dual paths for the control variable, government obligations. The simplest preconceptions about the control path in the first time periods in stochastic control problems are (1) that solutions like OLF which consider uncertainty will be more "cautions," i.e., have smaller control values, than those like CE which do not consider uncertainty and (2) that solutions like dual which consider learning as well as uncertainty will do more "probing," i.e., have control CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 145 200.00 r c 176.00i- 152.00 - Open-loop feedback Desired _i______________i_ 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Time, quarters Figure 12.3: Government obligations. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 146 values farther from the desired path, than solutions like OLF which consider the uncertainty but do not consider learning. One of these propositions is borne out by this particular Monte Carlo run, but the other is not. The OLF path is indeed more cautious than the CE path in the first time period, but the dual path does not exhibit more probing than the OLF solution in the first few time periods. As work progresses in this field, it will be interesting to observe what classes of models will on average over many Monte Carlo runs exhibit both the caution and probing characteristics.3 Figures 12.4 to 12.11 show the paths of the eight parameter estimates in the vector 6 for the eight time periods under each of the control methods. Figure 12.4 gives this information for the parameter an. The true value of the parameter is 1.014. The initial estimate of the parameter is 1.030, the same for all three methods. This initial estimate is generated by a Monte Carlo routine which uses the covariance of the parameter estimates. In glancing at all eight of the parameter-estimate figures (12.4 to 12.11) one observes that for all three methods the estimates change substantially in the early periods and much less in the later periods. This is due to the fact that as more data are collected, the state and parameter-estimate covariance become smaller and the extended Kaiman filter tends to assign lower weights to new observations in updating the parameter estimates. One can also observe that some of the parameter estimates actually diverge from, rather than converge to, the true values. While this is somewhat disturbing, it is worth remembering that the estimation done in the context of an optimal control algorithm does not treat all parameters equally. Some parameters are obviously more important than others when one considers the impact of uncertainty on the choice of control. For example, one of the most important parameters in this problem is b2, the parameter for the government-obligations control variable in the investment equation. This parameter is shown in Fig. 12.10. The estimates for this parameter converge toward the true value. The estimate made in the adaptive-control (dual) solution is closer to the true value at the terminal time than either the CE or OLF estimates. However, in the problem at hand there is a heavy weight on deviations of the states from the desired path at the terminal time (period 7), so it may be more important to have a good estimate of b2 in period 6 than in period 7. At period 6 the CE estimate is the closest, while the OLF and dual estimates are about equidistant from the true value. Preliminary results from dual control experiments on the Abel (1975) model with 2 state variables, 2 control variables, and 10 unknown parameters exhibit both the probing and the caution characteristics [Kendrick (1980a)]. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 147 10.3 ^fc + l|fc5 ^fc+l|£l5 (Qj5 ^j)j=k + l) (12.21) Probing: Jp,N-k = /((E*|*, ^%, sj|j-)£b+i) (12.22) The detailed expressions are in Eqs. (10.49) to (10.51), and their derivation is given in Appendix Q. The reader may recall from the earlier discussion that the deterministic component contains only nonrandom terms. All stochastic terms are in either the cautionary or the probing components. Of these stochastic terms the cautionary component includes terms in Y,k+i\k, which represent the uncertainty in the system between the time a control is chosen at time k and the time the next control is chosen at time k + 1. In contrast, the probing component includes terms in (Ej\j)f=k+i, which is the uncertainty remaining in the system after measurements have been taken in each time period after the current time period k. In particular, this component includes the parameter covariance £?£ for all future time periods. Since probing will serve to reduce the elements of this covariance, the component which includes the covariance is called the probing component. Figures 12.12 to 12.18 show for each period the total cost-to-go and its breakdown into deterministic, cautionary, and probing terms as a function of the 4See, for example, Dersin, Äthans, andKendrick (1979). CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 156 280.00 r Toul n ........-■■ '-*»- Deterministic Cautionary Probing "■ ■ ■ « 1.00 120.00 140.00 160.00 180.00 Government obligations, billions of 1958 dollars 200.00 Figure 12.12: Period 0 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 157 80.00- Detenninistic 40.00- Cautionary ■o—o—mi n mi -o—o—a Probing 100.00 120.00 140.00 160.00 180.00 Government obligations, billions of 19S8 dollars 200.00 Figure 12.13: Period 1 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 158 280.00 240.00 °^-tt^0Total 200.00 160.00 ^^*~»-^.^ Detenninistic 120.00 80.00 Cautionary 40.0» Probing nnn 100.00 120.00 140.00 160.00 180.00 Government obligations, billions of 1958 dollars 700.00 Figure 12.14: Period 2 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 159 280.00 r 40.00- 100.00 120.00 140.00 160.00 180.00 Government obbgations, billions of 19S8 dollars 200.00 Figure 12.15: Period 3 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 160 350.00 r 300.00 - 200.00- Cautionary Probing ^>-^M)—o^^>hi 11 n 1111 100.00 120.00 140.00 160.00 180.00 200.00 Government obligations, billions of 1958 dollars Figure 12.16: Period 4 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 161 350.00 r Cautionary Pro bine 100.00 120.00 140.00 160.00 180.00 200.00 Government obligation], billions of 1958 dollars Figure 12.17: Period 5 CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 162 350.00 r 50.00 - Deterministic Cautionary Probing 100.00 120.00 140.00 160.00 180.00 200 í Government obligations, billions of 1958 dollars Figure 12.18: Periodo CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 163 control Ufc (government obligations).5 Consider first Fig. 12.12 for period 0. The deterministic component is the largest of the three, followed by the cautionary and the probing components. Also the deterministic component is a convex function, the cautionary component is roughly a linear function, and the probing component is concave. Since the cost-to-go is the sum of these three functions, it is not necessarily concave or convex and the problem of local optima is a real possibility. Recall that local optima did indeed occur in one variant of the MacRae problem discussed in Chap. 11 (see, for example, Fig. 11.2). For this reason a grid-search method was used in finding the minimum cost-to-go. The widely spaced points on each component represent the 20 values of government obligations at which the functions were evaluated. The closely spaced points in turn represent a finer grid evaluation at nine points centered on the minimum from the coarse-grid search. A quick glance at Figs. 12.12 through 12.18 reveals that there is not a serious problem with local optima. Thus gradient methods probably could have been used. In fact this might have improved the relative standing on the dual method in the Monte Carlo runs. However, at this stage of the research, caution is advised. If it should result after a variety of macroeconomic models have been solved with grid-search methods that local optima are not a serious problem, gradient methods can be employed. This would be an important development because it would substantially reduce the cost of each Monte Carlo run, permitting wider experimentation. Now consider the effect of each of the three components on the location of the minimum. The minimum of the deterministic cost component in Fig. 12.12 occurs at a government-obligation level of about 185 billion dollars. (The interested reader can find the numerical results in Appendix T, for period 0 in Table T.4.) In contrast, the minimum of the total cost occurs at roughly 170 billion dollars. Since the probing component is relatively flat, it is apparent that the positive slope of the cautionary term results in a decrease in the optimum level of the control from 185 to 170. Thus in this particular problem the cautionary term does indeed result in a more cautious policy. In contrast, the slope of the probing term near the optimum of 170 is small but negative; so the probing term has the effect of increasing the optimum level from the deterministic optimum. Thus in this problem for this time period the effect of the cautionary term 5 A grid-search method was used to obtain the points shown in these figures. First the functions were evaluated at 20 points between ufc = 100 and ufc = 195. Then the function was evaluated at 10 points around the minimum found in the first grid search. CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 164 is to result in a lower level of government expenditures, and the effect of the probing term is to cause a tendency toward higher levels of government expenditures. However, the cautionary term has a fairly large positive slope, and the probing form has a small negative slope. This suggests that, relative to the T,k+1\k terms, the (Ti j\j)^~kx+1 terms are not changed much by changes in the government obligations. Another way to say this is that increases in government obligations have two effects. One is to increase the uncertainty about the levels of consumption and investment which will be obtained in the next period (period k + 1). The other is to decrease the uncertainty about postmeasurement values of the states and parameters in all future time periods. For this problem in this (and all other) time periods the two effects work in opposite directions, but the uncertainty in period k + 1 is the overriding effect. It seems reasonable to conjecture that larger values of £q£, that is, of the initial covariance matrix of the parameters, will result in relatively greater effects from the probing terms; i.e., if there was greater initial uncertainty about the parameters, probing would be more worthwhile. With the assumption used in this model that parameters are constant but unknown, the initial covariance of the parameters is sufficiently small for further learning from probing not to be a high priority. If, on the other hand, it was assumed that the parameters were time-varying, the initial parameter-covariance matrix elements would be larger and there would be probably be more gain from active learning. However, the value of knowing the parameters better at any time is less when parameters are time-varying since the parameters will change. Therefore under the assumption of time-varying parameters it seems likely but not certain that there will be more active probing. Against this one can ask whether or not economists really know the parameters of United States macroeconomic models as well as is represented by the covariance of coefficients when estimated on 20 to 30 years of quarterly data with the assumption that parameters are constant over that entire period. An assumption that at least some of the parameters are time-varying seems much more realistic. This completes the discussion of the results for period 0. A comparison of the results across all of the time periods follows. In looking at Figs. 12.13 to 12.18, the first thing one observers is that the deterministic cost term increases relative to the other two components. This is an artifact of the particular problem at hand and probably not a general result. The reason for this can be seen in Fig. 12.1, which shows the divergence of the dual path from the desired path for consumption. This divergence is a result of the CHAPTER 12. EXAMPLE: MODEL WITH MEASUREMENT ERROR 165 explosive path of consumption in this particular model and thus is not a result that is likely to recur when more suitable models are used. Next one can observe that the cautionary component becomes smaller and has a smaller positive slope as one moves from period 0 to period 6. This results from the fact that uncertainty about both states and parameters is reduced as time passes. Also, the probing component becomes smaller with the passage of time and becomes zero in period 6 (Fig. 12.18) when only one period remains and there is therefore nothing to be gained from active learning. Thus with a relatively high ratio of terminal-period penalties WN to other period penalties Wk of 100 : 1 there is not much gain from active learning in this small model with constant but unknown parameters. It remains to be seen whether this result will hold with larger models, different assumptions about parameters, and different ratios of terminal to other period weights. In summary the results show that (1) in the relevant range the slope of the cautionary term is positive and the slope of the probing term is negative; (2) the probing term is smaller in magnitude and has a smaller absolute value of the slope than the cautionary term; and (3) both the cautionary and the probing terms decrease with the passage of time. 12.5 Summary The methodology of control theory embodies a variety of notions which make it a particularly attractive means of analyzing many economic problems: (1) the focus on dynamics and thus on the evolution of an economic system over time, (2) the orientation toward reaching certain targets or goals and/or of improving the performance of an economic system, and (3) the treatment of uncertainty not only in additive-equation error terms but also in uncertain initial states, uncertain parameter estimates, and measurement errors. Appendices 166 Appendix A Second-Order Expansion of the System Equations For simplicity consider first an n vector x, an m vector u, and a set of n functions /* of the form x = f(u) (A.1) where X\ U = lil %n ^m f(u) /Hu) /n(u) (A.2) Then the derivative of a single function /* with respect to the vector u is the column vector1 dui dUm (A.3) 1 This differs from the usual procedure of treating the gradient vector V f of a function as a row vector. This means that all vectors are treated as column vectors unless they are explicitly transposed. 167 APPENDIX A. SECOND-ORDER EXPANSION OF SYSTEM EQS 168 Also, the derivative of the column of functions f with respect to the vector u is defined to be the matrix du i dfn du i df1 dum du„ (A.4) duidui du\dum d2f dumdu\ d2f oumoum The second derivative of a single function /* with respect to the vector u is defined to be d2f d2f (A.5) Using the above notation, one can write the second-order Taylor expansion of the ith equation in (A.l) around u t as Xi = f (ut) + (fi)'[u - ut] + i[u - ut]'f(z,x) = JV[(Hx,x),P] where P P -■- zz -■- zx HMH' + R HM MH' M (D.25) (D.26) Expressions (D.21), (D.25), and (D.26) provide the mean and covariance of p(z) and p(z,x) for the special case of a linear (or linearized) measurement relationship. These relationships can now be used in Eqs. (D.15) and (D.16) to yield the conditional mean and variance of x given z, that is, x + P^P-^z-ž] £{x|z} £{x|z} = x + MH'[HMH' + R]_1[z-Hx] and3 £ = £{[x-x][x-x]'|z} = Pxx £ = M-MH'[HMH' + R1_1HM p pip' rxzrzz rxz (D.27) (D.28) (D.29) (D.30) To derive the Kaiman filter for a second-order expansion of the measurement equation with additive noise, consider first such a form to replace the linear equation (D. 17) z = h(x)+v (D.31) 2 A simple derivation of this is to recall that Pzz = HMH' + R from (D.21) and Pxx = M by definition. So it remains only to obtain P xz since P is symmetric and Pxz = Pzx Pzx = £{[x-x][z-z]'} = £{[x-x][H[x-x]+v]'} = £{[x-x][x-x]'H' + [x-x][v]'} Pxz = MH' 3 Bryson and Ho use the notation P rather than S for the covariance matrix of x conditional on z. Thus Eqs. (D.28) and (D.30) provide the mean and covariance of p(x|z) for a first-order expansion of the measurement relationship. However, TBM use a second-order expansion. APPENDIX D. SECOND-ORDER KALMAN FILTER 183 The second-order expansion of this expression is z « h(x) + hx(5x + - J2 e*(5x'hxx<5x + v (D.32) Then the expected value is ž = E{z} = h(x) + \ £ eřtr[hLxM] (D.33) i because E{ôx.} = 0, where M = £{((5x)(<5x)'} Then z - E(z) = hx<5x + i X) e^x'h^öx + v - i £ e*tr[hxxM] (D.34) i i The covariance forp(z) is obtained as Pzz = E{[z-E{z}][z-E{z}]'} = El hx<5x<5x'hx + vv' + i £ 53 eV[(5x'hxx<5x] [áx'hxxáx] + ^EEeJeňr[hxxM]tr[hxxM] i j - 7 E E(^tr[hxxM])(e^'áx'hxx<5x])} (D.35) or Pzz = hxMhx + R+i^5]eíe^{(áx'hxx(5x)(áx'hxxáx)} + lEEeieJ'tr[hxxM]tr[hxxM] i j - \ E E eV'tr[hxxM]tr[hxxM] (D.36) APPENDIX D. SECOND-ORDER KALMAN FILTER 184 Now consider only the third term on the right-hand side of Eq. (D.36). This is a fourth moment, and under gaussian assumptions one has [see Appendix F or Äthans, Wishner, and Bertolini (1968, eq. (48), p. 508)] - E E eV'£{(<5x'hxxöx)(öx'hxx<5x)} = \ E E eV)'tr[hxxMhxxM] i j i j +\ E E eV)'tr[hxxM]tr[hxxM] (D.37) i j Then, using Eq. (D.37) in Eq. (D.36) and collecting terms yields Pzz = hxMhx + R + - E E eV'tr[hxxMhxxM] Z i 3 (D.38) Using Eq. (D.27) and (D.29) again, one obtains £{x|z} = = x + [Mh; x + PxzP^fz-ž] hxMhx + R + - E E eV'tr[hxxMhxxM] Z i j z-z (D.39) and P — P P_1P' rxx ± xzr zz r xz M - [Mhx [hxM] hxMhx + R + - E E eVtr^Mh^M] Z i j (D.40) (D.41) Appendix E Alternate Forms of the Cost-to-Go Expression This appendix shows the equivalence of Eq. (9.69), the Tse, Bar-Shalom, and Meier (1973) (TBM) result, and Eq. (9.68), the Bar-Shalom, Tse, and Larson (1974) (BTL) result. Since C0;Ar_fc_i and ~fk+i m BTL are equivalent to J0yN_k_i and g0yk+i, respectively, in TBM, Eq. (9.68) is the same as Eq. (9.69) except for the term i N-l 9 E tr[Kj+iQj + AcxEjIj] A j=k+i This term is derived here and then substituted into Eq. (9.68). Start with a result from Appendix C, namely Eq. (C-l 1) 9i = 9i - |tr[K,^|,] (C-ll) where g i = gj+1 - iH'uJf-^Hu + ^[K^Q, + A^Ľ^] gN = 0 (C-l) 9j — 9j+l - 2Hu^uuHu + 2tr[Hxx^j|j + {^j+l\j - ^i+l|j+l)Ki+l] 9n = 2^tI.Ln,xx?} n\n] (C-4) 185 APPENDIX E. ALTERNATE FORMS OF COST-TO-GO EXPRESSION 186 From Eq. (C-1) one can get 9k+i = 9k+2 — 2HuJiuuHu + 2tr[Kfc+2QA;+i + .AxxEfc+iifc+i] 9k+2 = 5^+3 — 2^u^uuHu + 2tr[Kfc+3Qfc+2 + ^xxEfc+2|A;+2] (E.l) 9n-i = 9n — 2HuJiuuHu + ^trfKjvQTv-i + A^^YiN_i\N_i\ Successive substitution of all expressions into the first expression in Eq. (E.l) leads to to i N-l i N-l gk+1 = gN-- E H'u^-,iHu + - E tr[KJ+1QJ + ^xxE,b] (E.2) Z j=k+l Z j=k+l In exactly the same way, one gets from Eq. (C-4) i N-l 9k+l = 9n - - E H'u^üuHu Zj=k+l i N-l + ^ E tr[Hxx5^ + (EJ+1U - Vj+1]j+1)Kj+1] (E.3) A j=k+i On the other hand, by Eq. (C-11), 9k+l = 9k+l — 2tr[Kfc+i£Wi|£i+i] (E.4) Substituting Eqs. (E.2) and (E.3) into Eq. (E.4) and simplifying the result leads i N-l i N-l 9 E tr[KJ+1Qj+A^j\j}+gN = - E {tr[Hxx£jU A j=k+l A j=k+l + (Ej+l\j - ^i+l|i+l)Kj+l]} + CJN - 2tr[Kfc+l£!fc+l|£;+l] (E.5) SubstitutinggN = 0 fromEq. (C-1) and^v = |tr[Ljv,XxSjv|Ar] fromEq. (C-4) intoEq. (E.5) yields -r N-l 9 E trtKj+lQj + ^xx^Ij] Z j=fc+l i JV-1 = - E {tr[Hxx5]i|j + (£j+i|j - si+i|i+i)Ki+i]} z i=fc+l + ^[^^xx^jviTv] — -tr[Kfc_|_iI]fc_|_i|fc+i] (E.6) APPENDIX E. ALTERNATE FORMS OF COST-TO-GO EXPRESSION 187 Substituting Eq. (E.6) into Eq. (9.68) leads to Jd,N-k=min\ Wfc(Xfe, Ufc) + 0fc(Ufc) + C0;Ar_fc_i + 7fe Ufc +1 + 2tr[S/.+i|fe — E/.+i|fc+i]K/._|_i + ^[Ln^x^nin] 1 N~1 1 + ^ E tr[HxxEjU + ((EJ+lb-- EJ+1|J+1)K,+1)] (E.7) z j=fc+i J Equation (E.7) is exactly the same as Eq. (9.69) since the notational difference between TBM and BTL is TBM BTL Jo,N-k-l Co,N-k-l 9o,k+l lk+l Appendix F Expected Value of the Product of Two Quadratic Forms by Jorge Rizo Patron In Äthans, Wishner, and Bertolini (1968, app. A, especially Eq. (48), p. 508), a formal derivation of E{(x'Ax) (x'Bx)} = 2tr[AEBE]+tr[AE]tr[BE] is given. However, Äthans et al. take as given that for a vector of gaussian random variables with zero means one has E{XiXjXkXl} = OijOkl + OikOjl + GuGjk (F.l) where a„ is the covanance between Xi and Xj. In these notes, the derivation of Eq. (F.l) is developed. Thereafter, following closely the approach of the above article, a formal proof of equality E{[x! Ax] [x'Bx]} = 2tr[AEBE] +tr[AE]tr[BE] where £ is the covariance matrix of the x's, is given. As E{xiXjXkxt} is a fourth moment, the point of departure of the derivation is the moment-generating function.1 To make exposition easier, this appendix is 1 The author has been helped in certain aspects of the derivation by notes of Yaakov Bar-Shalom. 188 APPENDIX F. EXPECTATION OF PROD OF QUADRATIC FORMS 189 Mx'{t) = ^Mx{t) = \a22te^2^ŕ =a2te^2^r =a2tMx{t) divided into three sections. The first section provides a derivation of the fourth moment in the scalar case. The second section generalizes this to the vector case, and the third section applies the result to obtain the desired derivation. F.l Fourth Moment of a Normal Distribution With Zero Mean: Scalar Case X~ N(0,(72) where x is a normal variable and a2 is its variance. Therefore, E{x — /i}2 = E{x2} = a2, as the mean (/i) is 0. The moment-generating function would be [Theil (1971, p. 75)] Mx(t) = eß^H'2 d_ d2 —Mx(t) = a2tMx'(t) + Mx(t)a2 = a2t(a2tMx(t)) + Mx(t)a2 = (aH2 + a2)Mx(t) d3 —Mx(t) = (aH2 + a2)Mx'{t) + Mx{t)2oH = {oH2 + o2){o2tMx{t)) + Mx(t)2aH = (a6t3 + aH + 2aH)Mx(t) = (a6i3 + 3aH)Mx(t) d4 —Mx(t) = (aH3 + 3aH)Mx'{t) + (3aH2 + 3a4)Mx{t) = (a6t3 + 3a4t)a2tMx(t) + (3a6i2 + 3a4)Mx(t) = (a8r + 6a612 + 3a4)Mx(t) From the definition of the moment-generating function E{x4} = |mx(0) Substituting 0 for t in the fourth derivative gives E{x4} = 3a\Mx(0)) AsMx(0) = e(1/2'ff2(°) = l, E{x4} = 3a4 APPENDIX F. EXPECTATION OF PROD OF QUADRATIC FORMS 190 F.2 Fourth Moment of a Normal Distribution With Zero Mean: Vector Case In this case x~iV(0,E) where £ is the variance-covariance matrix. The moment-generating function when the mean i s zero i s given by { Theil [1971, p. 77, eq. (5.7)]} Mx(t) = e(V2)t'st At this point it is useful to develop some derivations of matrix derivatives which will be used later. Recall that if f(t) /i(t) /i(t) M x 1 vector and t íi ti f M (t) J L tN then according to the notation used in Appendix A,2 N x 1 vector s«*> _d_ dt i m í*® 7i(t) dt N N x 1 vector In this appendix d is used to indicate partial derivatives. APPENDIXE EXPECTATION OF PROD OF QUADRATIC FORMS 191 ' i™ ■ - é/l(t) • urjv >= ij^ ■ - s-/>(t) ' cíŕjv wj"^ ■ - w/»v ■ ' dí/^ which is an M x TV matrix. The following rules apply. Rulel a dt ť At = 2At where A is an TV x TV symmetric matrix and t is an TV x 1 vector. The prime stands for transpose. Rule 2 where A is an M x TV matrix. Rule 3 where a is an TV x 1 vector. Rule 4 —At = A dt d . d . —a't = —ťa = a at at d_ ďt f (t)ig(t) a^(t) Then d_ ďt a^(t) W(t) a^(t) «MS(t) W00 aM-rrg(t) ■ ■ ■ aM—g(t) dt\ dtň ü\ dti g(t) dt ■g(t) d , , d / x d S—{[tťSMaľ(t) + IMx(t)]e!} by rule 6 (XT) V^-[{tťVé){Mx(t))} + siwift) (F.2) tt£eJ -Mit dt v d_ dt tt'Se' Mi(t By rule 5 the second term in Eq. (F.2) equals S|e-M.C(t) = Se-(iÍAfa(t; Substituting these two last equalities in Eq. (F.2), one gets d3(Mx{t)) tťEe'(sM'l(t))'+(stťSe,)Ml(t)+e,(lAft(t: dtdtidt Substituting for (d(Mx(t))/dt)', we get d3(Mx{t)) d 7J , _ tt'Se,t'SMi(t)+eVSMi(t) + -(tt'EeMMzrt) By rules 4 and 7, recalling t'Ee* is a scalar, one has 4tt'EeJ = -^(t'IIe^tf-^-t'Ee^ +I(t'Seť) dt dtK ' \dt ) K ' = t(Eeť)' + I(ť£eť) by rale 3 Therefore, substituting this value into Eq. (F.3) gives (F.3) d3(Mx{t)) dtdtidt Sftt'Se't'S + e't'E + t(el)"Ľ + I^'Ee^M^t) Fourth derivative. Similarly to the 3rd derivative, the value of the matrix d4:(Mx(t))/(dtdtidtjdt) is computed. Recall that d4{Mx(t)) d d3(Mx{t)) d dtdtidt j dt dt dtdtidt j dt d3{Mx(t)) dtdtidt APPENDIXE EXPECTATION OF PROD OF QUADRATIC FORMS 198 where e> is a vector of zero elements except in position j, where the element is 1. Therefore, d\Mx{t)) dtdtidtjdt d4(Mx(t)) dtdtidtjdt d_ ďt {Mx(t)E[tt'EeíťE + eVE + t(e*)'E + IíťSe*)]^'} E[tťEeYE + eVS + t(eť)'S + iYEe*)]^' d ďt Mi(t) + Mx{t)—{E[tťEeYE + e't'S + t(eJ)'E + iYEe*)]^} by rule 4 (F.4) The second term in Eq. (F.4) consists of a sum of four terms, each equal to Mx(t) multiplied by a derivative. These derivatives will be found first. -^-Ett'EeVEe3' = -^{(IltťEeOŕťSe3')} dt dtu A ;J At t'Ee-7 is a scalar, rule 4 applies. Then 4- Ett'EeYEe^' dt = Ett'Ee* (^-t'Sej)' + (ťSe3') ŕ-^-EtťEe! = EtťEeV'E + ťEeJj^-[(Et)(t'Eeí)]} by rule 3 Ett'EeV'E + t'EeJ dt |Et)(ťEe" by rule 4 Ett'EeVyS + t'Eei[Et(ei)'E + E(t'Ee*)] by rules 2 and 3 Ett'Ee^e^'E + t'EeiEt(ei)'E + (t'Ee^Et'Ee* On the other hand, d_ ďt E(eí)ťEeJ d_ ďt {(Ee*)(tW)} d Also ^Ste^Ee^ = -^-StŕŕeO'Se3' dt dt u ; EeM—t'EeM by rule 5 Ee^e^'E by rule 3 d dt ((e^'Ee^Et APPENDIXE EXPECTATION OF PROD OF QUADRATIC FORMS 199 as (é1)'T,e3 is a scalar. Therefore d_ ďt Et(eí)'Eei = ((e^'Ee^E by rale 2 Finally, —E(t'EeV = ^-(t'SeOSe3' dt dt = Ee^t'Ee'Y by rale 5 = Ee^'E by rale 3 Substituting these values into Eq. (F.4) leads to tž4(Mx(t)) dtdtidtjdt EľtťEeVE + eVE + tree's + Ift'EeMW (dMx^\ \ dt J + Mx(t){Ett'Eei(eJ)'E + (t'Eei)Et(ei)'E + (t'Ee^Et'Ee* + EeV)'E + [(eJ)'EeJ']E + Ee^e^'E} To find the fourth moment, one needs to substitute zero for t in this expression. Then all terms with ť or t on them vanish, and what remains is d4(Mx(t)) dtdtidtjdt t=o As Mx(0) = e-^VW -d4(Mx(t)) dtdtidtjdt t=o Mx(0)[EeV)'£ + ((e^'Ee^E + Ee^e^'E] {Ee^e^'E + [(e^'Ee^E + Ee^e^'E} E{eV)'s + [(e^'Ee^I + e^e^'E} Our goal is to obtain E{xiXiXjXk\. It is most direct to show this by taking E{x.XiXjX.} as follows for the case in which ^{x} = 0: E{x.XiXjX.'} = E < 0 0 0 0 0 0 (i, j')th element + 0M)ti element APPENDIXE EXPECTATION OF PROD OF QUADRATIC FORMS 200 Then E{x.XiXjX.'} = £ < 0 0 0 + 0 0 0 • •• 0 Oji (Jj2 (Jj3 ■ ■ • ajN 0 0 0 • ■■ 0 0 0 0 • •• 0 0 0 0 •• 0 X 0 0 0 •• 0 Cil &i2 &i3 0 0 0 + + o o o o ••• o (J2i/vxx — '^'ux'^uu'^ux (H.3) J~Cux = Hux+tuKj+iix ^uu = Huu+fuKj+1fu (H.4) HJ = LJ(xj>uJ) + p'j+li> (H.5) 207 APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 208 and, from Eq. (9.29),1 H, H. H, H, Lx + fxPi+1 N Hu — Lu + fÚPj+i E[(e*)Wi]fx ľ XX i=l N EKe'/Pi+iß« (H.6) i=l N Lxu + ElíeO'Pi+i]^« i=l The time subscript k is omitted from H, ÍK, and A for simplicity. The subscript x now is changed to z where z is the augmented state vector. Hence, for example, Eq. (H.2) becomes K j = ÍKZZ — Azz (H.7) The recursions K0X and Kee can be obtained by expressing Eq. (H.7) in terms of fx and ie from Eq. (10.20). This requires in turn expressing Eqs. (H.4) and (H.6) in terms of fx and ie as the rest of this appendix will show. From Eq. (10.20) it follows that fx ! f fxe i i9e (H.8) where the subscript denotes the gradient of each set of functions with respect to the state vector. Also f.e (H.9) and Hz = Lz + p'fx, from Eq. (H.5), where the time subscript is omitted for simplicity, or fx 70_ H* = L* + L°+ [(px)' i (p0)'] (H. 10) Note that Eq. (H.10) is still a scalar. Since 0 does not enter the cost function, Le = 0, Eq. (H.10) becomes H* = L- + [(px)' i (p0)'] fx (H. 11) 1 Since (e*)'pJ+i is a scalar quantity it is equal to the quantity elp'j+1 which is used in Eq. (9.29). APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 209 and, from Eqs. (10.16) to (10.19), Lx : |[xfc - xfe]'Wfc[xfe - xfc] + [xfc - Xfcj'Ffcfufc - üfc] +|[ufe -üfc]'Afc[ufc -üfc] The gradient of Hz with respect to z is where Hz m HZ=LX [(£)'!(£)'] 'X\'_ x (fxX)P since f* = 0 in view of the system equations and the criterion function HJ = !£+[(£)'j (#)'] O\>0 (fex)'px + (f»)'p since Lx = 0 in view of the criterion function. Hence, Hf Lx + (fxx)'px (fex)'px + (f*)'pe J Similarly, the gradient of Hz with respect to u is Hzu = K + [(fx)' I (fue)' Since Lx = F'Jxfe - xfc] + Afc[ufc - üfc] (£)' = B' Eq. (H. 17) becomes Hzu = Lx + [(fx)' i O] The hessian of Hz with respect to z is given by H! Hex (H.12) (H.13) (H.14) (H.15) (H.16) (H.17) F^[xfe-xfe] + Au-AÜ + B'px (H.18) (H. 19) APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 210 Using Eqs. (H.14) and (H.15) and letting i E X and i E O denote the indices of the original state equations and of the parameter evolution equations, respectively, we get Hxx = Lxx + E(e:Px)fxx + E e:p0fXx = Lxx iex iee since fxx = 0 for all i, and Hx0 = (HSJ' = Lx0 + Y, ejpx&, + E eÍP% = E eípx& iex iee iex since Lx0 = 0 and fxö = 0 for i E 0, and Hee = Lee + E e*PXfe* + E ejp% = ° iex iee (H.20) (H.21) (H.22) since L00 = 0 and ig9 = 0 for all i. A 2 x 2 example will make the derivation of Eq. (H.20) from Eq. (H. 14) clearer. Consider the following 2x2 case. Let H = L(xi,x2) + [f>i JP2] r(xi,x2) f2(xux2) Note that pi and p2 are scalars. Then H H H: H, Xl X2 (H.23) (H.24) Xl Am + f1 ŕ J Xl or H, H, H, Xl X2 Pi P2 LXi LX2 HX2 — LX2 + f1 J x- f? X2 •2 X2 Pi P2 + J Xl J X2 j-2 ti J Xl J X2 Pi P2 The hessian Hxx is obtained from Eq. (H.25) _£ _£ w -. sy -. J_ J_ . H, 0:10:2 (H.25) (H.26) Redefine //n = HXlXl, Hu = HXlX2, H22 = HX2X2, Ln = LXlXl, and Ai = fxixi, and so on. Then H, #11 H12 H21 H22 (H.27) APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 211 where #n H 12 H- 22 d dxi H X! d dX2 H X! d dxo HX2 Ln L 12 L 22 [ Pi Pi } í #11 fh [ Pi Pi ] \ & 1 f2 .J 12 [ Pi Pi ] J 22 f2 J 22 H' 21 (H.28) Substituting Eq. (H.28) into Eq. (H.27) leads, after simplification, to H. Ln J L12 L21 Ln i ^22 J L12 L21 i £22 + pj-lL pif21 P2P11 P2.f? 21 + Pl Jll J ľ. J 21 J 2'. P2 .Llík±Ežlk PJ22+Pif22 f2 f 2 Jll J12 f2 f 2 J21 J22 H, Lxx + E e'i Pi P2 P XX where -^21 £12 -^22 fi fi Jll J12 fi fi J21 J22 ei 1 0 e2 (H.29) (H.30) 1 0 Notice that Eq. (H.20)is equivalent to Eq. (H.30). Now, substituting Eqs. (H.20) to (H.22) into Eq. (H. 19) leads to H! 0 0 i o Ee!px iex 0 10x 0 (H.31) From the criterion equation L5 W, and from the system equations f£e = ae = (fflx)'> where a# denotes the gradient of the ith row of the coefficient matrix Ak(6k) with respect to #&. Therefore, H* W 0 q 0 aŽ 0 (H. 32) which is the same as Eq. (10.37). APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 212 In exactly the same way, the hessian of Hz with respect to u, that is, Huu is obtained from Eq. (H.17) Huu = Luu + E e:pxfuu + E ey fu jr uu iex jee In view of the system equation and cost functional again, LUU = A fuu = 0 = fuu for t G X, j G S Therefore Hf On the other hand, from Eq. (H.17), HL = L*u + E e^fL + E ey iL Recall that iex from which jee is derived, hence, Eq. (H.35) can be rewritten as HL = L*u + E e^px iex ŕ xu ci ^u E e;p jee r0u xu ^u (H.33) (H.34) (H.35) (H.36) Again, from the system equation and cost function, the following facts are found: F 0 f = xu = 0 for i G X p --xu = 0 for j G 0 ci l0u - = (H)' for i G X fj ^u - = 0 for j G 0 where b0 denotes the gradient of the ith row of the coefficient matrix ~Bk(0k) with respect to Ok- APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 213 Therefore Eq. (H.36) can be written as HL - -0- + £ ^Px -"T -L u J iex L °e J F . Ľíexe-px(H)' (H.37) Now, using Eqs. (H.2) to (H.4), we can rewrite the recursion K for the augmented system as K, TT — A •-> ^zz ^zz Kj = fzKj+]fz + Hzz — !KUZ'K~U'KV (H.38) Then using Eqs. (H.38), (H.l), (H.8), (H.4), and (H.9), we can write the K recursion for the augmented system as K, _KXX (Kex)' ~KTe~ Í9 X fx re _KXX_ K~e"' Kx0 i+i r fx i u fe . u . / Kxx j Kxe K0x | Kee 3 + 1 fx _ x_ ~ie" X fx ! f: fx fe H! fx0 i f§ Hf - f x ■ u fe . u . / Kxx ; Kxe K0X j Kee r fx i u fe . u . / Kxx j Kxe K0x | Kee 3 + 1 Jj+1 fx __u_ Je~ + Hf fx ! rs fx6 i tf Hf (H.39) Since, from (10.20), fx = Ax + Bu + c and i9 = D0, D 0 (H.40) Substituting Eqs. (H.32), (H.34), (H.37), and (H.40) into (H.39) yields Kxx ; (K0Xy K 0x K ee A i fí 0 D Kxx j K0x j Kx0 3 + 1 o ! D APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 214 W 0 "B 0 q o EeJpx iex __0__ "(aI)T - Kxx | Kx0 ■ Kex j Kee \ A 1 fe 1 0 i D + F' E*(px)'bj Consider the first term only: F' E <^K iex \ A 1 fo } 0 i D / - Kxx | Kex | Kx0 -I Kee 3 + 1 \ A ] fo 1 0 D (H.41) A' i 0 ' KXXA j Kxxfx + KX0D (f*)' j D' J KexA j K0xf£ + KeeD A'KXXA A'(Kxxf9x + KXÖD Consider the term in the first brace after the minus sign: (H.42) f£)'Kxx + D'Kex)A | (f£)'(Kxxf£ + KxeD) + D'(K0xf£ + KeeD) (H.43) B 0 B':o Kxx ; Kxe ■ K0x j Kee "A ! fx " 0 i D + KXXA j Kxxf£ + KXÖD K0xA i K0xfx KeeD F' F' Ee!pxb; E e]' (H.44) APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 215 Next consider the inverse term: B 0 - Kxx ; Kx0 ■ köx j Kee B 0 [B'i O] KXXB KexB [B'KXXB + A]_1 (H.45) By using Eqs. (H.44) and (H.45) all terms after the minus sign in Eq. (H.41) can be rewritten as [ ] = [b'KxxA + F' i B'(Kxxfx + Kx0d) + XXťÓVbj,]' H [b'KxxA + F' j B' (Kxxf£ + KxeD) + J2 eÍPxt>e] A'KXXB + F fx)'Kxx + D'K0x)b + (£e-pxb H [b'KxxA + F' i B' (Kxxf£ + Kx0d) + J2 eÍPxbe] (H-46) (H.47) C i where A = (A'KXXB + F)/Lt(B'KxxA + F') B = (A'KxxB + F)/Lt|B'(Kxxfx + Kx0D)+^e^pxbÜ C = í((fflx)'Kxx + D'Ke JB + fr e^pxbÚ'U{B'KxxA + F'} and j((f£)'Kxx + D'Kex)B Ee:Px)'UB'(Kxxfx + KxflD)+^e:PX} and where n = [A + B'KXXB]_1. Next, combine the second and third terms on the right-hand side of Eq. (H.41) to obtain w 0 o j o —i— +Ee:px 0 ! a*j = w [ Ľ e^a*, - [ (4)' i ° [ Ee;Px(a;)' i ° (H.48) APPENDIX H. MATRIX RECURSIONS FOR AUGMENTED SYSTEM 216 Now Eq. (H.41) can be rewritten as a sum of partitioned matrices (H.43), (H.47), and (H.48). Without actually writing out this total expression, consider each of the component matrices of Kj in Eq. (H.41) one a time. First consider Kxx, which is Kf = A'K^A - [A'K^FJ + F^B'K^A + F'] + W3- (H.49) This is exactly the same as the certainty-equivalence Riccati matrix in Eq. (10.33). This proves Eq. (10.39). Then, consider K^x, which is Kf = [(tf/K^ + D'KjijA - ((f^'K^ + D'KjyB + (j2<ť(KÍ ^[B'K^A + F'l + ^e^íaj,)' with K** = 0 (H.50) This is the same as Eq. (10.40). Finally, consider K^e, which is Kf = (^'(K-f^ + K^DJ+D^K^ + K^D) [(f*)'K- + D'K£j B + [£ e'pxty] 1 /i [B' [K^f * + K&d] + E = Wa^Xjv - Xjv] + Fa^uat - üN] + A'px 7V+1 -[A'Kat+íB + FN]/j,N[F'N(xN - xat) + AJV(ujv - üN) + B'p /„x 1 N+l\ but Fat = AN = p^+1 = 0 from the CE solution. Therefore, the equation above becomes pxN = WN[xN - Xjv] (1.9) Also recall from the CE solution [Eqs. (10.33) and (10.34)] that KN = WN and pN = — WatXjv. Therefore Eq. (1.9) can be written as p^ = KjvXjv + Pat (1.10) Next consider the period TV — 1. For this period Eq. (1.7) can be written as Pn-1 = Wjv.ifxAT-l -Xjv.i] +FJV_i[uAr-l -Üat-i] + A'p^ -[A'KjvB + Fjv.iJ/ijv.itF'j^xjv-i - xjv-i) +AJv_1(uJv_i-Üat-i) + B'px/] (1.11) Let ip = Viv-i = A'KatB + Fat-i = A'WjvB + Fjv-i ôx = 6xN_i = Xat-i — Xat-i (1-12) ÔU = ÔUN_ľ = Uat-1 - Üat-1 APPENDIX I. VECTOR RECURSIONS FOR AUGMENTED SYSTEM 220 Then Eq. (1.11) can be written as Pn-i = Wjv-iáxjv-i + F(5ujv_i + A'px N -V>at-ia4f'<5xw-i + Aáujv_i + B'p^] = Wjv-iáxjv.x + Fáujv_i + A'p^ —T/>jLtF'<5xjv-i — ipiJ,AôuN-i — ißfiB'p^ = [Wat.! - VAtF^Xjv.! + [F - VM^u^ + [A-^B']p£r (1.13) From Eqs. (1.9) and (1.12) Pn = Wa^Xjv - xjv] = Wjváxjv (1.14) and from the system equations 6xN = xjv — xjv = Axat_i + Buat_i + c — xjv (1-15) For notational simplicity all variables without a time subscript are for period TV —1. Using this convention and substituting Eqs. (1.14) and (1.15) into Eq. (1.13), we obtain px = [W-VAtF']<5x+[F-VAiA]^u + [A' - V^B'^WatAx + WjvBu + WatC - WN5tN] = [W - Va*F']x - [W - Va*F']x + [F - #A]u - [F - íŕ/iA]u +A'WatAx + A'WatBu + A'WjvC - A'WN5tN -VjuB'WatAx - VAtB'WjvBu - VaíB'WatC +ißfiB'WN5iN (1.16) Collecting terms in x, u and x yields px = [W - Va*F'+ A'WatA - VAtB'WjvAjx + [F - VM + A'WjvB - VjuB'WatBJu - [W - V>A*F']x -[F-ip[iA]ü + [iPfiB' - A']Wjvxjv + [A' - Va*B']Wjvc (1.17) Consider only the second term on the right-hand side of Eq. (1.17): [F - VM + A'WatB - ý/j,B'WNB]u = [F + A'WatB - V>A*(A + B'KJVB)]u = [ip - ipnV'1}™ = 0 (1.18) APPENDIX I. VECTOR RECURSIONS FOR AUGMENTED SYSTEM 221 Therefore Eq. (1.17) reduces to px = [W + A'WjvA - VaW'']* - [W - V>A*F]x - [F - íŕ/iA]u + [A'-iPfiB']WN[c-5tN] (1.19) or px = [A'KjvA-^V' + Wjx + VMtFx + Aü-B'Wivic-xjv)] -Wx-Fü + AWa^c-Xjv] (1.20) or px = [A'KatA - V/^' + W]x - V>A* [B (KatC + pjv) - (Fx + AÜ)] +A [KjvC + pat] - [Wx + Fü] (1.21) Using Eqs. (10.33) and (1.12), we obtain Kjv.x = A'KatA - Viv-iAtjv-iVw-i + W*-i (L22) and using Eqs. (10.34) and (1.12) provides PJV-I = -^W-lAtTV-lfB^KArC + PAr) - (Fat.iXat.i + Aat.iÜat.i)] +A'[knc + pn] -[W 7V-1Xat_i + Fat_iÜat_i] (1-23) Then using Eqs. (1.22) and (1.23), we can write Eq. (1.21) P^.-L = Kat.iXjv.i + Pat_i (1.24) which establishes the second step of the induction. In the same manner it can be shown that for any period j, Eqs. (1.7) and (1.8) are equivalent. This proves the p recursion (10.43). Appendix J Proof That a Constant Term in the Cost-To-Go is Zero This appendix proves that 7^+1 in the approximate optimal cost-to-go [Eq. (10.24)] is zero. 7fc has been defined to be 7fc = 7fc+i - iH'U)fcJf-^)ibHU)k 7AT = 0 (J.l) [See Eq. (9.60).] Similarly, 7^ is defined for the augmented system as Yk = ii+l - itH^rtK^r'tH^] with fN = o (J.2) FromEq. (H.I8) (H^)' = [xfc - xfc]'Ffc + [uok - ük]'A'k + (pí+1)'Bfc (J.3) where uok is the nominal control obtained from the CE problem. From Eqs. (H.4), (H.34), (H.40), and (10.32) [Ku/1 = [K + B'KJEJB"1 = ixk (J.4) Hence Eq. (J.2) becomes 7* = 7fc+i-|([[xfc-5fc]'Fí; + [u0Í;-ufc]'A,í; + (p^+1)'Bí;] x^[F^[xfc - Xfc] + Afe[u0fc - üfc] + B^p£+1]) (J.5) 222 APPENDIX J. PROOF THAT TERM IN COST-TO-GO IS ZERO 223 Now, by using Eq. (10.43) we obtain p%'+1 as (Pfe+i)' = Xfc+1K'fe+1 + p'k+1 (J.6) Substituting the unaugmented system equation (10.7) into Eq. (J.6) gives (pjb+i)' = [AfeXofc + BfeUofe + ck]'K'k+1 + pfc+i = ^ok^k^-k+l + uofcBfcKfc+1 + cfcKfc+i + Pfc+1 (J-7) Now consider only the term [xofc - Xfej'Ffc + [uok - ük]'A'k + (pfc+1)'Bfc = [xofe - Xfej'Ffc + [uofc - ük]'A'k + x'ofeA'feK'fe+1Bfc +U'ofcBfcKfc+lBfc + 4Kfc+iBfc + Pfe+iBfc = u'ofe[A'fe + B^K'fe+1Bfe] + x^F, + A^K'fe+1Bfe] +p'fe+1Bfc - x'feFfc - ü'feA^ + 4K^+1Bfc (J.8) Also let _ *fe = [Fk + A'feK'fe+1Bfc] (J.9) By using Eqs. (J.4) and (J.9) in Eq. (J.8) we get [xofc - Xfcj'Ffe + [uok - ük]'A'k + Pfc+iBfc = uófc^fcl + Kk^k + p'k+iBk - SfcFfe - ü'kA'k +4Kfc+1Bfe (J. 10) The nominal control uok from Eq. (10.30) is u0fc = Gfcxofc + gfe (J. 11) where Gfc = -Hk% and gfc = -jUfc[B'(Kfc+1c + pfc+1) - (F'fexfe + Afcü)] Substitution of Eq. (J. 11) into Eq. (J. 10) then yields [x0fc - Xfcj'Ffc + [u0fc - üfcj'Afc + (p*+1)'Bfc = -[[B'(Kfe+1c + pfc+1) - (Ffcifc + Afcüfc)]'n'k + ^ok^n'k]n~kl + x'0fc*fc + Pfc+iBfc^ - (x'feFfc + ü'fcA^) + c'Kfc+1Bfe = 0 (J. 12) APPENDIX J. PROOF THAT TERM IN COST-TO-GO IS ZERO 224 Substituting Eq. (J. 12) into Eq. (J.5) leads to Yk=Yk+i (J.13) Since 7^r = 0, Eq. (J.13) implies 7**+i = 0 (J. 14) which was sought in Eq. (10.35). Appendix K Updating the Augmented State Covariance Begin with Eqs. (9.84) and (9.85), that is, ^£i+i|fc+i = [I — Vfe+ihZ)fe+i]Efc+1|fe (K.l) and Vfc+i = Efe+1|fchZ)A;+1 ^hZ)fc+1Efe+1|fchZ)A.+1 + Rfe+i +5 E E e*(eJ)'tr[hzzEfe+1|fchizEfc+1|fe]]_1 (K.2) i j For the case at hand the observation relationship is Eq. (10.8), that is, y k = Hfe(0fe)xfe + wfe (K.3) Thus, in the notation of Eq. (9.5), hfc = Hfc(0fe)xfc (K.4) Therefore the observation relationship for the augmented system is Xfc Ok Ck (K.5) yfc = [Hfe(0fc)iO] and in the augmented system hz = [hxih0] (K.6) 225 APPENDIX K. UPDATING THE AUGMENTED STATE COVARIANCE 226 with Hfc(0/; and h0 = Yie%+1\kH. |fc±J-0 by analogy with results in Appendix L. Also, K _ V__|__„x0 ^x n00 and Therefore ^0 Hi lee and ^zz^feH life H*,£ex H^Eee 0 ' 0 to[K*Ek+i\kKiPk+i\k 0 (Hj,)' i 0 J - Sxx j £X0 -E0x ; E00 fe+llfc (K.7) (K.8) H^Eex j H^Eee " H0Eex i H0Eee (H0)'EXX Í (H0)'Exe _ _ (h0)'exx i "(h0")'exV = tr| = tr[HÍ£0xH0£ex + H^Eee(HJe)'Exx + (H0)'ExxH0Eee + (H0)'Ex0(H0)'Exe] = 2tr[H^EexHJeE0x + H^Eee(H0)'Exx] Then substitution of Eq. (K.9) into Eq. (K.2) yields Vfc+i = Efe+i|fch^A,+1|^hZ;fc+iEfc+i|feh^A,+1+ Rfc+i + E E eV)'tr[H*,£0xfťe£ex + H^Eee(H0)'Exx]j "(K. 10) (K.9) For many problems H will not be a function of 0, so that H0 = 0 for all i APPENDIX K. UPDATING THE AUGMENTED STATE COVARIANCE 227 For this special case Eq. (K.6) becomes [HfejO and Eq. (K.7) becomes Thus Eq. (K. 10) becomes Vfc+1 = x 14 = 0 yxx j yvxö £~e"T~£e~0~ Exx j Exe £0x j j^ěě k+i\k 0 0 Hfc+i i 01 1 -1 or Vfc+1 [Hfc+1E^1|A.HA.+1 + R, where yxx TJ7 V>ÖX TJ7 ^fe + llfc^fc+l V-'XXTT/ Q — 1 VÖxtlj/ Q —1 •^ nfc+l°fc+l Sfc+i - H/.+iS™1ifcHfc+1 + Rfe fe+ij fc+i Substitution of Eqs. (K.14) and (K.15) into Eq. (K.l) yields £xx "E"0" Ex0 I 0 £xx "Ě"ex I 0 I (III Q — 1 1 rlA;+l0fc+l V^xXJ/ C — 1 ^ nfc+l°fc+l Exe n fc+i|fc VXX TJ7 O— 1 TT ^fe + llfc^fc+l^fc+l^fe + l _______íjkttä+l-Jí+____ VOX TJ7 O — 1 TT "^fe + llfc^fe + l^fe + l^fc+l Exx j Ex0 - Vox ! VÖ0 ^ ' ^ Jfc+l|fe 0 I Therefore (K.ll) (K.12) (K.13) (K.14) (K.15) [Hfc+ii0] y\xx ^fe + llfc+l Vox ^fe+llfc+l VÖÖ ^fe+llfc+l JA;+l|fc ixx Jfc+1|A; (K.16) (K.17) Vxx XJ' <5 — 1 W V° ^Vx# \ __ V^X V^X TT/ O— 1 TT Vxx /V ISA ^fc+ilfcj — ^k+i\k - ^fc+i|£!rt£!+iöfc+irtfc+i2jfc+i|£; _d_ ± Wr c\0) cn(0) (L.3) Next define a« = ío^ ±^(0) ±aa(0) ••• ±aa(0) ±aň(0) ±^(0) ••• ±al2(0) Wi^n(e) W2xöi 2 í EeJtr[a^Eex + (a^)'E iex 0 Eeťtr[aÍEÉ JGX._________ 0 Hence, the use of Eqs. (M.2) and (M.7) in Eq. (M. 1) yields zk+l\k _Xfe+l|fc_ 9k+i\k iex (M.5) (M.6) (M.7) ________________________________________iex_________ (M.8) and since 0 does not differ from 0O, we need to use only the top half of Eq. (M.8); i.e., xfe+1|fc = Ak{eok)±k\k + Bfc(0ofcK + cfc(0ofe) + E eťtr[4S^] (M.9) iex APPENDIX M. PROJECTION OF THE AUGMENTED STATE VECTOR 235 Next Eq. (9.78) can be used to project the variance one step ahead ^fe+ii, fz£^ + Ql + - EEe'^teiiW Z i j -"fe+llfc fx ! fx v0 ! v0 tx ■ h Jfe JJXX j JJX0 JJ0X j jjee fx ! f; 1x ' 1< v9 ! f6» J *|fc L Tx ' Tö + _Q_fc_ o (M.10) __0_ + ^EEeí(eJ)'tr| A ieijei vi XX i f: X0 ft i VI x0x ' X00 JJXX j JJX0 JJ0X j JJ00 Jfc|£i P ! fjfl XX 1 X0 - JJXX j JJX0 vi ! fi 10x ' lee JJ0X j JJ00 (M.ll) k\k> And since so we have Ak(0k)xk + Bfc(0fe)u^ + Cfc(ôfc) fxx = Afe(0fe) D,0 fcffc and (see Appendix L) f* = E e^|*a0 + E eiWbj + E^cj € = 0 fee = Dfc (M.12) i i i Substitution of Eqs. (M.3), (M.4), and (M.12) intoEq. (M.ll) yields 'k+l\h A 0 _le_ D k L JJXX _9_ o o r + i Exe A i 0 --------------1--------------- o i o k\k fx T0 _0_ D7 (M.13) where1 ô E E ^(e^'tr A iex jex í r o [aji [ (4)' i ° J y^xx ] y-vx0 E0x ; E00 "o i 4' . (4)' ! 0 _ y-ixx j y-vx0 E0x ; E00 1 Note that the summations below are over X while the summations in Eq. (M. 11) are over I. This explains why A is in the upper left hand corner in Eq. (M. 13). APPENDIX M. PROJECTION OF THE AUGMENTED STATE VECTOR 236 Then Jk+l\k A ! fx " 0 i D ľQ i o --------------------1------------ o i r n i o + o : o (M. 14) where ô E E ^(eJ)'tr A iexjex a0Eex aeEee (ae)'E> : (a^)'Ex0 aJeE0x aJ0Eee (a£)'E* (a; j y^xe Then AExxA/ + ASxff(f«)' + fxEöxA' + f^Eöö(f^; "Š"8"*": Exe E~0_e~ fc+i|fc X.GŕfX\l I fXTlfe A ' i00/-fxv AEX0D' + fíEeeD' _9_ o o r DEexA' + DEee(fx)' C i O --------------1--------------- o i o DEeeD' + (M. 15) where C Y: E eV)'tr[ajS*'aÍEöx DfcE^A'fe + DfeE^(fxfe)' DfeEfrifeD^ + rfc (M.16) (M.17) (M.18) (M.19) Appendix N Updating the Augmented State Vector Begin with the augmented equation like Eq. (9.86) žfc+i|£i+i = žfe+i|fc + Vfc+i[yfc+i — hZík+ižk+i\k] and write it in augmented form as x 0 fc+llfc+l X 0 vfc+1 k+l\k Yfc+i — [hx,fe+i,h0;fc+i] X 0 k+l\Ü (N.l) (N.2) where, from Eq. (K.6), for the case in which noisy measurements of x alone are available, lx,fc+l Hfc+i(0fc+i|fc) h0)fe+i - ^e'x^^H^^ (N.3) For the case in which h and H are not functions of 0, Eq. (N.3) can be rewritten as hX)fe+1 = Hfc+1 h9)k+1 = 0 (N.4) Substitution of Eq. (N.3), (N.4), and (K.14) into Eq. (N.2) yields x 0 X 0 fc+l|£i+l where, from Eq. (K.15) Jfc+l|£i yxx tt/ Q —1 „ÍL+lJfc„fe±l„fc±l . V-'ÖX TT/ Q —1 ^k+ll^k+l^k+l [yfc+i - Hfc+ixfc+i|fe] (N.5) Sfc+i - H/.+1S™1,fcHfc+1 + R, ■fc+i (N.6) 237 APPENDIX N. UPDATING THE AUGMENTED STATE VECTOR 238 Then Eq. (N.5) can be rewritten xfe+i|fc+i = *k+i\k + 5]fe+i|fcHA;+1SA:+1[yfe+i - Hk+1±k+1\k] (N.7) and Ok+i\k+i = &k+i\k + 5]fe+i|fcHfc+iSfe+i[yfc+i - Hfe+ixfc+i|fe] (N.8) Appendix O The Sequential Certainty-Equivalence Method Repeat the following calculations for each time period beginning with k = 0. Step 1. Generate the random vectors for the system noise vk and the measurement noise Wfc+i. Step 2. Solve the certainty-equivalence problem from period k to period TV and set uk = ukE, as given by Eq. (10.30). Step 3. Obtain the actual value of the state vector with xfc+1 = Axfc + Buj[ + Cfc + vfc (0.1) and the actual value of the measurement vector with yfc+i = Hfc+1xfc+1 + wfc+1 (0.2) Step 4. Get xfe+1|fc and 0k+1\k by using Eqs. (M.8) and (M.9) xfc+1|fe = Ak(Ôk\k)H.k\k + Bk(Ôk\k)uTk + ck(Ôk\k) + ^ e^r^EJ^] (0.3) and &k+i\k = Dfc0fe|fc (0.4) Step 5. Get Efe+1|fc by using Eqs. (M. 16) to (M. 19) 239 APPENDIX O. SEQUENTIAL CERTAINTY-EQUIV METHOD 240 +f0fc^fc|£:(í0fc) + Q*: + E E eV)'tr(a^a*EX + 4^(4)'^) (0-5) iexjex S£i|* = D.S^AÍ + D.Sjlf^)' (0.6) Ežín* = D*sS*D'* + r* (°-7) Step 6. For the case in which H is not a function of 6 use Eqs. (K.17) to (K.19) to get Sfe+i|fc+1 Vxx __ v-,xx v,xx u' G —1 U v,xx ÍC\ s^ ^fe+iifc+i — ^fe+iifc ~~ ^k+i^k+i^k+i^k+i-^k+^k yy.ö) V>ÖX __ rVx0 V __ V"ÖX V"ÖX TT/ o— 1 TT v"xx; ÍC\ Q\ ^k+i\k+\ - K^k+^k) - ^k+i\k - ^k+i^k+i^k+i^k+i^k+^k kW) ^fe+llfc+l — ^fe+llfc ^fc+lIfe^fe+l^fc+l^fe+l^fc+llA; ^pxb^ [B'KXXB + A]-1 x B'KXXA + F' | B'[Kxxfx + KX0D] + £ e^pxb^ and where J\7, C : D (Q. 11) (Q. 12) [A'KXXB + F]/Lt[B'KxxA + F'] C [A'KXXB + F']V v/T^ÖX , fX/T^XX B'[Kxxfx + KxeD] + J2 eÍPxH D'KWX + fí'KxxlB Ee« /Lt[B'KxxA + Fy and (D'K0x+fx,Kxx)B+( J>^pxbU n B'(Kxxfx+KxeD)+^e^pxb From Eqs. (Q.9) and (Q. 12) we can then determine the probing term N-l J, P,N-k j=k+l - £ MtA'K^B + Fl/i^B'K^A + F'JE + 2tr [B'K^A + F']'/ii B'[K-fx + KX^D] + £ ejp"^ V>0x + tr [D'K^ + (fex)'K-]B + peíp-bí B'IK^^ + KX*XD] + E e^bj,] Ejg) } Hj (Q.13) Appendix R The Measurement-Error Covariance Following the work of Conrad (1977), the revisions of the national-income accounts were used to obtain an estimate of the covariance matrix of the noise term of the measurement equations. This was done by assuming that the latest revision available is the true value and that the difference between this and the initial estimate is the size of the measurement error. Table R.l gives the first reported value of GC58, GPI58, and GNP58 and Table R.2 gives the latest revision used in this study (those published in the Survey of Current Business on or before the November 1968 issue). The differences between these two series, of course, understates the magnitude of the true measurement errors. Worse still, they may provide misleading estimates of the true measurement errors since those series which have the largest true measurement errors may be the most difficult to revise and thus be the series that shows the least revision and therefore the smallest errors. So, the measurement errors shown in the revisions in Table R.3 reflect lower bounds on the true measurement errors. As this kind of work proceeds, it will be useful to attempt to obtain independent information on the magnitudes of the measurement errors by making detailed studies on some elements of the time series. A glance at Table R.3 confirms that the revisions are serially correlated and have nonzero means. However, for purposes of this study, we have assumed that the measurement errors have zero means and are uncorrelated over time. For a study which exploits the information in the serial correlation and nonzero means of these statistics see Bar-Shalom and Wall (1978). The covariance of these time series is given in Table R.4. This is the 3 x 3 matrix used for R, the covariance of the measurement noise. There is a slight inconsistency in the components for GNP58 since the model actually uses GNP58 - GNET58, that is, GNP net of net exports. However, the magnitude of this inconsistency is small. 246 APPENDIX R. THE MEASUREMENT-ERROR COVARIANCE 247 Table R. 1: First reported values, billions of 1958 dollars Quarter GC58 GPI58 GNP58 64-1 364.5 83.8 567.1 369.8 85.2 575.9 377.3 86.0 582.6 IV 376.8 90.2 584.7 65-1 385.9 94.7 597.5 390.2 93.0 601.4 396.7 92.9 609.7 IV 403.3 100.5 624.4 IV 67 IV 409.9 100.9 633.6 412.2 106.3 643.5 418.3 102.5 649.3 418.5 106.4 657.2 422.0 95.7 656.7 430.6 91.3 664.7 431.5 96.4 672.0 434.0 103.0 679.6 APPENDIX R. THE MEASUREMENT-ERROR COVARIANCE 248 Table R.2: Latest revision, billions of 1958 dollars Quarter GC58 GPI58 GNP58 64- IV 65 IV 66 IV IV 366.3 85.3 571.1 370.7 87.3 578.6 378.6 87.6 585.8 379.3 90.8 588.5 387.9 96.9 601.6 393.4 96.8 610.4 400.3 99.6 622.5 409.2 103.4 636.6 415.7 106.1 648.6 414.8 109.5 653.3 420.0 107.4 659.5 420.6 112.3 667.1 424.8 99.8 665.7 431.2 94.2 669.2 431.8 99.3 675.6 434.1 104.7 681.8 APPENDIX R. THE MEASUREMENT-ERROR COVARIANCE 249 Table R.3: Size of revisions, billions of 1958 dollars Quarter GC58 GPI58 GNP58 64-1 -1.8 -1.5 -4.0 -0.9 -2.1 -2.7 III -1.3 -1.6 -3.2 IV -2.5 -0.6 -3.8 65-1 -2.0 -2.2 -4.1 -3.2 -3.8 -9.0 III -3.6 -6.7 -12.8 IV -5.9 -2.9 -12.2 66-1 -5.8 -5.2 -15.0 -2.6 -3.2 -9.8 III -1.7 -4.9 -10.2 IV -2.1 -5.9 -9.9 67-1 -2.8 -4.1 -9.0 -0.6 -2.9 -4.5 III -0.3 -2.9 -3.6 IV -0.1 -1.7 -2.2 GC58 GPI58 GNP58 Table R.4: Covariance of revisions GC58 2.71 1.12 5.52 GPI58 1.12 2.78 5.42 GNP58 5.52 5.42 16.22 Appendix S Data for Deterministic Problem 250 APPENDIX S. DATA FOR DETERMINISTIC PROBLEM 251 Quarter GC58 GPI58_______YN_______GNET58 GGE58 47-Í 203.4 5L3 293.3 ÍSľl 38.6 207.0 48.9 295.7 13.3 39.8 207.4 48.6 296.6 13.0 40.7 IV 207.3 57.1 304.8 9.70 40.3 48-1 208.5 59.8 309.4 7.70 41.1 210.7 60.9 317.1 5.80 45.5 211.1 61.3 320.2 5.60 47.8 IV 212.8 59.7 323.2 5.50 50.7 IV soil IV sill IV 213.2 52.3 316.7 7.80 51.3 216.3 45.0 315.0 7.50 53.8 216.8 48.6 319.6 6.50 54.2 219.7 46.0 319.5 3.80 53.8 223.5 59.1 336.0 3.60 53.4 227.6 66.3 345.1 3.40 51.3 238.8 70.8 361.3 1.50 51.7 232.1 81.0 367.8 2.30 54.8 236.0 71.7 372.1 2.70 64.4 230.0 75.1 376.7 4.80 71.7 232.0 70.0 381.9 6.80 79.9 233.3 63.0 381.9 6.80 85.6 APPENDIX S. DATA FOR DETERMINISTIC PROBLEM 252 Quarter GC58 GPI58 YN GNET58 GGE58 52 I I 53 I I 54 I I 55 I I 56 I 57 I 58 I 59 I 60 I 6.00 3.80 1.60 IV 246.8 63.6 404.7 .60 1.00 .80 1.10 IV 250.4 55.7 407.3 1.50 1.80 3.00 3.30 IV 261.9 64.3 411.7 4.00 4.10 2.70 3.10 IV 279.0 78.5 443.6 2.80 3.20 5.00 5.30 IV 284.7 73.3 443.6 6.70 7.30 7.00 6.00 IV 289.7 64.0 443.6 4.60 2.5 2.5 2.4 IV 295.2 68.50 460.3 1.3 -.10 -.70 .60 IV 310.1 75.00 479.2 1.2 2.6 3.9 4.5 IV 316.5 65.20 477.6 6.2 233.7 63.8 385.4 238.1 56.0 385.8 239.1 58.6 392.3 246.8 63.6 404.7 250.1 63.4 411.1 251.5 64.2 415.6 251.1 61.5 412.6 250.4 55.7 407.3 250.8 56.3 401.1 253.3 57.0 399.1 256.9 59.8 403.9 261.9 64.3 411.7 267.6 70.8 423.9 273.0 75.5 432.7 276.3 76.9 439.0 279.0 78.5 443.6 279.8 75.5 440.4 280.3 74.5 440.6 280.8 74.0 439.2 284.7 73.3 443.6 286.6 70.5 446.1 287.0 69.9 446.2 289.3 70.9 449.2 289.7 64.0 443.6 285.6 57.50 435.0 287.5 56.00 437.0 291.9 61.60 448.3 295.2 68.50 460.3 302.3 70.90 468.7 307.0 78.50 480.6 310.0 70.20 474.4 310.1 75.00 479.2 313.9 79.90 487.6 317.8 73.50 485.9 316.5 71.00 482.9 316.5 65.20 477.6 87.8 91.7 94.6 94.4 97.7 99.9 100.0 101.3 94.1 88.8 87.2 85.4 85.5 84.2 85.8 85.1 85.2 85.8 84.3 85.7 89.0 89.4 89.1 89.9 91.80 93.60 94.80 96.50 95.50 95.10 94.30 94.20 93.90 94.70 95.40 95.90 APPENDIX S. DATA FOR DETERMINISTIC PROBLEM 253 Quarter GC58 GPI58 YN GNET58 GGE58 61-1 316.3 62.40 476.3 6.4 97.60 320.5 67.80 487.9 5.0 99.50 III 324.0 71.20 497.2 4.4 102.0 IV 329.6 74.70 507.2 4.7 102.9 62-1 333.5 77.20 516.2 3.5 105.5 335.9 79.00 522.7 5.2 107.8 III 340.3 80.60 528.7 4.9 107.8 IV 344.8 80.70 534.1 4.4 108.5 63-1 348.3 78.70 537.2 4.0 110.3 350.0 80.50 539.1 5.8 108.7 III 355.1 83.00 548.2 5.5 110.0 IV 356.4 86.90 552.9 7.1 109.6 64-1 366.3 85.30 562.0 9.1 110.4 370.7 87.30 570.6 8.0 112.6 III 378.6 87.60 577.4 8.4 111.2 IV 379.3 90.80 580.6 7.9 110.5 65-1 387.9 96.90 596.2 5.4 111.4 393.4 96.80 603.4 7.0 113.1 III 400.3 99.60 615.8 6.7 115.9 IV 409.2 103.4 630.9 5.7 118.4 66-1 415.7 106.1 643.3 5.3 121.5 414.8 109.5 649.0 4.3 124.7 III 420.0 107.4 655.9 3.6 128.5 IV 420.6 112.3 664.2 2.9 131.3 67-1 424.8 99.80 662.7 3.0 138.1 431.2 94.20 666.4 2.8 141.0 III 431.8 99.30 672.5 3.1 141.4 IV 434.1 104.7 680.8 1.0 142.0 68-1 444.9 101.5 692.8 -.10 146.5 447.5 107.3 704.0 -.60 149.2 III 455.7 105.8 711.6 .70 150.1 IV 455.4 113.1 719.7 -1.3 151.2 69-1 460.1 113.1 725.8 -2.3 152.5 Appendix T Solution to the Macroeconomic Model with Measurement Error This appendix presents the detailed results for one Monte Carlo run of the macroeconomic model with measurement error discussed in Chap. 12. In particular the results are for the fourth Monte Carlo run. Graphical results are displayed in the chapter. This appendix contains both the actual random elements used in the run and the numerical results, so that others can check these results and debug their own computer codes. T.l Random Elements Four sets of random elements are required for each Monte Carlo run: 1. The system noise terms vfc in Eq. (12.8) for each time period, k = 0,1,...,7V-1 2. The measurement-noise terms wfc in Eq. (12.9) for each time period, k = 1,2,...,7V 3. The initial state-variable measurement error £, defined by x0|o = x0 + £ (T.l) where x0|o is the initial estimate of the state vector and £ is the initial-state-variable measurement error 254 APPENDIX T. SOLUTION TO MEASUREMENT ERROR MODEL 255 4. The initial-parameter-vector error r], defined by #o|o = #0 + r) where 0O|o = initial estimate of parameter vector 0o = true value of parameter 77 = initial-parameter-vector error For all the Monte Carlo runs x0 and 60 were set as x0 and 0n (T.2) (T.3) 460.1 113.1 1.014 .002 -.004 -1.312 .093 .753 -.100 .448 The covariance Q for the additive-error terms [see Eq. (12.10)] was used in the Monte Carlo routine to generate the system noise terms vfc. For Monte Carlo run 4 these values were (T.4) v0 v3 v6 .27538 4.2377 2.2937 3.6310 2.1751 3.2589 Vl v4 2.8660 1.4935 1.7421 1.1975 v2 v5 1.2624 3.9079 .36733 .88018 (T.5) The covariance R for the measurement-error term [see Eq. (12.11)] was used to generate the measurement-noise terms wfc. For Monte Carlo run 4 these values APPENDIX T. SOLUTION TO MEASUREMENT ERROR MODEL 256 were Wi W4 W7 .49625 .93212 1.22890 .50955 .26480 .91895 w2 w5 .40668 .25947 .89972 1.39700 w3 w6 .12890 .05578 1.17250 .71312 (T.6) The covariance Eq,q for the initial-state vector [see Eq. (12.17)] was used to generate the initial-state-vector measurement error £. For Monte Carlo run 4 these values were " 1.16820 .53328 (T.7) The covariance £g£ for the initial-parameter vector [see Eq. (12.19)] was used For Monte Carlo run 4 these V (T.8) Jo|o to generate the initial-parameter vector error 77. values were .01606 " -.00983 -.02613 -1.52010 .00112 .04410 -.02295 -1.33760 One of the links between these numerical input values and the results which are displayed graphically in Chap. 12 can be seen by using (T.4) and (T.8) in (T.2) to construct 1.0301 -.0078 -.0301 -2.8321 .0941 .7971 -.1230 -.8896 ft o|o APPENDIX T. SOLUTION TO MEASUREMENT ERROR MODEL 257 The value for parameter an in this vector is the first element, 1.0301, and this is used for the initial value of the parameter an in Fig. 12.4 for all three control methods. T.2 Results This section presents the results for Monte Carlo run 4. The cost (in thousands) for the three methods for this particular run were Dual = 23.72 OLF = 23.69 CE = 23.94 The results for this particular run are consistent with the overall results which found the Dual and OLF cost to be close to each other and somewhat better (lower) than the CE solution cost. The state-variable results are given in Table T.l, the control-variable results are in Table T.2, and the parameter-estimation results are in Table T.3. These results correspond to Figs. 12.1 and 12.2, 12.3, and 12.4 to 12.11, respectively. Table T.4 contains the approximate cost-to-go for periods 0, 1, and 6. These results correspond to Figs. 12.12, 12.13, and 12.18. A discussion of these results is given in Chap. 12 along with the figures. C7\ ^ 4^ OJ to O Oi Cn Cli Cn Cli Ol O (O OO S Ol ŕ CO Ol 4^ CO CO CO P ID OO O) CO Ol O 4^ OO Ľ) OO OO S S S ^ O C W !D Ol M Ö Ö Ö Ö Ö Ö Ö O O O O O O O -j oo oo oo oo -q oo (D J^ Ol CO M OO o s ji bo bi bi bi b M M N5 OO J^ Cíl Ol OO OO OO OO OO OO OO CO OO S ^ M M M čo lo b b b w bi CO 00 ^ 00 h-1 o o Tí o Ö ►t' o* H to o o 3 < s- Q 3 o OQ o T -O On ^ -P*. OJ to O ^1 o\ Lfr -P*. OJ to O o h-1 h-1 h-1 h-1 CO 00 h-1 h-1 ^1 h-1 h-1 Ol h-1 h-1 Ol h-1 h-1 4^ h-1 h-1 CO h-1 h-1 CO 4^ 0O 4^ 4^ OO h-1 4^ 7-1 4^ 4^ 4^ o 4^ Ol 7-1 4^ Ol CO 4^ Ol O Ö V1. h^ to —í co 4^ h-1 bi CO bi Ol bo o ČO Ü1 h-1 O bo h-1 to o bi h-1 Ö Ol bi CO Ö CO bi Ol h-1 O ►t' 116.19 119.12 h-1 h-1 Ol ČO CO h-1 h-1 OO h-1 CO h-1 h-1 Ol to h-1 h-1 ŕ-bi to h-1 h-1 Ol 4^ 4^ h-1 h-1 CO h-1 O 1 OJ 3 3 Ol o Ol CO 00 4^ CO 0O Ö 4^ 4^ CO to bi to 4^ 00 Ol ČO O 4^ 0O to 4^ to bi 4^ 4^ Ol Ol Ö 4^ 4^ Ol O h-1 O O o 3 OJ C 3 T! o' 3 Ö i H h-1 h-1 h-1 h-1 h-1 h-1 Ol h-1 h-1 7-1 h-1 h-1 Ol h-1 h-1 CO h-1 h-1 4^ h-1 h-1 CO Ol o Ol 4^ CO 00 4^ CO to 4^ 0O Ol 4^ 00 4^ to 4^ Ol Ol 4^ Ol O O t-1 c« < CT kí bi 00 h-1 bi to CO Ol h-1 bi 4^ bi CO h-1 O CO Ol Ö h-1 bi Ol bo 4^ bi Ol bi CO Ö h-1 h-1 O ►n h-1 h-1 h-1 h-1 00 Ol h-1 h-1 Ol h-1 h-1 Ol h-1 h-1 Ol h-1 h-1 CO h-1 h-1 4^ h-1 h-1 CO Ol o Ol 4^ CO 7-1 4^ CO to 4^ 0O Ol 4^ 00 4^ to 4^ Ol Ol 4^ Ol O o w bo kí Ol Ol h-1 Ü1 ČO Ol to Ü1 h-1 CO CO CO h-1 O to 00 ČO Ol bi to bo h-1 bi CO bi ^1 Ö O h-1 O OJ to 00 ^i c^ wi -^ w to oooooooo (D(D(D(DtDtDtDtD WWWWCOUUU OOOOOOOO O. ^1 C\ Ol JI W M I I I I I I I I WWWCOCOCOCOCO totototototototo oooooooo I I I I I I I I COCOCOCOtOtOtOtO O On ^ 4^ OJ to \ \ \ \ \ \ \ \ ÖÖÖÖÖÖÖÖ oooooooo oooooooo I I I I I I I I vl Ch Ul -r> w to oooooooo oooooooo totototototototo oooooooo I I I I I I I I vl 0\ W 4^ W IO oooooooo oooooooo Ti fa fa 3 o' ft> fa fa 3 OOOOOOOO OOOOOOClOClOIOCDtO OP-'OOCntOOCOP-' p^p^p^OCOCO^IOO (DSSOOUtDHtO I I I I I I I I cococococotototo oooooooo PPPtOtOMCOW CDK3tDM01-s|HO ►J^CnCnJ^OCndP-' I I I I I I I I OOOOOOOO OOOOOOOO ►^►J^J^CO-^ICntOOO I I I I I I I I OOOOOOOO totototototococo aiaiai^joocooo COCOCOOCnOClP-' Vi fa o' OJ oooooooo OOOOOOOOOOOOtDtD PNJfOPOC^JOO CDOPOiOOitDCú íiPtOG^lUOOtO OOOOWOltDStDP I I I I I I I I cocococototototo oooooooo pjpjpjpJtOtOCOCO ^SOlOOUOíPO PJP-'COCrl^l00ClP-' I I I I I I I I oooooooo oooooooo P^pJpJp^COJ^OO-^ h^h^OOOCOOO^JOO I I I I I I I I oooooooo totototototococo CnCnCnOl—ÍOOOO ^J^J^JtOOO^OlH^ oooooooo OOOOOOOOOOOOOO^O OOOOCOOOCl^OP-' pjpjpjococooooo uuuootoow 0)0)!DIn3SSPIn3 CTJCTJCTJP-'tOOP-'P-' OOOOOOOO PPPtOtOMCOW OOOOOOO^CIP-'O OOOOOOOO OOOOOOOO tOtOtOCO^CnOO^J ^Olí-PS^^OO oooooooo totototototococo C1C1C1C10000OO COCOCOOOP-'CTJCTJP-' o to CS O On ^ 4^ OJ to OOOOOOOOOOOOOOOO oooooooo I I I I I I I I Ol Ol Ol J^ CO H'r—'00 (OPfiK3ii^|ÜlOO ÜiOOOOOOCOÜltOCO ^Üi^lOOCnPÜiO) I I I I I I I I O) Di Ol ŕ U h'h-'00 COODitOCOOlCOOO POíOíi^tDOOtO tOO^JCOr-^OtOCI I I I I I I I I Ol *. S M W OJ t-'00 OOUitDtDWOi^tD ooooj^-^üitoaj vi Ch Ul -ŕ> w to I I I I I I I I oooooooo oooooooo oooooooo I I I I I I I I OOOOOh-'h-'to OtOr—'COOO^OiCO OOOíOíOitOOifOO I I I I I I I I OOOOr—'h-'h-'h-' cocococooi—'i—'to rooo^oo^opw COOr—'00C0Ü100O I I I I I I I I OOOOr—'r—'r—'r—1 !OOOOOPP(0 ^JOOOOCO^Or-^CO vi ON W 4^ W tO —J —J —^J —J —J —J —J —^J OIOIOIOIOIOIOIOI WWWWUCOCOCO oooooooo 00000000000000^1 oooooooto (DtDtDOOOlOJMS COOiOO!DK3U^P 00000000000000-^ oooooooco ^OiOOMGO^CSr-' 00000000000000^1 OOOOOOO^O OOOOOOSOiCOCOS OtOÜiOOtOtOCOr—' Tí 3 o' ft> 3 o' o w to On o co co oo oo -q -q -q Cn O Cn O Cn 4^ CO MMomoosmoio Cn Cn Cn Ji Ji U U O CTI O CTI O tO tO h-" CTI O CTI h-1 O O O CTI O o o o o o o o o o o o o o o oooooooooo oooooooooo tototototototototototototototototo Cn Ji Ji Ji Ji Ö Ö Cn CO Ol Ol CO O P-1 P O) OO OO Oí Cn Cn CnCnCnCnCnCnOlOl—ÍOO MUCnOlGOtDHUíiOO OOOPCnh-^luiJiíitO ooooooooooo ooooooooooo WWUUUŕ'ŕ'í'CnCnO) """""""""To ^^OlP^OOOOtO^ICI^I OOPPOÜitOOOOi^OOtO íi CO ^ S CO K3 P OO U Ol OO S O CO O P-1 -.1 O OO to to -.1 SO CT) CD 4^ p-1 ^J p-1 to to O Ol O to to oi oi oi oi oi oi oi (D OO S O) Ol Ol Cn - -- -- 0 -- - PCOÜiOOWOOCntOOlCn 010*"!ÜCOOOrtDCOp OlOlOlOlOlOlOlOlOlOlOlOlCnCnCnCnCnCnCnCnCn -J ^1 OO CO CO 00 4^ tO 00 Ol tO 4p O Cn CO CO -q Cn CO Cn cn 4p tO Ol 00 to Cn 4p 4p 4p 4p CO —í Cn CO O tO 4^ Ol 00 O 4p CO tO CO to 4p CO O (D (D 0O S Ol Cn 0O O CO Cn -^| 4p 4p CO POllsJtDSOlOlStD P-1 O O O tO CO OJ 00 w^cnrossoossai SlsJCnCnMP^PPOl^SUCOSCntDPOOllOlOCntDOOOCOŕ M P-1 P-1 to to to to co co co o o o o P Ol tD CO O) S OO CO O 00 4^ 00 4^ O totototototototototo OOOP-'P-'P-'P-'P-'P-'P-' OOCOCOOP-'P-'tOtOCn^J —ícoco4^oajp-'—j p (o tototototototototototo P-'tOtOtOtOtOtOtOtOP-'P-' coop-'totototop-'ocooo OJ^J^P^fiPOlGOGOS CO 0O o o KS P P S Cn ^ Cn CO CO o op-'p-'cooitooioai^j CO-^tOCnOl4^C0tOtO4^ tO P 4^ —í Cn h-1 P^OlOOJiStOCn CO^ICOOlOltOOOO) totototototototototototototototototototototototototototo pjpjpjpjpjpjpjpjpjpjpjpjpjpjpjpjpjpjpjpjp^tOtOtOtOCOCOCO 4^ CO tO P-1 P-1 P-1 P-1 *. co cn © o o ai 4^ 4^ p cn cn tO O Čn bo ČO Čn to Čo Čo CO Ol CO -^ Ol O -^ pjpjpjpjpjpjpjp^tOtO Cn Cn Ol O) O) ^í—J 0O (O CD COCOOtOOlOCnp^^JCO IpČnsÓobMlpbÓoP OlOOtOOOOlCnOlOOCOOO COCn01000t04^^íOCO^J CO OlCOtOCncO-^-^COCn Cj. CnOlMCOCOPOOMPSO ^bibisčobiboblpčo §:£ ^ < OQ o & 3 § B v 3 r+ I.Ö 3 a> ví a> Ö". "i o ' Tí O n> B !-| C o &- o 3 o P ^ T !-| O cr 3 OQ H o r+ £L tO -í 4p CO 0O oj to o to to i rr> H 4^ > T! T! O 3 O O OQ O 3 O o 3 T! O 3 3 S H I 3 ta g l to On to so oo oo -q -q -q W O Ol O (D OO S OlÜiJiWlNjMOOiOOi Cn Ji Ji U U O Ol O Ol O tO tO h-" CTI O CTI h-1 O O O CTI o o o o o o o o o o o o o o o oooooooooo oooooooooo ooooooooooo ooooooooooo tOtOtOtOtOtOtOtOtOtOtOtOtOtOtOtOtOOOOOP^ to to to CO K| Oí O) O) O) O) 05 O 4^. h-1 O O h-1 h-1 tO Oí M Oi ^J O íi w bo p b b u íi o co ai ^j co o oo OlOlOíOíOíOiOiSOOU MO3r"ÜiOlSÜiOi00 -^ h-1 00 00 o PUUi^lOOJOlOíiOOÍO íiUOitDOlOltDí-tOUS Cn-^|-^|Cn04^CnC004^ —í—J Üi wbbbowoi^bwsbbbüiioptob co^jtocooo^jaicooaito^jH^^^jcoH^H^ SO Ol co to to to íi OlOlOlOlOlOlOlOlOlOlCnCnCnCnCnCnCn ...... — — — — — — — tO tO tO tO tO 00 ^1 tO 0O O) Üi O ' H-1 O SO ~ Ö". "i O ' Tí O n> S !-| C o o- o 3 K-1 P ^ T !-| O a* 3 OQ H o r+ £L Ol^JOOCOOCOtOtOOO toototooototoooj^ o o i rr> H > T! T! O x B O O sa OQ O 3 o o 3 T! O 3 3 to On to tOtOOOOOOOOOOOOOOOOOOOOO OiOtDGOSOlOi^UNJHO H ^ Oí O) Cn Üi *. CTI O CTI O CTI O CTI 4^ CO CO O CTI o to to CTI O CTI I—1 o o O CTI o oooooooooooo oooooooooooo totototototototototototo oooooooooo oooooooooo totototototototototo o o o o o o o o o o o o to to to to to to Oi Ol O) ^1 OO to oo - ví a> Ö". "i o ' Tí O n> S !-| C o o- o 3 a\ P ^ T !-| O a* 3 OQ H o r+ £L Cn^JC0tO4^CO00OCn -^4^tOC0CncO4^tOh-' CO 4^ CO tO Cn O tDGOSöMOSUtO^ O^USOi-J^O)(OW 00 tO CT CO h-1 CO 0O -q h-1 O CO h-1 I H > T! T! O OQ O 3 O- T! O 3 3 to ON OJ Appendix U Changes in the Second Edition As is discussed in the Preface to the Second Edition the changes from the First Edition to the Second Edition are primarily in providing an electronic version of the book that can be posted on the Internet. However, many minor corrections and a few major ones have been made in the process of creating the electronic version of the book. Many of the minor changes are corrections of transpose signs. Some major changes are listed below. Eq. (10-64) has been changed from xfc+1 = Ak±k\k + BfcUfc + Cfc + wk (U. 1) to xfc+i = AfeXfc + BfeU^ + Cfc + Vfc (U.2) Eq. (10-65) has been changed from ek+l = noklk + r]k (U.3) to ek+1 = D0fc + r]k (U.4) Eq. (M-l) has been changed from zk+l]k « f (žk\k, uTk) + -Y, eHrß^ifc] (U.5) z iei to zk+l]k « f (žal*, uTk) + -J2 eHr^Efcifc] (U.6) 1 iei 264 Bibliography Abel, Andrew B. (1975): A Comparison of Three Control Algorithms to the Monetarist-Fiscalist Debate, Ann. Econ. Soc.Meas., 4(2):239-252, Spring. Ando, Albert, Alfred Norman, and Carl Palash (1978): On the Application of Optimal Control to a Large Scale Econometric Model, in "Applied Optimal Control," vol. 9 in A. Bensoussan, T. Kleindorfer, and S. H. S. Tapiero (eds.), "Studies in the Management Sciences," North-Holland, Amsterdam. Aoki, Masanao (1967): "Optimization of Stochastic Systems," Academic, NY. Aoki, Masanao (1973): Sufficient Conditions for Optimal Stabilization Policies, Rev. Econ. Stud, 40:131-138, January. Aoki, Masanao (1974a): Noninteracting Control of Macroeconomic Variables: Implication on Policy Mix Considerations, J. Econometr., 2(4):261-281. Aoki, Masanao (1974b): Stochastic Control Theory in Economics: Applications and New Problems, IFAC Symp. Stochastic Control, Budapest. Aoki, Masanao (1976): "Dynamic Economic Theory and Control in Economics," American Elsevier, New York. Arrow, Kenneth J. (1968): Applications of Control Theory to Economic Growth, Lect. Appl. Math. Math. Decision Sei., pt. 2, vol. 12, American Mathematical Society, Providence, R.I. Ashley, Richard Arthur (1976): Postponed Linear Approximation in Stochastic Multiperiod Problems, Ph.D. dissertation, University of California, Department of Economics, San Diego. Ashley, Richard Arthur (1979): Postponed Linear Approximations and Adaptive Control with Non-quadratic Losses, J. Econ. Dynam. Control, l(4):347-360, November. Äthans, Michael (1972): The Discrete Time Linear-Quadratic-Gaussian Stochastic Control Problem, Ann. Econ. Soc. Meas., l(4):449-492. Äthans, Michael, and Peter L. Falb (1966): "Optimal Control," McGraw-Hill, New York. 265 BIBLIOGRAPHY 266 Äthans, Michael, and D. Kendrick (1974): Contol Theory and Economics: A Survey, Forecast, and Speculations, IEEE Trans. Autom. Control, 19(5):518-523, October. Äthans, Michael, Richard Ku, and Stanley B. Gershwin (1977): The Uncertainty Threshold Principle, IEEE Trans. Autom. Control, June, AC-22:491-495. Äthans, Michael, Edwin Kuh, Lucas Papademos, Robert Pindyck, Richard Ku, Turgay Ozkan, and Kent Wall (1975): Sequential Open Loop Optimal Control of a Nonlinear Macroeconomic Model, 3d World Congr. Econometric Soc, Toronto. Äthans, Michael, R. P. Wishner, and A. Bertolini (1968): Suboptimal State Estimation for Continuous Time Nonlinear Systems with Discrete Noise Measurements, IEEE Trans. Autom. Control, 13(5):504-514, October. Ayres, Frank, Jr. (1962): "Theory and Problems of Matrices," Schaum, Waltham, Mass. Bar-Shalom, Yaakov, and R. Sivan (1969): On the Optimal Control of Discrete-Time Linear Systems with Random Parameters, IEEE Trans. Autom. Control, AC-14:3-8, February. Bar-Shalom, Yaakov, and Edison Tse (1976a): Caution, Probing and the Value of Information in the Control of Uncertain Systems, Ann. Econ. Soc. Meas., 5(2):323-338, Spring. Bar-Shalom, Yaakov, and Edison Tse (1976b): Concepts and Methods in Stochastic Control, Control Dynam. Syst.: Adv. Theory Appl., 12:99-172. Bar-Shalom, Yaakov, Edison Tse, and R. E. Larson (1974): Some Recent Advances in the Development of Closed-Loop Stochastic Control and Resource Allocation Algorithms, Proc. IFAC Symp. Adaptive Control, Budapest. Bar-Shalom, Yaakov, and Kent Wall (1978): Effect of Uncertainties on the Adaptive Control of Macroeconomic Systems, International Federation of Automatic Control (IFAC) Conference, Sweden, 1978. Bellman, Richard (1957): "Dynamic Programming," Princeton University Press, Princeton, N.J. Bellman, Richard, and Stuart Dreyfus (1962): "Applied Dynamic Programming," Princeton University Press, Princeton, N.J. Boggard, P. J. M. van den, and H. Theil (1959): Macrodynamic Policy Making: An Application of Strategy and Certainty Equivalence Concepts to the Economy of the United States, 1933-36, Metroeconomica, 11:149-167. Bowman, H. Woods, and Anne Marie Laporte (1972): Stochastic Optimization in Recursive Equation Systems and Random Parameters, Ann. Econ. Soc. Meas., 1(4):419^136. BIBLIOGRAPHY 267 Bray, Jeremy (1974): Predictive Control of a Stochastic Model of the U.K. Economy Simulating Present Policy Making Practice by the U.K. Government, Ann. Econ. Soc. Meas., 3(l):239-256, January. Bray, Jeremy (1975): Optimal Control of a Noisy Economy with the U.K. as an Example,./ Statist. Soc, 138A:339-366. Brito, D. L., and D. D. Hester (1974): Stability and Control of the Money Supply, Q. J. Econ., 88(2):278-303, May. Bryson, Arthur E., Jr., and Yu-Chi Ho (1969): "Applied Optimal Control," Blaisdell, Waltham, Mass. Burger, Albert E., Lionel Kalish III, and Christopher T. Babb (1971): Money Stock Control and Its Implications for Monetary Policy, Fed. Reserv. Bank St. Louis Rev., 53:6-22, October. Cheng, David C, and San Wan (1972): Time Optimal Control of Inflation, Georgia Institute of Technology, College of Industrial Management (photocopy). Chow, Gregory C. (1967): Multiplier, Accelerator, and Liquidity Preference in the Determination of National Income in the United States, Rev. Econ. Statist., 49(1): 1-15, February. Chow, Gregory C. (1970): Optimal Stochastic Control of Linear Economic Systems, J. Money Credit Banking, 1:411-425. Chow, Gregory C. (1972): How Much Could Be Gained by Optimal Stochastic Control Policies, Ann. Econ. Soc. Meas., l(4):391-406. Chow, Gregory C. (1973): Effect of Uncertainty on Optimal Control Policies, Int. Econ. Rev, 14:632-645. Chow, Gregory C. (1975): "Analysis and Control of Dynamic Systems," Wiley, New York. Conrad, William E. (1977): Imperfect Observation and Systematic Policy Error, Ann. Econ. Soc. Meas., 6:3. Cooper, J. Phillip, and Stanley Fischer (1975): A Method for Stochastic Control of Nonlinear Econometric Models and an Application, Econometrica, 4(1): 147- 162, January. Craine, Roger, Arthur Havenner, and Peter Tinsley (1976): Optimal Macroeco- nomic Control Policies,^««. Econ. Soc. Meas., 5(2): 191-203, Spring. Curry, R. E. (1969): A New Algorithm for Suboptimal Stochastic Control, IEEE Trans. Autom. Control, AC-14:533-536. Davidon, W. C. (1959): Variable Metric Method for Minimization, AECRes. Dev Rep. ANL-5990. Denham, W. (1964): Choosing the Nominal Path for a Dynamic System with Random Forcing Function to Optimize Statistical Performance, Harvard Univ. BIBLIOGRAPHY 268 Div. Eng. Appl. Phys., TR449. Dersin, Pierre, Michael Äthans, and David A. Kendrick (1979): Some Properties of the Dual Adaptive Stochastic Control Algorithm, M.I. T. Lab. Inf. Decis. Sei., LIDS-P-936, August. Deshpande, J. G., T. N. Upadhyay, and D. G. Lainoitis (1973): Adaptive Control of Linear Stochastic Systems, Automatica, 9:107-115, January. Dobell, A. R. (1969): Some Characteristic Features of Optimal Problems in Economic Theory, IEEE Trans. Autom. Control, AC-14(l):39-46, February. Dobell, A. R., and Y. C. Ho (1967): Optimal Investment Policy: An Example of a Control Problem in Economic Theory, IEEE Trans. Autom. Control, AC- 12(1):4-14, February. Drud, Arne (1976): "Methods for Control of Complex Dynamic Systems," Tech. Univ. Denmark, Inst. Math. Statist. Oper. Res., no. 27. Drud, Arne (1977): An Optimization Code for Nonlinear Econometric Models Based on Sparse Matrix Techniques and Reduced Gradients, I: Theory, Technical University of Denmark, Department of Mathematical Statistics and Operations Research (photocopy). Eijk, C. J. van, and J. Sandee (1959): Quantitative Determination of an Optimal Economic Policy, Econometrica, 27:1-13. Erickson, D. L. (1968): Sensitivity Constrained Optimal Control Policies for a Dynamic Model of the U.S. National Economy, Ph.D. dissertation, University of California, School of Engineering, Los Angeles. Erickson, D. L., C. T. Leondes, and F. E. Norton (1970): Optimal Decision and Control Policies in the National Economy, Proc. 9th IEEE Symp. Adaptive Process. Decis. Control, Univ. Texas, Austin, December, pp. XII.2.1-XII.2.6. Erickson, D. L., and F. E. Norton (1973): Application of Sensitivity Constrained Optimal Control to National Economic Policy, Control Dynam. Syst., 9:131- 237. Fair, Ray C. (1974): On the Solution of Optimal Control Problems as Maximization Problems, Ann. Econ. Soc. Meas., 3(1): 135-154, January. Fair, Ray C. (1976): "A Model of Macroeconomic Activity," vol. II; "The Empirical Model," Ballinger, Cambridge, Mass. Fair, Ray C. (1978a): The Effects of Economic Events on Votes for President, Rev. Econ. Statist, 60:159-173, May. Fair, Ray C. (1978b): The Use of Optimal Control Techniques to Measure Economic Performance, Int. Econ. Rev., 19:289-309, June. Farison, J. B., R. E. Graham, and R. C. Shelton (1967): Identification and Control of Linear Discrete Systems, IEEE Trans. Autom. Control, AC-12(4):438-442, BIBLIOGRAPHY 269 August. Fischer, Joachim, and Götz Uebe (1975): Stability and Optimal Control of a Large Linearized Econometric Model for Germany, Technische Universität München, Institut für Statistik and Unternehmensforschung (photocopy). Fisher, W. D. (1962): Estimation in the Linear Decision Model, Int. Econ. Rev., 3:1-29. Fitzgerald, V. W., H. N. Johnston, and A. J. Bayes (1973): An Interactive Computing Algorithm for Optimal Policy Selection with Nonlinear Econometric Models, Commonwealth Bureau of Census and Statistics, Canberra, Australia (photocopy). Fletcher, R., and M. J. D. Powell (1963): A Rapidly Convergent Descent Method of Minimization, Comp. J., 6:163-168. Fletcher, R., and C. M. Reeves (1964): Function Minimization for Conjugate Gradients, Br. Comput. J., 7:149-154, July. Friedman, Benjamin M. (1972): Optimal Economic Stabilization Policy: An Extended Framework, J. Polit. Econ., 80:1002-1022, September-October. Friedman, Benjamin M., and E. Phillip Howrey (1973): Nonlinear Models and Linear Optimal Policies: An Evaluation, Harvard Inst. Econ. Res., Discuss. Pap. 316. Gantmacher, F. R. (1960): "The Theory of Matrices," Chelsea, New York. Garbade, Kenneth D. (1975a): "Discretionary Control of Aggregate Economic Activity," Lexington, Lexington, Mass. Garbade, Kenneth D. (1975b): Discretion in the Choice of Macroeconomic Policies,^««. Econ. Soc. Meas., 4(2):215-238, Spring. Garbade, Kenneth D. (1976): On the Existence and Uniqueness of Solutions of Multiperiod Linear Quadratic Control Problems, Int. Econ. Rev, 17(3):719- 732, October. Geraci, Vincent J. (1976): Identification of Simultaneous Equation Models with Measurement Error, J. Econometr., 4(3):263-283, August. Gill, P. E., W. Murray, S. M. Picken, H. M. Barber, and H. M. Wright (1976): Subroutine LNSRCH and NEWPTC, National Physical Laboratory, Teddington, NPL Algorithm Library, Ef/16/0 Fortran/02/76. Goldberger, Arthur S. (1964): "Econometric Theory," Wiley, New York. Gordon, Roger H. (1974): The Investment Tax Credit as a Supplementary Discretionary Stabilization Tool, Harvard University, Department of Economics, Cambridge, Mass. (photocopy). Gupta, Surender K., Laurence H. Meyer, Frederic Q. Raines, and Tzyh-Jong Tarn (1975): Optimal Coordination of Aggregate Stabilization Policies: Some BIBLIOGRAPHY 270 Simulation Results, Ann. Econ. Soc. Meas., 4:253-270, Spring. Healey, A. J., and F. Medina (1975): Economic Stabilization from the Monetaris- tic Viewpoint Using the Dynamic Philips Curve Concept, University of Texas, Department of Mechanical Engineering, Austin (photocopy). Healey, A. J., and S. Summers (1974): A Suboptimal Method for Feedback Control of the St. Louis Econometric Model, Trans. ASME, J. Dynam. Syst., Meas. Control, 96(4):446-454, December. Henderson, D. W., and S. J. Turnovsky (1972): Optimal Macroeconomic Policy Adjustment under Conditions of Risk, J. Econ. Theory, 4:58-71. Holbrook, Robert S. (1973): An Approach to the Choice of Optimal Policy Using Large Econometric Models, Bank Can. Staff Res. Stud, No. 8, Ottawa. Holbrook, Robert S. (1974): A Practical Method for Controlling a Large Nonlinear Stochastic System, Ann. Econ. Soc. Meas., 3(1): 155-176, January. Holbrook, Robert S. (1975): Optimal Policy Choice under a Nonlinear Constraint: An Iterative Application of Linear Techniques, J. Money, Credit Banking, 7(l):33-49, February. Holly, Sean, Bere Růstem, and Martin B. Zarrop (eds.) (1979): "Optimal Control for Econometric Models: An Approach to Economic Policy Formulation," Macmillan, London. Holt, C. C. (1962): Linear Decision Rules for Economic Stabilization and Growth, Q. J. Econ., 76:20-45. IMSL Library 3(1974): Edition 3 (Fortran 2.4), International Mathematical and Statistical Libraries, 6200 Hilcroft, Suite 510, Houston, Tex. Intriligator, Michael D. (1971): "Mathematical Optimization and Economic Theory," Prentice-Hall, Englewood Cliffs, N.J. Intriligator, Michael D. (1975): Applications of Optimal Control Theory in Economics, Synthese, 31:271-288. Jacobson, D. H., and D. Q. Mayne (1970): "Differential Dynamic Programming," American Elsevier, New York. Kareken, J. H., T. Muench, and N. Wallace (1973): Optimal Open Market Strategy: The Use of Information Variables, Am. Econ. Rev., 63:156-172. Kaul, T. K., and D. S. Rao (1975): Digital Simulation and Optimal Control of International Short-Term Capital Movements, 3d World Congr. Econometric Soc, Toronto. Kendrick, D. A. (1973): Stochastic Control in Macroeconomic Models, Inst. Elec. Eng. IEEE Conf. Publ. 101, pp. 200-207. Kendrick, D. A. (1976): Applications of Control Theory to Macroeconomics, Ann. Econ. Soc. Meas., 5(2): 171-190. BIBLIOGRAPHY 271 Kendrick, D. A. (1978): Non-convexities from Probing an Adaptive Control Problem, J. Econ. Lett, 1:347-351. Kendrick, D. A. (1979): Adaptive Control of Macroeconomic Models with Measurement Error, chap. 9 in Holly, Růstem, and Zarrop (1979). Kendrick, D. A. (1980): Control Theory with Application to Economics, chap. 4 in Kenneth J. Arrow and Michael D. Intriligator (eds.), "Handbook of Mathematical Economics, " North-Holland, Amsterdam. Kendrick, D. A. (1980a): "Caution and Probing in Macroeconomic Model," Center for Economic Research, Univ. of Texas, Austin, Texas. Presented at the World Congress of the Econometric Society, Aix-en-Provence, France, August 1980. Kendrick, D. A., and J. Majors (1974): Stochastic Control with Uncertain Macroeconomic Parameters, Automatica, 10(2):587-594. Kendrick, D. A., H. Rao, and C. Wells (1970): Optimal Operation of a System of Waste Water Treatment Facilities, Proc. 9th IEEE Symp. Adaptive Process. De cis. Control, Univ. Texas, Austin. Kendrick, D. A., and Lance Taylor (1970): Numerical Solutions of Nonlinear Planning Models, Econometrica, 38(3):453-467. Kendrick, D. A., and Lance Taylor (1971): Numerical Methods and Nonlinear Optimizing Models for Economic Planning, chap. 1 in Holls B. Chenery (ed.), "Studies in Development Planning," Harvard University Press, Cambridge, Mass. Kim, Han K., Louis M. Goreux, and David A. Kendrick (1975): Feedback Control Rule for Cocoa Market Stabilization, chap. 9 in Walter C. Labys (ed.), "Quantitative Models of Commodity Markets," Ballinger, Cambridge, Mass. Klein, Lawrence R. (1979): Managing the Modern Economy: Econometric Specification, chap. 11 in Holly, Růstem, and Zarrop (1979). Kmenta, Jan (1971): "Elements of Econometrics," Macmillan, New York. Ku, R., and M. Äthans (1973): On the Adaptive Control of Linear Systems Using the Open Loop Feedback Optimal Approach, IEEE Trans. Autom. Control, AC-18:489-493. Ku, R., and M. Äthans (1977): Further Results on the Uncertainty Threshold Principle, IEEE Trans. Autom. Control, AC-22(5): 866-868. Kydland, Finn (1973): Decentralized Macroeconomic Planning, Ph.D. dissertation, Carnegie-Mellon University, Pittsburgh. Kydland, Finn (1975): Decentralized Stabilization Policies: Optimization and the Assignment Problem, Ann. Econ. Soc. Meas., 5(2):249-262. Lasdon, L.S., S. K. Mitter, and A. D. Warren (1967): The Conjugate Gradient BIBLIOGRAPHY 272 Method for Optimal Control Problems, IEEE Trans. Autom. Control, 12:132- 138, April. Livesey, D. A. (1971): Optimizing Short-Term Economic Policy, Econ. J., 81:525-546. Livesey, D. A. (1976): A Minimal Realization of the Leontief Dynamic Input-Output Model, chap. 25 in K. Polenske and J. Školka (eds.), "Advances in Input-Output Analysis," Ballinger, Cambridge, Mass. Livesey, D. A. (1977): On the Specification of Unemployment and Inflation in the Objective Function: A Comment, Ann. Econ. Soc. Meas., 6(3):291-293, Summer. Livesey, D. A. (1978): Feasible Directions in Economic Policy, J. Optimization Theory Appl, 25(3):383-406. MacRae, Elizabeth Chase (1972): Linear Decision with Experimentation, Ann. Econ. Soc. Meas., 1:437-447. MacRae, Elizabeth Chase (1975): An Adaptive Learning Role for Multiperiod Decision Problems, Econometrica, 43(5-6):893-906. Mantell, J. B., and L. S. Lasdon (1977): Algorithms and Software for Large Econometric Control Problems, NBER Conf. Econ. Control, New Haven, Conn., May. Miller, Ronald E. (1979): "Dynamic Optimization and Economic Applications," McGraw-Hill, New York. Murtagh, Bruce A., and Michael A. Saunders (1977): MINOS, A Large-Scale Nonlinear Programming System, Stanford Univ. Syst. Optimization Lab. Tech. Rep. SOL 77-9, February. Norman, A. L. (1976): First Order Dual Control, Ann. Econ. Soc. Meas., 5(3):311-322, Spring. Norman, A. L. (1979): Dual Control of Perfect Observations, pp. 343-349 in J. N. L. Janssen, L. M. Pau, and A. Straszak (eds.), "Models and Decision Making in National Economies," Norht-Holland, Amsterdam. Norman, A. L., and M. R. Norman (1973): Behavioral Consistency Test of Econometric Models, IEEE Trans. Autom. Control, AC-18:465-472, October. Norman, A. L., and Woo Sik Jung (1977): Linear Quadratic Control Theory for Models with Long Lags, Econometrica, 45(4):905-918. Oudet, B. A. (1976): Use of the Linear Quadratic Approach as a Tool for Analyzing the Dynamic Behavior of a Model of the French Economy, Ann. Econ. Soc. Meas., 5(2):205-210, Spring. Pagan, Adrien (1975): Optimal Control of Econometric Models with Autocorre- lated Disturbance Terms, Int. Econ. Rev, 16(l):258-263, February. BIBLIOGRAPHY 273 Palash, Carl J. (1977): On the Specification of Unemployment and Inflation in the Objective Function, Ann. Econ. Soc. Meas., 6(3):275-300. Paryani, K. (1972): Optimal Control of Linear Macroeconomic Systems, Ph.D. thesis, Michigan State University, Department of Electrical Engineering, East Lansing. Perry, A. (1976): An Improved Conjugate Gradient Algorithm, Northwestern Univ. Dept. Decis. Sei. Tech. Note, Evanston, 111. Phelps, Edmund S., and John B. Taylor (1977): Stabilizing Properties of Monetary Policy under Rational Price Expectations, J. Polit. Econ., 85:163- 190, February. Phillips, A. W. (1954): Stabilization Policy in a Closed Economy, Econ. J., 64:290-323, June. Phillips, A. W. (1957): Stabilization Policy and the Time Form of the Lagged Responses, Econ. J., 67:265-277, June. Pindyck, Robert S. (1972): An Application of the Linear Quadratic Tracking Problem to Economic Stabilization Policy, IEEE Trans. Automatic Control, AC-17(3):287-300, June. Pindyck, Robert S. (1973a): "Optimal Planning for Economic Stabilization," North-Holland, Amsterdam. Pindyck, Robert S. (1973b): Optimal Policies for Economic Stabilization, Econometrica, 41(3):529-560, May. Pindyck, Robert S., and Steven M. Roberts (1974): Optimal Policies for Monetary Control, Ann. Econ. Soc. Meas., 3(l):207-238, January. Pitchford, John, and Steve Turnovsky (1977): "Application of Control Theory to Economic Analysis," North-Holland, Amsterdam. Polack, E., and G. Ribiěre (1969): Note sur la convergence de méthodes de directions conjugées, Rev. Fr. Inf. Rech. Oper, 16RL35-43. Prescott, E. C. (1967): Adaptive Decision Rules for Macroeconomic Planning, doctoral dissertation, Carnegie-Mellon University, Graduate School of Industrial Administration. Prescott, E. C. (1971): Adaptive Decision Rules for Macroeconomic Planning, West. Econ. J., 9:369-378. Prescott, E. C. (1972): The Multi-period Control Problem under Uncertainty, Econometrica, 40:1043-1058. Preston, A. J., and K. D. Wall (1973): Some Aspects of the Use of State Space Models in Econometrics, Univ. London, Programme Res. Econometr. Methods Discuss. Pap. 5. Rausser, Gordon (1978): Active Learning, Control Theory, and Agricultural BIBLIOGRAPHY 274 Policy, Amer. J. Agricultural Economics, 60(3):476^90, 1978. Rausser, Gordon, and J. Freebairn (1974): Approximate Adaptive Control Solution to the U.S. Beef Trade Policy, Ann. Econ. Soc. Meas., 3(1): 177-204. Rouzier, P. (1974): "The Evaluation of Optimal Monetary and Fiscal Policy with a Macroeconomic Model for Belgium," Catholic University of Louvain, Belgium, 1974. Sandblom, C. L. (1970): On Control Theory and Economic Stabilization, Ph.D. dissertation, Lund University, Sweden, National Economy Institution. Sandblom, C. L. (1975): Stabilization of a Fluctuating Simple Macroeconomic Model, Cybern. Syst. Res., 2:251-262. Sargent, T. J., and N. Wallace (1975): "Rational" Expectations, the Optimal Monetary Instrument and the Optimal Money Supply Rule, J. Polit. Econ., 83:241-254, April. Sarris, Alexander H., and Michael Äthans (1973): Optimal Adaptive Control Methods for Structurally Varying Systems, Natl. Bur. Econ. Res. Working Pap. 24, Cambridge, Mass., December. Shanno, D. F. (1977): Conjugate Gradient Methods with Inexact Searches, Univ. Arizona Coll. Bus. Public Admin. Manage. Inf. Syst. Working Pap. Tempe, Ariz. Shupp, Franklin R. (1972): Uncertainty and Stabilization Policies for a Nonlinear Macroeconomic Model, Q. J. Econ., 80(1):94-110, February. Shupp, Franklin R. (1976a): Optimal Policy Rules for a Temporary Incomes Policy, Rev. Econ. Stud., 43(2):249-259, June. Shupp, Franklin R. (1976b): Uncertainty and Optimal Policy Intensity in Fiscal and Incomes Policies, Ann. Econ. Soc. Meas., 5(2):225-238, Spring. Shupp, Franklin R. (1976c): Uncertainty and Optimal Stabilization Policies, J. Public Financ, 6(4):243-253, November. Shupp, Franklin R. (1977): Social Performance Functions and the Dichotomy Argument: A Comment,^««. Econ. Soc. Meas., 6(3):295-300, Summer. Simon, H. A. (1956): Dynamic Programming under Uncertainty with a Quadratic Criterion Function, Econometrica, 24:74-81, January 1956. Taylor, J. B. (1973): A Criterion for Multiperiod Control in Economic Models with Unknown Parameters, Columbia Univ. Dept. Econ. Discuss. Pap. 73- 7406. Taylor, J. B. (1974): Asymptotic Properties of Multiperiod Control Rules in the Linear Regression Model, Int. Econ. Rev., 15(2):472-482, June. Thalberg, Bjorn (1971a): Stabilization Policy and the Nonlinear Theory of the Trade Cycle, Swed. J. Econ., 73:294-310. BIBLIOGRAPHY 275 Thalberg, Björn (1971b): A Note on Phillips' Elementary Conclusions on the Problems of Stabilization Policy, Swed. J. Econ., 73:385-408. Theil, H. (1957): A Note on Certainty Equivalence in Dynamic Planning, Econometrica, 25:346-349, April. Theil, H. (1964): "Optimal Decision Rules for Government and Industry," North-Holland, Amsterdam. Theil, H. (1965): Linear Decision Rules for Macro-dynamic Policy Problems, in B. Hickman (ed.), "Quantitative Planning of Economic Policy," The Brookings Institute, Washington. Theil, H. (1971): "Principles of Econometrics," Wiley, New York. Tinsley, P., R. Craine, and A. Havenner (1974): On NEREF Solutions of Macroeconomic Tracking Problems, 3d NBER Stochastic Control Conf, Washington. Tse, Edison, and Michael Äthans (1972): Adaptive Stochastic Control for a Class of Linear Systems, IEEE Trans. Autom. Control, AC-17:38-52, February. Tse, Edison, and Y. Bar-Shalom (1973): An Actively Adaptive Control for Linear Systems with Random Parameters, IEEE Trans. Autom. Control, AC-18:109- 117, April. Tse, Edison, Y Bar-Shalom, and L. Meier (1973): Wide Sense Adaptive Dual Control for Nonlinear Stochastic Systems, IEEE Trans. Automatic Control, AC-18: 98-108, April. Turnovsky, Stephen J. (1973): Optimal Stabilization Policies for Deterministic and Stochastic Linear Systems, Rev. Econ. Stud, 40(121):79-96, January. Turnovsky, Stephen J. (1974): Stability Properties of Optimal Economic Policies, Am. Econ. Rev., 44:136-147. Turnovsky, Stephen J. (1975): Optimal Choise of Monetary Instruments in a Linear Economic Model with Stochastic Coefficients, J. Money Credit Banking, 7:51-80. Turnovsky, Stephen J. (1977): Optimal Control of Linear Systems with Stochastic Coefficients and Additive Disturbances, chap. 11, in Pitchford and Turnovsky (1977). Tustin, A. (1953): "The Mechanism of Economic Systems," Harvard University Press, Cambridge, Mass. Upadhyay, Treveni (1975): Application of Adaptive Control to Economic Stabilization Policy, Int. J. Syst. Sei., 6(10):641-650. Wall, K. D., and J. H. Westcott (1974): Macroeconomic Modelling for Control, IEEE Trans. Autom. Control, AC-19:862-873, December. Wall, K. D., and J. H. Westcott (1975): Policy Optimization Studies with a Simple BIBLIOGRAPHY 276 Control Model of the U.K. Economy, Proc. IFAC/75 Congress, Boston and Cambridge, Mass. Walsh, Peter, and J. B. Cruz (1975): Neighboring Stochastic Control of an Econometric Model, 4th NBER Stochastic Control Conf, Cambridge, Mass. Woodside, M. (1973): Uncertainty in Policy Optimization: Experiments on a Large Econometric Model, Inst. Elect. Eng. IEE Conf. Publ. 101, pp. 418— 429. You, Jong Keun (1975): A Sensitivity Analysis of Optimal Stochastic Control Policies, 4th NBER Stochastic Control Conf, Cambridge, Mass. Zellner, Arnold (1966): On Controlling, and Learning about a Normal Regression Model, University of Chicago, School of Business, Chicago (photocopy). Zellner, Arnold (1971): "An Introduction to Bayesian Inference in Econometrics," Wiley, New York. Zellner, Arnold, and M. V. Geisel (1968): Sensitivity of Control to Uncertainty and Form of the Criterion Function, pp. 269-283 in D. G. Watts (ed.), "The Future of Statistics," Academic, New York. Index Abel, Andrew B., 72, 146, 265 Adaptive control, 2, 71, 72, 77, 139, 140, 144 Additive error terms, 44^15 Additive uncertainty, 39^15 Agricultural problems, 46 Ando, Albert, 19, 28, 265 Aoki, Masanao, 1, 48, 57, 59, 77, 265 Arrow, Kenneth J., 1, 265 Ashley, Richard Arthur, 44, 265 Astrom Karl, 41 Äthans, Michael, x, 1, 19, 41, 45, 48, 57, 72, 99, 155, 184, 188, 265,266,268,271,274,275 Augmented state vector, 9, 233-238 Augmented system, 106-107 matrix recursions for, 207-216 vector recursions for, 217-221 Ayres, Frank, Jr., 178, 266 Babb, Christopher T., 59, 267 Backward integration, 54, 80, 81 Bar-Shalom, Yaakov, ix, x, 42, 43, 48, 57, 72, 79, 84-86, 90, 94, 97, 102, 104, 111, 121, 132, 139, 155, 173, 185, 246, 266, 275 Barber, H. M., 27, 269 Bayes, A. J., 19, 269 Bellman, Richard, 11, 266 Bertolini, A., 99, 184, 188,266 Boggard, R J. M. van den, 7, 266 Bowman, H. Woods, 59, 266 Bray, Jeremy, 45, 267 Brito, D. L., 45, 267 Bryson, Arthur E., Jr., 1, 4, 26, 100, 177, 178, 180, 182,267 BTL (Bar-Shalom, Tse, and Larson), 48, 84, 94, 97, 173, 185, 187 Buffer-stock level, viii, 12 Burger, Albert E., 59, 267 Cautionary term (component), 79, 80, 111, 112, 117, 118, 127, 130, 131, 155, 163, 165, 243, 244, 261 Certainty equivalence (CE), 40, 44, 77, 86, 108, 118, 125, 139-141,239 heuristic, 57 optimal cost-to-go problem, 204- 205 sequential, 57, 64, 144, 239-240 update, 57 Certainty equivalence (CE) sequential, 140 Cheng, David C, 19,267 Chow, Gregory C, 1, 6, 10, 45, 48, 57, 59, 72, 82, 267 Closed-loop policy, 43, 85 Commodity stabilization, viii, 11, 12 Conditional distribution, 178 277 INDEX 278 Conjugate gradient, 27, 88 Conrad, William E., 135, 246, 267 Consumption, 5, 7, 9, 30, 31, 135, 137, 142, 164, 258 Continuous-time problems, 4, 19 Control variables, 5-10 Control vector, 11 Cooper, J. Phillip, 59, 267 Cost-to-go, 49, 77, 80, 86, 110, 123, 155-164, 185-187,222-224, 243-245, 261 deterministic, 49, 243 expected, 49 optimal, 10-13, 49, 51, 79, 88, 204 random, 73 Costate equations, 20, 26 Costate variables, 25 Covariance matrices: projection of, 98-102 updating, 103 Craine, Roger, 7, 19, 267, 275 Criterion function, 36-37 quadratic, 36-37 Cruz, J. B., 59, 276 Curry, R. E., 48, 57, 267 Davidon, W. C, 27, 267 Denham, W., 45, 267 Dersin, Pierre, 155, 268 Deshpande, J. G., 72, 268 Deterministic control, 2, 4-37, 76 example of, 30-37 system equations, 30-36 examples of criterion function, 36-37 Deterministic cost-to-go, 49, 243 Deterministic problem, data for, 250- 253 Deterministic term (component), 79, 80, 111, 112, 117, 118, 127, 130, 131, 155, 163,243,261 Difference equations: nth-order, 5, 6, 9 first-order, 8, 9 second-order, 8 Discrete-time problems, 4 Dobell, A. R., 1, 268 Dreyfus, Stuart, 11,266 Drud, Arne, 21, 28, 29, 268 Dual control, 2, 71, 144, 146, 257 Dual-control algorithm, 113-120 Dynamic programming, 5, 10, 11, 13,48,86 Eijk, C. J. van, 7, 268 Endogenous variables, 33 Erickson, D. L., 7, 268 Error terms, additive, 33, 44-45 Expected cost-to-go, 49 Expected values of matrix products, 56-57 Expenditure, government, 5, 10, 30, 31, 164 Explicit form, 21,28 Fair, Ray C, 19, 37, 268 Falb, Peter L., 1,265 Farison, J. B., 48, 268 Feedback policy, 43, 85 Feedback rule, 5, 11, 12, 14 for deterministic problems, 15, 17, 205 for stochastic problems, 45, 53, 54 INDEX 279 Feedback-gain matrices, 54 Fiscal policy, viii, 135 Fischer, Joachim, 7, 269 Fischer, Stanley, 59, 267 Fisher, W. D., 1, 59, 269 Fitzgerald, V. W., 19, 269 Fletcher, R., 27, 269 Forward integration, 81 Freebairn, J., 57, 72, 274 Friedman, Benjamin M., 7, 19, 36, 269 Gantmacher, F. R., 179, 269 Garbade, Kenneth D., 21, 45, 269 Geisel, M. V, 59, 276 Generalized reduced gradient (GRG), 29 Geraci, Vincent J., 40, 134, 269 Gershwin, Stanley B., 266 Gill, P. E., 27, 269 Goldberger, Arthur S., 170, 200, 269 Gordon, Roger H., 45, 269 Goreux, Louis M., 12, 45, 271 Government expenditure, 5, 10, 31, 164 Government obligations, 10,35, 135, 258, 261 Government taxation, 5 Gradient conjugate, 27, 88 Gradient methods, 81, 86 for nonlinear problems, 25-27 Gradient vector, 208, 209, 212 Graham, R. E., 48, 268 Gross national product, 5, 9, 30, 31 Gupta, SurenderK., 19, 269 Hamiltonian, 25 Havenner, Arthur, 7, 19, 267, 275 Healey, A. J., 19, 270 Henderson, D. W., 59, 270 Hessians, 169 Hester, D. D., 45, 267 Heuristic certainty equivalence, 57 Hewett, Ed, x Ho, Yu-Chi, 1, 4, 26, 100, 177, 178, 180, 182,267,268 Holbrook, Robert S., 19, 270 Holly, Sean, 270 Holt, C. C, 1, 7, 270 Howrey, E. Phillip, 19, 269 Identifiability, 34 Identification, 33 Implicit form, 21,28 IMSL (International Mathematical and Statistical Libraries), 86 Inflation, viii, 5 Initial conditions, 26 Initialization, 115, 123 Integration: backward, 54, 81 forward, 81 Interest rates, 7 International Mathematical and Statistical Libraries (IMSL), 86 Intriligator, Michael D., 1, 10, 19, 270 Inventory, viii, 135 Investment, viii, 5, 9, 30, 31, 135, 137, 142, 144, 164, 258 nonresidential, 7 residential, 7 Jacobians, 169 Jacobson, D. H., 45, 270 Johnston, H. N., 19,269 INDEX Joint distribution, 178 Jung, Woo Sik, 9, 272 Kalish III, Lionel, 59, 267 Kaiman filter, 100, 120, 139, 146 second-order, 102, 177-184 Kang, Bo Hyun, x, 84, 104 Kareken, J. H., 45, 270 Kaul, T. K., 7, 270 Kendrick, David A., x, 1, 12, 26, 27, 45, 59, 72, 86, 88, 132, 137, 146, 155,265,268,270,271 Kim, Han K., 12,45,271 Kirkland, Connie, x Klein, Lawrence R., 19, 271 Kmenta, Jan, 33, 34, 271 Ku, Richard, 19, 48, 57, 266, 271 Kuh, Edwin, 19, 266 Kydland, Finn, 271 Lagrangian variable, 25 Lags, second-order, 8 Lainoitis, D. G., 72, 268 Lane, Susan, x Laporte, Anne Marie, 59, 266 Larson, R. E., 48, 84, 90, 94, 97, 173, 185,266 Lasdon,L. S., 27, 28, 271, 272 Learning: active, 2, 41-44, 71 examples of: MacRae problem, 121-132 nonlinear, 84-103 quadratic linear, 104-120 passive, 2, 41-44, 57, 76 example of, 58-68 Leondes, C. T., 7, 268 Line-search methods, 27 280 Livesey, D. A., 1, 19,36,272 Local optima, 119, 132, 163 MacRae problem, 121-132, 163 MacRae, Elizabeth Chase, 1, 58, 59, 64, 72, 82, 121, 272 Majors, J., 271 Mantell, J. B., 27, 28, 272 Matrix products, expected value of, 170-172 Matrix recursions for augmented systems, 207-216 Maximum-principle method, 5 Mayne, D. Q., 45, 270 Measurement error, 40-42, 134, 135, 246, 254-263 Measurement relationships, 43, 73, 136, 177 Measurement vector, 73, 85 Measurement-equation noise terms, 76, 254, 255 Measurement-error covariance, 137, 246 Measurement-noise terms, 76, 254, 255 Measurements, multiple, 74 Medina, E, 19, 270 Meier, Laurence H., 19, 48, 72, 84, 94,97, 102, 173, 185,275 Meyer, Laurence H., 269 Miller, Ronald E., 19,272 Mills, Peggy, x Mitter, S. K., 27, 271 Moment-generating function, 189, 190 Monetary policy, viii, 72, 135 Money supply, 7 Monte Carlo, 72, 76, 81, 115, 120, 139, 140, 142, 144, 146, INDEX 281 163,254,255,257 Motamen, Homa, x Muench, T., 45, 270 Multiplicative uncertainty, 46-57 Multiplier-accelerator model, 9, 30 Murray, W., 27, 269 Murtagh, Bruce A., 27, 28, 272 National Bureau of Economic Research, 31 Noise terms, 40, 42 measurement-equation, 76, 254, 255 system-equation, 76, 254, 255 Nominal path, 24, 45, 79, 87-88, 113, 126 Nonconvex shape, 132 Nonlinear problems, 19-29 gradient methods, 25-27 problem statement, 20-21 quadratic linear approximation method, 21-25 special problems: accuracy and roundoff errors, 27 inequality constraints on state variables, 29 large model size, 28 Norman, Alfred, ix, 1, 9, 19, 28, 57, 72, 82, 88, 106, 265, 272 Norman, M. R., 1, 19, 88, 272 Norton, R E., 7, 268 Notational equivalence, 33, 100 Open-loop feedback (OLF), 57, 59, 64, 68, 140, 141, 144, 257 Open-loop policy, 43 Open-market purchases, 5 Optimality conditions, 26 Optimality, principle of, 51 Oudet, B. A., 7, 272 Ozkan, Turgay, 19, 266 Pagan, Adrien, 44, 272 Palash, Carl J., 19, 28, 36, 265, 273 Papademos, Lucas, 19, 266 Parameter uncertainty, 40 Paryani, K., 7, 273 Penalties, 165 Perry, A., 27, 273 Perturbations, 41, 43, 71, 80 Phelps, Edmund S., 45, 273 Phillips, A. W., 1, 273 Picken, S. M., 27, 269 Pindyck, Robert, 19 Pindyck, Robert S., 1, 6, 7, 9, 35, 37, 45, 266, 273 Pitchford, John, 1, 19, 273 Polack, E., 27, 273 Postponed-linear-approximation method, 44 Powell, M. J. D., 27, 269 Predetermined variables, 34 Prescott, E. C, 1, 57, 72, 273 Preston, A. J., 273 Price level, 7, 9 Prices, viii Principle of optimality, 51 Probing, 80 Probing term (component), 79, 80, 112, 117, 119, 128, 130, 131, 155-165,243-245,261 Problem Statement State variables, 5-10 Production, viii Profit, viii Projections, 64, 79, 80, 98, 101, 233 INDEX 282 Quadratic criterion function, 36-37 Quadratic forms, 188-203 scalar case, 189 vector case, 190-200 Quadratic linear problems solution method, 10-18 Quadratic linear problems (QLP), 4-18 approximation, 21-25 problem statement, 5-10 Quadratic linear tracking problems, 6-8 Raines, Frederic Q., 19, 269 Random cost-to-go, 73 Random error term, 44 Rao, D. S., 7, 270 Rao, H., 27, 45, 271 Rausser, Gordon, 41, 57, 72, 273, 274 Recursions, 53, 94-95, 109 for augmented system: matrix, 207-216 vector, 217-221 Reduced gradient, generalized, 29 Reduced-form equation, 33 Reestimation method, 241 Reeves, CM., 27, 269 Ribiěre, G., 27, 273 Riccati equations deterministic, 13, 17 stochastic active-learning, 93, 109, 173 passive-learning, 54, 55 terminal conditions, 50, 54 Riccati matrices, 12, 53, 80, 117, 118, 127, 173-176,216 Rismanchian, Mohamed, x Rizo-Patron, Jorge, x, 188 Roberts, Steven M., 45, 273 Roundoff errors, 28 Rouzier, P., 19, 274 Růstem, Bere, 270 Sales, viii Sandblom, C. L., 7, 19, 274 Sandee, J., 7, 268 Sargent, T. J., 45, 274 Sarris, Alexander H., 41, 72, 274 Saunders, Michael A., 27, 28, 272 Search, 77, 81, 86, 113, 115, 124, 130-132 grid, 81, 163 Search-iteration counter, 77, 81 Serial correlation, 44 Shanno, D. F., 27, 274 Shelton, R. C, 48, 268 Shupp, Franklin R., 1, 7, 19, 36, 59, 274 Simon, H. A., 1, 44, 274 Sivan, R., 48, 57, 266 Stabilization cocoa-market, 45 commodity, viii, 11, 12 State equations, 20, 26 State variables inequality constraints on, 29 State vector, 11 augmented, 8, 233-238 Structural form, 32 Summers, S., 19, 270 System equations, 30, 229-232 second-order expansion of, 167-169 System-equation noise terms, 76, 254, 255 INDEX 283 Tarn, Tzyh-Jong, 19, 269 Taxation, 5 Taylor, JohnB., 45, 72, 273, 274 Taylor, Lance, 1,26,88,271 TBM (Tse, Bar-Shalom, and Meier), 48, 72, 82, 84, 94, 97, 102, 173, 182, 185, 187 Terminal conditions, 26 Thalberg, Björn, 7, 274, 275 Theil, Henri, 1, 7, 44, 189, 190, 266, 275 Time Series Processor (TSP), 139 Time-varying parameters, 140, 164 Tinsley, Peter, 7, 19, 267, 275 Tracking problems, quadratic linear, 6-8 TROLL system at M.I.T., 34 Tse, Edison, 42, 43, 48, 57, 72, 79, 84-86, 90, 94, 97, 102, 104, 111, 121, 132, 155, 173, 185,266,275 Turnovsky, Stephen J., 1,19, 48, 59, 270, 273, 275 Tustin, A., 1, 7, 275 Uebe, Götz, 7, 269 Uncertainty, 134 additive, 39-45 multiplicative, 46-57 parameter, 40 Unemployment, viii, 5, 7, 9 Upadhyay, Treveni, 72, 268, 275 Update, 81, 82, 102, 113, 115, 237-238 of augmented state covariance, 225-228 of covariance matrix, 103 of state and parameter estimates, 120 Update certainty equivalence, 57 Vector products, expected value of, 170-172 Vector recursions for augmented system, 217-221 Wall, Kent, 19, 45, 246, 266, 273, 275 Wallace, N., 45, 270, 274 Walsh, Peter, 59, 276 Wan, San, 19, 267 Warren, A. D., 27, 271 Weighting matrices, 37, 136 Wells, C, 27, 45, 271 Westcott, J. H., 45, 275 Wide-sense method, 86 Wishner, R. P., 99, 184, 188, 266 Woodside, M., 19, 276 Wright, H. M., 27, 269 You, Jong Keun, 7, 276 Zarrop, Martin B., 270 Zellner, Arnold, 1, 57, 59, 276