Econometrics - Lecture 1 Econometrics – First Steps Contents nOrganizational Issues nSome History of Econometrics nAn Introduction to Linear Regression qOLS: An Algebraic Tool qThe Linear Regression Model qGauss-Markov Assumptions qSmall Sample Properties of the OLS Estimator nIntroduction to GRETL n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Organizational Issues nCourse schedule n n n n n n n n n Time: 10:00-13:30 with a break of 30 minutes n Oct 4, 2019 Class Date 1 Fr, Oct 4 2 Fr, Oct 11 3 Fr, Oct 25 4 Fr, Nov 1 5 Fr, Nov 8 6 Fr, Nov 15 Hackl, Econometrics, Lecture 1 Organizational Issues, cont’d nAims of the course nUse of econometric tools for analyzing economic data: specification of adequate models, identification of appropriate econometric methods, estimation of model parameters, interpretation of results nIntroduction to commonly used econometric tools and techniques nUnderstanding of econometric concepts and principles nUse of GRETL Oct 4, 2019 Hackl, Econometrics, Lecture 1 Example: Individual Wages nSample (US National Longitudinal Survey, 1987) nN = 3294 individuals (1569 females) nVariable list qWAGE: wage (in 1980 $) per hour (p.h.) qMALE: gender (1 if male, 0 otherwise) qSCHOOL: years of schooling qEXPER: experience in years qAGE: age in years nQuestions of interest qEffect of gender on wage p.h.: Average wage p.h.: 6,31$ for males, 5,15$ for females qEffects of education, of experience, of interactions, etc. on wage p.h. n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Example: Income and Consumption Hackl, Econometrics, Lecture 1 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 136 observations Basis: 1995 Source: AWM-Database Oct 4, 2019 Organizational Issues, cont’d nLiterature nCourse textbook nMarno Verbeek, A Guide to Modern Econometrics, 5rd ed., Wiley, 2017; available in the MUNI Library. nSuggestions for further reading nPeter Kennedy, A guide to econometrics. 6th ed., Blackwell, 2008; available in the MUNI Library. nWilliam H. Greene, Econometric Analysis. 8th Ed., Prentice Hall, 2017 Oct 4, 2019 Hackl, Econometrics, Lecture 1 Organizational Issues, cont’d nPrerequisites are topics from nLinear algebra: linear equations, matrices, vectors (basic operations and properties); see M. Verbeek, Appendix A “Vectors and Matrices”. nDescriptive statistics: measures of central tendency, measures of dispersion, measures of association, frequency tables, histogram, scatter plot, quantile nTheory of probability: probability and its properties, random variables and distribution functions in one and in several dimensions, moments, convergence of random variables, limit theorems, law of large numbers; see M. Verbeek, Appendix B “Statistical and Distribution Theory”. nMathematical statistics: point estimation, confidence interval, hypothesis testing, p-value, significance level Oct 4, 2019 Hackl, Econometrics, Lecture 1 Organizational Issues, cont’d nTeaching and learning method nCourse in six blocks of 3 hours each nClass discussions, written homework (computer exercises, GRETL) submitted by groups of (3-5) students, presentations of homework by participants nFinal exam nAssessment of student work nFor grading, the written homework, presentation of homework in class, and a final written exam will be of relevance nWeights: homework 40 %, final written exam 60 % nPresentation of homework in class: students must be prepared to be called at random Oct 4, 2019 Hackl, Econometrics, Lecture 1 Contents nOrganizational Issues nSome History of Econometrics nAn Introduction to Linear Regression qOLS: An Algebraic Tool qThe Linear Regression Model qGauss-Markov Assumptions qSmall Sample Properties of the OLS Estimator nIntroduction to GRETL n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Empirical Economics Prior to 1930ies nThe situation in the early 1930ies nTheoretical economics aims at “operationally meaningful theorems“; “operational” means purely logical mathematical deduction nEconomic theories or laws are seen as deterministic relations; no inference from data as part of economic analysis nData: limited availability; time-series on agricultural commodities, foreign trade nIgnorance of the stochastic nature of economic concepts nUse of statistical methods for qmeasuring theoretical coefficients, e.g., demand elasticities qrepresenting business cycles n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Early Institutions nApplied demand analysis: US Bureau of Agricultural Economics nStatistical analysis of business cycles: H.L.Moore (Columbia University): Fourier periodogram; W.M.Persons et al. (Harvard): business cycle forecasting; US National Bureau of Economic Research (NBER) nCowles Commission for Research in Economics qFounded 1932 by Alfred Cowles: determinants of stock market prices? qFormalization of econometrics, development of econometric methodology qR.Frisch, G.Tintner; European refugees qJ.Marschak (head 1943-55) recruited people like T.C.Koopmans, T.M.Haavelmo, T.W.Anderson, L.R.Klein qInterests shifted to theoretical and mathematical economics after 1950 n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Early Actors nR.Frisch (Oslo Institute of Economic Research): econometric project, 1930-35; T.Haavelmo, O.Reiersol nJ.Tinbergen (Dutch Central Bureau of Statistics, Netherlands Economic Institute; League of Nations, Genova): macro-econometric model of Dutch economy, ~1935; T.C.Koopmans, H.Theil nAustrian Institute for Trade Cycle Research (Österreichisches Institut für Konjunkturforschung, 1927, F.v.Hayek, L.v.Mises): O.Morgenstern (head), A.Wald, G.Tintner nEconometric Society, founded 1930 by R.Frisch et al. qFacilitates exchange of scholars from Europe and US qDealing with econometrics and mathematical statistics n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Trygve Magnus Haavelmo, Olav Reiersöl First Steps nR.Frisch, J.Tinbergen: qMacro-economic modelling based on time-series, ~ 1935 qAiming at measuring parameters, e.g., demand elasticities qAware of problems due to quality of data qNobel Memorial Prize in Economic Sciences jointly in 1969 (“for having developed and applied dynamic models for the analysis of economic processes”) nT.Haavelmo q“The Probability Approach in Econometrics”: PhD thesis (1946) qEconometrics as a tool for testing economic theories qNobel Memorial Prize in Economic Sciences in 1989 ("for his clarification of the probability theory foundations of econometrics and his analyses of simultaneous economic structures”) Oct 4, 2019 Hackl, Econometrics, Lecture 1 Trygve Haavelmo, 1911-1999 First Steps, cont’d nCowles Commission (Cowles Foundation since 1955) qFormalization of econometrics, development of the econometric methodology qMethodology for macro-economic modelling based on Haavelmo’s approach qCowles Commission monographs by G.Tintner, T.C.Koopmans, et al. Oct 4, 2019 Hackl, Econometrics, Lecture 1 Tjalling Koopmans The Haavelmo Revolution nIntroduction of probabilistic concepts in economics qObvious deficiencies of traditional approach: Residuals, measurement errors, omitted variables; stochastic time-series data qAdvances in probability theory in early 1930ies qFisher‘s likelihood function approach nHaavelmo‘s ideas qCritical view of Tinbergen‘s macro-econometric models qThorough adoption of probability theory in econometrics qConversion of deterministic economic models into stochastic structural equation models nHaavelmo‘s “The Probability Approach in Econometrics” qWhy is the probability approach indispensable? qModelling procedure based on ML estimation and hypothesis testing qEconomic models may guide policies, may answer policy questions Oct 4, 2019 Hackl, Econometrics, Lecture 1 Cowles Commission Methodology nAssumptions based to macro-econometric modelling and testing of economic theories nTime series model n Yt = aXt + bWt+ u1t n Xt = gYt + dZt+ u2t 1.Specification of the model equation(s) includes the choice of variables; functional form is (approximately) linear 2.Time-invariant model equation(s): the model parameters a, …, d are independent of time t 3.Parameters a, …, d are structurally invariant, i.e., invariant wrt changes in the variables 4.Causal ordering (exogeneity, endogeneity) of variables is known 5.Statistical tests can falsify but not verify a model 6. n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Classical Econometrics and More n“Golden age” of econometrics until ~1970 qMulti-equation models for analyses and forecasting qGrowing computing power qDevelopment of econometric tools nSkepticism qPoor forecasting performance qDubious results due to nwrong specifications nimperfect estimation methods nTime-series econometrics: non-stationarity of economic time-series qConsequences of non-stationarity: misleading t-, DW-statistics, R² qNon-stationarity: needs new models (ARIMA, VAR, VEC); Box & Jenkins (1970: ARIMA-models), Granger & Newbold (1974, spurious regression), Dickey-Fuller (1979, unit-root tests) Oct 4, 2019 Model year eq‘s Tinbergen 1936 24 Klein 1950 6 Klein & Goldberger 1955 20 Brookings 1965 160 Brookings Mark II 1972 ~200 Hackl, Econometrics, Lecture 1 Econometrics … n… consists of the application of statistical data and techniques to mathematical formulations of economic theory. It serves to test the hypotheses of economic theory and to estimate the implied interrelationships. (Tinbergen, 1952) n… is the interaction of economic theory, observed data and statistical methods. It is the interaction of these three that makes econometrics interesting, challenging, and, perhaps, difficult. (Verbeek, 2017) n… is a methodological science with the elements qeconomic theory qmathematical language qstatistical methods qcomputer science qaiming to give empirical content to economic relations. (Pesaran, 1987) Oct 4, 2019 Hackl, Econometrics, Lecture 1 Our Course n1. Introduction to linear regression (Verbeek, Ch. 2): the linear regression model, OLS method, properties of OLS estimators n2. Introduction to linear regression (Verbeek, Ch. 2): goodness of fit, hypotheses testing, multicollinearity n3. Interpreting and comparing regression models (Verbeek, Ch. 3): interpretation of the fitted model, selection of regressors, testing the functional form n4. Heteroskedascity and autocorrelation (Verbeek, Ch. 4): causes and consequences, testing, alternatives for inference n5. Endogeneity, instrumental variables and GMM (Verbeek, Ch. 5): the IV estimator, the generalized instrumental variables estimator, the generalized method of moments (GMM) n6. The practice of econometric modelling Oct 4, 2019 Hackl, Econometrics, Lecture 1 Econometrics 2: An Advanced Course nUnivariate and multivariate time series models: ARMA-, ARCH-, GARCH-models, VAR-, VEC-models nModels for panel data nModels with limited dependent variables: binary choice, count data n Oct 4, 2019 Hackl, Econometrics, Lecture 1 (G)ARCH: (generalized) autoregressive conditional heteroskedasticity Contents nOrganizational Issues nSome History of Econometrics nAn Introduction to Linear Regression qOLS: An Algebraic Tool qThe Linear Regression Model qGauss-Markov Assumptions qSmall Sample Properties of the OLS Estimator nIntroduction to GRETL n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Example: Individual Wages nSample (US National Longitudinal Survey, 1987) nN = 3294 individuals (1569 females) nVariable list qWAGE: wage (in 1980 $) per hour (p.h.) qMALE: gender (1 if male, 0 otherwise) qSCHOOL: years of schooling qEXPER: experience in years qAGE: age in years nPossible questions qEffect of gender on wage p.h.: Average wage p.h.: 6,31$ for males, 5,15$ for females qEffects of education, of experience, of interactions, etc. on wage p.h. n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Individual Wages, cont’d qWage per hour vs. Years of schooling n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Linear Regression Oct 4, 2019 Y: explained variable X: explanatory or regressor variable The linear regression model describes the data-generating process of Y under the condition X simple linear regression model b2: coefficient of X b1 : intercept multiple linear regression model Hackl, Econometrics, Lecture 1 Fitting a Model to Data nChoice of values b1, b2 for model parameters b1, b2 of Y = b1 + b2 X, ngiven the observations (yi, xi), i = 1,…,N n nPrinciple of (Ordinary) Least Squares or OLS: n bi = arg minb1, b2 S(b1, b2), i =1,2 n nObjective function: sum of the squared deviations n S(b1, b2) = Si [yi - (b1 + b2xi)]2 = Si ei2 n nDeviation between observation and fitted value: ei = yi - (b1 + b2xi) n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Observations and Fitted Regression Line n nSimple linear regression: Fitted line and observation points (Verbeek, Figure 2.1) Oct 4, 2019 Hackl, Econometrics, Lecture 1 OLS Estimators nOLS estimators b1 und b2 result in Oct 4, 2019 with mean values and and second moments Equating the partial derivatives of S(b1, b2) to zero: normal equations Hackl, Econometrics, Lecture 1 Individual Wages, cont’d nSample (US National Longitudinal Survey, 1987): wage per hour, gender, experience, years of schooling; N = 3294 individuals (1569 females) nAverage wage p.h.: 6,31$ for males, 5,15$ for females nModel: n wagei = β1 + β2 malei + εi nmaleI: male dummy, has value 1 if individual is male, otherwise value 0 nOLS estimation gives n wagei = 5,15 + 1,17*malei nCompare with averages! n n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Individual Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n n n n wagei = 5,15 + 1,17*malei n estimated wage p.h for males: 6,313 n for females: 5,150 n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 OLS Estimators: General Case nModel for Y contains K-1 explanatory variables n Y = b1 + b2X2 + … + bKXK = x’b nwith x = (1, X2, …, XK)’ and b = (b1, b2, …, bK)’ nObservations: (yi, xi’) = (yi, (1, xi2, …, xiK)), i = 1, …, N nOLS estimates b = (b1, b2, …, bK)’ are obtained by minimizing the objective function wrt the bk’s n n nthis results in n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 OLS Estimators: General Case, cont’d nor n n nthe normal equations, a system of K linear equations for the components of b nGiven that the symmetric KxK-matrix has full rank K and is hence invertible, the OLS estimators are n n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Best Linear Approximation nGiven the observations: (yi, xi’) = (yi, (1, xi2, …, xiK)), i = 1, …, N nFor yi, the linear combination or the fitted value n n nis the best linear combination for Y from X2, …, XK and a constant (the intercept) n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Some Matrix Notation nN observations n (y1,x1), … , (yN,xN) n nModel: yi = b1 + b2xi + εi, i = 1, …,N, or n y = Xb + ε nwith n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 OLS Estimators in Matrix Notation nMinimizing n S(b) = (y - Xb)’ (y - Xb) = y’y – 2y’Xb + b’ X’Xb n with respect to b gives the normal equations n n n resulting from differentiating S(b) with respect to b and setting the first derivative to zero nThe vector b of OLS solution or OLS estimators for b is n b = (X’X)-1X’y nThe best linear combinations or predicted values for Y given X or projections of y into the space of X are obtained as n ŷ = Xb = X(X’X)-1X’y = Pxy n the NxN-matrix Px is called the projection matrix or hat matrix Oct 4, 2019 Hackl, Econometrics, Lecture 1 Residuals in Matrix Notation nThe vector y can be written as y = Xb + e = ŷ + e with residuals n e = y – Xb or ei = yi – xi‘b, i = 1, …, N nFrom the normal equations follows n -2(X‘y – X‘Xb) = -2 X‘e = 0 n i.e., each column of X is orthogonal to e nWith n e = y – Xb = y – Pxy = (I – Px)y = Mxy n the residual generating matrix Mx is defined as n Mx = I – X(X’X)-1X’ = I – Px n Mx projects y into the orthogonal complement of the space of X nProperties of Px and Mx: symmetry (P’x = Px, M’x = Mx) idempotence (PxPx = Px, MxMx = Mx), and orthogonality (PxMx = 0) Oct 4, 2019 Hackl, Econometrics, Lecture 1 Properties of Residuals nResiduals: ei = yi – xi‘b, i = 1, …, N nMinimum value of objective function S(b) = (y - Xb)’ (y - Xb) n S(b) = e’e = Si ei2 nFrom the orthogonality of e = (e1, …, eN)‘ to each xi = (x1i, …, xNi)‘, i = 1,…, K, i.e., e‘xi = 0, follows that n Si ei = 0 n i.e., average residual is zero, if the model has an intercept Oct 4, 2019 Hackl, Econometrics, Lecture 1 Contents nOrganizational Issues nSome History of Econometrics nAn Introduction to Linear Regression qOLS: An Algebraic Tool qThe Linear Regression Model qGauss-Markov Assumptions qSmall Sample Properties of the OLS Estimator nIntroduction to GRETL n Oct 4, 2019 Hackl, Econometrics, Lecture 1 US Wages nUS wages are gender-specific nThe relation n wagei = β1 + β2 malei + εi n with maleI: male dummy (equals 1 for males, otherwise 0) ndescribes the wage of individual i as a function of its gender nis assumed to be true for all US citizens nGiven sample data (wagei, malei, i = 1,…N), OLS estimation of β1 and β2 may result in n wagei = 5,15 + 1,17*malei nThis is not (only) a description of the sample! nBut reflects a general relationship n n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Income and Consumption Hackl, Econometrics, Lecture 1 PCR: private consumption, real PYR: household's disposable income, real, AWM-Database, 1970:1-2003:4, Basis: 1995 Consumption function PCRt = β1 + β2 PYRt + εt § describes consumption in the Euro-zone Oct 4, 2019 Economic Models nDescribe economic relationships (not just a set of observations), have an economic interpretation nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nVariables Y, X2, …, XK: observable nError term εi (disturbance term) contains all influences that are not included explicitly in the model; not observable; assumption E{εi | xi} = 0 gives q E{yi | xi} = xi‘β qthe model describes the expected value of y given x nSample (yi, xi2, …, xiK, i = 1, …, N) from a well-defined population nUnknown coefficients b1, …, bK: population parameters n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Sampling in the Economic Context nThe regression model yi = xi’b + εi, i = 1, …, N; or y = Xb + ε n describes one realization out of all possible samples of size N from the population nA) Sampling process with fixed, i.e., non-stochastic xi’s nNew sample: new error terms εi, i = 1, …, N, and, hence, new yi’s nJoint distribution of εi‘s determines properties of b etc. nA laboratory setting, does not apply to the economic context nExample: yt = t b + εt, t = 1, …, T; t represents time nB) Sampling process with samples of (xi, yi) or (xi, ei) nNew sample: new error terms εi and new xi, i = 1, …, N nRandom sampling of (xi, ei), i = 1, …, N: joint distribution of (xi, ei)‘s determines properties of b etc. Oct 4, 2019 Hackl, Econometrics, Lecture 1 Sampling in the Economic Context, cont’d nThe sampling with fixed, non-stochastic xi’s is not realistic for economic data nSampling process with samples of (xi, yi) is appropriate for modeling cross-sectional data qExample: household surveys, e.g., US National Longitudinal Survey, EU-SILC nSampling process with samples of (xi, yi) from time-series data: sample is seen as one out of all possible realizations of the underlying data-generating process qExample: time series PYR and PCR of the AWM-Database q Oct 4, 2019 Hackl, Econometrics, Lecture 1 Assumptions of the Linear Regression Model nThe linear regression model yi = xi’b + εi makes use of assumptions nAssumption for εi‘s: E{εi | xi } = 0; exogeneity of variables X qE{εi | xi } = 0 implies that εi and xi are uncorrelated qX contains no information on the error term ε nThis implies n E{yi | xi } = xi'b n i.e., the regression line describes the conditional expectation of yi given xi nCoefficient bk measures the change of the expected value of Y if Xk changes by one unit and all other Xj values, j ǂ k, remain the same (ceteris paribus condition) Oct 4, 2019 Hackl, Econometrics, Lecture 1 Regression Coefficients nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nCoefficient bk measures the change of the expected value of Y if Xk changes by one unit and all other Xj values, j ǂ k, remain the same (ceteris paribus condition); marginal effect of changing Xk on Y n n nExample nWage equation: wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n β3 measures the impact of one additional year at school upon a person’s wage, keeping gender and years of experience fixed Oct 4, 2019 Hackl, Econometrics, Lecture 1 Estimation of β nGiven a sample (xi, yi), i = 1, …, N, the OLS estimators for b n b = (X’X)-1X’y n can be used as an approximation for b nThe vector b is a vector of numbers, the estimates nThe vector b is the realization of a vector of random variables nThe sampling concept and assumptions on εi‘s determine the quality, i.e., the statistical properties, of b Oct 4, 2019 Hackl, Econometrics, Lecture 1 Contents nOrganizational Issues nSome History of Econometrics nAn Introduction to Linear Regression qOLS: An Algebraic Tool qThe Linear Regression Model qGauss-Markov Assumptions qSmall Sample Properties of the OLS Estimator nIntroduction to GRETL n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Fitting Economic Models to Data nObservations allow nto estimate parameters nto assess how well the data-generating process is represented by the model, i.e., how well the model coincides with reality nto improve the model if necessary nFitting a linear regression model to data provides nparameter estimates b = (b1, …, bK)’ for coefficients b = (b1, …, bK)’ nstandard errors se(bk) of the estimates bk, k=1,…,K nt-statistics, F-statistic, R2, Durbin Watson test-statistic, etc. Oct 4, 2019 Hackl, Econometrics, Lecture 1 Individual Wages, cont’d nWage equation with three regressors (Table 2.2, Verbeek) n n n n n n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 OLS Estimator and OLS Estimates b nOLS estimates b are a realization of the OLS estimator nThe OLS estimator is a random variable nObservations are a random sample from the population nObservations are generated by some random sampling process nDistribution of the OLS estimator nActual distribution not known nDistribution determined by assumptions on qmodel specification qthe error term εi and regressor variables xi nQuality criteria (bias, accuracy, efficiency) of OLS estimates are determined by the properties of this distribution Oct 4, 2019 Hackl, Econometrics, Lecture 1 Gauss-Markov Assumptions A1 E{εi} = 0 for all i A2 all εi are independent of all xi (exogenous xi) A3 V{ei} = s2 for all i (homoskedasticity) A4 Cov{εi, εj} = 0 for all i and j with i ≠ j (no autocorrelation) Oct 4, 2019 Hackl, Econometrics, Lecture 1 Observation yi is a linear function yi = xi'b + εi of observations xik of the regressor variables Xk, k = 1, …, K, and the error term εi for i = 1, …, N; xi' = (xi1, …, xiK); X = (xik) n n n n n In matrix notation: E{ε} = 0, V{ε} = s2 IN Systematic Part of the Model nThe systematic part E{yi | xi } of the model yi = xi'b + εi, given observations xi, is derived under the Gauss-Markov assumptions as follows: n(A2) implies E{ε | X} = E{ε} = 0 and V{ε | X} = V{ε} = s2 IN nObservations xi, i = 1, …, N, do not affect the properties of ε nThe systematic part n E{yi | xi } = xi'b n can be interpreted as the conditional expectation of yi, given observations xi n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Contents nOrganizational Issues nSome History of Econometrics nAn Introduction to Linear Regression qOLS: An Algebraic Tool qThe Linear Regression Model qGauss-Markov Assumptions qSmall Sample Properties of the OLS Estimator nIntroduction to GRETL n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Is the OLS Estimator a Good Estimator? n nUnder the Gauss-Markov assumptions, the OLS estimator has favourable properties; see below nGauss-Markov assumptions are very strong but not always satisfied nRelaxations of the Gauss-Markov assumptions and consequences of such relaxations are important topics in econometrics Oct 4, 2019 Hackl, Econometrics, Lecture 1 Properties of OLS Estimators n1. The OLS estimator b is unbiased: E{b | X} = E{b} = β n Needs assumptions (A1) and (A2) n n2. The variance of the OLS estimator b is given by n V{b | X} = V{b} = σ2(Σi xi xi’)-1 = σ2(X‘ X)-1 n Needs assumptions (A1), (A2), (A3) and (A4) n n3. Gauss-Markov Theorem: The OLS estimator b is a BLUE1) (best linear unbiased estimator) for β n Needs assumptions (A1), (A2), (A3), and (A4) and requires linearity in parameters n_________________ n1) OLS estimator is most accurate among linear unbiased estimators; see next slide n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 The Gauss-Markov Theorem nOLS estimator b is BLUE (best linear unbiased estimator) for β nLinear estimator: b* = Ay with any full-rank KxN matrix A nb* is an unbiased estimator: E{b*} = E{Ay} = β nb is BLUE: V{b*} – V{b} is positive semi-definite, i.e., the variance of any linear combination d’b* is not smaller than that of d’b n V{d’b*} ≥ V{d’b} n e.g., V{bk*} ≥ V{bk} for any k nThe OLS estimator is most accurate among the linear unbiased estimators n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Standard Errors of OLS Estimators nVariance (covariance matrix) of the OLS estimators: n V{b} = σ2(X’ X)-1 = σ2(Σi xi xi’)-1 nStandard error of OLS estimate bk: The square root of the kth diagonal element of V{b} nV{b} is proportional to the variance σ2 of the error terms nEstimator for σ2: sampling variance s2 of the residuals ei n s2 = (N – K)-1 Σi ei2 n Under assumptions (A1)-(A4), s2 is unbiased for σ2 n Attention: the estimator (N – 1)-1 Σi ei2 is biased nEstimated variance (covariance matrix) of b: n Ṽ{b} = s2(X’ X)-1 = s2(Σi xi xi’)-1 n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Estimated Standard Errors of OLS Estimators nVariance (covariance matrix) of the OLS estimators: n V{b} = σ2(X’ X)-1 = σ2(Σi xi xi’)-1 nStandard error of OLS estimate bk: The square root of the kth diagonal element of V{b} n σ√ckk n with ckk the k-th diagonal element of (X’ X)-1 nEstimated variance (covariance matrix) of b: n Ṽ{b} = s2(X’ X)-1 = s2(Σi xi xi’)-1 nEstimated standard error of bk: n se(bk) = s√ckk n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Two Examples n1. Simple regression yi = a + b xi + et n The variance of the OLS estimator b of b is n n n b is the more accurate, the larger N and sx² and the smaller s² n2. Regression with two regressors: n yi = b1 + b2 xi2 + b3 xi3 + et n The variance for the OLS estimator of b2 is n n n r232: correlation coefficient between X2 and X3 n b2 is most accurate if X2 and X3 are uncorrelated Oct 4, 2019 Hackl, Econometrics, Lecture 1 Normality of Error Terms nFor the purpose of statistical inference, a distributional assumption for the εi‘s is needed n n nTogether with assumptions (A1), (A3), and (A4), (A5) implies n εi ~ NID(0,σ2) for all i ni.e., all εi are nindependent drawings nfrom the normal distribution nwith mean 0 nand variance σ2 nError terms are “normally and independently distributed” (NID) n n Oct 4, 2019 A5 εi normally distributed for all i Hackl, Econometrics, Lecture 1 Properties of OLS Estimators n1. The OLS estimator b is unbiased: E{b} = β n2. The variance of the OLS estimator is given by n V{b} = σ2(X’X)-1 n3. The OLS estimator b is a BLUE (best linear unbiased estimator) for β n n4. The OLS estimator b is normally distributed with mean β and covariance matrix V{b} = σ2(X‘X)-1 n b ~ N(β, σ2(X’X)-1), bk ~ N(βk, σ2ckk) n with ckk: the k-th diagonal element of (X’X)-1 n Needs assumptions (A1) - (A5) n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Individual Wages: Relevance of Assumptions n wagei = β1 + β2*malei + εi nWhat do the assumptions mean? Are they acceptable? n(A1): β1 + β2*malei contains the entire systematic part of the model; no other regressors besides gender are relevant? n(A2): xi uncorrelated with εi for all i: knowledge of a person’s gender provides no information about further variables which affect the person’s wage; is this realistic? n(A3) V{εi} = σ2 for all i: variance of error terms (and of wages) is the same for males and females; is this realistic? n(A4) Cov{εi,,εj} = 0, i ≠ j: implied by random sampling n(A5) Normality of εi: is this realistic? (Would allow, e.g., for negative wages) n n n n n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Individual Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n n n n b1 = 5,1479, se(b1) = 0,0812: mean wage p.h. for females: 5,15$, with std.error of 0,08$ n b2 = 1,166, se(b2) = 0,112 n 95% confidence interval for β1: 4,988 £ β1 £ 5,306 n n n Oct 4, 2019 Hackl, Econometrics, Lecture 1 Your Homework 1.Verbeek’s data set “bwages” contains for a sample of 1472 individuals the gross hourly wage (wage) in Euro and other variables. Calculate, using GRETL, for the variable wage the mean (a) of the whole sample, (b) of males and females, and (c) the standard deviation of wage for males and for females. 2.For Verbeek’s data set “bwages”, draw, using GRETL, for the whole population (a) scatter plots of wage over educ and exper; and (b) a factorized box plot of wage over educ. For individuals with educ = 5, compare (c) the mean values of the males and females. Discuss the results. Oct 4, 2019 Hackl, Econometrics, Lecture 1 Your Homework, cont’d 3.For the simple regression yi = a + b xi + ei, i =1,...,N, show that the variance of the OLS estimate for b is σ2/(Nsx2), where σ2 is the error term variance, sx2 the variance of the xi‘s. 4.For the sample (yi, xi), i = 1,...,N, and the linear regression (yi = b1 + b2xi + ei): (a) write out the matrices X’X and X’y; (b) write out the determinant det[(X’X)-1], the matrix (X’X)-1, and the OLS estimator b = (X’X)-1X’y. Oct 4, 2019 Hackl, Econometrics, Lecture 1