LECTURE 2 1/33 Introduction to Econometrics INTRODUCTION TO LINEAR REGRESSION ANALYSIS I Hieu Nguyen Fall Semester, 2024 PREVIOUS LECTURE... Introduction, organization, review of statistical background  random variables  mean, variance, standard deviation  covariance, correlation, independence  normal distribution  standardized random variables 2/33 WARM-UP EXERCISE ► What is the correlation between X andY? ► Correlation: Corr(X, Y) = Cov(X,Y) σXσY ► Covariance: Cov(X,Y) = E [(X − E[X])(Y − E[Y])] = E [XY] − E[X]E[Y] 3/33 LECTURE 2. 4/33 e Introduction to simple linear regressionanalysis Sampling and estimation OLS principle e Readings: Studenmund, A. H., Using Econometrics: A Practical Guide, Chapters 1, 2.1, 16.1, 16.2 Wooldridge, J. M., Introductory Econometrics: A Modern Approach, Chapters 2.1, 2.2 SAMPLING 5/33 e Population: the entire group of items that interestsus e Sample: the part of the population that we actually observe e Statistical inference: use of the sample to draw conclusion about the characteristics of the population from which the sample came e Examples: medical experiments, opinionpolls RANDOM SAMPLING VS SELECTION BIAS 6/33 e Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population e Biased sample: any sample that differs systematically from the population that it is intended to represent e Selection bias: occurs when the selection of the sample systematically excludes or under represents certaingroups Example: opinion poll about tuition payments among undergraduate students vs all citizens e Self-selection bias: occurs when we examine data for a group of people who have chosen to be in that group Example: accident records of people who buy collision insurance EXERCISE 1 7/33 e American Express and the French tourist office sponsored a survey that found that most visitors to France do not consider the French to be especially unfriendly. e The sample consisted of 1,000 Americans who have visited France more than once for pleasure over the past two years. e Is this surveyunbiased? ESTIMATION e Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated Example: population mean E[X] e Estimator: a sample statistic that is used to estimate the value of the parameter Example: sample mean Note that the estimator is a random variable (it has a probability distribution, mean, variance,...) e Estimate: the specific value of the estimator that is obtained on a specific sample PROPERTIES OF AN ESTIMATOR 9/33 e An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating e An estimator is consistent if it converges to the value of the true parameter as the sample size increases e An estimator is efficient if the variance of its sampling distribution is the smallest possible EXERCISE 2 10/33 e A young econometrician wants to estimate the relationship between foreign direct investments (FDI) in her country and firm profitability. e Her reasoning is that better managerial skills introduced by foreign owners increases firms’ profitability. e She collects a random sample of 8,750 firms and finds that one sixth of the firms were entered within last few yearsby foreign investors. The rest of the firms are owned domestically. e When she compares indicators of profitability, such as ROA and ROE, between the domestic and foreign-owned firms, she finds significantly better outcomes for foreign-owned firms. e She concludes that FDI increases firms’ profitability. Is this conclusion correct? ECONOMETRIC MODELS 11/33 e Econometric model is an estimable formulation of a theoretical relationship e Theory says: Q = f(P,Ps, Y) Q . . . quantitydemanded P . . . commodity’sprice Ps . . . price of substitutegood Y . . . disposableincome e We simplify: Q = β0 + β1P + β2Ps + β3Y e We estimate: Q = 31.50 − 0.73P + 0.11Ps + 0.23Y ECONOMETRIC MODELS 12/33 e Today’s econometrics deals with different, even very general models e During this course we will cover just linear regression models e We will see how these models are estimatedby Ordinary Least Squares (OLS) GeneralizedLeast Squares (GLS) Instrumental Variables (IV) e We will perform estimation on different types ofdata DATA USED IN ECONOMETRICS 13/33 cross-section sample of units (eg. firms, individuals) taken at a given point in time repeated cross-section severalindependent samples of units (eg. firms, individuals) taken at different points in time time-series observations of variable(s) in different points in time (eg. GDP) paneldata time series for each cross-sectionalunit in the data set (eg. GDP of various countries) DATA USED IN ECONOMETRICS -EXAMPLES 14/33 e Country’s macroeconomic indicators (GDP,inflation rate, net exports, etc.) month by month e Data about firms’ employees or financial indicators as of the end of the year e Records of bank clients who were given aloan e Annual social security or tax records of individual workers STEPS OF AN ECONOMETRIC ANALYSIS 15/33 1. Formulation of an economic model (rigorous or intuitive) 2. Formulation of an econometric model based onthe economic model 3. Collection of data 4. Estimation of the econometric model 5. Interpretation of results EXAMPLE - ECONOMIC MODEL 16/33 e Denote: p c . . . price of the good ... firm’s average cost per one unit of output q(p) .. . demand for firm’soutput Demand for good: q(p) = a − b ·p Firm profit: π= q(p) ·(p −c) e Derive: bq = a − ·c 2 2 e We call q dependent variable and c explanatoryvariable EXAMPLE - ECONOMETRIC MODEL e Write the relationship in a simple linearform q = β0 +β1c 0 1 a b 2 2 17/33 (have in mind that β = and β = − e There are other (unpredictable) things that influence firms’ sales ⇒ add disturbance term q = β0 + β1c +ε e Find the value of parameters β1 (slope) and β0(intercept) EXAMPLE - DATA 18/33 e Ideally: investigate all firms in theeconomy e Reality: investigate a sample offirms Weneed a random (unbiased) sample of firms e Collect data: Firm 1 2 3 4 5 6 q 15 32 52 14 37 27 c 294 247 153 350 173 218 EXAMPLE - DATA 10204050 Outpu t30 150 200 250 300 350 19/33 Average cost EXAMPLE - ESTIMATION 10204050 Outpu t30 150 200 250 300 350 20/33 Average cost EXAMPLE - ESTIMATION1050 Output 203040 150 200 300 350 21/33 rageAve 250 cost OLS method: Make the fit as good as possible ⇓ Make the misfit as low as possible Minimize the (vertical)distance between data points and regression line Minimize the sum of ssquared deviations TERMINOLOGY 22/33 yi = β0 + β1xi + εi . .. regression line yi . . . dependent/explained variable (i-thobservation) xi . . .independent/explanatory variable (i-th observation) εi . . .random error term/disturbance (of i-th observation) β0 .. . intercept parameter ( β^0.. . estimate of this parameter) β1 .. . slope parameter ( β^1... estimate of this parameter) ORDINARY LEAST SQUARES e OLS = fitting the regression line by minimizing the sum of vertical distance between the regression line and the observed points 104050 Output 2030 150 200 300 350 rageAve 250 cost 23/33 ORDINARY LEAST SQUARES - PRINCIPLE 24/33 e Take the squared differences between observed point yi and regression line β0 + β1xi: 𝜀𝑖 2 =(yi − β0 −β1xi)2 e Sum them over all n observations: e ^0 ^1Find β and β such that they minimize this sum ORDINARY LEAST SQUARES - DERIVATION 25/33 RESIDUAL 26/33 e Residual is the vertical differencebetween the estimated regression line and the observation points e OLS minimizes the sum of squares of allresiduals e It is the difference between the true value yi and the estimated value e Wedefine: e Residual ei (observed) is not the same as the disturbance εi (unobserved)!!! e Residual is an estimate of the disturbance: ^ RESIDUAL VS. DISTURBANCE 10204050 Outpu t30 150 200 250 300 350 Average cost Truerelationship Estimated relationship Disturbance Residual 27/33 GETTING BACK TO THE EXAMPLE We have the economicmodel bq = a − ·c 2 2 We estimate qi = β0 + β1ci +εi 0 a 2(having in mind that β = andβ1 b 2 28/33 = − ) Our data: Firm 1 2 3 4 5 6 q 15 32 52 14 37 27 c 294 247 153 350 173 218 GETTING BACK TO THE EXAMPLE e When we plug in the formula: 29/33 -0.177 GETTING BACK TO THE EXAMPLE e When we plug in the formula: 29/33 -0.177 -0.177c 0.353 MEANING OF REGRESSION COEFFICIENT 30/33 e Consider themodel q = β0 + β1c ^q = 71.74 −1.77cestimated as q . . . demandfor firm’s output c . . . firm’s averagecostper unit of output e Meaning of β1 is the impact of a one unit increase in c on the dependent variable q e When average costs increase by 1 unit, quantity demanded decreases by 0.177 units - 0.177c BEHIND THE ERROR TERM 31/33 e The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochasticcharacter of unpredictable human behavior e Remember that all of these factors are included in the error term and may alter its properties e The properties of the error term determine the properties of the estimates SUMMARY e We have learned that an econometric analysis consistsof 1. definition of the model 2. estimation 3. interpretation e We have explained the principle of OLS: minimizing the sum of squared differences between the observationsand the regression line e We have derived the formulas of theestimates: 32/33 WHAT’S NEXT 33/33 e In the next lectures, wewill  derive estimation formulas for multivariate models  specify properties of the OLS estimator  start using Gretlfor data description and estimation