Introduction to econometrics II. Non-technical introduction to econometrics Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 1 / 29 Content 1 Simple regression model Ordinary least squares method Basic statistical concepts 2 Multiple regression 3 Dummy variables Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 2 / 29 Introduction Topics: how to use gretl + non-technical introduction to regression. Reading for the next week: Koop (2008), chapters 1 and 2. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 3 / 29 Simple regression model Content 1 Simple regression model Ordinary least squares method Basic statistical concepts 2 Multiple regression 3 Dummy variables Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 4 / 29 Simple regression model Regression Relationships among variables (linear, non-linear, two or more variables). Dependent variables and explanatory variables. E(Y |X) as a function of x (Y dependent variable, X explanatory variables, x realizations of explanatory variables). Interesting examples: Gujarati, Porter (2009). Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 5 / 29 Simple regression model Example – costs of production Replicate example in Koop (2008) Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 6 / 29 Simple regression model Linear regression model Linear relationship between costs, Y , and output, X: Y = α + βX. Unknown parameters of the model: α . . . intercept, β . . . slope parameter (effect of variable X on Y ). Error term, – measurement error, ommited explanatory variables, unobserved variables ⇒ observations do not lie exactly on the line. Y = α + βX + . Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 7 / 29 Simple regression model Ordinary least squares method Content 1 Simple regression model Ordinary least squares method Basic statistical concepts 2 Multiple regression 3 Dummy variables Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 8 / 29 Simple regression model Ordinary least squares method How to estimate parameters? Parameters estimates: α, β. Best fitting line? Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 9 / 29 Simple regression model Ordinary least squares method Error terms and residuals Observations: Yi = α + βXi + i . Error term: i = Yi − α − βXi . Residual: i = Yi − α − βXi . Fitted regression line: Yi = α + βXi . Fitted values: Yi . Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 10 / 29 Simple regression model Ordinary least squares method Regression – best fitting line Sum of squared residuals (SSR). SSR = N i=1 i 2 = N i=1 Yi − α − βXi 2 = N i=1 Yi − Yi 2 . Ordinary least squares method – OLS. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 11 / 29 Simple regression model Ordinary least squares method Interpreting OLS estimates Intercept – sometimes economic interpretations. α = 2.19 . . . fixed costs of the industry. Slope parameter: dYi dXi = β. β = 4.79 . . . estimated marginal costs of the industry. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 12 / 29 Simple regression model Ordinary least squares method Interpreting OLS estimates – review Tabulka: Interpreting parameters regarding functional relationship. Model Dependent Explanatory Interpretation of β Level-Level Y X ∆Y = β∆X Level-Log Y ln X ∆Y = (β/100)%∆X Log-Level ln Y X %∆Y = (100β)∆X Log-Log ln Y ln X %∆Y = β%∆X Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 13 / 29 Simple regression model Ordinary least squares method Measuring the fit Total sum of squares: TSS = N i=1 Yi − Y 2 . Regression sum of squares: RSS = N i=1 Yi − Y 2 . Total variability Y : TSS = RSS + SSR. Coefficient of determination, R2 (0 ≤ R2 ≤ 1): R2 = RSS TSS = 1 − SSR TSS . Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 14 / 29 Simple regression model Basic statistical concepts Content 1 Simple regression model Ordinary least squares method Basic statistical concepts 2 Multiple regression 3 Dummy variables Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 15 / 29 Simple regression model Basic statistical concepts Confidence intervals Confidence interval for a parameter – a measure of uncertainty of the point estimate. Pr(IntD < β < IntH) = 0.95 Confidence level (e.g. 95 %). Usually 0.99 = 99 %, 0.95 = 95 %, 0.90 = 90 %. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 16 / 29 Simple regression model Basic statistical concepts Hypothesis testing „Does education increase an individual’s earning potential?“ „Will a certain advertising strategy increase sales?“ „Will a new governemnt training scheme lower unemployment?“ Mostly: „Does the explanatory variable have an effect on the dependent variable?“, or „Is β = 0 in the regression of Y on X?“. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 17 / 29 Simple regression model Basic statistical concepts Hypothesis testing involving a parameter Null and alternative hypothesis: H0 : β = 0 against H1 : β = 0. Test statistics: t = β sb . Level of significanse: usually 0.01, 0.05, 0.10 ⇒ (1-confidence level) = a probability needed to not to reject null hypothesis (using observations). Critical value of the test – based on significance level; define critical region → value that a test statistic must exceed in order for the the null hypothesis to be rejected. p-value: compare with significance level; the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. Hypothesis testing using confidence intervals. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 18 / 29 Simple regression model Basic statistical concepts Using computer software. β: point estimate. 95% confidence interval. Standard error of parameter estimate (β), sb. t-statistics for H0 : β = 0. p-value for H0 : β = 0. Example – electric utility industry (see Koop and replicate example). Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 19 / 29 Simple regression model Basic statistical concepts Hypothesis testing involving R2 . H0 : R2 = 0, H1 : R2 = 0 → X does not have any explanatory power for Y . F-statistics: F = (N − 2)R2 1 − R2 . Compare with critical value or use p-value. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 20 / 29 Multiple regression Content 1 Simple regression model Ordinary least squares method Basic statistical concepts 2 Multiple regression 3 Dummy variables Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 21 / 29 Multiple regression Model and OLS estimates Model: Yi = α + β1X1i + β2X2i + . . . + βkXki + i . Sum of squared residuals: SSR = N i=1 Yi − α − β1X1i − β2X2i − . . . − βkXki 2 . R2 – effect of the all variables. F-test – test of whether the regression explains anything at all: F = (N − k − 1)R2 1 − R2 . Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 22 / 29 Multiple regression Interpreting OLS estimates Parameter – marginal effect of the explanatory variable on the dependent variable holding the other explanatory variables constant. Example – house prices (page 45). Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 23 / 29 Multiple regression Choosing explanatory variables Important consideration pulling in opposite directions: 1 To include as many variables as possible (all variables that help explain the dependent variable). 2 To include as few explanatory variables as possible (including irrelevant variables, statistically insignificant, can raduce the statistical significance of all the explanatory variables). Why not to exclude important explanatory variables? (omitted variables bias) – example (see Koop (2008)). Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 24 / 29 Multiple regression Practical guide Not possible to include all relevant variables. Start with the most variables → sequential elimination of insignificant variables. Final regression – statistically significant variables only + intercept. Competing models → R2. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 25 / 29 Multiple regression Multicollinearity If some or all of the explanatory Some consequences – high R2 × all parameters statistically insignificant (high std. errors). Perfect collinearity – estimation impossible (intuition based on parameters interpretation). Solution – exclude appropriate variables. Testing – correlation matrix. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 26 / 29 Dummy variables Content 1 Simple regression model Ordinary least squares method Basic statistical concepts 2 Multiple regression 3 Dummy variables Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 27 / 29 Dummy variables Working with dummy variables Qualitative „1 or 0“ variables). Interpreted such as „ordinary“ variables. Regression with „variable“ intercepts or slopes (for each category). Dummy dependent variable = another kind of models (logit, probit)! Examples – see Koop (2008), pages 51–55. Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 28 / 29 Dummy variables Exercises Koop (2008) – exercises 1, 2 and 3 (chapter 2). Introduction to econometrics (INEC) II. Non-technical introduction Autumn 2011 29 / 29