LECTURE 1 Introduction to Econometrics Dali Laxton 1 / 30 September 24, 2021 WHAT IS ECONOMETRICS? 2 / 30 To beginning students, it may seem as if econometrics is an overly complex obstacle to an otherwise useful education. (. . .) To professionals in the field, econometrics is a fascinating set of techniques that allows the measurement and analysis of economic phenomena and the prediction of future economic trends. Studenmund (Using Econometrics: A Practical Guide) WHAT IS ECONOMETRICS? 3 / 30 q Econometrics is a set of statistical tools and techniques for quantitative measurement of actual economic and business phenomena q It attempts to 1. quantify economic reality 2. bridge the gap between the abstract world of economic theory and the real world of human activity q It has three major uses: 1.describing economic reality 2.testing hypotheses about economic theory 3.forecasting future economic activity 4 / 30 EXAMPLE 5 / 30 e Consumer demand for a particular commodity can be thought of as a relationship between §quantity demanded (Q) §commodity’s price (P) §price of substitute good (Ps) §disposable income (Y) e Theoretical functional relationship: Q = f (P, Ps, Y) e Econometrics allows us to specify: Q = 31.50 − 0.73P + 0.11Ps + 0.23Y INTRODUCTORY ECONOMETRICS COURSE 6 / 30 e Lecturer: Dali Laxton (CERGE-EI, Prague) dali.laxton@gmail.com e Lectures / Seminars: Friday, 9:00-11:50 room VT 105 e Office hours: Saturday by appointment 17:00-18:00 INTRODUCTORY ECONOMETRICS COURSE 7 / 30 e Course requirements: Ø2 quizzes and 2 home assignments (account for 40 points) ØMidterm exam (account for 30 points) ØFinal exam/project (account for 30 points) Øto pass the course, student has to get at least 50 points in total e Recommended literature: §Studenmund, A. H., Using Econometrics: A Practical Guide §Wooldridge, J. M., Introductory Econometrics: A Modern Approach §Adkins, L., Using gretl for Principles of Econometrics COURSE CONTENT 8 / 30 e Lectures: §Lecture 1: Introduction, repetition of statistical background, non-technical introduction to regression §Lectures 2 - 4: Linear regression models §Lectures 5 - 11: Violations of standard assumptions e In-class exercises: §Will serve to clarify and apply concepts presented on lectures §We will use statistical software to solve the exercises LECTURE 1. 9 / 30 e Introduction, repetition of statistical background §probability theory §statistical inference e Readings: §Studenmund, A. H., Using Econometrics: A Practical Guide, Chapter 16 §Wooldridge, J. M., Introductory Econometrics: A Modern Approach, Appendix B and C RANDOM VARIABLES 10 / 30 e A random variable X is a variable whose numerical value is determined by chance. It is a quantification of the outcome of a random phenomenon. e Discrete random variable: has a countable number of possible values Example: the number of times that a coin will be flipped before a heads is obtained e Continuous random variable: can take on any value in an interval Example: time until the first goal is scored in a football match between Liverpool and Manchester United DISCRETE RANDOM VARIABLES 11 / 30 e Described by listing the possible values and the associated probability that it takes on each value e Probability distribution of a variable X that can take values x1, x2, x3, . . . : P(X = x1) = p1 P(X = x2) = p2 P(X = x3) = p3 . . e Cumulative distribution function (CDF) : Example, probability that heads shows up less or equal than 3 times after flipping it 5 times => need to sum all probabilities: 0 times, once, twice and three times SIX-SIDED DIE: PROBABILITY DISTRIBUTION FUNCTION 12 / 30 SIX-SIDED DIE: HISTOGRAM OF DATA (100 ROLLS) 13 / 30 SIX-SIDED DIE: HISTOGRAM OF DATA (1000 ROLLS) 14 / 30 When you increase the number of observations (100->1000) empirical distribution approaches the theoretical one CONTINUOUS RANDOM VARIABLES 15 / 30 e Probability density function fX(x) (PDF) describes the relative likelihood for the random variable X to take on a particular value x e Cumulative distribution function (CDF) : e Computational rule: P(X > x) = 1 − P(X ≤ x) EXPECTED VALUE AND MEDIAN 16 / 30 e Expected value (mean) : Mean is the (long-run) average value of random variable Discrete variable Continuous variable ∫ Example: calculating mean wind speed given wind speed distribution and power curve e Median : ”the value in the middle” EXERCISE 1 17 / 30 e A researcher is analyzing data on financial wealth of 100 professors at a small liberal arts college. The values of their wealth range from $400 to $400,000, with a mean of $40,000, and a median of $25,000. e However, when entering these data into a statistical software package, the researcher mistakenly enters $4,000,000 for the person with $400,000 wealth. e How much does this error affect the mean and median? VARIANCE AND STANDARD DEVIATION e Variance : Measures the extent to which the values of a random variable are dispersed from the mean. If values (outcomes) are far away from the mean, variance is high. If they are close to the mean, variance is low. e Standard deviation : §Note: Outliers influence on variance/sd. 18 / 30 Example of variance, darts DANCING STATISTICS 19 / 30 Watch the video ”Dancing statistics: Explaining the statistical concept of variance through dance”: https://www.youtube.com/watch?v=pGfwj4GrUlA&list= PLEzw67WWDg82xKriFiOoixGpNLXK2GNs9&index=4 Use the ’dancing’ terminology to answer these questions: 1.How do we define variance? 2.How can we tell if variance is large or small? 3.What does it mean to evaluate variance within a set? 4.What does it mean to evaluate variance between sets? 5.What is the homogeneity of variance? 6.What is the heterogeneity of variance? EXERCISE 2 20 / 30 e Which has a higher expected value and which has a higher standard deviation: a standard six-sided die or a four-sided die with the numbers 1 through 4 printed on the sides? e Explain your reasoning, without doing any calculations, then verify, doing the calculations. COVARIANCE, CORRELATION, INDEPENDENCE 21 / 30 e Covariance : §How, on average, two random variables vary with one another. §Do the two variables move in the same or opposite direction? §Measures the amount of linear dependence between two variables. Cov(X, Y) = E [(X − E[X]) (Y − E[Y])] = E [XY] − E[X]E[Y] e Correlation : Similar concept to covariance, but easier to interpret. It has values between -1 and 1. Corr(X, Y) = Cov(X, Y) σXσY INDEPENDENCE OF VARIABLES 22 / 30 e Independence : X and Y are independent if the conditional probability distribution of X given the observed value of Y is the same as if the value of Y had not been observed. e If X and Y are independent, then Cov(X, Y) = 0 (not the other way round in general) e Dancing statistics: explaining the statistical concept of correlation through dance https://www.youtube.com/watch?v=VFjaBh12C6s&index=3& list=PLEzw67WWDg82xKriFiOoixGpNLXK2GNs9 COMPUTATIONAL RULES 23 / 30 E (aX + b) = aE(X) + b Var(aX + b) = a2Var(X) Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) Cov(aX, bY) = Cov(bY, aX) = abCov(X, Y) Cov(X + Z, Y) = Cov(X, Y) + Cov(Z, Y) Cov(X, X) = Var[X] RANDOM VECTORS 24 / 30 e Example: e Sometimes, we deal with vectors of random variables e Expected value: e Variance/covariance matrix: STANDARDIZED RANDOM VARIABLES 25 / 30 e Standardization is used for better comparison of different variables e Define Z to be the standardized variable of X: Z = X − µX σX e The standardized variable Z measures how many standard deviations X is below or above its mean e No matter what are the expected value and variance of X, it always holds that E[Z] = 0 and Var[Z] = σZ2 = 1 NORMAL (GAUSSIAN) DISTRIBUTION e Notation : X ∼ N(µ, σ2) e E[X] = µ e Var[X] = σ2 e Dancing statistics https://www.youtube.com/watch?v=dr1DynUzjq0&index=2& list=PLEzw67WWDg82xKriFiOoixGpNLXK2GNs9 26 / 30 EXERCISE 3 27 / 30 e The heights of U.S. females between age 25 and 34 are approximately normally distributed with a mean of 66 inches and a standard deviation of 2.5 inches. e What fraction of U.S. female population in this age bracket is taller than 70 inches, the height of average adult U.S. male of this age? EXERCISE 4 28 / 30 e A woman wrote to Dear Abby, saying that she had been pregnant for 310 days before giving birth. e Completed pregnancies are normally distributed with a mean of 266 days and a standard deviation of 16 days. e Use statistical tables to determine the probability that a completed pregnancy lasts ) at least 270 days ) at least 310 days SUMMARY 29 / 30 e Today, we revised some concepts from statistics that we will use throughout our econometrics classes e It was a very brief overview, serving only for information what students are expected to know already e The focus was on properties of statistical distributions and on work with normal distribution tables NEXT LECTURE 30 / 30 e We will go through terminology of sampling and estimation e We will start with regression analysis and introduce the Ordinary Least Squares estimator