Econometrics 2 - Lecture 2 Models with Limited Dependent Variables Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 2 Cases of Limited Dependent Variable nTypical situations: function of explanatory variables are of interest to explain nDichotomous dependent variable, e.g., ownership of a car (yes/no), employment status (employed/unemployed), etc. nOrdered response, e.g., qualitative assessment (good/average/bad), working status (full-time/part-time/not working), etc. nMultinomial response, e.g., trading destinations (Europe/Asia/Africa), transportation means (train/bus/car), etc. nCount data, e.g., number of orders a company receives in a week, number of patents granted to a company in a year nCensored data, e.g., expenditures for durable goods, duration of study with drop outs n n n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 3 Example: Car Ownership and Income nWhat is the probability that a randomly chosen household owns a car? nSample of N=32 households qProportion of car owning households:19/32 = 0.59 nBut: this probability will differ for rich and poor! nThe sample data has income information: qYearly income: average EUR 20.524, minimum EUR 12.000, maximum EUR 32.517 qProportion of car owning households among the 16 households with less than EUR 20.000 income: 9/16 = 0.56 qProportion of car owning households among the 16 households with more than EUR 20.000 income: 10/16 = 0.63 March 18, 2011 Hackl, Econometrics 2, Lecture 2 4 Car Ownership and Income, cont’d nHow can prediction of car ownership take the income of a household into account? nNotation: From N households qdummy yi for car ownership: yi =1: household i has car qincome xi2 nFor predicting yi – or of P{yi =1} – , a model is needed that takes the income into account March 18, 2011 Hackl, Econometrics 2, Lecture 2 5 Modeling Car Ownership nHow is car ownership related to the income of a household? 1.Linear regression xi’β + εi = β1+ β2xi2 + εi for describing y nWith E{εi|xi} = 0, the model yi = xi’β + εi gives n P{yi =1|xi} = xi’β n due to E{yi|xi} = 1*P{yi =1|xi} + 0*P{yi =0|xi} = P{yi =1|xi} nModel yi = xi’β + εi: xi’β can be interpreted as P{yi =1|xi}! nProblems: qxi’β not necessarily in [0,1] qError terms: for a given xi nεi has only two values, viz. 1- xi’β and xi’β nV{εi |xi} = xi’β(1- xi’β), heteroskedastic, dependent upon β nModel for y actually is specifying the probability that y=1 as a function of x March 18, 2011 Hackl, Econometrics 2, Lecture 2 6 Modeling Car Ownership, cont’d 2.Use of a function G(xi,β) with values in the interval [0,1] n P{yi =1|xi} = E{yi|xi} = G(xi,β) nThe probability that yi =1, i.e., the household owns a car, depends on the income (and other characteristics, e.g., family size) nUse for G(xi,β) the standard logistic distribution function q q n F(z) fulfills limz→ -∞ F(z) = 0, limz→ ∞ F(z) = 1 nInterpretation: qFrom P{yi =1|xi} = pi = exp{xi’β}/(1+exp{xi’β}) follows q q qAn increase of xi2 by 1 results in a relative change of the odds pi/(1- pi) by β2 or by 100β2%; cf. the notion semi-elasticity n q q March 18, 2011 Hackl, Econometrics 2, Lecture 2 7 Car Ownership and Income, cont’d nE.g., P{yi =1|xi} = 1/(1+exp(-zi)) with z = -0.5+1.1*x, the income in EUR 1000 per month nIncreasing income is associated with an increasing probability of owning a car: z goes up by 1.1 for every additional EUR 1000 nFor a person with an income of EUR 3000, z = 3.1 and the probability of owning a car is 1/(1+exp(-3.1)) = 0.94 nThe standard logistic distribution function, with z on the horizontal and F(z) on the vertical axis March 18, 2011 Hackl, Econometrics 2, Lecture 2 8 Odds nThe odds in favor of an event are the ratio of a pair of integers, the first (the second) representing the relative likelihood that the event will happen (will not happen) nIf p is the probability in favor of the event, the probability against the event therefore being 1-p, the odds of the event are the quotient n nExample: the odds that a randomly chosen day of the week is a Sunday are 1:6 (say “one to six”) because p = P{Sunday} = 1/7, p/(1-p) = (1/7)/(6/7) = 1/6 nIn bookmakers language: odds are not in favor but against qThe bookmaker would say: “The odds that a randomly chosen day of the week is a Sunday are 6:1” nThe logarithm of the odds is the logit of the probability q n March 18, 2011 Hackl, Econometrics 2, Lecture 2 9 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 10 Binary Choice Models nModel for probability P{yi =1|xi}, function of K (numerical or categori-cal) explanatory variables xi and unknown parameters β, such as n E{yi|xi} = P{yi =1|xi} = G(xi,β) nTypical functions G(xi,β): distribution functions (cdf’s) F(xi’β) nProbit model: standard normal distribution function; V{z} = 1 n n nLogit model: standard logistic distribution function; V{z}=π2/3=1,812 q q nLinear probability model (LPM) n n q q March 18, 2011 Hackl, Econometrics 2, Lecture 2 11 Linear Probability Model (LPM) nAssumes that n P{yi =1|xi} = xi’β for 0 ≤ xi’β ≤ 1 n but sets n P{yi =1|xi} = 0 for xi’β < 0 n P{yi =1|xi} = 1 for xi’β > 1 nTypically, the model is estimated by OLS, ignoring the probability restrictions nStandard errors should be adjusted using heteroskedasticity-consistent (White) standard errors March 18, 2011 Hackl, Econometrics 2, Lecture 2 12 Probit Model: Standardization nE{yi|xi} = P{yi =1|xi} = G(xi,β): assume G(.) to be N(0, σ2) n n nGiven xi, the ratio β/σ2 determines P{yi =1|xi} nStandardization restriction s2 = 1: allows unique estimates for β nSimilarly, March 18, 2011 Hackl, Econometrics 2, Lecture 2 13 Probit vs Logit Model nDifferences between the probit and the logit model: qShape of distribution is slightly different, particularly in the tails. qScaling of the distribution is different: The implicit variance for ei in the logit model is p2/3 = (1.81)2, while 1 for the probit model qProbit model is relatively easy to extend to multivariate cases using the multivariate normal or conditional normal distribution nIn practice, the probit and logit model produce quite similar results qThe scaling difference makes the values of b not directly comparable across the two models, while the signs are typically the same qThe estimates in the logit model are roughly a factor p/Ö3 »1.81 larger than those in the probit model q q March 18, 2011 Hackl, Econometrics 2, Lecture 2 14 Interpretation of Coefficients nFor assessing the effect of changing xk the nCoefficient bk nis of interest, but also related characteristics such as nSign nSlope, i.e., the “average” marginal effect ¶F(xi’b)/¶xik q q March 18, 2011 Hackl, Econometrics 2, Lecture 2 15 Binary Choice Models: Marginal Effects nLinear regression models: βk is the marginal effect of a change in xk nFor E{yi|xi} = F(xi’β): n n n with density function f(.) nThe effect of changing the regressor xk depends upon xi’β, the shape of F, and βk nThe marginal effect of changing xk qProbit model: ϕ(xi’β) βk, with standard normal density function ϕ qLogit model: L(xi’β)[1 - L(xi’β)] βk qLinear probability model q q March 18, 2011 Hackl, Econometrics 2, Lecture 2 16 Binary Choice Models: Slopes nInterpretation of the effect of a change in xk n“Slope”, i.e., the gradient of E{yi|xi} at the sample means of the regressors n n nFor a dummy variable D: marginal effect is calculated as the difference of probabilities P{yi =1|x(d),D=1} – P{yi =1|x(d),D=0}; x(d) stands for the sample means of all regressors except D nFor the logit model: q q q qThe coefficient βk is the relative change of the odds when increasing xk by 1 unit March 18, 2011 Hackl, Econometrics 2, Lecture 2 17 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 18 Binary Choice Models: Estimation nTypically, binary choice models are estimated by maximum likelihood nLikelihood function n L(β) = Πi=1NP{yi =1|xi;β}yi P{yi =0|xi;β}1-yi n = Πi F(xi’β)yi (1- F(xi’β))1-yi nMaximization via the log-likelihood function n ℓ(β) = log L(β) = Si yi log F(xi’β) + Si (1-yi) log (1-F(xi’β)) nFirst-order conditions of the maximization problem n n nei: generalized residuals March 18, 2011 Hackl, Econometrics 2, Lecture 2 19 Generalized Residuals nThe first-order conditions allow to define the generalized residuals nFrom n n nfollows that the generalized residuals ei can assume two values: qei = f(xi’b)/F(xi’b) if yi =1 qei = - f(xi’b)/(1-F(xi’b)) if yi =0 n b are the estimates of β nGeneralized residuals are orthogonal to each regressor; cf. the first-order conditions of OLS estimation n March 18, 2011 Hackl, Econometrics 2, Lecture 2 20 Estimation of Logit Model nFirst-order condition of the maximization problem n n n n gives n n nFrom Si xi = Siyixi follows – given one regressor is an intercept –: qThe predicted frequency Si equals the observed frequency Siyi nSimilar results for the probit model, due to similarity of logit and probit functions n March 18, 2011 Hackl, Econometrics 2, Lecture 2 21 Properties of ML estimators nConsistent nAsymptotically efficient nAsymptotically normally distributed nThese properties require that the assumed distribution is correct nCorrect shape nNo autocorrelation and/or heteroskedasticity nNo dependence between errors and regressors nNo omitted regressors n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 22 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 23 Goodness-of-Fit Measures nConcepts nComparison of the maximum likelihood of the model with that of the naïve model, i.e., a model with only an intercept, no regressors qPseudo-R2 qMcFadden R2 nIndex based on proportion of correctly predicted observations qHit rate n n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 24 McFadden R2 nBased on log-likelihood function nℓ(b) = ℓ1: maximum log-likelihood of the model to be assessed nℓ0: maximum log-likelihood of the naïve model, i.e., a model with only an intercept; ℓ0 ≤ ℓ1 and ℓ0, ℓ1 < 0 qThe larger ℓ1 - ℓ0, the more contribute the regressors qℓ1 = ℓ0, if all slope coefficients are zero qℓ1 = 0, if yi is exactly predicted for all i nPseudo-R2: a number in [0,1), defined by n n n nMcFadden R2: a number in [0,1], defined by n nBoth are 0 if ℓ1 = ℓ0 nMcFadden R2 attains the upper limit if ℓ1 = 0 n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 25 Hit Rate nComparison of correct and incorrect predictions nPredicted outcome n ŷi = 1 if xi’b > 0 n = 0 if xi’b ≤ 0 nCross-tabulation of actual and predicted outcome nProportion of incorrect predictions n wr1 = (n01+n10)/N nHit rate: 1 - wr1 q proportion of correct predictions nComparison with naive model: qPredicted outcome of naïve model q ŷi = 1 if = N1/N > 0.5, ŷi = 0 if ≤ 0.5 qRp2= 1 – wr1/wr0 q with wr0 = 1 - if > 0.5, wr0 = if ≤ 0.5 in order to avoid Rp2 < 0 n n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 26 ŷ = 0 ŷ = 1 Σ y = 0 n00 n01 N0 y = 1 n10 n11 N1 Σ n0 n1 N Example: Effect of Teaching Method nStudy by Spector & Mazzeo (1980); see Greene (2003), Chpt.21 nPersonalized System of Instruction: new teaching method in economics; has it an effect on student performance in later courses? nData: qGRADE (0/1): indicator whether grade was higher than in principal course qPSI (0/1): participation in program with new teaching method qGPA: grade point average qTUCE: score on a pretest, entering knowledge n32 observations n March 18, 2011 Hackl, Econometrics 2, Lecture 2 27 Effect of Teaching Method, cont’d nLogit model for GRADE, GRETL output March 18, 2011 Hackl, Econometrics 2, Lecture 2 28 Model 1: Logit, using observations 1-32 Dependent variable: GRADE Coefficient Std. Error z-stat Slope* const -13.0213 4.93132 -2.6405 GPA 2.82611 1.26294 2.2377 0.533859 TUCE 0.0951577 0.141554 0.6722 0.0179755 PSI 2.37869 1.06456 2.2344 0.456498 Mean dependent var 0.343750 S.D. dependent var 0.188902 McFadden R-squared 0.374038 Adjusted R-squared 0.179786 Log-likelihood -12.88963 Akaike criterion 33.77927 Schwarz criterion 39.64221 Hannan-Quinn 35.72267 *Number of cases 'correctly predicted' = 26 (81.3%) f(beta'x) at mean of independent vars = 0.189 Likelihood ratio test: Chi-square(3) = 15.4042 [0.0015] Predicted 0 1 Actual 0 18 3 1 3 8 Effect of Teaching Method, cont’d nLogit model for GRADE, actual and fitted values of 32 observations March 18, 2011 Hackl, Econometrics 2, Lecture 2 29 Effect of Teaching Method, cont’d nComparison of the LPM, logit, and probit model for GRADE nEstimated models: coefficients and their standard errors n n n n n n n n nCoefficients of logit model: due to larger variance, larger by factor √(π2/3)=1.81 than that of the probit model March 18, 2011 Hackl, Econometrics 2, Lecture 2 30 LPM Logit Probit coeff s.e. coeff s.e. coeff s.e. const -1.498 0.524 -13.02 4.931 -7.452 2.542 GPA 0.464 0.162 2.826 1.263 1.626 0.694 TUCE 0.010 0.019 0.095 0.142 0.052 0.084 PSI 0.379 0.139 2.379 1.065 1.426 0.595 Effect of Teaching Method, cont’d nGoodness of fit measures for the logit model nWith N1 = 11 and N = 32 n ℓ0 = 11 log(11/32) + 21 log(21/32) = - 20.59 nAs = N1/N = 0.34 < 0.5: the proportion wr0 of incorrect predictions with the naïve model is n wr0 = = 11/32 = 0.34 nFrom the GRETL output: ℓ0 = -12.89, wr1 = 6/32 nGoodness of fit measures nRp2 = 1 – wr1/wr0 = 1 – 6/11 = 0.45 nMcFadden R2 = 1 – (-12.89)/(-20.59) = 0.374 n March 18, 2011 Hackl, Econometrics 2, Lecture 2 31 Example: Utility of Car Owning nLatent variable yi*: utility difference between owning and not owning a car; unobservable (latent) nDecision on owning a car qyi* > 0: in favor of car owning qyi* ≤ 0: against car owning nyi* depends upon observed characteristics (like income) and unobserved characteristics εi n yi* = xi’β + εi nObservation yi = 1 (i.e., owning car) if yi* > 0 n P{yi =1} = P{yi* > 0} = P{xi’β + εi > 0} = 1 – F(-xi’β) = F(xi’β) n last step requires a symmetric distribution function F(.) nLatent variable model: based on a latent variable that represents underlying behavior March 18, 2011 Hackl, Econometrics 2, Lecture 2 32 Latent Variable Model nModel for the latent variable yi* n yi* = xi’β + εi n yi*: not necessarily a utility difference nεi‘s are independent of xi’s nεi has standardized distribution qProbit model if εi has standard normal distribution qLogit model if εi has standard logistic distribution nObservations qyi = 1 if yi* > 0 qyi = 0 if yi* ≤ 0 nML estimation March 18, 2011 Hackl, Econometrics 2, Lecture 2 33 Binary Choice Models in GRETL nModel > Nonlinear Models > Logit > Binary nEstimates the specified model using error terms with standard logistic distribution nModel > Nonlinear Models > Probit > Binary nEstimates the specified model using error terms with standard normal distribution March 18, 2011 Hackl, Econometrics 2, Lecture 2 34 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 35 Multiresponse Models nModel for explaining the choice between discrete outcomes nExamples: a.Working status (full-time/part-time/not working), qualitative assessment (good/average/bad), etc. b.Trading destinations (Europe/Asia/Africa), transportation means (train/bus/car), etc. nMultiresponse models describe the probability of each of these outcomes, as a function of variables like qperson-specific characteristics qalternative-specific characteristics nTypes of multiresponse models (cf. above examples) qOrdered response models: outcomes have a natural ordering qMultinomial (unordered) models: ordering of outcomes is arbitrary March 18, 2011 Hackl, Econometrics 2, Lecture 2 36 Example: Credit Rating nCredit rating: numbers, indicating experts’ opinion about (a firm’s) capacity to satisfy financial obligations, e.g., credit-worthiness nStandard & Poor's rating scale: AAA, AA+, AA, AA-, A+, A, A-, BBB+, BBB, BBB-, BB+, BB, BB-, B+, B, B-, CCC+, CCC, CCC-, CC, C, D nVerbeek‘s data set CREDIT qCategories “1“, …,“7“ (highest) qInvestment grade with alternatives “1” (better than category 3) and “0” (category 3 or less, also called “speculative grade“) nExplanatory variables, e.g., qFirm sales qEbit, i.e., earnings before interest and taxes qRatio of working capital to total assets March 18, 2011 Hackl, Econometrics 2, Lecture 2 37 Ordered Response Model nChoice between M alternatives nObserved alternative for sample unit i: yi nLatent variable model n yi* = xi’β + εi n with explanatory variables xi q yi = j if γj-1 < yi* ≤ γj for j = 0,…,M nboundaries γj, j = 0,…,M, with γ0 = -∞, …, γM = ∞ nεi‘s are independent of xi’s nεi typically follow the qstandard normal distribution: ordered probit model qstandard logistic distribution: ordered logit model March 18, 2011 Hackl, Econometrics 2, Lecture 2 38 Example: Willingness to Work n„How much would you like to work?“ nPotential answers of individual i: yi = 1 (not working), yi = 2 (part time), yi = 3 (full time) nMeasure the desired labor supply nDependent upon factors like age, education level, husband‘s income nOrdered response model with M = 3 n yi* = xi’β + εi n with q yi = 1 if yi* ≤ 0 q yi = 2 if 0 < yi* ≤ γ q yi = 3 if yi* > γ nεi‘s with distribution function F(.) nyi* stands for “willingness to work” or “desired hours of work” March 18, 2011 Hackl, Econometrics 2, Lecture 2 39 Willingness to Work, cont’d nIn terms of observed quantities: n P{yi = 1 |xi} = P{yi* ≤ 0 |xi} = F(- xi’β) n P{yi = 3 |xi} = P{yi* > γ |xi} = 1 - F(γ - xi’β) n P{yi = 2 |xi} = F(γ - xi’β) – F(- xi’β) nUnknown parameters: γ and β nStandardization: wrt location (γ1 = 0) and scale (V{εi} = 1) nML estimation nInterpretation of parameters β nWrt yi*: willingness to work increases with larger xk for positive βk nWrt probabilities P{yi = j |xi}, e.g., P{yi = 3 |xi} increases and P{yi = 1 |xi} decreases with larger xk for positive βk n March 18, 2011 Hackl, Econometrics 2, Lecture 2 40 Example: Credit Rating nVerbeek‘s data set CREDIT: 921 observations for US firms' credit ratings in 2005, including firm characteristics nRating models: 1.Ordered logit model for assignment of categories “1“, …,“7“ (highest) 2.Binary logit model for assignment of “investment grade” with alternatives “1” (better than category 3) and “0” (category 3 or less, also called “speculative grade“) March 18, 2011 Hackl, Econometrics 2, Lecture 2 41 Credit Rating, cont’d nVerbeek‘s data set CREDIT nRatings and characteristics for 921 firms: summary statistics n n n n n n n n n_____________________ nBook leverage: ratio of debts to assets March 18, 2011 Hackl, Econometrics 2, Lecture 2 42 Credit Rating, cont’d nVerbeek, Table 7.5. March 18, 2011 Hackl, Econometrics 2, Lecture 2 43 Ordered Response Model: Estimation nML estimation of β1, …, βK and γ1, …, γM-1 nLoglikelihood function in terms of probabilities nNumerical optimization nML estimators are qConsistent qAsymptotically efficient qAsymptotically normally distributed March 18, 2011 Hackl, Econometrics 2, Lecture 2 44 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 45 Multinomial Models nChoice between M alternatives without natural order nObserved alternative for sample unit i: yi n“Random utility” framework: Individuals nattach utility levels Uij to each of the alternatives, j = 1,…, M nchoose the alternative with the highest utility level nUtility levels Uij, j = 1,…, M, as a function of characteristics xij n Uij = xij’β + εij nerror terms εij follow the Type I extreme value distribution: n n n for j = 1, …, M nand Σj P{yi = j} = 1 March 18, 2011 Hackl, Econometrics 2, Lecture 2 46 Variants of the Logit Model nFor setting the location: constraint xi1’b = 0 or exp{xi1’b} = 1 nConditional logit model: for j = 1, …, M n n nAlternative-specific characteristics xij nE.g., mode of transportation is affected by travel costs, travel duration, etc. nMultinomial logit model: for j = 1, …, M n n nPerson-specific characteristics xi nE.g., mode of transportation is affected by income, gender, etc n March 18, 2011 Hackl, Econometrics 2, Lecture 2 47 Multinomial Logit Model nThe term “multinomial logit model” is also used for both the nthe conditional logit model nthe multinomial logit model (see above) nand also the mixed logit model: combines qAlternative-specific characteristics and qPerson-specific characteristics n March 18, 2011 Hackl, Econometrics 2, Lecture 2 48 Independence of Errors nIndependence of the error terms εi implies independent utility levels of alternatives nA restrictive assumption nExamples: High utility of alternative „travel with red bus“ implies high utility of „travel with blue bus“ nImplies that the odds ratio of two alternatives does not depend upon the number of alternatives: “independence of irrelevant alternatives” (IIA) n March 18, 2011 Hackl, Econometrics 2, Lecture 2 49 Multiresponse Models in GRETL nModel > Nonlinear Models > Logit > Ordered nEstimates the specified model using error terms with standard logistic distribution, assuming ordered alternatives for responses nModel > Nonlinear Models > Logit > Multinomial nEstimates the specified model using error terms with standard logistic distribution, assuming alternatives without order nModel > Nonlinear Models > Probit > Ordered / Multinomial nEstimates the specified model using error terms with standard normal distribution, assuming alternatives with or without order March 18, 2011 Hackl, Econometrics 2, Lecture 2 50 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 51 Models for Count Data nDescribe the number of times an event occurs, depending upon certain characteristics nExamples: nNumber of visits in the library per week nNumber of misspellings in an email nNumber of applications of a firm for a patent, as a function of qFirm size qR&D expenditures qIndustrial sector qCountry, etc. qSee Verbeek‘s data set PATENT n March 18, 2011 Hackl, Econometrics 2, Lecture 2 52 Poisson Regression Model nObserved variable for sample unit i: n yi: number of possible outcomes 0, 1, …, y, … nAim: to explain E{yi | xi }, based on characteristics xi n E{yi | xi } = exp{xi’β} nPoisson regression model n n n with λi = E{yi | xi } = exp{xi’β} n y! = 1x2x…xy, 0! = 1 March 18, 2011 Hackl, Econometrics 2, Lecture 2 53 Poisson Distribution March 18, 2011 Hackl, Econometrics 2, Lecture 2 54 C:\Users\PHackl\Documents\O'trie\_Brno_SS\800px-Poisson-Verteilung.PNG Poisson Regression Model: The Practice nUnknown parameters: coefficients β nFitting the model to data: ML estimators are nConsistent nAsymptotically efficient nAsymptotically normally distributed nEquidispersion condition nPoisson distributed X obeys n E{X} = V{X} = λ nIn many situations not realistic nOverdispersion nRemedies: Alternative distributions, e.g., negative Binomial, and alternative estimation procedures, e.g., Quasi-ML, robust standard errors March 18, 2011 Hackl, Econometrics 2, Lecture 2 55 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 56 Tobit Models nTobit models are regression models where the range of the (continuous) dependent variable is non-negative, i.e., censored from below nExamples: nExpenditures on durable goods as a function of income, age, etc.: a part of units does not spend any money on durable goods nHours of work as a function of qualification, age, etc. nExpenditures on alcoholic beverages and tabacco nTobit models nStandard Tobit model or Tobit I model; Tobin (1958) on expenditures on durable goods nGeneralizations: Tobit II to V March 18, 2011 Hackl, Econometrics 2, Lecture 2 57 Example: Expenditures on Tobacco nVerbeek‘s data set TOBACCO: expenditures on tobacco in 2724 Belgian households, Belgian household budget survey of 1995/96 nModel: n yi* = xi’b + ei n with optimal expenditures yi* on tobacco in household i, characteristics xi of the household, unobserved heterogeneity ei (or measurement error or optimization error) n Actual expenditures yi n yi = yi* if yi* > 0 n = 0 if yi* ≤ 0 March 18, 2011 Hackl, Econometrics 2, Lecture 2 58 The Standard Tobit Model nThe latent variable yi* depends upon characteristics xi n yi* = xi’b + eI n with error terms (or unobserved heterogeneity) n ei ~ NID(0, s2), independent of xi nActual outcome of the observable variable yi n yi = yi* if yi* > 0 n = 0 if yi* ≤ 0 nStandard Tobit model or censored regression model nCensoring: all negative values are substituted by zero nCensoring in general qCensoring from below (above): all values left (right) from a lower (an upper) bound are substituted by the lower (upper) bound nOLS produces inconsistent estimators for b n March 18, 2011 Hackl, Econometrics 2, Lecture 2 59 The Standard Tobit Model, cont’d nStandard Tobit model describes 1.The probability P{yi = 0} as a function of xi n P{yi = 0} = P{ei £ - xi’b } = 1 - F(xi’b/s) 2.The distribution of yi given that it is a positive or (by yi > 0) truncated normal distribution n E{yi | yi > 0} = xi’b + E{ei | ei > - xi’b} = xi’b + s l(xi’b/s) n with l(xi’b/s) = f(xi’b/s) / F(xi’b/s) ³ 0 nAttention: a single set b of parameters characterizes both expressions nThe effect of a characteristic qon the probability of non-zero observation and qon the value of the observation n have the same sign! March 18, 2011 Hackl, Econometrics 2, Lecture 2 60 The Standard Tobit Model: Interpretation nFrom n P{yi = 0} = 1 - F(xi’b/s) n E{yi | yi > 0} = xi’b + s l(xi’b/s) n follows: nA positive coefficient means that an increase in the explanatory variable increases the probability of having a positive yi nThe marginal effect of xik upon E{yi | yi > 0} is different from bk nThe marginal effect of xik upon E{yi} is bkP{yi > 0} qIt is close to bk if P{yi > 0} is close to 1, i.e, little censoring nThe marginal effect of xik upon E{yi*} is bk n March 18, 2011 Hackl, Econometrics 2, Lecture 2 61 The Standard Tobit Model: Estimation nOLS produces inconsistent estimators for b nML estimation based on the log-likelihood n log L1(b, s2) = ℓ1(b, s2) = SiϵI0 log P{yi = 0} + SiϵI1 log f(yi) n with appropriate expressions for P{.} and f(.), I0 the set of censored observations, I1 the set of uncensored observations nFor the correctly specified model: estimates are nConsistent nAsymptotically efficient nAsymptotically normally distributed nML estimation based on observations with yi > 0 only, i.e., on the truncated regression model: n ℓ2(b, s2) = SiϵI1[ log f(yi) - log P{yi > 0}] nEstimates based on ℓ1 are more efficient than those based on ℓ2 March 18, 2011 Hackl, Econometrics 2, Lecture 2 62 Example: Model for Budget Share for Tobacco nVerbeek‘s data set TOBACCO: Belgian household budget survey of 1995/96 nBudget share wi* for expenditures on tobacco corresponding to maximal utility: wi* = xi’b + eI n xi: log of total expenditures and various characteristics like qnumber of children £ 2 years old qnumber of adults in household qage nActual budget share for expenditures on tobacco n wi = wi* if wi* > 0, n = 0 otherwise n2724 households n March 18, 2011 Hackl, Econometrics 2, Lecture 2 63 Model for Budget Share for Tobacco nTobit model, nGRETL output March 18, 2011 Hackl, Econometrics 2, Lecture 2 64 Model 2: Tobit, using observations 1-2724 Dependent variable: SHARE1 (Tobacco) coefficient std. error t-ratio p-value ---------------------------------------------------------- const -0,170417 0,0441114 -3,863 0,0001 *** AGE 0,0152120 0,0106351 1,430 0,1526 NADULTS 0,0280418 0,0188201 1,490 0,1362 NKIDS -0,00295209 0,000794286 -3,717 0,0002 *** NKIDS2 -0,00411756 0,00320953 -1,283 0,1995 LNX 0,0134388 0,00326703 4,113 3,90e-05 *** AGELNX -0,000944668 0,000787573 -1,199 0,2303 NADLNX -0,00218017 0,00136622 -1,596 0,1105 WALLOON 0,00417202 0,000980745 4,254 2,10e-05 *** Mean dependent var 0,017828 S.D. dependent var 0,021658 Censored obs 466 sigma 0,024344 Log-likelihood 4764,153 Akaike criterion -9508,306 Schwarz criterion -9449,208 Hannan-Quinn -9486,944 Model for Budget Share for Tobacco, cont’d nTruncated regres- nsion model, nGRETL output March 18, 2011 Hackl, Econometrics 2, Lecture 2 65 Model 7: Tobit, using observations 1-2724 (n = 2258) Missing or incomplete observations dropped: 466 Dependent variable: W1 (Tobacco) coefficient std. error t-ratio p-value --------------------------------------------------------- const 0,0433570 0,0458419 0,9458 0,3443 AGE 0,00880553 0,0110819 0,7946 0,4269 NADULTS -0,0129409 0,0185585 -0,6973 0,4856 NKIDS -0,00222254 0,000826380 -2,689 0,0072 *** NKIDS2 -0,00261220 0,00335067 -0,7796 0,4356 LNX -0,00167130 0,00337817 -0,4947 0,6208 AGELNX -0,000490197 0,000815571 -0,6010 0,5478 NADLNX 0,000806801 0,00134731 0,5988 0,5493 WALLOON 0,00261490 0,000922432 2,835 0,0046 *** Mean dependent var 0,021507 S.D. dependent var 0,022062 Censored obs 0 sigma 0,021450 Log-likelihood 5471,304 Akaike criterion -10922,61 Schwarz criterion -10865,39 Hannan-Quinn -10901,73 Two Models for Budget Share for Tobacco, Comparison nEstimates and standard errors for some coefficients n of the Tobit and the truncated regression model n March 18, 2011 Hackl, Econometrics 2, Lecture 2 66 constant NKIDS LNX WALL Tobit model -0,1704 -0,0030 0,0134 0,0042 0,0441 0,0008 0,0033 0,0010 Truncated regression 0,0433 -0,0022 -0,0017 0,0026 0,0458 0,0008 0,0034 0,0009 Specification Tests nVarious tests based on ngeneralized residuals n l(- xi’b/s) if yi = 0 n ei/s if yi > 0 (standardized residuals) n with l(xi’b/s) = f(xi’b/s) / F(xi’b/s), evaluated for estimates of b, s nand “second order” generalized residuals corresponding the estimation of s2 nTests nfor normality nfor heteroskedasticity nfor omitted variables nTest for normality is standard test in GRETL‘s TOBIT procedure: consistency requires normality March 18, 2011 Hackl, Econometrics 2, Lecture 2 67 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 68 An Example: Wage Equation nWage observations: available only for the working population nModel that explains wages as a function of characteristics, e.g., the person‘s age nTobin model: for a positive coefficient of age, an increase of age qincreases wage qincreases the probability that the person is working qNot always realistic! nTobin II model: allows two separate equations qfor labor force participation and qfor the wage of a person nTobin II model is also called “sample selection model” March 18, 2011 Hackl, Econometrics 2, Lecture 2 69 Wage Model: Tobit II nWage equation describes the wage of person i n wi* = x1i’b1 + e1i n with exogenous characteristics (age, education, …) nLabor force participation or selection equation n hi* = x2i’b2 + e2i nObservation rule: wi actual wage of person i n wi = wi*, hi = 1 if hi* > 0 n wi not observed, hi = 0 if hi* £ 0 n hi: indicator for working nDistributional assumption for e1i, e2i March 18, 2011 Hackl, Econometrics 2, Lecture 2 70 Wage Model: Selection Equation nSelection equation: a binary choice model; probit model needs standardization (s22 = 1) nSpecial cases qIf s12 = 0, sample selection is exogenous qIf x1i’b1 = x2i’b2 and e1i = e2i, the Tobit II model coincides with the Tobit I model nCharacteristics x1i and x2i may be different; however, qIf the selection depends upon wi*: x2i is expected to include x1i qBecause the model describes the joint distribution of wi and hi given one set of conditioning variables: x2i is expected to include x1i qSign and value of coefficients of the same variables in x1i and x2i can be different q March 18, 2011 Hackl, Econometrics 2, Lecture 2 71 Wage Model: Wage Equation nExpected value of wi, given sample selection: n E{wi | hi =1} = x1i’b1 + s12 l(x2i’b2) n with the inverse Mill’s ratio or Heckman’s lambda n l(x2i’b2) = f(x2i’b2) / F(x2i’b2) nHeckman’s lambda qPositive and decreasing in its argument qThe smaller the probability that a person is working, the larger the value of the correction term l nExpected value of wi only equals x1i’b1 if s12 = 0: “no sample selection” March 18, 2011 Hackl, Econometrics 2, Lecture 2 72 Tobit II Model: Log-likelihood Function nLog-likelihood n ℓ3(b,s12,s12) = SiϵI0 log P{hi=0} + SiϵI1 [log f(yi|hi=1) + log P{hi=1}] n = SiϵI0 log P{hi=0} + SiϵI1 [log f(yi) + log P{hi=1|yi}] n with n P{hi=0} = 1 - F(x2i’b2) n March 18, 2011 Hackl, Econometrics 2, Lecture 2 73 Tobit II Model: Estimation nMaximum likelihood estimation, based on the log-likelihood n ℓ3(b,s12,s12) = SiϵI0 log P{hi=0} + SiϵI1 [log f(yi|hi=1) + log P{hi=1}] nTwo step approach (Heckman, 1979) 1.Estimate the coefficients b2 of the selection equation by standard probit maximum likelihood 2.Compute l(x2i’b2) = f(x2i’b2) / F(x2i’ b2) = li 3.Estimate the coefficients b1 and s12 using OLS q wi = x1i’b1 + s12 li + ηi nGRETL: procedure „Heckit“ allows both the ML and the two step estimation n March 18, 2011 Hackl, Econometrics 2, Lecture 2 74 Tobit II Model for Budget Share for Tobacco nHeckit ML nestimation, nGRETL output March 18, 2011 Hackl, Econometrics 2, Lecture 2 75 Model 7: ML Heckit, using observations 1-2724 Dependent variable: SHARE1 Selection variable: D1 coefficient std. error t-ratio p-value ------------------------------------------------------------- const 0,0444178 0,0492440 0,9020 0,3671 AGE 0,00874370 0,0110272 0,7929 0,4278 NADULTS -0,0130898 0,0165677 -0,7901 0,4295 NKIDS -0,00221765 0,000585669 -3,787 0,0002 *** NKIDS2 -0,00260186 0,00228812 -1,137 0,2555 LNX -0,00174557 0,00357283 -0,4886 0,6251 AGELNX -0,000485866 0,000807854 -0,6014 0,5476 NADLNX 0,000817826 0,00119574 0,6839 0,4940 WALLOON 0,00260557 0,000958504 2,718 0,0066 *** lambda -0,00013773 0,00291516 -0,04725 0,9623 Mean dependent var 0,021507 S.D. dependent var 0,022062 sigma 0,021451 rho -0,006431 Log-likelihood 4316,615 Akaike criterion -8613,231 Schwarz criterion -8556,008 Hannan-Quinn -8592,349 Tobit II Model for Budget Share for Tabacco, cont’d March 18, 2011 Hackl, Econometrics 2, Lecture 2 76 nHeckit ML nestimation, nGRETL output n Model 7: ML Heckit, using observations 1-2724 Dependent variable: SHARE1 Selection variable: D1 Selection equation coefficient std. error t-ratio p-value ------------------------------------------------------------- const -16,2535 2,58561 -6,286 3,25e-010 *** AGE 0,753353 0,653820 1,152 0,2492 NADULTS 2,13037 1,03368 2,061 0,0393 ** NKIDS -0,0936353 0,0376590 -2,486 0,0129 ** NKIDS2 -0,188864 0,141231 -1,337 0,1811 LNX 1,25834 0,192074 6,551 5,70e-011 *** AGELNX -0,0510698 0,0486730 -1,049 0,2941 NADLNX -0,160399 0,0748929 -2,142 0,0322 ** BLUECOL -0,0352022 0,0983073 -0,3581 0,7203 WHITECOL 0,0801599 0,0852980 0,9398 0,3473 WALLOON 0,201073 0,0628750 3,198 0,0014 *** Models for Budget Share for Tabacco constant NKIDS LNX WALL Tobit model -0,1704 -0,0030*** 0,0134*** 0,0042*** 0,0441 0,0008 0,0033 0,0010 Truncated regression 0,0433 -0,0022*** -0,0017 0,0026*** 0,0458 0,0008 0,0034 0,0009 Tobit II model 0,0444 -0,0022*** -0,0017 0,0026*** 0,0492 0,0006 0,0036 0,0010 Tobit II selection -16,2535 -0,0936** 1,2583*** 0,2011*** 2,5856 0,0377 0,1921 0,0629 March 18, 2011 Hackl, Econometrics 2, Lecture 2 77 Estimates and standard errors for some coefficients of the standard Tobit, the truncated regression and the Tobit II model Test for Sampling Selection Bias nError terms of the Tobit II model with s12 ≠ 0: standard errors and test may result in misleading inferences nTest of H0: s12 = 0 in the second step of Heckit, i.e., fitting the regression wi = x1i’b1 + s12 li + ηi nt-test on the coefficient for Heckman’s lambda nTest results are sensitive to exclusion restrictions on x1i n n March 18, 2011 Hackl, Econometrics 2, Lecture 2 78 Tobit Models in GRETL nModel > Nonlinear Models > Tobit nEstimates the Tobit model; censored dependent variable nModel > Nonlinear Models > Heckit nEstimates in addition the selection equation (Tobit II), optionally by ML- and by two-step estimation March 18, 2011 Hackl, Econometrics 2, Lecture 2 79 Your Homework 1.Verbeek‘s data set CREDIT contains credit ratings of 921 US firms, as well as characteristics of the firm; the variable rating has categories “1“, …,“7“ (highest) . Generate the variable GF (good firm) with value 1 if rating > 4 and 0 otherwise, and the more detailed variable CR (credit rating) with CR = 1 if rating < 3, CR = 2 if rating = 3, CR = 3 if rating = 4, and CR = 4 otherwise. a.Estimate a binary logit model for the assignment of the GF ratings, and an ordered logit model for assignment CR. b.Compare the effects of the regressors in the models, based on coefficients and slopes. c.What is the percentage of firms correctly rated by GF that are incorrectly rated by CR? 2.People buy for yi* of an investment fund, with yi* = xi’b + ei with eI ~ N(0,1); xi consists of an intercept and the variables age and income. The dummy di = 1 if yi* > 0 and di = 0 otherwise. n March 18, 2011 Hackl, Econometrics 2, Lecture 2 80 Your Homework, cont’d a.Derive the probability for di = 1 as function of xi. b.Derive the log-likelihood function of the probit model for di. 3.Verbeek‘s data set TOBACCO contains also expenditures on alcohol in 2724 Belgian households, taken from the Belgian household budget survey of 1995/96, as well as other characteristics of the households; for the expenditures on alcohol, the dummy D1=1 if the budget share for alcohol SHARE1 differs from 0, and D1=0 otherwise. a.Model the budget share for alcohol, using (i) a Tobit model, (ii) a truncated regression, and (iii) a Tobit II model, using besides other characteristics of the household the dummy FLANDERS. b.Compare the effects of the regressors in the models, based on coefficients and slopes. c.Compare the results for Flanders with that for the Wallonie. n 2. March 18, 2011 Hackl, Econometrics 2, Lecture 2 81