Econometrics 2 - Lecture 2 Models with Limited Dependent Variables Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model March 1, 2013 Hackl, Econometrics 2, Lecture 2 2 Cases of Limited Dependent Variable nTypical situations: functions of explanatory variables are to be explained nDichotomous dependent variable, e.g., ownership of a car (yes/no), employment status (employed/unemployed), etc. nOrdered response, e.g., qualitative assessment (good/average/bad), working status (full-time/part-time/not working), etc. nMultinomial response, e.g., trading destinations (Europe/Asia/Africa), transportation means (train/bus/car), etc. nCount data, e.g., number of orders a company receives in a week, number of patents granted to a company in a year nCensored data, e.g., expenditures for durable goods, duration of study with drop outs n n n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 3 Example: Car Ownership and Income nWhat is the probability that a randomly chosen household owns a car? nSample of N=32 households qProportion of car owning households:19/32 = 0.59 nEstimated probability for owning a car: 0.59 nBut: the probability will differ for rich and poor! nThe sample data contains income information: qYearly income: average EUR 20.524, minimum EUR 12.000, maximum EUR 32.517 qProportion of car owning households among the 16 households with less than EUR 20.000 income: 9/16 = 0.56 qProportion of car owning households among the 16 households with more than EUR 20.000 income: 10/16 = 0.63 March 1, 2013 Hackl, Econometrics 2, Lecture 2 4 Car Ownership and Income, cont’d nHow can probability – or prediction – of car ownership take the income of a household into account? nNotation: N households qdummy yi for car ownership; yi =1: household i has car qincome xi2 nFor predicting yi – or of P{yi =1} – , a model is needed that takes the income into account March 1, 2013 Hackl, Econometrics 2, Lecture 2 5 Modelling Car Ownership nHow is car ownership related to the income of a household? 1.Linear regression yi = xi’β + εi = β1+ β2xi2 + εi nWith E{εi|xi} = 0, the model yi = xi’β + εi gives n P{yi =1|xi} = xi’β n due to E{yi|xi} = 1*P{yi =1|xi} + 0*P{yi =0|xi} = P{yi =1|xi} nModel yi = xi’β + εi: xi’β can be interpreted as P{yi =1|xi}! nProblems: qxi’β not necessarily in [0,1] qError terms: for a given xi nεi has only two values, viz. 1- xi’β and xi’β nV{εi |xi} = xi’β(1- xi’β), heteroskedastic, dependent upon β nModel for y actually is specifying the probability that y = 1 as a function of x March 1, 2013 Hackl, Econometrics 2, Lecture 2 6 Modelling Car Ownership, cont’d 2.Use of a function G(xi,β) with values in the interval [0,1] n P{yi =1|xi} = E{yi|xi} = G(xi,β) nThe probability that yi =1, i.e., the household owns a car, depends on the income (and other characteristics, e.g., family size) nUse for G(xi,β) the standard logistic distribution function q q n L(z) fulfils limz→ -∞ L(z) = 0, limz→ ∞ L(z) = 1 nInterpretation: qFrom P{yi =1|xi} = pi = exp{xi’β}/(1+exp{xi’β}) follows q q qAn increase of xi2 by 1 results in a relative change of the odds pi/(1- pi) by β2 or by 100β2%; cf. the notion semi-elasticity March 1, 2013 Hackl, Econometrics 2, Lecture 2 7 Car Ownership and Income, cont’d nE.g., P{yi =1|xi} = 1/(1+exp(-zi)) with z = -0.5 + 1.1*x, the income in EUR 1000 per month nIncreasing income is associated with an increasing probability of owning a car: z goes up by 1.1 for every additional EUR 1000 nFor a person with an income of EUR 1000, z = 0.6 and the probability of owning a car is 1/(1+exp(-0.6)) = 0.65 nThe standard logistic distribution function, with z on the horizontal and F(z) on the vertical axis March 1, 2013 Hackl, Econometrics 2, Lecture 2 8 x z P{y =1|x} 1000 0.6 0.646 2000 1.7 0.846 3000 2.8 0.943 Odds nThe odds in favour of an event is the ratio of a pair of numbers, the first (the second) representing the relative likelihood that the event will happen (will not happen) nIf p is the probability in favour of the event, the probability against the event therefore being 1-p, the odds of the event are the quotient n nOdds are read as “1 to p/(1-p)” or “1:p/(1-p)” n n n n nThe logarithm of the odds of the probability p is called the logit of p March 1, 2013 Hackl, Econometrics 2, Lecture 2 9 p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 odds 1:9 1:4 1:2.3 1:1.5 1:1 1:0.67 1:0.43 1:0.25 1:0.11 p/(1-p) 0.11 0.25 0.43 0.67 1 1.5 2.33 4 9 Odds: Example nExample: the odds that a randomly chosen day of the week is a Sunday are 1:6 (say “one to six”) because p = P{Sunday} = 1/7 = 0.143, p/(1-p) = (1/7)/(6/7) = 1/6; the odds are 1:6 nIn bookmakers language: odds are not in favour but against nThe bookmaker would say qThe odds that a randomly chosen day of the week is a Sunday are 6:1 qThe odds that Czech Republic men's national ice hockey team wins the World Championship is 2:1; i.e., the probability is considered to be 0.333 q March 1, 2013 Hackl, Econometrics 2, Lecture 2 10 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model March 1, 2013 Hackl, Econometrics 2, Lecture 2 11 Binary Choice Models nModel for probability P{yi =1|xi}, function of K (numerical or categori-cal) explanatory variables xi and unknown parameters β, such as n E{yi|xi} = P{yi =1|xi} = G(xi,β) nTypical functions G(xi,β): distribution functions (cdf’s) F(xi’β) nProbit model: standard normal distribution function; V{z} = 1 n n nLogit model: standard logistic distribution function; V{z}=π2/3=1.812 q q nLinear probability model (LPM) n n q q March 1, 2013 Hackl, Econometrics 2, Lecture 2 12 Linear Probability Model (LPM) nAssumes that n P{yi =1|xi} = xi’β for 0 ≤ xi’β ≤ 1 n but sets n P{yi =1|xi} = 0 for xi’β < 0 n P{yi =1|xi} = 1 for xi’β > 1 nTypically, the model is estimated by OLS, ignoring the probability restrictions nStandard errors should be adjusted using heteroskedasticity-consistent (White) standard errors March 1, 2013 Hackl, Econometrics 2, Lecture 2 13 Probit Model: Standardization nE{yi|xi} = P{yi =1|xi} = G(xi,β): assume G(.) to be the distribution function of N(0, σ2) n n nGiven xi, the ratio β/σ2 determines P{yi =1|xi} nStandardization restriction s2 = 1: allows unique estimates for β March 1, 2013 Hackl, Econometrics 2, Lecture 2 14 Probit vs Logit Model nDifferences between the probit and the logit model: qShape of distribution is slightly different, particularly in the tails. qScaling of the distribution is different: The implicit variance for ei in the logit model is p2/3 = (1.81)2, while 1 for the probit model qProbit model is relatively easy to extend to multivariate cases using the multivariate normal or conditional normal distribution nIn practice, the probit and logit model produce quite similar results qThe scaling difference makes the values of b not directly comparable across the two models, while the signs are typically the same qThe estimates in the logit model are roughly a factor p/Ö3 »1.81 larger than those in the probit model q q March 1, 2013 Hackl, Econometrics 2, Lecture 2 15 Interpretation of Coefficients nFor assessing the effect of changing xk the nCoefficient bk nis of interest, but also related characteristics such as nSign of bk nSlope, i.e., the “average” marginal effect ¶F(xi’b)/¶xik q q March 1, 2013 Hackl, Econometrics 2, Lecture 2 16 Binary Choice Models: Marginal Effects nLinear regression models: βk is the marginal effect of a change in xk nFor E{yi|xi} = F(xi’β): n n n with density function f(.) nThe effect of changing the regressor xk depends upon xi’β, the shape of F, and βk nThe marginal effect of changing xk qProbit model: ϕ(xi’β) βk, with standard normal density function ϕ qLogit model: L(xi’β)[1 - L(xi’β)] βk qLinear probability model q q March 1, 2013 Hackl, Econometrics 2, Lecture 2 17 Binary Choice Models: Slopes nInterpretation of the effect of a change in xk n“Slope”, i.e., the gradient of E{yi|xi} at the sample means of the regressors n n nFor a dummy variable D: marginal effect is calculated as the difference of probabilities P{yi =1|x(d),D=1} – P{yi =1|x(d),D=0}; x(d) stands for the sample means of all regressors except D nFor the logit model: q q q qThe coefficient βk is the relative change of the odds when increasing xk by 1 unit March 1, 2013 Hackl, Econometrics 2, Lecture 2 18 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 19 Binary Choice Models: Estimation nTypically, binary choice models are estimated by maximum likelihood nLikelihood function, given N observations (yi, xi) n L(β) = Πi=1N P{yi =1|xi;β}yi P{yi =0|xi;β}1-yi n = Πi F(xi’β)yi (1- F(xi’β))1-yi nMaximization via the log-likelihood function n ℓ(β) = log L(β) = Si yi log F(xi’β) + Si (1-yi) log (1-F(xi’β)) nFirst-order conditions of the maximization problem n n nei: generalized residuals March 1, 2013 Hackl, Econometrics 2, Lecture 2 20 Generalized Residuals nThe first-order conditions allow to define generalized residuals nFrom n n nfollows that the generalized residuals ei can assume two values: qei = f(xi’b)/F(xi’b) if yi =1 qei = - f(xi’b)/(1-F(xi’b)) if yi =0 n b are the estimates of β nGeneralized residuals are orthogonal to each regressor; cf. the first-order conditions of OLS estimation n March 1, 2013 Hackl, Econometrics 2, Lecture 2 21 Estimation of Logit Model nFirst-order condition of the maximization problem n n n n gives [due to P{yi =1|xi} = L(xi,β)] n n nFrom Si xi = Siyixi follows – given one regressor is an intercept –: qThe sum of estimated probabilities Si equals the observed frequency Siyi nSimilar results for the probit model, due to similarity of logit and probit functions n March 1, 2013 Hackl, Econometrics 2, Lecture 2 22 Properties of ML Estimators nConsistent nAsymptotically efficient nAsymptotically normally distributed nThese properties require that the assumed distribution is correct nCorrect shape nNo autocorrelation and/or heteroskedasticity nNo dependence between errors and regressors nNo omitted regressors n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 23 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 24 Goodness-of-Fit Measures nConcepts nComparison of the maximum likelihood of the model with that of the naïve model, i.e., a model with only an intercept, no regressors qPseudo-R2 qMcFadden R2 nIndex based on proportion of correctly predicted observations qHit rate n n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 25 McFadden R2 nBased on log-likelihood function nℓ(b) = ℓ1: maximum log-likelihood of the model to be assessed nℓ0: maximum log-likelihood of the naïve model, i.e., a model with only an intercept; ℓ0 ≤ ℓ1 and ℓ0, ℓ1 < 0 qThe larger ℓ1 - ℓ0, the more contribute the regressors qℓ1 = ℓ0, if all slope coefficients are zero qℓ1 = 0, if yi is exactly predicted for all i nPseudo-R2: a number in [0,1), defined by n n n nMcFadden R2: a number in [0,1], defined by n nBoth are 0 if ℓ1 = ℓ0, i.e., all slope coefficients are zero nMcFadden R2 attains the upper limit if ℓ1 = 0 n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 26 Naïve Model: Calculation of ℓ0 nMaximum log-likelihood function of the naïve model, i.e., a model with only an intercept: ℓ0 nLog-likelihood function (cf. urn experiment) n log L(p) = N1 log(p) + (N – N1) log (1-p) n with N1 = Siyi, i.e., the observed frequency nMaximum likelihood estimator for p is N1/N nMaximum log-likelihood of the naïve model n ℓ0 = N1 log(N1/N) + (N – N1) log (1 – N1/N) n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 27 Hit Rate nComparison of correct and incorrect predictions nPredicted outcome n ŷi = 1 if xi’b > 0 n = 0 if xi’b ≤ 0 nCross-tabulation of actual and predicted outcome nProportion of incorrect predictions n wr1 = (n01+n10)/N nHit rate: 1 - wr1 q proportion of correct predictions nComparison with naive model: qPredicted outcome of naïve model q ŷi = 1 if = N1/N > 0.5, ŷi = 0 if ≤ 0.5 (for all i) qRp2= 1 – wr1/wr0 q with wr0 = 1 - if > 0.5, wr0 = if ≤ 0.5 in order to avoid Rp2 < 0 n n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 28 ŷ = 0 ŷ = 1 Σ y = 0 n00 n01 N0 y = 1 n10 n11 N1 Σ n0 n1 N Example: Effect of Teaching Method nStudy by Spector & Mazzeo (1980); see Greene (2003), Chpt.21 nPersonalized System of Instruction: new teaching method in economics; has it an effect on student performance in later courses? nData: qGRADE (0/1): indicator whether grade was higher than in principal course qPSI (0/1): participation in program with new teaching method qGPA: grade point average qTUCE: score on a pretest, entering knowledge n32 observations n March 1, 2013 Hackl, Econometrics 2, Lecture 2 29 Effect of Teaching Method, cont’d nLogit model for GRADE, GRETL output March 1, 2013 Hackl, Econometrics 2, Lecture 2 30 Model 1: Logit, using observations 1-32 Dependent variable: GRADE Coefficient Std. Error z-stat Slope* const -13.0213 4.93132 -2.6405 GPA 2.82611 1.26294 2.2377 0.533859 TUCE 0.0951577 0.141554 0.6722 0.0179755 PSI 2.37869 1.06456 2.2344 0.456498 Mean dependent var 0.343750 S.D. dependent var 0.188902 McFadden R-squared 0.374038 Adjusted R-squared 0.179786 Log-likelihood -12.88963 Akaike criterion 33.77927 Schwarz criterion 39.64221 Hannan-Quinn 35.72267 *Number of cases 'correctly predicted' = 26 (81.3%) f(beta'x) at mean of independent vars = 0.189 Likelihood ratio test: Chi-square(3) = 15.4042 [0.0015] Predicted 0 1 Actual 0 18 3 1 3 8 Effect of Teaching Method, cont’d nLogit model for GRADE, actual and fitted values of 32 observations March 1, 2013 Hackl, Econometrics 2, Lecture 2 31 Effect of Teaching Method, cont’d nComparison of the LPM, logit, and probit model for GRADE nEstimated models: coefficients and their standard errors n n n n n n n n nCoefficients of logit model: due to larger variance, larger by factor √(π2/3)=1.81 than that of the probit model March 1, 2013 Hackl, Econometrics 2, Lecture 2 32 LPM Logit Probit coeff s.e. coeff s.e. coeff s.e. const -1.498 0.524 -13.02 4.931 -7.452 2.542 GPA 0.464 0.162 2.826 1.263 1.626 0.694 TUCE 0.010 0.019 0.095 0.142 0.052 0.084 PSI 0.379 0.139 2.379 1.065 1.426 0.595 Effect of Teaching Method, cont’d nGoodness of fit measures for the logit model nWith N1 = 11 and N = 32 n ℓ0 = 11 log(11/32) + 21 log(21/32) = - 20.59 nAs = N1/N = 0.34 < 0.5: the proportion wr0 of incorrect predictions with the naïve model is n wr0 = = 11/32 = 0.34 nFrom the GRETL output: ℓ0 = -12.89, wr1 = 6/32 nGoodness of fit measures nRp2 = 1 – wr1/wr0 = 1 – 6/11 = 0.45 nMcFadden R2 = 1 – (-12.89)/(-20.59) = 0.374 n March 1, 2013 Hackl, Econometrics 2, Lecture 2 33 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 34 Example: Utility of Car Owning nLatent variable yi*: utility difference between owning and not owning a car; unobservable (latent) nDecision on owning a car qyi* > 0: in favor of car owning qyi* ≤ 0: against car owning nyi* depends upon observed characteristics (like income) and unobserved characteristics εi n yi* = xi’β + εi nObservation yi = 1 (i.e., owning car) if yi* > 0 n P{yi =1} = P{yi* > 0} = P{xi’β + εi > 0} = 1 – F(-xi’β) = F(xi’β) n last step requires a symmetric distribution function F(.) nLatent variable model: based on a latent variable that represents underlying behavior March 1, 2013 Hackl, Econometrics 2, Lecture 2 35 Latent Variable Model nModel for the latent variable yi* n yi* = xi’β + εi n yi*: not necessarily a utility difference nεi‘s are independent of xi’s nεi has standardized distribution qProbit model if εi has standard normal distribution qLogit model if εi has standard logistic distribution nObservations qyi = 1 if yi* > 0 qyi = 0 if yi* ≤ 0 nML estimation March 1, 2013 Hackl, Econometrics 2, Lecture 2 36 Binary Choice Models in GRETL nModel > Nonlinear Models > Logit > Binary nEstimates the specified model using error terms with standard logistic distribution nModel > Nonlinear Models > Probit > Binary nEstimates the specified model using error terms with standard normal distribution March 1, 2013 Hackl, Econometrics 2, Lecture 2 37 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 38 Multiresponse Models nModel for explaining the choice between discrete outcomes nExamples: a.Working status (full-time/part-time/not working), qualitative assessment (good/average/bad), etc. b.Trading destinations (Europe/Asia/Africa), transportation means (train/bus/car), etc. nMultiresponse models describe the probability of each of these outcomes, as a function of variables like qperson-specific characteristics qalternative-specific characteristics nTypes of multiresponse models (cf. above examples) qOrdered response models: outcomes have a natural ordering qMultinomial (unordered) models: ordering of outcomes is arbitrary March 1, 2013 Hackl, Econometrics 2, Lecture 2 39 Example: Credit Rating nCredit rating: numbers, indicating experts’ opinion about (a firm’s) capacity to satisfy financial obligations, e.g., credit-worthiness nStandard & Poor's rating scale: AAA, AA+, AA, AA-, A+, A, A-, BBB+, BBB, BBB-, BB+, BB, BB-, B+, B, B-, CCC+, CCC, CCC-, CC, C, D nVerbeek‘s data set CREDIT qCategories “1“, …,“7“ (highest) qInvestment grade with alternatives “1” (better than category 3) and “0” (category 3 or less, also called “speculative grade“) nExplanatory variables, e.g., qFirm sales qEbit, i.e., earnings before interest and taxes qRatio of working capital to total assets March 1, 2013 Hackl, Econometrics 2, Lecture 2 40 Ordered Response Model nChoice between M alternatives nObserved alternative for sample unit i: yi nLatent variable model n yi* = xi’β + εi n with K-vector of explanatory variables xi q yi = j if γj-1 < yi* ≤ γj for j = 0,…,M nM+1 boundaries γj, j = 0,…,M, with γ0 = -∞, …, γM = ∞ nεi‘s are independent of xi’s nεi typically follow the qstandard normal distribution: ordered probit model qstandard logistic distribution: ordered logit model March 1, 2013 Hackl, Econometrics 2, Lecture 2 41 Example: Willingness to Work n„How much would you like to work?“ nPotential answers of individual i: yi = 1 (not working), yi = 2 (part time), yi = 3 (full time) nMeasure of the desired labour supply nDependent upon factors like age, education level, husband‘s income nOrdered response model with M = 3 n yi* = xi’β + εi n with q yi = 1 if yi* ≤ 0 q yi = 2 if 0 < yi* ≤ γ q yi = 3 if yi* > γ nεi‘s with distribution function F(.) nyi* stands for “willingness to work” or “desired hours of work” March 1, 2013 Hackl, Econometrics 2, Lecture 2 42 Willingness to Work, cont’d nIn terms of observed quantities: n P{yi = 1 |xi} = P{yi* ≤ 0 |xi} = F(- xi’β) n P{yi = 3 |xi} = P{yi* > γ |xi} = 1 - F(γ - xi’β) n P{yi = 2 |xi} = F(γ - xi’β) – F(- xi’β) nUnknown parameters: γ and β nStandardization: wrt location (γ = 0) and scale (V{εi} = 1) nML estimation nInterpretation of parameters β nWrt yi*: willingness to work increases with larger xk for positive βk nWrt probabilities P{yi = j |xi}, e.g., P{yi = 3 |xi} increases and P{yi = 1 |xi} decreases with larger xk for positive βk n March 1, 2013 Hackl, Econometrics 2, Lecture 2 43 Example: Credit Rating nVerbeek‘s data set CREDIT: 921 observations for US firms' credit ratings in 2005, including firm characteristics nRating models: 1.Ordered logit model for assignment of categories “1“, …,“7“ (highest) 2.Binary logit model for assignment of “investment grade” with alternatives “1” (better than category 3) and “0” (category 3 or less, also called “speculative grade“) March 1, 2013 Hackl, Econometrics 2, Lecture 2 44 Credit Rating, cont’d nVerbeek‘s data set CREDIT nRatings and characteristics for 921 firms: summary statistics n n n n n n n n n_____________________ nBook leverage: ratio of debts to assets March 1, 2013 Hackl, Econometrics 2, Lecture 2 45 Credit Rating, cont’d nVerbeek, Table 7.5. March 1, 2013 Hackl, Econometrics 2, Lecture 2 46 Ordered Response Model: Estimation nLatent variable model n yi* = xi’β + εi n with explanatory variables xi q yi = j if γj-1 < yi* ≤ γj for j = 0,…,M nML estimation of β1, …, βK and γ1, …, γM-1 nLog-likelihood function in terms of probabilities nNumerical optimization nML estimators are qConsistent qAsymptotically efficient qAsymptotically normally distributed March 1, 2013 Hackl, Econometrics 2, Lecture 2 47 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 48 Multinomial Models nChoice between M alternatives without natural order nObserved alternative for sample unit i: yi n“Random utility” framework: Individual i nattaches utility levels Uij to each of the alternatives, j = 1,…, M, nchooses the alternative with the highest utility level nUtility levels Uij, j = 1,…, M, as a function of characteristics xij n Uij = xij’β + εij nerror terms εij follow the Type I extreme value distribution: n n n for j = 1, …, M nand Σj P{yi = j} = 1 March 1, 2013 Hackl, Econometrics 2, Lecture 2 49 Variants of the Logit Model nFor setting the location: constraint xi1’b = 0 or exp{xi1’b} = 1 nConditional logit model: for j = 1, …, M n n nAlternative-specific characteristics xij nE.g., mode of transportation is affected by travel costs, travel duration, etc. nMultinomial logit model: for j = 1, …, M n n nPerson-specific characteristics xi nE.g., mode of transportation is affected by income, gender, etc. n March 1, 2013 Hackl, Econometrics 2, Lecture 2 50 Multinomial Logit Model nThe term “multinomial logit model” is also used for both the nthe conditional logit model nthe multinomial logit model (see above) nand also the mixed logit model: combines qAlternative-specific characteristics and qPerson-specific characteristics n March 1, 2013 Hackl, Econometrics 2, Lecture 2 51 Independence of Errors nIndependence of the error terms εij implies independent utility levels of alternatives nIndependence assumption may be restrictive nExample: High utility of alternative „travel with red bus“ implies high utility of „travel with blue bus“ nImplies that the odds ratio of two alternatives does not depend upon the number of alternatives: “independence of irrelevant alternatives” (IIA) n March 1, 2013 Hackl, Econometrics 2, Lecture 2 52 Multiresponse Models in GRETL nModel > Nonlinear Models > Logit > Ordered... nEstimates the specified model using error terms with standard logistic distribution, assuming ordered alternatives for responses nModel > Nonlinear Models > Logit > Multinomial... nEstimates the specified model using error terms with standard logistic distribution, assuming alternatives without order nModel > Nonlinear Models > Probit > Ordered... nEstimates the specified model using error terms with standard normal distribution, assuming ordered alternatives March 1, 2013 Hackl, Econometrics 2, Lecture 2 53 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 54 Models for Count Data nDescribe the number of times an event occurs, depending upon certain characteristics nExamples: nNumber of visits in the library per week nNumber of misspellings in an email nNumber of applications of a firm for a patent, as a function of qFirm size qR&D expenditures qIndustrial sector qCountry, etc. qSee Verbeek‘s data set PATENT n March 1, 2013 Hackl, Econometrics 2, Lecture 2 55 Poisson Regression Model nObserved variable for sample unit i: n yi: number of possible outcomes 0, 1, …, y, … nAim: to explain E{yi | xi }, based on characteristics xi n E{yi | xi } = exp{xi’β} nPoisson regression model n n n with λi = E{yi | xi } = exp{xi’β} n y! = 1x2x…xy, 0! = 1 March 1, 2013 Hackl, Econometrics 2, Lecture 2 56 Poisson Distribution March 1, 2013 Hackl, Econometrics 2, Lecture 2 57 C:\Users\PHackl\Documents\O'trie\_Brno_SS\800px-Poisson-Verteilung.PNG Poisson Regression Model: The Practice nUnknown parameters: coefficients β nFitting the model to data: ML estimators are nConsistent nAsymptotically efficient nAsymptotically normally distributed nEquidispersion condition nPoisson distributed X obeys n E{X} = V{X} = λ nIn many situations not realistic nOverdispersion nRemedies: Alternative distributions, e.g., negative Binomial, and alternative estimation procedures, e.g., Quasi-ML, robust standard errors March 1, 2013 Hackl, Econometrics 2, Lecture 2 58 Count Data Models in GRETL nModel > Nonlinear Models > Count data… nEstimates the specified model using Poisson or the negative binomial distribution March 1, 2013 Hackl, Econometrics 2, Lecture 2 59 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 60 Tobit Models nTobit models are regression models where the range of the (continuous) dependent variable is constrained, i.e., censored from below nExamples: nExpenditures on durable goods as a function of income, age, etc.: a part of units does not spend any money on durable goods nHours of work as a function of qualification, age, etc. nExpenditures on alcoholic beverages and tobacco nTobit models nStandard Tobit model or Tobit I model; James Tobin (1958) on expenditures on durable goods nGeneralizations: Tobit II to V March 1, 2013 Hackl, Econometrics 2, Lecture 2 61 Example: Expenditures on Tobacco nVerbeek‘s data set TOBACCO: expenditures on tobacco in 2724 Belgian households, Belgian household budget survey of 1995/96 nModel: n yi* = xi’b + ei nyi*: optimal expenditures on tobacco in household i nxi: characteristics of the i-th household nei: unobserved heterogeneity (or measurement error or optimization error) nActual expenditures yi n yi = yi* if yi* > 0 n = 0 if yi* ≤ 0 March 1, 2013 Hackl, Econometrics 2, Lecture 2 62 The Standard Tobit Model nThe latent variable yi* depends upon characteristics xi n yi* = xi’b + eI n with error terms (or unobserved heterogeneity) n ei ~ NID(0, s2), independent of xi nActual outcome of the observable variable yi n yi = yi* if yi* > 0 n = 0 if yi* ≤ 0 nStandard Tobit model or censored regression model nCensoring: all negative values are substituted by zero nCensoring in general qCensoring from below (above): all values left (right) from a lower (an upper) bound are substituted by the lower (upper) bound nOLS produces inconsistent estimators for b n March 1, 2013 Hackl, Econometrics 2, Lecture 2 63 The Standard Tobit Model, cont’d nStandard Tobit model describes 1.The probability P{yi = 0} as a function of xi n P{yi = 0} = P{ei £ - xi’b } = 1 - F(xi’b/s) 2.The distribution of yi given that it is positive, i.e., the truncated normal distribution with expectation n E{yi | yi > 0} = xi’b + E{ei | ei > - xi’b} = xi’b + s l(xi’b/s) n with l(xi’b/s) = f(xi’b/s) / F(xi’b/s) ³ 0 nAttention! A single set b of parameters characterizes both expressions nThe effect of a characteristic qon the probability of non-zero observation and qon the value of the observation n have the same sign! March 1, 2013 Hackl, Econometrics 2, Lecture 2 64 The Standard Tobit Model: Interpretation nFrom n P{yi = 0} = 1 - F(xi’b/s) n E{yi | yi > 0} = xi’b + s l(xi’b/s) n follows: nA positive coefficient bk means that an increase in the explanatory variable xik increases the probability of having a positive yi nThe marginal effect of xik upon E{yi | yi > 0} is different from bk nThe marginal effect of xik upon E{yi} is bkP{yi > 0} qIt is close to bk if P{yi > 0} is close to 1, i.e, little censoring nThe marginal effect of xik upon E{yi*} is bk n March 1, 2013 Hackl, Econometrics 2, Lecture 2 65 The Standard Tobit Model: Estimation nOLS produces inconsistent estimators for b 1.ML estimation based on the log-likelihood n log L1(b, s2) = ℓ1(b, s2) = SiϵI0 log P{yi = 0} + SiϵI1 log f(yi) n with appropriate expressions for P{.} and f(.), I0 the set of censored observations, I1 the set of uncensored observations nFor the correctly specified model: estimates are nConsistent nAsymptotically efficient nAsymptotically normally distributed 2.Truncated regression model: ML estimation based on observations with yi > 0 only: n ℓ2(b, s2) = SiϵI1[ log f(yi) - log P{yi > 0}] nEstimates based on ℓ1 are more efficient than those based on ℓ2 March 1, 2013 Hackl, Econometrics 2, Lecture 2 66 Example: Model for Budget Share for Tobacco nVerbeek‘s data set TOBACCO: Belgian household budget survey of 1995/96 nBudget share wi* for expenditures on tobacco corresponding to maximal utility: wi* = xi’b + eI n xi: log of total expenditures (LNX) and various characteristics like qnumber of children £ 2 years old (NKIDS2) qnumber of adults in household (NADULTS) qAge (AGE) nActual budget share for expenditures on tobacco n wi = wi* if wi* > 0, n = 0 otherwise n2724 households n March 1, 2013 Hackl, Econometrics 2, Lecture 2 67 Model for Budget Share for Tobacco nTobit model, nGRETL output March 1, 2013 Hackl, Econometrics 2, Lecture 2 68 Model 2: Tobit, using observations 1-2724 Dependent variable: SHARE1 (Tobacco) coefficient std. error t-ratio p-value ---------------------------------------------------------- const -0,170417 0,0441114 -3,863 0,0001 *** AGE 0,0152120 0,0106351 1,430 0,1526 NADULTS 0,0280418 0,0188201 1,490 0,1362 NKIDS -0,00295209 0,000794286 -3,717 0,0002 *** NKIDS2 -0,00411756 0,00320953 -1,283 0,1995 LNX 0,0134388 0,00326703 4,113 3,90e-05 *** AGELNX -0,000944668 0,000787573 -1,199 0,2303 NADLNX -0,00218017 0,00136622 -1,596 0,1105 WALLOON 0,00417202 0,000980745 4,254 2,10e-05 *** Mean dependent var 0,017828 S.D. dependent var 0,021658 Censored obs 466 sigma 0,024344 Log-likelihood 4764,153 Akaike criterion -9508,306 Schwarz criterion -9449,208 Hannan-Quinn -9486,944 Model for Budget Share for Tobacco, cont’d nTruncated regres- nsion model, nGRETL output March 1, 2013 Hackl, Econometrics 2, Lecture 2 69 Model 7: Tobit, using observations 1-2724 (n = 2258) Missing or incomplete observations dropped: 466 Dependent variable: W1 (Tobacco) coefficient std. error t-ratio p-value --------------------------------------------------------- const 0,0433570 0,0458419 0,9458 0,3443 AGE 0,00880553 0,0110819 0,7946 0,4269 NADULTS -0,0129409 0,0185585 -0,6973 0,4856 NKIDS -0,00222254 0,000826380 -2,689 0,0072 *** NKIDS2 -0,00261220 0,00335067 -0,7796 0,4356 LNX -0,00167130 0,00337817 -0,4947 0,6208 AGELNX -0,000490197 0,000815571 -0,6010 0,5478 NADLNX 0,000806801 0,00134731 0,5988 0,5493 WALLOON 0,00261490 0,000922432 2,835 0,0046 *** Mean dependent var 0,021507 S.D. dependent var 0,022062 Censored obs 0 sigma 0,021450 Log-likelihood 5471,304 Akaike criterion -10922,61 Schwarz criterion -10865,39 Hannan-Quinn -10901,73 Two Models for Budget Share for Tobacco, Comparison nEstimates (coeff.) and standard errors (s.e.) for some coefficients n of the Tobit (2724 observations, 644 censored) and the truncated regression model (2258 uncensored observations) n March 1, 2013 Hackl, Econometrics 2, Lecture 2 70 constant NKIDS LNX WALL Tobit model coeff. -0,1704 -0,0030 0,0134 0,0042 s.e. 0,0441 0,0008 0,0033 0,0010 Truncated regression coeff. 0,0433 -0,0022 -0,0017 0,0026 s.e. 0,0458 0,0008 0,0034 0,0009 Specification Tests nVarious tests based on ngeneralized residuals n l(- xi’b/s) if yi = 0 n ei/s if yi > 0 (standardized residuals) n with l(-xi’b/s) = - f(xi’b/s) / F(-xi’b/s), evaluated for estimates of b, s nand “second order” generalized residuals corresponding the estimation of s2 nTests nfor normality nfor omitted variables nTest for normality is standard test in GRETL‘s Tobit procedure: consistency requires normality March 1, 2013 Hackl, Econometrics 2, Lecture 2 71 Contents nLimited Dependent Variable Cases nBinary Choice Models nBinary Choice Models: Estimation nBinary Choice Models: Goodness of Fit nApplication to Latent Models nMultiresponse Models nMultinomial Models nCount Data Models nThe Tobit Model nThe Tobit II Model n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 72 An Example: Modeling Wages nWage observations: available only for the working population nModel that explains wages as a function of characteristics, e.g., the person‘s age nTobit model: for a positive coefficient of age, an increase of age qincreases wage qincreases the probability that the person is working qNot always realistic! nTobit II model: allows two separate equations qfor labor force participation and qfor the wage of a person nTobit II model is also called “sample selection model” March 1, 2013 Hackl, Econometrics 2, Lecture 2 73 Tobit II Model for Wages nWage equation describes the wage of person i n wi* = x1i’b1 + e1i n with exogenous characteristics (age, education, …) nSelection equation or labor force participation n hi* = x2i’b2 + e2i nObservation rule: wi actual wage of person i n wi = wi*, hi = 1 if hi* > 0 n wi not observed, hi = 0 if hi* £ 0 n hi: indicator for working nDistributional assumption for e1i, e2i March 1, 2013 Hackl, Econometrics 2, Lecture 2 74 Tobit II Model for Wages, cont’d nSelection equation: a binary choice model; probit model needs standardization (s22 = 1) nCharacteristics x1i and x2i may be different; however, qIf the selection depends upon wi*: x2i is expected to include x1i qBecause the model describes the joint distribution of wi and hi given one set of conditioning variables: x2i is expected to include x1i qSign and value of coefficients of the same variables in x1i and x2i can be different nSpecial cases qIf s12 = 0, sample selection is exogenous qIf x1i’b1 = x2i’b2 and e1i = e2i, the Tobit II model coincides with the Tobit I model q March 1, 2013 Hackl, Econometrics 2, Lecture 2 75 Tobit II Model for Wages: Wage Equation nExpected value of wi, given sample selection: n E{wi | hi =1} = x1i’b1 + s12 l(x2i’b2) n with the inverse Mill’s ratio or Heckman’s lambda n l(x2i’b2) = f(x2i’b2) / F(x2i’b2) nHeckman’s lambda qPositive and decreasing in its argument qThe smaller the probability that a person is working, the larger the value of the correction term l nExpected value of wi only equals x1i’b1 if s12 = 0: “no sample selection” error March 1, 2013 Hackl, Econometrics 2, Lecture 2 76 Tobit II Model: Log-likelihood Function nLog-likelihood n ℓ3(b1,b2,s12,s12) = SiϵI0log P{hi=0} + SiϵI1 [log f(yi|hi=1)+log P{hi=1}] n = SiϵI0 log P{hi=0} + SiϵI1 [log f(yi) + log P{hi=1|yi}] n with n P{hi=0} = 1 - F(x2i’b2) n n n n n and using f(yi|hi = 1) P{hi = 1} = P{hi = 1|yi} f(yi) March 1, 2013 Hackl, Econometrics 2, Lecture 2 77 Tobit II Model: Estimation nMaximum likelihood estimation, based on the log-likelihood n ℓ3(b1,b2,s12,s12) = SiϵI0 log P{hi=0}+SiϵI1 [log f(yi|hi=1)+log P{hi=1}] nTwo step approach (Heckman, 1979) 1.Estimate the coefficients b2 of the selection equation by standard probit maximum likelihood: b2 2.Compute estimates of Heckman’s lambdas: li = l(x2i’b2) = f(x2i’b2) / F(x2i’ b2) for i = 1, …, N 3.Estimate the coefficients b1 and s12 using OLS q wi = x1i’b1 + s12 li + ηi nGRETL: procedure „Heckit“ allows both the ML and the two step estimation n March 1, 2013 Hackl, Econometrics 2, Lecture 2 78 Tobit II Model for Budget Share for Tobacco nHeckit ML nestimation, nGRETL output March 1, 2013 Hackl, Econometrics 2, Lecture 2 79 Model 7: ML Heckit, using observations 1-2724 Dependent variable: SHARE1 Selection variable: D1 coefficient std. error t-ratio p-value ------------------------------------------------------------- const 0,0444178 0,0492440 0,9020 0,3671 AGE 0,00874370 0,0110272 0,7929 0,4278 NADULTS -0,0130898 0,0165677 -0,7901 0,4295 NKIDS -0,00221765 0,000585669 -3,787 0,0002 *** NKIDS2 -0,00260186 0,00228812 -1,137 0,2555 LNX -0,00174557 0,00357283 -0,4886 0,6251 AGELNX -0,000485866 0,000807854 -0,6014 0,5476 NADLNX 0,000817826 0,00119574 0,6839 0,4940 WALLOON 0,00260557 0,000958504 2,718 0,0066 *** lambda -0,00013773 0,00291516 -0,04725 0,9623 Mean dependent var 0,021507 S.D. dependent var 0,022062 sigma 0,021451 rho -0,006431 Log-likelihood 4316,615 Akaike criterion -8613,231 Schwarz criterion -8556,008 Hannan-Quinn -8592,349 Tobit II Model for Budget Share for Tabacco, cont’d March 1, 2013 Hackl, Econometrics 2, Lecture 2 80 nHeckit ML nestimation, nGRETL output n Model 7: ML Heckit, using observations 1-2724 Dependent variable: SHARE1 Selection variable: D1 Selection equation coefficient std. error t-ratio p-value ------------------------------------------------------------- const -16,2535 2,58561 -6,286 3,25e-010 *** AGE 0,753353 0,653820 1,152 0,2492 NADULTS 2,13037 1,03368 2,061 0,0393 ** NKIDS -0,0936353 0,0376590 -2,486 0,0129 ** NKIDS2 -0,188864 0,141231 -1,337 0,1811 LNX 1,25834 0,192074 6,551 5,70e-011 *** AGELNX -0,0510698 0,0486730 -1,049 0,2941 NADLNX -0,160399 0,0748929 -2,142 0,0322 ** BLUECOL -0,0352022 0,0983073 -0,3581 0,7203 WHITECOL 0,0801599 0,0852980 0,9398 0,3473 WALLOON 0,201073 0,0628750 3,198 0,0014 *** Models for Budget Share for Tabacco const. NKIDS LNX WALL Tobit model coeff. -0,1704 -0,0030 0,0134 0,0042 s.e. 0,0441 0,0008 0,0033 0,0010 Truncated regression coeff. 0,0433 -0,0022 -0,0017 0,0026 s.e. 0,0458 0,0008 0,0034 0,0009 Tobit II model coeff. 0,0444 -0,0022 -0,0017 0,0026 s.e. 0,0492 0,0006 0,0036 0,0010 Tobit II selection coeff. -16,2535 -0,0936 1,2583 0,2011 s.e. 2,5856 0,0377 0,1921 0,0629 March 1, 2013 Hackl, Econometrics 2, Lecture 2 81 Estimates and standard errors for some coefficients of the standard Tobit, the truncated regression and the Tobit II model Test for Sampling Selection Bias nError terms of the Tobit II model with s12 ≠ 0: standard errors and test may result in misleading inferences nTest of H0: s12 = 0 in the second step of Heckit, i.e., fitting the regression wi = x1i’b1 + s12 li + ηi nt-test on the coefficient for Heckman’s lambda nTest results are sensitive to exclusion restrictions on x1i n n March 1, 2013 Hackl, Econometrics 2, Lecture 2 82 Tobit Models in GRETL nModel > Nonlinear Models > Tobit nEstimates the Tobit model; censored dependent variable nModel > Nonlinear Models > Heckit nEstimates in addition the selection equation (Tobit II), optionally by ML- and by two-step estimation March 1, 2013 Hackl, Econometrics 2, Lecture 2 83 Your Homework 1.Verbeek‘s data set CREDIT contains credit ratings of 921 US firms, as well as characteristics of the firm; the variable rating has categories “1“, …,“7“ (highest) . Generate the variable GF (good firm) with value 1 if rating > 4 and 0 otherwise, and the more detailed variable CR (credit rating) with CR = 1 if rating < 3, CR = 2 if rating = 3, CR = 3 if rating = 4, and CR = 4 otherwise. a.Estimate a binary logit model for the assignment of the GF ratings, and an ordered logit model for assignment CR. b.Compare the effects of the regressors in the models, based on coefficients and slopes. c.Compare the hit rates of the models based on GF and on CR? 2.People buy for yi* of an investment fund, with yi* = xi’b + ei with eI ~ N(0,1); xi consists of an intercept and the variables age and income. The dummy di = 1 if yi* > 0 and di = 0 otherwise. n March 1, 2013 Hackl, Econometrics 2, Lecture 2 84 Your Homework, cont’d a.Derive the probability for di = 1 as function of xi. b.Derive the log-likelihood function of the probit model for di. 3.Verbeek‘s data set TOBACCO contains expenditures on alcohol in 2724 Belgian households, taken from the Belgian household budget survey of 1995/96, as well as other characteristics of the households; for the expenditures on alcohol, the dummy D1=1 if the budget share for alcohol SHARE1 differs from 0, and D1=0 otherwise. a.Model the budget share for alcohol, using (i) a Tobit model, (ii) a truncated regression, and (iii) a Tobit II model, using the household characteristics AGE, LNX, NKIDS, and the dummy FLANDERS. b.Compare the effects of the regressors in the models, based on coefficients and slopes. c.Compare the results for FLANDERS with that for the WALLOON. n 2. March 1, 2013 Hackl, Econometrics 2, Lecture 2 85