Binary dependent variables 03.12.2021 LECTURE 8 2 Lecture Outline •The linear probability model • •Nonlinear probability models •Probit •Logit • •Brief introduction of maximum likelihood estimation • •Interpretation of coefficients in logit and probit models 3 Introduction •So far the dependent variable (Y ) has been continuous: •Average hourly earnings •Birth weight of babies • •What if Y is binary? •Y = get into college, or not; X = parental income. •Y = person smokes, or not; X = cigarette tax rate, income. •Y = mortgage application is accepted, or not; X = race, income, house characteristics, marital status ... 4 The linear probability model •Multiple regression model with continuous dependent variable Yi = β0 + β1X1i + · · · + βk Xki + ui •The coefficient βj can be interpreted as the change in Y associated with a unit change in Xj • •We will now discuss the case with a binary dependent variable •We know that the expected value of a binary variable Y is E [Y ] = 1 · Pr (Y = 1) + 0 · Pr (Y = 0) = Pr (Y = 1) •In the multiple regression model with a binary dependent variable we have E [Yi |X1i , · · · , Xki ] = Pr (Yi = 1|X1i , · · · , Xki ) •It is therefore called the linear probability model. 5 Mortgage applications Example: •Most individuals who want to buy a house apply for a mortgage at a bank. •Not all mortgage applications are approved. •What determines whether or not a mortgage application is approved or denied? •During this lecture we use a subset of the Boston HMDA data (N = 2380) •a data set on mortgage applications collected by the Federal Reserve Bank in Boston Variable Description Mean SD deny pi_ratio black = 1if mortgage application is denied 0.120 0.325 anticipated monthly loan payments / monthly income 0.331 0.107 = 1if applicant is black, = 0 if applicant is white 0.142 0.350 6 Mortgage applications •Does the payment to income ratio affect whether or not a mortgage application is denied? . regress deny pi_ratio, robust Linear regression Number of obs = 2380 F( 1, 2378) = 37.56 Prob > F = 0.0000 R-squared = 0.0397 Root MSE = .31828 deny Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] pi_ratio .6035349 .0984826 6.13 0.000 .4104144 .7966555 _cons -.0799096 .0319666 -2.50 0.012 -.1425949 -.0172243 7 The linear probability model •The conditional expectation equals the probability that Yi = 1 conditional on X1i , · · · , Xki : E [Yi |X1i , · · · , Xki ] = Pr (Yi = 1|X1i , · · · , Xki ) = β0 + β1X1i + · · · βk Xki •The population coefficient βj equals the change in the probability that Yi = 1 associated with a unit change in Xj . ∂Pr (Yi = 1|X1i , · · · , Xki ) = βj ∂Xj In the mortgage application example: •A change in the payment to income ratio by 1 is estimated to increase the probability that the mortgage application is denied by 0.60. •A change in the payment to income ratio by 0.10 is estimated to increase the probability that the application is denied by 6% (0.10*0.60*100). 8 The linear probability model 9 The linear probability model: heteroskedasticity Yi = β0 + β1X1i + · · · + βk Xki + ui •The variance of a Bernoulli random variable: Var (Y ) = Pr (Y = 1) × (1 − Pr (Y = 1)) •We can use this to find the conditional variance of the error term •Solution: Always use heteroskedasticity robust standard errors when estimating a linear probability model! 10 The linear probability model: shortcomings In the linear probability model the predicted probability can be below 0 or above 1! Example: linear probability model, HMDA data Mortgage denial v. ratio of debt payments to income (P/I ratio) in a subset of the HMDA data set (n = 127) 11 Nonlinear probability models •Probabilities cannot be less than 0 or greater than 1 •To address this problem we will consider nonlinear probability models Pr (Yi = 1) = G (Z ) with Z = β0 + β1X1i + · · · + βk Xki and 0 ≤ G (Z ) ≤ 1 •We will consider 2 nonlinear functions 1Probit G(Z ) = Φ (Z ) 2Logit G (Z ) = 1 1 + e−Z 12 Probit Probit regression models the probability that Y = 1 •Using the cumulative standard normal distribution function Φ(Z ) •evaluated at Z = β0 + β1X1i + · · · + βk Xki •since Φ(z) = Pr (Z ≤ z) we have that the predicted probabilities of the probit model are between 0 and 1 • • Example •Suppose we have only 1 regressor and Z = −2 + 3X1 •We want to know the probability that Y = 1 when X1 = 0.4 •z = −2 + 3 · 0.4 = −0.8 •Pr (Y = 1) = Pr (Z ≤ −0.8) = Φ(−0.8) 13 Probit Pr (Y = 1) = Pr (Z ≤ −0.8) = Φ(−0.8) = 0.2119 14 Logit Logit regression models the probability that Y = 1 •Using the cumulative standard logistic distribution function F (Z ) = 1 1 + e−Z •evaluated at Z = β0 + β1X1i + · · · + βk Xki •since F (z) = Pr (Z ≤ z) we have that the predicted probabilities of the probit model are between 0 and 1 • Example •Suppose we have only 1 regressor and Z = −2 + 3X1 •We want to know the probability that Y = 1 when X1 = 0.4 •z = −2 + 3 · 0.4 = −0.8 •Pr (Y = 1) = Pr (Z ≤ −0.8) = F (−0.8) 15 Logit Area = Pr(Z <= -0.8) -5 -4 -3 -2 -1 0 1 2 3 4 5 Standard logistic density •Pr (Y = 1) = Pr (Z ≤ −0.8) = 1 1+e 0.8 = 0.31 16 Logit & probit -5 -4 -3 -2 -1 0 1 2 3 4 5 Standard Logistic CDF and Standard Normal CDF 1 .9 .8 .7 .6 .5 .4 .3 .2 .1 0 logistic normal 17 How to estimate logit and probit models •In previous lectures we discussed regression models that are nonlinear in the independent variables • •these models can be estimated by OLS • •Logit and Probit models are nonlinear in the coefficients β0, β1, · · · , βk •these models can’t be estimated by OLS • •The method used to estimate logit and probit models is Maximum Likelihood Estimation (MLE). • •The MLE are the values of (β0, β1, · · · , βk ) that best describe the full distribution of the data. 18 Maximum likelihood estimation •The likelihood function is the joint probability distribution of the data, treated as a function of the unknown coefficients. •The maximum likelihood estimator (MLE) are the values of the coefficients that maximize the likelihood function. •MLE’s are the parameter values “most likely” to have produced the data. • Lets start with a special case: The MLE with no X •We have n i.i.d. observations Y1, . . . , Yn on a binary dependent variable •Y is a Bernoulli random variable •There is only 1 unknown parameter to estimate: •The probability p that Y = 1, •which is also the mean of Y 19 Maximum likelihood estimation (Optional) Step 1: write down the likelihood function, the joint probability distribution of the data •Yi is a Bernoulli random variable we therefore have Pr (Yi = y ) = Pr (Yi = 1)y · (1 − Pr (Yi = 1))1−y = py (1 − p)1−y i 1 0 •Pr (Y = 1) = p (1 − p) = p i 0 1 •Pr (Y = 0) = p (1 − p) = 1 − p •Y1, . . . , Yn are i.i.d, the joint probability distribution is therefore the product of the individual distributions 20 Maximum likelihood estimation (Optional) We have the likelihood function: Step 2: Maximize the likelihood function w.r.t p •Easier to maximize the logarithm of the likelihood function •Since the logarithm is a strictly increasing function, maximizing the likelihood or the log likelihood will give the same estimator. 21 Maximum likelihood estimation (Optional) •Taking the derivative w.r.t p gives •Setting to zero and rearranging gives •Solving for p gives the MLE 22 MLE of the probit model (Optional) 23 MLE of the probit model (Optional) Also with obtaining the MLE of the probit model it is easier to take the logarithm of the likelihood function Step 2: Maximize the log likelihood function •There is no simple formula for the probit MLE, the maximization must be done using numerical algorithm on a computer. 24 MLE of the logit model (Optional) •There is no simple formula for the logit MLE, the maximization must be done using numerical algorithm on a computer. 25 Probit: mortgage applications . probit deny pi_ratio Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = -872.0853 -832.02975 -831.79239 -831.79234 Probit regression 2380 80.59 0.0000 0.0462 = = Number of obs = LR chi2( 1) Prob > chi2 Pseudo R2 Log likelihood = -831.79234 = deny Coef. Std. Err. z P>|z| [95% Conf. Interval] pi_ratio 2.967907 .3591054 8.26 0.000 2.264073 3.67174 _cons -2.194159 .12899 -17.01 0.000 -2.446974 -1.941343 •The estimated MLE coefficient on the payment to income ratio equals 26 Probit: mortgage applications The estimate of β1 in the probit model CANNOT be interpreted as the change in the probability that Yi = 1 associated with a unit change in X1!! •In general the effect on Y of a change in X is the expected change in Y resulting from the change in X •Since Y is binary the expected change in Y is the change in the probability that Y = 1 In the probit model the predicted change the probability that the mortgage application is denied when the payment to income ratio increases from 27 Probit: mortgage applications Predicted values in the probit model: •All predicted probabilities are between 0 and 1! 28 Logit: mortgage applications . logit deny pi_ratio Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = -872.0853 -830.96071 -830.09497 -830.09403 -830.09403 Logistic regression 2380 83.98 0.0000 0.0482 = = Number of obs = LR chi2( 1) Prob > chi2 Pseudo R2 Log likelihood = -830.09403 = deny Coef. Std. Err. z P>|z| [95% Conf. Interval] pi_ratio 5.884498 .7336006 8.02 0.000 4.446667 7.322328 _cons -4.028432 .2685763 -15.00 0.000 -4.554832 -3.502032 •The estimated MLE coefficient on the payment to income ratio equals 29 Logit: mortgage applications Also in the Logit model: The estimate of β1 CANNOT be interpreted as the change in the probability that Yi = 1 associated with a unit change in X1!! In the logit model the predicted change the probability that the mortgage application is denied when the payment to income ratio increases from 30 Logit: mortgage applications The predicted probabilities from the probit and logit models are very close in these HMDA regressions: 31 Probit & Logit with multiple regressors •We can easily extend the Logit and Probit regression models, by including additional regressors •Suppose we want to know whether white and black applications are treated differentially •Is there a significant difference in the probability of denial between black and white applicants conditional on the payment to income ratio? •To answer this question we need to include two regressors •P/I ratio •Black 32 Probit with multiple regressors Probit regression 2380 149.90 0.0000 0.0859 = = Number of obs = LR chi2( 2) Prob > chi2 Pseudo R2 Log likelihood = -797.13604 = deny Coef. Std. Err. z P>|z| [95% Conf. Interval] black .7081579 .0834327 8.49 0.000 .5446328 .8716831 pi_ratio 2.741637 .3595888 7.62 0.000 2.036856 3.446418 _cons -2.258738 .129882 -17.39 0.000 -2.513302 -2.004174 •To say something about the size of the impact of race we need to specify a value for the payment to income ratio •Predicted denial probability for a white application with a P/I-ratio of 0.3 is Φ(−2.26 + 0.71 · 0 + 2.74 · 0.3) = 0.0749 •Predicted denial probability for a black application with a P/I-ratio of 0.3 is Φ(−2.26 + 0.71 · 1 + 2.74 · 0.3) = 0.2327 •Difference is 15.8% 33 Logit with multiple regressors Logistic regression 2380 152.78 0.0000 0.0876 = = Number of obs = LR chi2( 2) Prob > chi2 Pseudo R2 Log likelihood = -795.69521 = deny Coef. Std. Err. z P>|z| [95% Conf. Interval] black 1.272782 .1461983 8.71 0.000 .9862385 1.559325 pi_ratio 5.370362 .7283192 7.37 0.000 3.942883 6.797841 _cons -4.125558 .2684161 -15.37 0.000 -4.651644 -3.599472 •To say something about the size of the impact of race we need to specify a value for the payment to income ratio •Predicted denial probability for a white application with a P/I-ratio of 0.3 is 1/1 + e−(−4.13+5.37·0.30) = 0.075 •Predicted denial probability for a black application with a P/I-ratio of 0.3 is 1/1 + e−(−4.13+5.37·0.30+1.27) = 0.224 •Difference is 14.8% 34 LPM, Probit & Logit Table 1: Mortgage denial regression using the Boston HMDA Data Dependent variable: deny = 1 if mortgage application is denied, = 0 if accepted regression model LPM Probit Logit black 0.177*** (0.025) 0.71*** (0.083) 1.27*** (0.15) P/I ratio 0.559*** (0.089) 2.74*** (0.44) 5.37*** (0.96) constant -0.091*** (0.029) -2.26*** (0.16) -4.13*** (0.35) difference Pr(deny =1) between black and white applicant when P/I ratio=0.3 17.7% 15.8% 14.8% 35 Threats to internal and external validity Both for the Linear Probability as for the Probit & Logit models we have to consider threats to 1Internal validity •Is there omitted variable bias? •Is the functional form correct? •Probit model: is assumption of a Normal distribution correct? •Logit model: is assumption of a Logistic distribution correct? •Is there measurement error? •Is there sample selection bias? •is there a problem of simultaneous causality? • 2External validity •These data are from Boston in 1990-91. •Do you think the results also apply today, where you live? 36 Distance to college & probability of obtaining a college degree Linear regression Number of obs = 3796 F( 1, 3794) = 15.77 Prob > F = 0.0001 R-squared = 0.0036 Root MSE = .44302 college Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] dist -.012471 .0031403 -3.97 0.000 -.0186278 -.0063142 _cons .2910057 .0093045 31.28 0.000 .2727633 .3092481 Probit regression 3796 14.48 0.0001 0.0033 = = Number of obs = LR chi2( 1) Prob > chi2 Pseudo R2 Log likelihood = -2204.8977 = college Coef. Std. Err. z P>|z| [95% Conf. Interval] dist -.0407873 .0109263 -3.73 0.000 -.0622025 -.0193721 _cons -.5464198 .028192 -19.38 0.000 -.6016752 -.4911645 Logistic regression 3796 14.68 0.0001 0.0033 = = Number of obs = LR chi2( 1) Prob > chi2 Pseudo R2 Log likelihood = -2204.8006 = college Coef. Std. Err. z P>|z| [95% Conf. Interval] dist -.0709896 .0193593 -3.67 0.000 -.1089332 -.033046 _cons -.8801555 .0476434 -18.47 0.000 -.9735349 -.786776 37 Distance to college & probability of obtaining a college degree .35 .3 .25 .2 .15 .1 .05 0 0 5 10 15 Distance to college ( x 10 miles) Linear Probability Probit Logit •The 3 different models produce very similar results. 38 Summary •If Yi is binary, then E (Yi |Xi ) = Pr (Yi = 1|Xi ) •Three models: 1linear probability model (linear multiple regression) 2probit (cumulative standard normal distribution) 3logit (cumulative standard logistic distribution) •LPM, probit, logit all produce predicted probabilities •Effect of ∆X is a change in conditional probability that Y = 1 •For logit and probit, this depends on the initial X •Probit and logit are estimated via maximum likelihood •Coefficients are normally distributed for large n •Large-n hypothesis testing, conf. intervals is as usual