Binary dependent variables
03.12.2021
LECTURE 8

2
Lecture Outline
•The linear probability model
•
•Nonlinear probability models
•Probit
•Logit
•
•Brief introduction of maximum likelihood estimation
•
•Interpretation of coefficients in logit and probit models

3
Introduction
•So far the dependent variable (Y ) has been continuous:
•Average hourly earnings
•Birth weight of babies
•
•What if Y is binary?
•Y = get into college, or not; X = parental income.
•Y = person smokes, or not; X = cigarette tax rate, income.
•Y = mortgage application is accepted, or not; X = race, income,  house characteristics, marital
status ...

4
The linear probability model
•Multiple regression model with continuous dependent variable
Yi = β0 + β1X1i + · · · + βk Xki + ui
•The coefficient βj can be interpreted as the change in Y associated with  a unit change in Xj
•
•We will now discuss the case with a binary dependent variable
•We know that the expected value of a binary variable Y is
E [Y ] = 1 · Pr (Y = 1) + 0 · Pr (Y = 0) = Pr (Y = 1)
•In the multiple regression model with a binary dependent variable we  have
E [Yi |X1i , · · · , Xki ] = Pr (Yi = 1|X1i , · · · , Xki )
•It is therefore called the linear probability model.

5
Mortgage applications
Example:
•Most individuals who want to buy a house apply for a mortgage at a  bank.
•Not all mortgage applications are approved.
•What determines whether or not a mortgage application is approved or  denied?
•During this lecture we use a subset of the Boston HMDA data  (N = 2380)
•a data set on mortgage applications collected by the Federal  Reserve Bank in Boston
Variable
Description
Mean SD
deny  pi_ratio  black
= 1if mortgage application is denied
0.120 0.325
anticipated monthly loan payments / monthly income 0.331 0.107
= 1if applicant is black, = 0 if applicant is white
0.142 0.350

6
Mortgage applications
•Does the payment to income ratio affect whether or not a mortgage  application is denied?
. regress deny pi_ratio, robust
Linear regression
Number of obs =
2380
F( 1, 2378) =
37.56
Prob > F =
0.0000
R-squared =
0.0397
Root MSE =
.31828
deny
Coef.
Robust  Std. Err.
t
P>|t|
[95% Conf. Interval]
pi_ratio
.6035349
.0984826
6.13
0.000
.4104144
.7966555
_cons
-.0799096
.0319666
-2.50
0.012
-.1425949
-.0172243

7
The linear probability model
•The conditional expectation equals the probability that Yi = 1 conditional  on X1i , · · · , Xki :
E [Yi |X1i , · · · , Xki ] = Pr (Yi = 1|X1i , · · · , Xki ) = β0 + β1X1i + · · · βk Xki
•The population coefficient βj equals the change in the probability that
Yi = 1 associated with a unit change in Xj .
∂Pr (Yi = 1|X1i , · · · , Xki ) = βj
∂Xj
In the mortgage application example:
•A change in the payment to income ratio by 1 is estimated to increase  the probability that the
mortgage application is denied by 0.60.
•A change in the payment to income ratio by 0.10 is estimated to increase  the probability that the
application is denied by 6% (0.10*0.60*100).

8
The linear probability model


9
The linear probability model: heteroskedasticity
Yi = β0 + β1X1i + · · · + βk Xki + ui
•The variance of a Bernoulli random variable:
Var (Y ) = Pr (Y = 1) × (1 − Pr (Y = 1))
•We can use this to find the conditional variance of the error term
•Solution: Always use heteroskedasticity robust standard errors when  estimating a linear
probability model!

10
The linear probability model: shortcomings
In the linear probability model the predicted probability can be below 0 or  above 1!
Example: linear probability model, HMDA data
Mortgage denial v. ratio of debt payments to income  (P/I ratio) in a subset of the HMDA data set
(n = 127)

11
Nonlinear probability models
•Probabilities cannot be less than 0 or greater than 1
•To address this problem we will consider nonlinear probability models
Pr (Yi = 1) = G (Z )
with Z = β0 + β1X1i + · · · + βk Xki
and   0 ≤ G (Z ) ≤ 1
•We will consider 2 nonlinear functions
1Probit
G(Z ) = Φ (Z )
2Logit
G (Z ) = 1
1 + e−Z

12
Probit
Probit regression models the probability that Y = 1
•Using the cumulative standard normal distribution function Φ(Z )
•evaluated at Z = β0 + β1X1i + · · · + βk Xki
•since Φ(z) = Pr (Z ≤ z) we have that the predicted probabilities of the  probit model are between
0 and 1
•
•
Example
•Suppose we have only 1 regressor and Z = −2 + 3X1
•We want to know the probability that Y = 1 when X1 = 0.4
•z = −2 + 3 · 0.4 = −0.8
•Pr (Y = 1) = Pr (Z ≤ −0.8) = Φ(−0.8)

13
Probit
Pr (Y = 1) = Pr (Z ≤ −0.8) = Φ(−0.8) = 0.2119

14
Logit
Logit regression models the probability that Y = 1
•Using the cumulative standard logistic distribution function
F (Z ) = 1
1 + e−Z
•evaluated at Z = β0 + β1X1i + · · · + βk Xki
•since F (z) = Pr (Z ≤ z) we have that the predicted probabilities of the  probit model are between
0 and 1
•
Example
•Suppose we have only 1 regressor and Z = −2 + 3X1
•We want to know the probability that Y = 1 when X1 = 0.4
•z = −2 + 3 · 0.4 = −0.8
•Pr (Y = 1) = Pr (Z ≤ −0.8) = F (−0.8)

15
Logit
Area = Pr(Z <= -0.8)
-5
-4
-3
-2
-1
0
1
2
3
4
5
Standard logistic density
•Pr (Y = 1) = Pr (Z ≤ −0.8) =
1
1+e
0.8
= 0.31

16
Logit & probit
-5
-4
-3
-2
-1
0
1
2
3
4
5
Standard Logistic CDF and Standard Normal CDF
1
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
logistic normal

17
How to estimate logit and probit models
•In previous lectures we discussed regression models that are nonlinear in the  independent
variables
•
•these models can be estimated by OLS
•
•Logit and Probit models are nonlinear in the coefficients β0, β1, · · · , βk
•these models can’t be estimated by OLS
•
•The method used to estimate logit and probit models is Maximum  Likelihood Estimation (MLE).
•
•The MLE are the values of (β0, β1, · · · , βk ) that best describe the full  distribution of the
data.

18
Maximum likelihood estimation
•The likelihood function is the joint probability distribution of the data,  treated as a function
of the unknown coefficients.
•The maximum likelihood estimator (MLE) are the values of the  coefficients that maximize the
likelihood function.
•MLE’s are the parameter values “most likely” to have produced the data.
•
Lets start with a special case: The MLE with no X
•We have n i.i.d. observations Y1, . . . , Yn on a binary dependent variable
•Y is a Bernoulli random variable
•There is only 1 unknown parameter to estimate:
•The probability p that Y = 1,
•which is also the mean of Y

19
Maximum likelihood estimation (Optional)
Step 1: write down the likelihood function, the joint probability  distribution of the data
•Yi is a Bernoulli random variable we therefore have
Pr (Yi = y ) = Pr (Yi = 1)y · (1 − Pr (Yi = 1))1−y = py (1 − p)1−y
i
1
0
•Pr (Y = 1) = p (1 − p) = p
i
0 1
•Pr (Y = 0) = p (1 − p) = 1 − p
•Y1, . . . , Yn are i.i.d, the joint probability distribution is therefore the  product of the
individual distributions

20
Maximum likelihood estimation (Optional)
We have the likelihood function:
Step 2: Maximize the likelihood function w.r.t p
•Easier to maximize the logarithm of the likelihood function
•Since the logarithm is a strictly increasing function, maximizing the  likelihood or the log
likelihood will give the same estimator.

21
Maximum likelihood estimation (Optional)
•Taking the derivative w.r.t p gives
•Setting to zero and rearranging gives
•Solving for p gives the MLE

22
MLE of the probit model (Optional)


23
MLE of the probit model (Optional)
Also with obtaining the MLE of the probit model it is easier to take the  logarithm of the
likelihood function
Step 2: Maximize the log likelihood function
•There is no simple formula for the probit MLE, the maximization must be  done using numerical
algorithm on a computer.

24
MLE of the logit model (Optional)
•There is no simple formula for the logit MLE, the maximization must be  done using numerical
algorithm on a computer.

25
Probit: mortgage applications
. probit deny pi_ratio
Iteration 0: log likelihood =  Iteration 1: log likelihood =  Iteration 2: log likelihood =
Iteration 3: log likelihood =
-872.0853
-832.02975
-831.79239
-831.79234
Probit regression
2380
80.59
0.0000
0.0462
=
=
Number of obs =  LR chi2( 1)  Prob > chi2
Pseudo R2
Log likelihood = -831.79234
=
deny
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
pi_ratio
2.967907
.3591054
8.26
0.000
2.264073
3.67174
_cons
-2.194159
.12899
-17.01
0.000
-2.446974
-1.941343
•The estimated MLE coefficient on the payment to income ratio equals

26
Probit: mortgage applications
The estimate of β1 in the probit model CANNOT be interpreted as the change  in the probability that
Yi = 1 associated with a unit change in X1!!
•In general the effect on Y of a change in X is the expected change in Y
resulting from the change in X
•Since Y is binary the expected change in Y is the change in the  probability that Y = 1
In the probit model the predicted change the probability that the mortgage  application is denied
when the payment to income ratio increases from

27
Probit: mortgage applications
Predicted values in the probit model:
•All predicted probabilities are between 0 and 1!

28
Logit: mortgage applications
. logit deny pi_ratio
Iteration 0: log likelihood =  Iteration 1: log likelihood =  Iteration 2: log likelihood =
Iteration 3: log likelihood =  Iteration 4: log likelihood =
-872.0853
-830.96071
-830.09497
-830.09403
-830.09403
Logistic regression
2380
83.98
0.0000
0.0482
=
=
Number of obs =  LR chi2( 1)  Prob > chi2
Pseudo R2
Log likelihood = -830.09403
=
deny
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
pi_ratio
5.884498
.7336006
8.02
0.000
4.446667
7.322328
_cons
-4.028432
.2685763
-15.00
0.000
-4.554832
-3.502032
•The estimated MLE coefficient on the payment to income ratio equals

29
Logit: mortgage applications
Also in the Logit model:
The estimate of β1 CANNOT be interpreted as the change in the probability  that Yi = 1 associated
with a unit change in X1!!
In the logit model the predicted change the probability that the mortgage  application is denied
when the payment to income ratio increases from

30
Logit: mortgage applications
The predicted probabilities from the probit and logit  models are very close in these HMDA
regressions:

31
Probit & Logit with multiple regressors
•We can easily extend the Logit and Probit regression models, by  including additional regressors
•Suppose we want to know whether white and black applications are  treated differentially
•Is there a significant difference in the probability of denial between black  and white applicants
conditional on the payment to income ratio?
•To answer this question we need to include two regressors
•P/I ratio
•Black

32
Probit with multiple regressors
Probit regression
2380
149.90
0.0000
0.0859
=
=
Number of obs =  LR chi2( 2)  Prob > chi2
Pseudo R2
Log likelihood = -797.13604
=
deny
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
black
.7081579
.0834327
8.49
0.000
.5446328
.8716831
pi_ratio
2.741637
.3595888
7.62
0.000
2.036856
3.446418
_cons
-2.258738
.129882
-17.39
0.000
-2.513302
-2.004174
•To say something about the size of the impact of race we need to  specify a value for the payment
to income ratio
•Predicted denial probability for a white application with a P/I-ratio of 0.3  is
Φ(−2.26 + 0.71 · 0 + 2.74 · 0.3) = 0.0749
•Predicted denial probability for a black application with a P/I-ratio of 0.3  is
Φ(−2.26 + 0.71 · 1 + 2.74 · 0.3) = 0.2327
•Difference is 15.8%

33
Logit with multiple regressors
Logistic regression
2380
152.78
0.0000
0.0876
=
=
Number of obs =  LR chi2( 2)  Prob > chi2
Pseudo R2
Log likelihood = -795.69521
=
deny
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
black
1.272782
.1461983
8.71
0.000
.9862385
1.559325
pi_ratio
5.370362
.7283192
7.37
0.000
3.942883
6.797841
_cons
-4.125558
.2684161
-15.37
0.000
-4.651644
-3.599472
•To say something about the size of the impact of race we need to  specify a value for the payment
to income ratio
•Predicted denial probability for a white application with a P/I-ratio of 0.3
is
1/1 + e−(−4.13+5.37·0.30) = 0.075
•Predicted denial probability for a black application with a P/I-ratio of 0.3
is
1/1 + e−(−4.13+5.37·0.30+1.27) = 0.224
•Difference is 14.8%

34
LPM, Probit & Logit
Table 1: Mortgage denial regression using the Boston HMDA Data
Dependent variable: deny = 1 if mortgage application is denied, = 0 if accepted
regression model
LPM
Probit
Logit
black
0.177***  (0.025)
0.71***  (0.083)
1.27***  (0.15)
P/I ratio
0.559***  (0.089)
2.74***  (0.44)
5.37***  (0.96)
constant
-0.091***  (0.029)
-2.26***  (0.16)
-4.13***  (0.35)
difference Pr(deny =1) between black
and white applicant when P/I ratio=0.3
17.7%
15.8%
14.8%

35
Threats to internal and external validity
Both for the Linear Probability as for the Probit & Logit models we have to  consider threats to
1Internal validity
•Is there omitted variable bias?
•Is the functional form correct?
•Probit model: is assumption of a Normal distribution correct?
•Logit model: is assumption of a Logistic distribution correct?
•Is there measurement error?
•Is there sample selection bias?
•is there a problem of simultaneous causality?
•
2External validity
•These data are from Boston in 1990-91.
•Do you think the results also apply today, where you live?

36
Distance to college & probability of obtaining a college degree
Linear regression
Number of obs =
3796
F( 1, 3794) =
15.77
Prob > F =
0.0001
R-squared =
0.0036
Root MSE =
.44302
college
Coef.
Robust  Std. Err.
t
P>|t|
[95% Conf. Interval]
dist
-.012471
.0031403
-3.97
0.000
-.0186278
-.0063142
_cons
.2910057
.0093045
31.28
0.000
.2727633
.3092481
Probit regression
3796
14.48
0.0001
0.0033
=
=
Number of obs =  LR chi2( 1)  Prob > chi2
Pseudo R2
Log likelihood = -2204.8977
=
college
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
dist
-.0407873
.0109263
-3.73
0.000
-.0622025
-.0193721
_cons
-.5464198
.028192
-19.38
0.000
-.6016752
-.4911645
Logistic regression
3796
14.68
0.0001
0.0033
=
=
Number of obs =  LR chi2( 1)  Prob > chi2
Pseudo R2
Log likelihood = -2204.8006
=
college
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
dist
-.0709896
.0193593
-3.67
0.000
-.1089332
-.033046
_cons
-.8801555
.0476434
-18.47
0.000
-.9735349
-.786776

37
Distance to college & probability of obtaining a college degree
.35
.3
.25
.2
.15
.1
.05
0
0 5 10 15
Distance to college ( x 10 miles)
Linear Probability Probit Logit
•The 3 different models produce very similar results.

38
Summary
•If Yi is binary, then E (Yi |Xi ) = Pr (Yi = 1|Xi )
•Three models:
1linear probability model (linear multiple regression)
2probit (cumulative standard normal distribution)
3logit (cumulative standard logistic distribution)
•LPM, probit, logit all produce predicted probabilities
•Effect of ∆X is a change in conditional probability that Y = 1
•For logit and probit, this depends on the initial X
•Probit and logit are estimated via maximum likelihood
•Coefficients are normally distributed for large n
•Large-n hypothesis testing, conf. intervals is as usual