Causal Models
Lukˊaˇs Laﬀˊers
Matej Bel University, Dept. of Mathematics
MUNI Brno
11.11.2021
Causality
Is it possible to recover a causal relationship from observational dataset?
Graphical models
Judea Pearl (UCLA) and his book Causality.
Graphical models
Uniﬁed setup on how to think about causality.
Every problem is visualized in terms of a causal graph.
It is easier to think about a problem once you have a graph that visualize the
relationships.
It provides a set of rules that show when and how it is possible to identify
causal eﬀects.
This set of rules may be automated.
It makes the thinking about the identiﬁcation easier.
X
D Y
The relationship of D and Y is of
interest
D and Y are associated directly
D and Y are associated indirectly via X
X
D Y
The relationship of D and Y is of
interest
D and Y are associated directly
D and Y are associated indirectly via X
X is a confounder
Notation
X
D Y
Node
Edge
Path
Directed path
Parent/Child
Ascendant/Non-ascendats
Acyclic graph
Directed Acyclic Graphs - DAGs
Directed - arrows have direction
Acyclic - there does not exist a cycle in this graph. Causality is an
asymmetric concept
Graphs - object that encodes the causal structure of the problem
Direct eﬀect
X
D Y
P(y,d,x) is the joint distribution
(shorthand for P(Y = y,D = d,X = x))
P(x|d,y) = P(x)
(no edge between X and Y,D)
testable implications
P(y,d,x) = P(x)
P(x|parx )
· P(d)
P(d|pard )
· P(y|d)
P(y|pary )
=⇒ P(y,d,x) = P(x)·P(y,d) and
therefore
X ⊥⊥ (D,Y)
Bayesian factorization
X
D Y
P(y,d,x) = P(x)
P(x|parx )
· P(d)
P(d|pard )
· P(y|d)
P(y|pary )
or in general
P(x1,x2,··· ,xn) = P(x1|parx1 )·P(x2|parx2 )···P(xn|parxn )
Given its parents, the variable is independent of all of its non-descendants.
Every parent is a direct cause of all its children.
Eﬀect in reverse direction
X
D Y
P(y,d,x) is the joint distribution
P(x|d,y) = P(x)
(no edge between X and Y,D)
testable implications
P(y,d,x) = P(x)·P(d|y)·P(y) =⇒ P(y,d,x) = P(x)·P(y,d) and therefore
X ⊥⊥ (D,Y)
Confounded eﬀect
Graph includes information about independencies
X
D Y
P(y,d,x) is the joint distribution
No testable implications.
Direct and Indirect eﬀect
X
D Y
P(y,d,x) is the joint distribution
No testable implications.
No eﬀect (fork)
X
D Y
P(y,d,x) is the joint distribution
P(d|x,y) = P(d|x)
(no edge between D and Y)
testable implications
P(y,d,x) = P(x)·P(y|x)·P(d|x) =⇒ P(y,d|x) = P(y|x)·P(d|x) and
therefore
Y ⊥⊥ D|X
Indirect eﬀect via X (chain)
X
D Y
P(y,d,x) is the joint distribution
P(y|x,d) = P(y|x)
(no edge between D and Y))
testable implications
P(y,d,x) = P(x|d)·P(d)
=P(d|x)·P(x)
·P(y|x) =⇒ P(y,d|x) = P(y|x)·P(d|x) and therefore
Y ⊥⊥ D|X
So far, we have seen that very diﬀerent setups (in terms of direction of
eﬀects) have the same testable implications.
Graphs are helpful if we want to study their implications for statistical
independencies.
Graphs alone are not suﬃcient, we need to equip this setup with something
else in order to talk about causality.
Collider (immorality)
X
D Y
P(y,d,x) is the joint distribution
P(y|x,d) = P(y|x)
(no edge between D and Y))
testable implications
P(y,d,x) = P(x|d,y)·P(d)·P(y) =⇒
sum across x
P(y,d) = P(y)·P(d) and therefore
Y ⊥⊥ D
Collider (immorality) continued
X
D Y
Conditioning induces dependence
Conditioned on X, previously
independent D and Y are now
dependent.
testable implications
P(y,d,x) = P(x|d,y)·P(d)·P(y) =⇒ P(y,d|x) = P(y|x)·P(d|x) and
therefore
Y ⊥⊥ D|X
Example 1 (collider bias) - known as ”Bad controls”
Griﬃth, Gareth J., et al. ”Collider bias undermines our understanding of
COVID-19 disease risk and severity.” Nature communications 11.1
S
X1 X2
X1 - academic ability
X2 - sporting ability
S - admitted to university
Example 2 (collider bias)
Z
X Y
U
X - maternal smoking
Y - infant mortality
Z - birth-weight
U - unobserved risk factors (e.g.
birth-defects, malnutrition)
Hernˊandez-Dˊıaz, Sonia, Enrique F. Schisterman, and Miguel A. Hernˊan. ”The birth weight “paradox” uncovered?.” American journal of epidemiology
164.11 (2006): 1115-1120.
Example 3 (collider bias) - Obesity paradox
Z
X Y
U X - obesity
Y - mortality
Z - heart-failure
U - unobserved risk factors (e.g.
genetic factors, lifestyle
behaviour)
Banack, Hailey R., and Jay S. Kaufman. ”The “obesity paradox” explained.” Epidemiology 24.3 (2013): 461-462.
Example 4 (collider bias) - Gender wage gap
X
D Y
U
D - gender
Y - log wages
X - {education, work
experience, occupation}
U - unobserved variables
Blau, Francine D., and Lawrence M. Kahn. ”The gender wage gap: Extent, trends, and explanations.” Journal of economic literature 55.3 (2017): 789-865.
Example 5 (collider bias) - Nutrition/height puzzle
X
D Y
D - childhood nutrition
Y - adult height
X - in military
Schneider, Eric B. ”Collider bias in economic history research.” Explorations in Economic History 78 (2020): 101356.
All these examples show the importance of the causal structure of the
problem at hand.
Conditioning on certain variables may (or may not) induce an association
that is not of interest.
Failing to condition on the right variables may result in a mixed set of
associations - also not of interest.
More notation to come...
Blocked path
D-separation
Causal vs non-causal association
Manipulated graph
Intervention - ”do-operator”
Suﬃcient adjustment set
Structural Causal Models
Endogenous vs exogenous variables
Blocked path
Any path p is blocked by a set of variables B if:
(1) p contains a chain or a fork, such that the middle node is in B
or
(2) p contains a collider, such that neither the middle node is in B, nor any
of its descendants, are in B
Blocked path
X1
D Y
X2 X3
X4 X5 X6
M
X7
d-separation
For a given graph G, let us have three disjoint sets of variables B1,B2,B3:
B1 and B2 are d-connected by B3 ⇐⇒ there exists an undirected path p
between some vertex in B1 and some vertex in B2 such that for every
collider C on p, either C or a descendant of C is in B3, and no non-collider on
p is in B3.
B1 and B2 are d-separated by B3 ⇐⇒ B1 and B2 are not d-connected by B3
B1 and B2 are d-separated by B3 ⇐⇒ if B3 blocks every path between B1
and B2
{D} and {Y} are d-connected by {X5}. There are 3 paths.
X1
D Y
X2 X3
X4 X5 X6
M
X7
{D} and {Y} are d-separated by {X1,M}. All three paths are blocked..
X1
D Y
X2 X3
X4 X5 X6
M
X7
d-separation and statistical independence
Notation:
(B1 ⊥⊥ B2|B3)G ⇐⇒ B1 and B2 are d-separated by B3 in a graph G
=⇒ B1 ⊥⊥ B2|B3
D-separation implies statistical independence
(assuming that the graph is correct).
X1
D Y
X2 X3
X4 X5 X6
M
X7
non-causal association
non-causal association
causal association
X
D Y
non-causal association
causal association
X
D Y
causal association
causal association
Intervention - ”do-operator”
X
D Y
X
D Y
Manipulating D to be equal to d
do(D = d)
removes all the parents from nod D and sets P(D = d) = 1.
Intervention - ”do-operator”
Manipulating D to be equal to d
do(D = d)
removes all the parents from nod D and sets P(D = d) = 1.
It induces an interventional distribution:
P(Y,X|do(D = d))
which can be used to deﬁne potential outcomes:
E[Y(d)]
Three diﬀerent causal graphs
X
D Y
non-causal association
causal association
X
D Y
causal association
causal association
X
D Y
causal association
non-causal association
Controlling for X
X
D Y
blocked
causal association
X
D Y
causal association
blocked
X
D Y
causal association
open
Not-controlling for X
X
D Y
open
causal association
X
D Y
causal association
open
X
D Y
causal association
blocked
Back-door criterion
A set of variables B satisﬁes the back-door criterion if it:
blocks all spurious paths (non-causal, non-directed) from D to Y
does not block any of the causal paths from D to Y
does not open any spurious paths (via colliders or their descendants)
Then
E[Y(d)] = E[Y|do(D = d)] = E[ E[Y|D = d,B]
random (due to B)
]
thus we get the mean of potential outcome Y(d) from non-experimental
data (!)
(Note: The outer expectation is taken with respect to B.)
Back-door criterion
X
D Y
blocked
causal association
X
D Y
causal association
blocked
X
D Y
causal association
causal association
B = {X}
E[Y(d)] = E[E[Y|D = d,X]]
B = {}
E[Y(d)] = E[Y|D = d]
B = {}
E[Y(d)] = [E[Y|D = d]
Example 6: Diﬀerent conclusions based on the same data
D Y
X
X - management position
D - gender or lifestyle
Y - wage
Causal structure matters. Very diﬀerent conclusions can be reached from
the same data.
Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/
Example 6a:
D Y
X
X - management position
D - gender
Y - wage
E[Y(d)] = E[Y|D = d] = ∑
x∈{0,1}
E[Y|D = d,X = x]Pr(X = x|D = d)
Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/
Example 6b:
D Y
X
X - management position
D - lifestyle
Y - wage
E[Y(d)] = E E[Y|D = d,X] = ∑
x∈{0,1}
E[Y|D = d,X = x]Pr(X = x)
Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/
Example 6a:
♀ ♂
Not manager 3163 (87) 3015 (59)
Manager 5592 (13) 5319 (41)
X - management position
D - gender
Y - wage
E[Y(♀)] = ∑
x∈{0,1}
E[Y|D = ♀,X = x]Pr(X = x|D = ♀) = 3163 ·0.87 +5592 ·0.13 = 3478.77
E[Y(♂)] = ∑
x∈{0,1}
E[Y|D = ♂,X = x]Pr(X = x|D = ♂) = 3015 ·0.59 +5319 ·0.41 = 3959.64
E[Y(♀)−Y(♂)] = 3478.77 −3959.64 = −480.87
Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/
Example 6b:
Not manager 3163 (87) 3015 (59)
Manager 5592 (13) 5319 (41)
X - management position
D - lifestyle
Y - wage
E[Y( )] = ∑
x∈{0,1}
E[Y|D = ,X = x]Pr(X = x) = 3163 ·
87 +59
200
+5592 ·
13 +41
200
= 3818.83
E[Y( )] = ∑
x∈{0,1}
E[Y|D = ,X = x]Pr(X = x) = 3015 ·
87 +59
200
+5319 ·
13 +41
200
= 3637.08
E[Y( )−Y( )] = 3818.83 −3637.08 = 181.75
Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/
Do-calculus
Back-door criterion is only an application of one of the three rules of
”do-calculus”
There are three rules that provide exhaustive manipulation with do-operator
this whole process can be fully automated (!)
yes, that’s correct, automated!
1. Ignoring observations
P(y|z,do(x),w) = P(y|do(x),w) ⇐⇒ (Y ⊥⊥ Z|W,X)G¯X
remove all the arrows pointing into X
2. Treating interventions as observations
P(y|do(z),do(x),w) = P(y|z,do(x),w) ⇐⇒ (Y ⊥⊥ Z|W,X)G¯X,Z
remove all the arrows pointing from Z
3. Ignoring interventions
P(y|do(z),do(x),w) = P(y|do(x),w) ⇐⇒ (Y ⊥⊥ Z|W)G¯X,¯Z(W)
remove all the arrows pointing into Z(W)
that are not ancestors of W
Back-door criterion?
P(y|do(x)) = P(y|do(x),z)·P(z|do(x))
Chain rule
= P(y|x,z)·P(z|do(x))
Rule 2
= P(y|x,z)·P(z|{})
Rule 3
= P(y|x,z)·P(z)
Backdoor criterion
Note that: do(x) is a shorthand notation for do(X = x).
This is an event: ”X is manipulated to be equal to x”
X1
D Y
X2 X3
X4 X5 X6
M
X7
D = fD(X1)
M = fM(M)
Y = fY (M)
X1 = f1(X2)
X2 = X2
X3 = f3(X2)
X4 = f4(D)
X5 = f5(X4,X6)
X6 = f6(Y)
X7 = f7(X5)
Structural causal models
X1
D Y
X2 X3
X4 X5 X6
M
X7
U1 U2 U3
UM
U4
U5
U6
UY
U7
UD
D = fD(X1,UD)
M = fM(M,UM)
Y = fY (M,UY )
X1 = f1(X2,U1)
X2 = f2(U2)
X3 = f3(X2,U3)
X4 = f4(D,U4)
X5 = f5(X4,X6,U5)
X6 = f6(Y,U6)
X7 = f7(X5,U7)
U ∼ P
Modiﬁed Structural Causal Model
X
D Y
UD
UX
UY X
D=d Y
UX
UY
Modiﬁed Structural Causal Model
D = fD(X,UD)
X = fX (UX )
Y = fY (D,X,UY )
D = d
X = fX (UX )
Y = fY (D,X,UY )
Example 7 (unobserved confounders) - Returns to
education
U
D Y
D - education
Y - log wages
U - unobserved ability
We cannot close the backdoor path via U because it is unobserved.
Example 8: Human Capital Model (Becker 1994)
X
Z YD
U
Z - parental education
D - education
Y - log wages
X - family income
U - unobserved background
characteristics
Conditioning on X closes all the backdoor paths.
Example 9: Human Capital Model (Becker 1994) - ver.2
X
Z YD
U
Z - parental education
D - education
Y - log wages
X - family income
U - unobserved background
characteristics
Not possible to close the backdoor path via U as it is unobserved
Example 10: Schooling again
YD
U1
X
U2
D - education
Y - log wages
X - family income
U1 - unobserved mother’s
characteristics
U2 - unobserved father’s
characteristics
conditioning on X makes things even worse as it opens up two new paths
Example 11: Discrimination
X
D Y
U
G G - gender
D - discrimination
X - occupation
Y - log wages
U - unobserved ability
conditioning on X closes the mediated path but it opens up a new path
D ← X ← U → Y
Example 12: Covid risk factors
D Y
X1
X2
X1 - smoking
X2 - frailty
D - Covid hospitalization
Y - death
looking at the hospitalized patients only (conditioning on D) induces
spurious correlation among diﬀerent independent(!) risk factors: smoking
(X1) and frailty (X2)
https://www.hdruk.ac.uk/news/we-should-be-cautious-about-associations-of-patient-characteristics-with-covid-19-outcomes-that-are-identiﬁed-in-
Example 13: Age adjustment for Vaccine eﬀectiveness
D Y
X
X - age
D - vaccination
Y - severe Covid
Adjusting for age closes the back-door path.
https://www.covid-datascience.com/post/israeli-data-how-can-eﬃcacy-vs-severe-disease-be-strong-when-60-of-hospitalized-are-vaccinated
Example 14 - many confounders
D Y
X5
X3
X1
X2
X6
X8
X7
X9
X4
X1,X2,··· - controls
D - treatment
Y - outcome
We can hopefully close all the
backdoor paths.
How plausible is this model?
Example 15 - Gender wage gap decomposition
Y - wage
G - gender
X - educ., work exp., occup., region... (in 1998)
W - parent’s education, foreign born (in 1979)
Fig 2 and 3 from Huber, Martin. ”Causal pitfalls in the decomposition of wage gaps.” Journal of Business & Economic Statistics 33.2 (2015): 179-191.
Example 16 - Mitigating measures and Covid-19
Ii,t - information
Pi,t - adopted policies
Wi,t - unobserved confounding factors
Bi,t - behavior variables
Yi,t+l - future health outcomes
Fig 4 from Chernozhukov, Victor, Hiroyuki Kasahara, and Paul Schrimpf. ”Causal impact of masks, policies, behavior on early covid-19 pandemic in the
US.” Journal of econometrics 220.1 (2021): 23-62.
Lessons to take:
causal structure is important
beware of colliders
working with causal models could be useful, it may clarify your thinking
there are diﬀerent views on how useful the whole DAG literature is
(Epidemiology, CS vs Economics)
Further topics
maybe I cannot manipulate D, but I can manipulate Z (surrogate
experiments)
there are tools for addressing external validity (transportability)
from the data it is possible to create class of admissible DAGs (causal
discovery)
this is currently an area of active research in CS and it seems to be
slowly leaking into economics
Thank you for your attention!
References
This is an overview written for economists. It is suﬃcient if you read this up to page 19: H¨unermund, Paul, and Elias Bareinboim. ”Causal Inference and Data Fusion in
Econometrics.” arXiv preprint arXiv:1912.09104 (2019).
This book is the comprehensive DAG book, there is no book that matches this one in terms of the depth of the exposition. Pearl, Judea. Causality. Cambridge
university press, 2009.
This is a book on the other side of the spectrum. Short and succinct, very readable: Pearl, Judea, Madelyn Glymour, and Nicholas P. Jewell. Causal inference in
statistics: A primer. John Wiley & Sons, 2016.
Please read this, it is diﬃcult to ﬁnd anything better. Appendix A provides a quick intro in DAG calculus. Cinelli, Carlos, Andrew Forney, and Judea Pearl. ”A crash
course in good and bad controls.” Available at SSRN 3689437 (2020).
Banack, Hailey R., and Jay S. Kaufman. ”The “obesity paradox” explained.” Epidemiology 24.3 (2013): 461-462.
Hernˊandez-Dˊıaz, Sonia, Enrique F. Schisterman, and Miguel A. Hernˊan. ”The birth weight “paradox” uncovered?.” American journal of epidemiology 164.11 (2006):
1115-1120.
Griﬃth, Gareth J., et al. ”Collider bias undermines our understanding of COVID-19 disease risk and severity.” Nature communications 11.1 (2020): 1-12.
Blau, Francine D., and Lawrence M. Kahn. ”The gender wage gap: Extent, trends, and explanations.” Journal of economic literature 55.3 (2017): 789-865.
Schneider, Eric B. ”Collider bias in economic history research.” Explorations in Economic History 78 (2020): 101356.
P. Hunermund’s course on DAGs (paid): https://www.udemy.com/course/causal-data-science/
P. Hunermund’s lecture on DAGs. It is a compact version of the above course: https://www.youtube.com/watch?v=GtpnWQ9uTL8 based on H¨unermund, Paul, and
Elias Bareinboim. ”Causal Inference and Data Fusion in Econometrics.” arXiv preprint arXiv:1912.09104 (2019).
Excellent exposition on do-calculus here: https://www.andrewheiss.com/blog/2021/09/07/do-calculus-backdoors)
Huber, Martin. ”Causal pitfalls in the decomposition of wage gaps.” Journal of Business & Economic Statistics 33.2 (2015): 179-191.
DAGs in action on a very relevant topic: Chernozhukov, Victor, Hiroyuki Kasahara, and Paul Schrimpf. ”Causal impact of masks, policies, behavior on early covid-19
pandemic in the US.” Journal of econometrics 220.1 (2021): 23-62.
Excellent and super clear course on many of the concepts we covered here: https://www.bradyneal.com/causal-inference-course It is hard to compete with this one!
CausalAI Lab of Elias Bareinboim is on the research frontier of Causal infernce with machine learning https://causalai.net