Causal Models Lukˊaˇs Laffˊers Matej Bel University, Dept. of Mathematics MUNI Brno 11.11.2021 Causality Is it possible to recover a causal relationship from observational dataset? Graphical models Judea Pearl (UCLA) and his book Causality. Graphical models Unified setup on how to think about causality. Every problem is visualized in terms of a causal graph. It is easier to think about a problem once you have a graph that visualize the relationships. It provides a set of rules that show when and how it is possible to identify causal effects. This set of rules may be automated. It makes the thinking about the identification easier. X D Y The relationship of D and Y is of interest D and Y are associated directly D and Y are associated indirectly via X X D Y The relationship of D and Y is of interest D and Y are associated directly D and Y are associated indirectly via X X is a confounder Notation X D Y Node Edge Path Directed path Parent/Child Ascendant/Non-ascendats Acyclic graph Directed Acyclic Graphs - DAGs Directed - arrows have direction Acyclic - there does not exist a cycle in this graph. Causality is an asymmetric concept Graphs - object that encodes the causal structure of the problem Direct effect X D Y P(y,d,x) is the joint distribution (shorthand for P(Y = y,D = d,X = x)) P(x|d,y) = P(x) (no edge between X and Y,D) testable implications P(y,d,x) = P(x) P(x|parx ) · P(d) P(d|pard ) · P(y|d) P(y|pary ) =⇒ P(y,d,x) = P(x)·P(y,d) and therefore X ⊥⊥ (D,Y) Bayesian factorization X D Y P(y,d,x) = P(x) P(x|parx ) · P(d) P(d|pard ) · P(y|d) P(y|pary ) or in general P(x1,x2,··· ,xn) = P(x1|parx1 )·P(x2|parx2 )···P(xn|parxn ) Given its parents, the variable is independent of all of its non-descendants. Every parent is a direct cause of all its children. Effect in reverse direction X D Y P(y,d,x) is the joint distribution P(x|d,y) = P(x) (no edge between X and Y,D) testable implications P(y,d,x) = P(x)·P(d|y)·P(y) =⇒ P(y,d,x) = P(x)·P(y,d) and therefore X ⊥⊥ (D,Y) Confounded effect Graph includes information about independencies X D Y P(y,d,x) is the joint distribution No testable implications. Direct and Indirect effect X D Y P(y,d,x) is the joint distribution No testable implications. No effect (fork) X D Y P(y,d,x) is the joint distribution P(d|x,y) = P(d|x) (no edge between D and Y) testable implications P(y,d,x) = P(x)·P(y|x)·P(d|x) =⇒ P(y,d|x) = P(y|x)·P(d|x) and therefore Y ⊥⊥ D|X Indirect effect via X (chain) X D Y P(y,d,x) is the joint distribution P(y|x,d) = P(y|x) (no edge between D and Y)) testable implications P(y,d,x) = P(x|d)·P(d) =P(d|x)·P(x) ·P(y|x) =⇒ P(y,d|x) = P(y|x)·P(d|x) and therefore Y ⊥⊥ D|X So far, we have seen that very different setups (in terms of direction of effects) have the same testable implications. Graphs are helpful if we want to study their implications for statistical independencies. Graphs alone are not sufficient, we need to equip this setup with something else in order to talk about causality. Collider (immorality) X D Y P(y,d,x) is the joint distribution P(y|x,d) = P(y|x) (no edge between D and Y)) testable implications P(y,d,x) = P(x|d,y)·P(d)·P(y) =⇒ sum across x P(y,d) = P(y)·P(d) and therefore Y ⊥⊥ D Collider (immorality) continued X D Y Conditioning induces dependence Conditioned on X, previously independent D and Y are now dependent. testable implications P(y,d,x) = P(x|d,y)·P(d)·P(y) =⇒ P(y,d|x) = P(y|x)·P(d|x) and therefore Y ⊥⊥ D|X Example 1 (collider bias) - known as ”Bad controls” Griffith, Gareth J., et al. ”Collider bias undermines our understanding of COVID-19 disease risk and severity.” Nature communications 11.1 S X1 X2 X1 - academic ability X2 - sporting ability S - admitted to university Example 2 (collider bias) Z X Y U X - maternal smoking Y - infant mortality Z - birth-weight U - unobserved risk factors (e.g. birth-defects, malnutrition) Hernˊandez-Dˊıaz, Sonia, Enrique F. Schisterman, and Miguel A. Hernˊan. ”The birth weight “paradox” uncovered?.” American journal of epidemiology 164.11 (2006): 1115-1120. Example 3 (collider bias) - Obesity paradox Z X Y U X - obesity Y - mortality Z - heart-failure U - unobserved risk factors (e.g. genetic factors, lifestyle behaviour) Banack, Hailey R., and Jay S. Kaufman. ”The “obesity paradox” explained.” Epidemiology 24.3 (2013): 461-462. Example 4 (collider bias) - Gender wage gap X D Y U D - gender Y - log wages X - {education, work experience, occupation} U - unobserved variables Blau, Francine D., and Lawrence M. Kahn. ”The gender wage gap: Extent, trends, and explanations.” Journal of economic literature 55.3 (2017): 789-865. Example 5 (collider bias) - Nutrition/height puzzle X D Y D - childhood nutrition Y - adult height X - in military Schneider, Eric B. ”Collider bias in economic history research.” Explorations in Economic History 78 (2020): 101356. All these examples show the importance of the causal structure of the problem at hand. Conditioning on certain variables may (or may not) induce an association that is not of interest. Failing to condition on the right variables may result in a mixed set of associations - also not of interest. More notation to come... Blocked path D-separation Causal vs non-causal association Manipulated graph Intervention - ”do-operator” Sufficient adjustment set Structural Causal Models Endogenous vs exogenous variables Blocked path Any path p is blocked by a set of variables B if: (1) p contains a chain or a fork, such that the middle node is in B or (2) p contains a collider, such that neither the middle node is in B, nor any of its descendants, are in B Blocked path X1 D Y X2 X3 X4 X5 X6 M X7 d-separation For a given graph G, let us have three disjoint sets of variables B1,B2,B3: B1 and B2 are d-connected by B3 ⇐⇒ there exists an undirected path p between some vertex in B1 and some vertex in B2 such that for every collider C on p, either C or a descendant of C is in B3, and no non-collider on p is in B3. B1 and B2 are d-separated by B3 ⇐⇒ B1 and B2 are not d-connected by B3 B1 and B2 are d-separated by B3 ⇐⇒ if B3 blocks every path between B1 and B2 {D} and {Y} are d-connected by {X5}. There are 3 paths. X1 D Y X2 X3 X4 X5 X6 M X7 {D} and {Y} are d-separated by {X1,M}. All three paths are blocked.. X1 D Y X2 X3 X4 X5 X6 M X7 d-separation and statistical independence Notation: (B1 ⊥⊥ B2|B3)G ⇐⇒ B1 and B2 are d-separated by B3 in a graph G =⇒ B1 ⊥⊥ B2|B3 D-separation implies statistical independence (assuming that the graph is correct). X1 D Y X2 X3 X4 X5 X6 M X7 non-causal association non-causal association causal association X D Y non-causal association causal association X D Y causal association causal association Intervention - ”do-operator” X D Y X D Y Manipulating D to be equal to d do(D = d) removes all the parents from nod D and sets P(D = d) = 1. Intervention - ”do-operator” Manipulating D to be equal to d do(D = d) removes all the parents from nod D and sets P(D = d) = 1. It induces an interventional distribution: P(Y,X|do(D = d)) which can be used to define potential outcomes: E[Y(d)] Three different causal graphs X D Y non-causal association causal association X D Y causal association causal association X D Y causal association non-causal association Controlling for X X D Y blocked causal association X D Y causal association blocked X D Y causal association open Not-controlling for X X D Y open causal association X D Y causal association open X D Y causal association blocked Back-door criterion A set of variables B satisfies the back-door criterion if it: blocks all spurious paths (non-causal, non-directed) from D to Y does not block any of the causal paths from D to Y does not open any spurious paths (via colliders or their descendants) Then E[Y(d)] = E[Y|do(D = d)] = E[ E[Y|D = d,B] random (due to B) ] thus we get the mean of potential outcome Y(d) from non-experimental data (!) (Note: The outer expectation is taken with respect to B.) Back-door criterion X D Y blocked causal association X D Y causal association blocked X D Y causal association causal association B = {X} E[Y(d)] = E[E[Y|D = d,X]] B = {} E[Y(d)] = E[Y|D = d] B = {} E[Y(d)] = [E[Y|D = d] Example 6: Different conclusions based on the same data D Y X X - management position D - gender or lifestyle Y - wage Causal structure matters. Very different conclusions can be reached from the same data. Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/ Example 6a: D Y X X - management position D - gender Y - wage E[Y(d)] = E[Y|D = d] = ∑ x∈{0,1} E[Y|D = d,X = x]Pr(X = x|D = d) Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/ Example 6b: D Y X X - management position D - lifestyle Y - wage E[Y(d)] = E E[Y|D = d,X] = ∑ x∈{0,1} E[Y|D = d,X = x]Pr(X = x) Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/ Example 6a: ♀ ♂ Not manager 3163 (87) 3015 (59) Manager 5592 (13) 5319 (41) X - management position D - gender Y - wage E[Y(♀)] = ∑ x∈{0,1} E[Y|D = ♀,X = x]Pr(X = x|D = ♀) = 3163 ·0.87 +5592 ·0.13 = 3478.77 E[Y(♂)] = ∑ x∈{0,1} E[Y|D = ♂,X = x]Pr(X = x|D = ♂) = 3015 ·0.59 +5319 ·0.41 = 3959.64 E[Y(♀)−Y(♂)] = 3478.77 −3959.64 = −480.87 Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/ Example 6b: Not manager 3163 (87) 3015 (59) Manager 5592 (13) 5319 (41) X - management position D - lifestyle Y - wage E[Y( )] = ∑ x∈{0,1} E[Y|D = ,X = x]Pr(X = x) = 3163 · 87 +59 200 +5592 · 13 +41 200 = 3818.83 E[Y( )] = ∑ x∈{0,1} E[Y|D = ,X = x]Pr(X = x) = 3015 · 87 +59 200 +5319 · 13 +41 200 = 3637.08 E[Y( )−Y( )] = 3818.83 −3637.08 = 181.75 Paul Hunermund’s course: https://www.udemy.com/course/causal-data-science/ Do-calculus Back-door criterion is only an application of one of the three rules of ”do-calculus” There are three rules that provide exhaustive manipulation with do-operator this whole process can be fully automated (!) yes, that’s correct, automated! 1. Ignoring observations P(y|z,do(x),w) = P(y|do(x),w) ⇐⇒ (Y ⊥⊥ Z|W,X)G¯X remove all the arrows pointing into X 2. Treating interventions as observations P(y|do(z),do(x),w) = P(y|z,do(x),w) ⇐⇒ (Y ⊥⊥ Z|W,X)G¯X,Z remove all the arrows pointing from Z 3. Ignoring interventions P(y|do(z),do(x),w) = P(y|do(x),w) ⇐⇒ (Y ⊥⊥ Z|W)G¯X,¯Z(W) remove all the arrows pointing into Z(W) that are not ancestors of W Back-door criterion? P(y|do(x)) = P(y|do(x),z)·P(z|do(x)) Chain rule = P(y|x,z)·P(z|do(x)) Rule 2 = P(y|x,z)·P(z|{}) Rule 3 = P(y|x,z)·P(z) Backdoor criterion Note that: do(x) is a shorthand notation for do(X = x). This is an event: ”X is manipulated to be equal to x” X1 D Y X2 X3 X4 X5 X6 M X7 D = fD(X1) M = fM(M) Y = fY (M) X1 = f1(X2) X2 = X2 X3 = f3(X2) X4 = f4(D) X5 = f5(X4,X6) X6 = f6(Y) X7 = f7(X5) Structural causal models X1 D Y X2 X3 X4 X5 X6 M X7 U1 U2 U3 UM U4 U5 U6 UY U7 UD D = fD(X1,UD) M = fM(M,UM) Y = fY (M,UY ) X1 = f1(X2,U1) X2 = f2(U2) X3 = f3(X2,U3) X4 = f4(D,U4) X5 = f5(X4,X6,U5) X6 = f6(Y,U6) X7 = f7(X5,U7) U ∼ P Modified Structural Causal Model X D Y UD UX UY X D=d Y UX UY Modified Structural Causal Model D = fD(X,UD) X = fX (UX ) Y = fY (D,X,UY ) D = d X = fX (UX ) Y = fY (D,X,UY ) Example 7 (unobserved confounders) - Returns to education U D Y D - education Y - log wages U - unobserved ability We cannot close the backdoor path via U because it is unobserved. Example 8: Human Capital Model (Becker 1994) X Z YD U Z - parental education D - education Y - log wages X - family income U - unobserved background characteristics Conditioning on X closes all the backdoor paths. Example 9: Human Capital Model (Becker 1994) - ver.2 X Z YD U Z - parental education D - education Y - log wages X - family income U - unobserved background characteristics Not possible to close the backdoor path via U as it is unobserved Example 10: Schooling again YD U1 X U2 D - education Y - log wages X - family income U1 - unobserved mother’s characteristics U2 - unobserved father’s characteristics conditioning on X makes things even worse as it opens up two new paths Example 11: Discrimination X D Y U G G - gender D - discrimination X - occupation Y - log wages U - unobserved ability conditioning on X closes the mediated path but it opens up a new path D ← X ← U → Y Example 12: Covid risk factors D Y X1 X2 X1 - smoking X2 - frailty D - Covid hospitalization Y - death looking at the hospitalized patients only (conditioning on D) induces spurious correlation among different independent(!) risk factors: smoking (X1) and frailty (X2) https://www.hdruk.ac.uk/news/we-should-be-cautious-about-associations-of-patient-characteristics-with-covid-19-outcomes-that-are-identified-in- Example 13: Age adjustment for Vaccine effectiveness D Y X X - age D - vaccination Y - severe Covid Adjusting for age closes the back-door path. https://www.covid-datascience.com/post/israeli-data-how-can-efficacy-vs-severe-disease-be-strong-when-60-of-hospitalized-are-vaccinated Example 14 - many confounders D Y X5 X3 X1 X2 X6 X8 X7 X9 X4 X1,X2,··· - controls D - treatment Y - outcome We can hopefully close all the backdoor paths. How plausible is this model? Example 15 - Gender wage gap decomposition Y - wage G - gender X - educ., work exp., occup., region... (in 1998) W - parent’s education, foreign born (in 1979) Fig 2 and 3 from Huber, Martin. ”Causal pitfalls in the decomposition of wage gaps.” Journal of Business & Economic Statistics 33.2 (2015): 179-191. Example 16 - Mitigating measures and Covid-19 Ii,t - information Pi,t - adopted policies Wi,t - unobserved confounding factors Bi,t - behavior variables Yi,t+l - future health outcomes Fig 4 from Chernozhukov, Victor, Hiroyuki Kasahara, and Paul Schrimpf. ”Causal impact of masks, policies, behavior on early covid-19 pandemic in the US.” Journal of econometrics 220.1 (2021): 23-62. Lessons to take: causal structure is important beware of colliders working with causal models could be useful, it may clarify your thinking there are different views on how useful the whole DAG literature is (Epidemiology, CS vs Economics) Further topics maybe I cannot manipulate D, but I can manipulate Z (surrogate experiments) there are tools for addressing external validity (transportability) from the data it is possible to create class of admissible DAGs (causal discovery) this is currently an area of active research in CS and it seems to be slowly leaking into economics Thank you for your attention! References This is an overview written for economists. It is sufficient if you read this up to page 19: H¨unermund, Paul, and Elias Bareinboim. ”Causal Inference and Data Fusion in Econometrics.” arXiv preprint arXiv:1912.09104 (2019). This book is the comprehensive DAG book, there is no book that matches this one in terms of the depth of the exposition. Pearl, Judea. Causality. Cambridge university press, 2009. This is a book on the other side of the spectrum. Short and succinct, very readable: Pearl, Judea, Madelyn Glymour, and Nicholas P. Jewell. Causal inference in statistics: A primer. John Wiley & Sons, 2016. Please read this, it is difficult to find anything better. Appendix A provides a quick intro in DAG calculus. Cinelli, Carlos, Andrew Forney, and Judea Pearl. ”A crash course in good and bad controls.” Available at SSRN 3689437 (2020). Banack, Hailey R., and Jay S. Kaufman. ”The “obesity paradox” explained.” Epidemiology 24.3 (2013): 461-462. Hernˊandez-Dˊıaz, Sonia, Enrique F. Schisterman, and Miguel A. Hernˊan. ”The birth weight “paradox” uncovered?.” American journal of epidemiology 164.11 (2006): 1115-1120. Griffith, Gareth J., et al. ”Collider bias undermines our understanding of COVID-19 disease risk and severity.” Nature communications 11.1 (2020): 1-12. Blau, Francine D., and Lawrence M. Kahn. ”The gender wage gap: Extent, trends, and explanations.” Journal of economic literature 55.3 (2017): 789-865. Schneider, Eric B. ”Collider bias in economic history research.” Explorations in Economic History 78 (2020): 101356. P. Hunermund’s course on DAGs (paid): https://www.udemy.com/course/causal-data-science/ P. Hunermund’s lecture on DAGs. It is a compact version of the above course: https://www.youtube.com/watch?v=GtpnWQ9uTL8 based on H¨unermund, Paul, and Elias Bareinboim. ”Causal Inference and Data Fusion in Econometrics.” arXiv preprint arXiv:1912.09104 (2019). Excellent exposition on do-calculus here: https://www.andrewheiss.com/blog/2021/09/07/do-calculus-backdoors) Huber, Martin. ”Causal pitfalls in the decomposition of wage gaps.” Journal of Business & Economic Statistics 33.2 (2015): 179-191. DAGs in action on a very relevant topic: Chernozhukov, Victor, Hiroyuki Kasahara, and Paul Schrimpf. ”Causal impact of masks, policies, behavior on early covid-19 pandemic in the US.” Journal of econometrics 220.1 (2021): 23-62. Excellent and super clear course on many of the concepts we covered here: https://www.bradyneal.com/causal-inference-course It is hard to compete with this one! CausalAI Lab of Elias Bareinboim is on the research frontier of Causal infernce with machine learning https://causalai.net