Probabilistic Graphical Models (PGM) PA154 Jazykové modelovaní (8.1) Pavel Rychlý pary@fi.muni.cz April 20, 2021 Source: Probabilistic Graphical Models Daphne Koller http://www.coursera.org/learn/probabilistic-graphical-models PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) Models Models Domain expert Declarative representation ELuilation Algorithm Learning Algorithm Daphne Kbller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 2/22 Graphical models Bayesian networks X\,.. . , Xn - nodes directed graph Markov networks undirected graph Difficulty J I Intelligence f Grade SAT Letter Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 3/22 Textual Information Extraction Mrs. Green spoke today in New York. Green chairs the finance committee. Person Location Person Organization Mrs. Green spoke today in New York Green chairs the finance committee Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 4/22 Graphical models Bayesian networks ■ Grade ■ Course Difficulty ■ Student Intelligence ■ Student SAT ■ Reference Letter P(G,D,I,S,L) Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 5/22 Graphical models Difficulty ) C Intelligence Grade^) Letter Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 6/22 Graphical models PA154 Jazykové modelování (8.1) Daphne Koller Probabilistic Graphical Models (PGM) 7/22 Chain Rule for Bayesian Networks P(D) ( Difficulty ) (intelligence) P(l) "3^. P(G|I,D)(^Grade^) C^AT^) p(sľ) Letter^) P(L|G) P(D,I,G,S,L) = P(D)P(I)P(G\I,D)P(S\I)P(L\G) Distribution defined as a product of factors! Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 8/22 Chain Rule for Bayesian Networks d° d1 0.6 0.4 g1(A) g2(B) g3(C) .0.0 I ,d 0.3 0.4 0.3 .0.1 i ,d 0.05 0.25 0.7 .1 .0 i ,d 0.9 0.08 0.02 i ,d 0.5 0.3 0.2 P(D)( Difficulty P(G|I,D) ' .0 .1 i i 0.7 0.3 I Intelligence) P(l) SAT Letter TT P(L|G) 0 1 s s i° 0.95 0.05 i1 0.2 0.8 i° I1 1 g 0.1 0.9 2 g 0.4 0.6 3 g 0.99 0.01 P(S|I) D/.0.1 3 1 ,1, P(d ,i , g ,s , I ) = 0.6*0.3*0.02*0.01*0.8 PA154 Jazykové modelovaní (8.1) Daphne Koller Probabilistic Graphical Models (PGM) 9/22 Bayesian network ■ A Bayesian network is: - A directed acyclic graph ( DAG) G whose nodes represent random variables Xi,..., Xn - For each node X/ a CPD P( Xj\ ParG(Xj)) ■ The BN represents a joint distribution via the chain rule for Bayesian networks P(Xi ...,Xn) = UiP(Xi\ParG(Xi)) Daphne Koller Probabilistic Graphical Models (PGM) 10/22 BN Is a Legal Distribution: P > 0 ■ P is a product of CPDs ■ CPDs are non-negative Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 11/22 BN Is a Legal Distribution: J] P = 1 ľd,/,g,s,l P(D, I, G, S, L) = Ed,,,g,s,l P(D)P(I)P(G\I, D)P(S\I)P(L\G) = ľd,/,6,s P(D)P(I)P(G\I, D)P(S\l)j:Lr(L\G) = Ed,i,g,sP(D)P(I)P(G\I,D)P(S\I) = ED,,,G P(D)P(I)P(G\I, D)Es p(J|0 = ED)/ P(P)P(/)Eg p(g|/,o) Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 12/22 P Factorizes over G ■ Let G be a graph over Xi,..., Xn. ■ P factorizes over G if P(X1,...,Xn) = UiP(Xi\ParG{Xi)) Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 13/22 Naive B ayes Model n P(C,Xi,...,Xll) = P(C)JIP(X/|C) i=l features X/, Xj independent Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 14/22 Na'íve Bayes Classifier P(C=c1\x1,...,xn) P(C=c1) —A P(C=c2|xi,...,x„) P(C=c2) P(C=r^ 11/= 1 P(x/-|C=c1) P(x,-|C=c2) Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 15/22 Bernoulli Na'íve Bayes for Text Document Label cats dogs buy sell Financial 0.001 0.001 0.2 0.3 Pets 0.3 0.4 0.02 0.0001 P("cat" appears | Label) P(C=c1\x1,...,xn) _ P(C=c2|xi,...,xn) P(C=c2) P(C=c1) nn P(x/-|C=c1) P(C=c^ 11/=1 P(xi\C=c2) Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 16/22 Multinomial Na'íve Bayes for Text Document Label cats dogs buy sell Financial 0.001 0.001 0.02 0.02 Pets 0.02 0.03 0.003 0.001 P(C=C1|xi,...,Xn) P(C=C1) 1-TA7 P(X/|C=C1) p(c=r*\x,.....*„i p(r=c^ 11/=i prx.ir=r2^ P(C=c2|xi,...,x„) P(C=c2) P(x,-|C=c2) Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 17/22 Summary ■ Simple approach for classification ► Computationally efficient ► Easy to construct ■ Surprisingly effective in domains with many weakly relevant features ■ Strong independence assumptions reduce performance when many features are strongly correlated Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 18/22 Application: Diagnosis Probabilistic Graphical Models A- Representation Bayesian Networks Application: Diagnosis Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 19/22 Medical Diagnosis: Pathfinder (1992) ■ Help pathologist diagnose lymph node pathologies (60 different diseases) ■ Pathfinder I: Rule-based system ■ Pathfinder II used naive Bayes and got superior performance Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 20/22 Medical Diagnosis: Pathfinder (1992) ■ Pathfinder III: Naive Bayes with better knowledge engineering ■ No incorrect zero probabilities ■ Better calibration of conditional probabilities ► P(finding\diseasei) to P(finding\disease2) ► Not P(findingi\disease) to P(finding2\disease) Heckerman et al. Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 21/22 Medical Diagnosis: Pathfinder (1992) ■ Pathfinder IV: Full Bayesian network ► Removed incorrect independencies; ► Additional parents led to more accurate estimation of probabilities ■ BN model agreed with expert panel in 50/53 cases, vs 47/53 for naive Bayes model ■ Accuracy as high as expert that designed the model Heckerman et al. Daphne Koller PA154 Jazykové modelování (8.1) Probabilistic Graphical Models (PGM) 22/22