Review: Markov Process Markov Models PA154 Jazykové modelování (5.1) Pavel Rychlý pary@fi.muni.cz March 16, 2020 Source: Introduction to Natural Language Processing (600.465) Jan Hajic, CS Dept., Johns Hopkins Univ. www.c5.jhu.edu/~hajic I Bayes formula (chain rule): P(W) = P(w1, W2, .., WT) = n;=1..rp(w; | f/f/ff£/,/./., W;_„+1, .., W;^) n-gram language models: ► Markov process (chain) of the order n-1: P(W) = P(wi, W2, .., Wj) = n;=i 7-p(IV/ | IV;_n+i, W;_„+2, Using just one distribution (Ex.: trigram model: p(w; | i*;-2, : Positions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Words: My car broke down and within hours Bob 's can broke down p(,| broke down) — p(w$ | W3, W4) — p{wi4 \ W12, ^3) Markov Models Markov Properties Long History Possible ■ Generalize to any process (not just words/LM): ► Sequence of random variables: X = (Xi, X2,..., Xj) ► Sample space S (states), size N: S = (So, Si, S2,..., S«) 1. Limited History (Context, Horizon): V/ G 1..T; P(X; I Xi, ..,X;_i) = (X; | X;_i) ^]4 5... 1 7 3 7 9 0 6 7 |T|45... 2. Time invariance (M.C. is stationary, homogenous) V/ G 1.. T, Vy, x G S; P[X, = y \ X,-_! = x) = p(y | x) ? ok... same distribution 1 What if we want trigrams: 1 7 7 3 7 9 0 [6T1 [T| 4 5... 1 Formally, use transformation: Define new variables Q,,suchthatX; = Qi-i, Qi'-Then P{Xi I Xi-i) = P{Qi-i, Qi I Sj, mark them by output symbol s, get rid of output distributions: p(toe) = .48 X .616 X .6 + .2 X 1 X .176 + .2 X 1 X .12 2* .237 In the future, we will use the view more convenient for the problem at hand. Markov Models PA154 Jazykové modelování (5.1) Markov Models Formalization Formalization - Example HMM (the most general case): ■ five-tuple (S,So, Y, Ps,Py), where: ► S = {so, si, S2,..., St} is the set of states, sq is the initial state, ► Y = {yi,y2, • • • ,yv} is the output alphabet, ► Ps(sj | Sj) is the set of prob. distributions of transitions, ► size of Ps : S |2. * Py{yk I si,sj) is the set of output (emission) probability distributions. ► size of Py :| S |2 x | Y \ Example: ■ S= x, 1, 2, 3, 4, s0 = x ■ Y = {t,o,e} Markov Models Using the HMM The generation algorithm (of limited value :-)): Q Start in s = sq. B Move from s to s' with probability Ps(s' \ s). E] Output (emit) symbol yk with probability Ps(yk \ s,s'). Q Repeat from step 2 (until somebody says enough). More interestirig usage: ► Given an output sequence Y = {yi, y2, ■ ■ ■, Yk} compute its probability. ► Given an output sequence Y = {yi, y2, • • •, Yk} compute the most likely sequence of states which has generated it. ► ... plus variations: e.g., n best state sequences Markov Models Example (for graph, see foils 11,12): ► S= {x, 1,2, 3, 4}, s0 = x ► Y= {e,o,t} Ps Z = 1 PA154 Jazykové modelování (5.1) Markov Models