HMM Tagging PA154 Language Modeling (5.2) Pavel Rychly pary@fi.muni.cz March 16,2023 Source: Introduction to Natural Language Processing (600.465) Jan Hajic, CS Dept.Johns Hopkins Univ. www.cs.jhu.edu/~hajic Review ■ Recall: ■ tagging ~ morphological disambiguation ■ tagset Vj c (Q, C2,... Cn) ■ C, - morphological categories, such as POS, NUMBER, CASE, PERSON, TENSE, GENDER,... ■ mapping w —>► {t e Vj] exists ■ restriction of Morphological Analysis: A+ 2(LC2C2' Cn) where A is the language alphabet, L is the set of lemmas ■ extension of punctuation, sentence boundaries (treated as words) Pavel Rychly • HMM Tagging • March 16,2023 2/12 The Setting Noisy Channel setting: Input (tags) NNP VBZDT. The channel (adds "noise") Output (words) John drinks the.. Goal (as usual): discover "input" to the channel (T,the tag seq.) given the "output" (W, the word sequence) ■ p{T\W) = p{W\T)p{T)/p{W) ■ p(W) fixed (W given)... argmaxTp(T\W) = argmaxTp(W\T)p(T) Pavel Rychly • HMM Tagging • March 16,2023 3/12 The Model Two models (d = \ W\ = |7"| word sequence Length): ■ p{W\T) = n/=1...cyp(^/|^1, . . . ti, ... ,£