Taggers PA154 Language Modeling (7.2) Pavel Rychlý pary@fi.muni.cz Pavel Rychlý ·Taggers · 1 / 14 Statistical Tagger using Viterbi algorithm to find the most probable sequence of tags sometimes even greedy search works the hard part is to find probabilities Pavel Rychlý ·Taggers · 2 / 14 TreeTagger Helmut Schmid, Stuttgart 1994 originally developed and evaluated on English, later also German disambiguation of proper nouns (named entities) and regular words smoothing with Equivalence Classes words with the same set of possible tags tag is atomic, no attributes or categories probabilities: decision trees Vitterbi algorithm https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ Pavel Rychlý ·Taggers · 3 / 14 TreeTagger - decision tree decision tree binary tree one condition in each inner node yes/no leads to left/right child every input (token) finds a leaf construction based on information gain Pavel Rychlý ·Taggers · 4 / 14 TreeTagger - decision tree example house in “The big house” is NN with probability 0.7 ADJ with probability 0.1 Pavel Rychlý ·Taggers · 5 / 14 TreeTagger - lexicon lexicon of words with respective tags probabilities of tags from a training corpus words not included in lexicon 1. lowercase in lexicon 2. suffix lexicon 3. default entry (relative frequencies in suffix tree) Pavel Rychlý ·Taggers · 6 / 14 TreeTagger - suffix lexicon Pavel Rychlý ·Taggers · 7 / 14 TreeTagger – results Pavel Rychlý ·Taggers · 8 / 14 TreeTagger – results Pavel Rychlý ·Taggers · 9 / 14 RFTagger Helmut Schmid, Florian Laws, Stuttgart 2008 non-atomic tags https://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/ Das zu versteuernde Einkommen sinkt. („The to be taxed income decreases.“ The taxable income decreases.) Pavel Rychlý ·Taggers · 10 / 14 RFTagger - tags tag = vector of attributes each part of speech - different vector (set of attributes) attribute values separated by dot (.) first attribute = PoS tagging - first Pos, then respective attributes Pavel Rychlý ·Taggers · 11 / 14 RFTagger - decision tree decision tree condition on one attribute only separate tree for each value of an attribute leaf - one probability of that value Pavel Rychlý ·Taggers · 12 / 14 RFTagger – decision tree nominative case of nouns (N.Nom) Pavel Rychlý ·Taggers · 13 / 14 RFTagger – results Pavel Rychlý ·Taggers · 14 / 14