MUNI
FI
HMM Algorithms: Trellis and Viterbi
PA154 Language Modeling (4.2)
Pavel Rychlý
pary@fi.muni.cz March 9,2023
HMM: The Two Tasks
HMM (the general case):
■ five-tuple (S, S0. Y, Ps>Py)> where:
■ S = {s1,s2,... :sT} is the set of states, SQ is the initiaL,
■ Y = {yi,y2, ■ ■ ■ ,yv} is the output aLphabet,
■ Ps(sj\si) is the set of prob. distributions of transitions,
■ Py(yk\si:Sj) is the set of output (emission) probability distributions.
Given an HMM & an output sequence V = {yi,yi, ■ ■ ■ ,yi<} (Task 1) compute the probability of Y;
(Task 2) compute the most likely sequence of states which has generated Y.
Source: Introduction to NaturaL Language Processing (600.465) Jan Hajic.CS Dept.Jonns Hopkins Univ. www.cs.jnu.edu/~najic
~aveL RycnLy • HMM Algorithms: Trellis and Viterbi • Marc
Trellis - Deterministic Output
HMM:
Trellis: time/position f
0 12 3
p(toe)=x.6 x .88 x + x.4 x .1 x 1 = .568
Y'     t       o e -trellis state: (HMM state, position) '
-each state: holds one number (prob):a <*(x,o) =1^,1) = .60(0,2) = - probability orY: la in the last state ^(c 1) = 4
Creating the Trellis: The Start
Start in the start state (x),
■ its q(x,0) to 1. Create the first stage:
■ get the first "output" symbol y\
■ create the first stage (column)
■ but only those trellis states
which generate yi
■ set their a(state,l) to the Ps(state\ x'
...and forget about the 0-th stage
position/stage 0 1
q(x,0)
y-:t
-'aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
■'aveL RychLy • HMM Algorithms: Trellis and Viterbi • Marc
Trellis: The Next Step
Trellis: The Last Step
Suppose we are in stage /, Creating the next stage:
■ create all trellis state in the next stage which generate y,+1, but only those reachable from any of the stage-/ states
■ set their a(state, i + 1) to:
Ps{state\ prev.state) xa(prev.state, i) (add up all such numbers on arcs going to a common trellis state)
■ ...and forget about stage /
position/stage i=l
Continue until "output" exhausted
■ |/| = 3: until stage 3
Add together all the a(state,\ Y\) That's the P(Y). Observation (pleasant):
■ memory usage max: 2|S|
■ multiplications max: |S|2|y|
last position/stage
P(Y)-568
3aveL RycnLy • HMM Algorithms: Trellis and Viterbi • Marcn 9, 2023
5/22
3aveL RycnLy • HMM Algorithms: Trellis and Viterbi • Marcn 9, 2023
6/22
Trellis: The General Case (still, bigrams)
General Trellis: The Next Step
Start as usual:
■ start state (x), set its q(x,0) to 1.
p(toe) = .48 x .616 x .6 + .2 x 1 x .176 +
.2 x 1 x .12 .237
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
We are in stage /:
■ Generate the next stage i+1 as before (except now arcs generate output, thus use only those arcs marked by the output symbol y,+1)
■ For each generated state compute a(state,i + 1) -
:sPy(yi+i \state,prev.state)
...and forget about stage / as usual
^aveL RychLy . HMM
March 9,2023
Trellis: The Complete Example
Stage:
9/22
The Case of Trigrams
■ Like before, but:
■ states correspond to bigrams,
■ output function always emits the second output symbol of the pair
(state) to which the arc goes:
Multiple paths not possible -> trellis not really needed
^aveL RychLy . HMM Algorithms: Trellis and Viterbi . March 9,2023 10/22
Trigrams with Classes
■ More interesting:
■ n-gram class LM:p(wi\wi_2,        = p(rV/|c/)p(c,-|c;
-> states are pairs of classes {cj_^ o],and emit "words' fo,e,y
^letters in our example)
P(t|Q -p(o|V) = p(e|V) --p(y|V) =
„(roe) = .6 p(reo) = .6 p(roy) = p(rr/) = .6
usual, non-overlapping classes
.88 x .3 x .07 x .6 £S .00665 .88 x .6 x .07 x .3 ~ .00332 : .88 x .3 x .07 x .1 ~ .00111 .12 x 1 x 1 x .1 ~ .0072
Class Trigrams: the Trellis
■ Trellis generation (Y = "toy"): p(t|C) = 1 p(o|V) = .3 p(e|V) = .6 p(y|V) = .1
again, treLLis usefuL but not really needed
3aveL RychLy • HMM Algorithms: TreLLis and Viterbi • March 9, 2023
11/22
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
12/22
Overlapping Classes
Imagine that classes may overlap
■ e.g. Y is sometimes vowel sometimes consonant, belongs to V as well as C:
o,e,y,r o,e,y,r
p(t|C) = .3 p(r|C) = .7 p(o|V) = .1 p(e|V) = .3 p(y|V) = .4 p(r|V) = .2
p(try) = ?
Overlapping Classes: Trellis Example
p(t|C) = .3 p(r|C) = .7 p(o|V) = .1 p(e|V) = .3 P(V|V) = .4 p(r|V) = .2
y p(Y)=.006955
o,e,y,r o,e,y,r
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
Trellis: Remarks
The Viterbi Algorithm
So far, we went left to right (computing a) Same result: going right to left (computing fS)
■ supposed we know where to start (finite data)
In fact, we might start in the middle going left and right Important for parameter estimation (Forward-Backward Algortihm alias Baum-Welch) Implementation issues:
■ scaling/normalizing probabilities, to avoid too small numbers & addition problems with many transitions
■ Solving the task of finding the most likely sequence of states which generated the observed data
■ i.e.,finding
Sbest - argmaxsP(S\Y) which is equal to (Y is constant and thus P(Y) is fixed): Sbest ~ argmaxsP(S,Y) -= argmaxsP(so,s1,S2, ■ ■ ■,sk,y1,y2, ■ ■ ■,») = = argmaxsY\j=i k P(yi|s;,s;_i)P(s;|s,-_i)
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
The Crucial Observation
Viterbi Example
Imagine the trellis build as before (but do not compute the as yet; assume they are o.k.); stage /:
stage stage
NB: remember previous state 5     from which we got the maximum
q = .max{3, .32) = .32
©
this is certainLythe "backwards" maximum to (D,2)...but
it cannot change even whenever we go forward (M. Property: Limited History)
Y classification (C or V?, sequence?): fo,e,y,r
o,e,y,r o,e,y,r
P(t|Q = .3 p(r|C) = .7 p(o|V) = .1 p(e|V) = .3 p(y|V) = .4 p(r|V) = .2
argmaxxvz p(rry|XYZ) = ?
Possible state seq.: (x, V)(V, C)(C, V)[VCV], (x,C)(C,C)(C,V)[CQ/], (x, C)(C, V)(V, V)[CW]
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
17/22
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
18/22
Tracking Back the n-best paths
Backtracking-style algorithm:
■ Start at the end, in the best of the n states (s6est)
■ Put the other n-1 best nodes/back pointer pairs on stack, except those leading from sbest to the same best-back state.
Follow the back "beam" towards the start of the data, spitting out nodes on the way (backwards of course) using always only the best back pointer.
At every beam split, push the diverging node/back pointer pairs onto
the stack (node/beam width is sufficient!).
When you reach the start of data, close the path, and pop the
topmost node/back pointer(width) pair from the stack.
Repeat until the stack is empty; expand the result tree if necessary.
Pruning
Sometimes, too many trellis states in a stage:
a = .002
criteria: (a) a < threshold (b) Zir < threshold (c) # of states > threshold (get rid of smallest a)
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
21/22
3aveL RychLy • HMM Algorithms: Trellis and Viterbi • March 9, 2023
22/22