Lecture 4
.
......
Syntactic Formalisms for Parsing
Natural Languages
Aleš Horák, Miloš Jakubíček, Vojtěch Kovář
(based on slides by Juyeon Kang)
ia161@nlp.fi.muni.cz
Autumn 2013
IA161 Syntactic Formalisms for Parsing Natural Languages 1 / 32
Lecture 4
.
...... Dependency Syntax and Parsing
IA161 Syntactic Formalisms for Parsing Natural Languages 2 / 32
Lecture 4
Outline
1 Motivation
2 Dependency Syntax
3 Dependency Parsing
IA161 Syntactic Formalisms for Parsing Natural Languages 3 / 32
Lecture 4
Motivation
what you have seen as far: applying analysis of formal
languages to a natural language – creating a phrase-structure
derivation tree according to some grammar
PS accounts for one important syntactic property:
constituency
is that all?
but what about: discontinuous phrases, structure sharing
IA161 Syntactic Formalisms for Parsing Natural Languages 4 / 32
Lecture 4
Motivation
another crucial syntactic phenomenon is dependency
what is a dependency? ”some relation between two words“
what is the diﬀerence to phrase-structure?
what does constituency express?
what does dependency express?
IA161 Syntactic Formalisms for Parsing Natural Languages 5 / 32
Lecture 4
Dependency Syntax (Meľchuk 1988)
A more formal account – what is a dependency? A relation!
.
Dependency Relation
..
......
Let W be a set of all words within a sentence, then dependency relation
→ is D ⊆ W × W such that:
D is anti-reﬂexive: a → b ⇒ a ̸= b
D is anti-symmetric: a → b ∧ b → a ⇒ a = b, ≡
(anti-reﬂexivity) a → b ⇒ b ↛ a
D is anti-transitive: a → b ∧ b → c ⇒ a ↛ c
optionally: D is labeled: there is a mapping l : D → L, L being
the set of labels
IA161 Syntactic Formalisms for Parsing Natural Languages 6 / 32
Lecture 4
Dependency Representation
a → b: a depends on b, a is a dependent b, b is the head
of a
a dependency graph
a dependency tree
IA161 Syntactic Formalisms for Parsing Natural Languages 7 / 32
Lecture 4
Dependency Tree vs. PS Tree
sleep S
ideas furiously NP VP
Green A N V ADV
Green ideas sleep furiously
IA161 Syntactic Formalisms for Parsing Natural Languages 8 / 32
Lecture 4
Non-projectivity
a property of a dependency tree: a sentence is non-projective
whenever drawing (projecting) a line from a node to the surface
of the tree crosses an arc
a lot of attention has been paid to this problem
practical implications are rather limited (in most cases
non-projectivity can be easily handled or avoided)
hard cases:
koupil
Malou
chaloupku
IA161 Syntactic Formalisms for Parsing Natural Languages 9 / 32
Lecture 4
Czech Tradition of Dependency Syntax
a long tradition of dependency syntax in the Prague linguistic
school (Sgall, Hajičová, Panevová)
Institute of Formal and Applied Linguistics at Charles University
formalized as Functional Generative Description (FGD) of
language
Prague Dependency Treebank (PDT)
IA161 Syntactic Formalisms for Parsing Natural Languages 10 / 32
Lecture 4
Dependencies vs. PS
is one of the formalisms clearly better than the other one?
No.
dependencies: ⊕ account for relational phenomena, ⊕ simple
phrase-structure: ⊕ account for constituency, ⊕ easy chunking
can we perform transformation from one of the formalism to the
other one a vice versa? Technically yes, but . . .
It is not a problem to convert the structure between a dependency
tree and a PS tree ...
... but it is a problem to transform the information included
⇒ both of the formalisms are convertible but not mutually
equivalent
IA161 Syntactic Formalisms for Parsing Natural Languages 11 / 32
Lecture 4
Dependency Parsing
rule-based vs. statistical
transition-based (→ deterministic parsing)
graph-based (→ spanning trees algorithms)
various other approaches (ILP, PS conversion, . . . )
very recent advances (vs. long studied PS parsing algorithms)
IA161 Syntactic Formalisms for Parsing Natural Languages 12 / 32
Lecture 4
Introduction to Dependency parsing
Motivation
a. dependency-based syntactic representation seem to be useful in
many applications of language technology: machine translation,
information extraction
→ transparent encoding of predicate-argument structure
b. dependency grammar is better suited than phrase structure
grammar for language with free or ﬂexible word order
→ analysis of diverse languages within a common framework
c. leading to the development of accurate syntactic parsers for a
number of languages
→ combination with machine learning from syntactically
annotated corpora (e.g. treebank)
IA161 Syntactic Formalisms for Parsing Natural Languages 13 / 32
Lecture 4
Introduction to Dependency parsing
Dependency parsing
“Task of automatically analyzing the dependency structure of a
given input sentence”
Dependency parser
“Task of producing a labeled dependency structure of the kind
depicted in the follow ﬁgure, where the words of the sentence
are connected by typed dependency relations”
ROOT Economic news had little eﬀect on ﬁnancial markets .
PRED
PU
PC
ATTATT
OBJ
ATTSBJATT
IA161 Syntactic Formalisms for Parsing Natural Languages 14 / 32
Lecture 4
Deﬁnitions of dependency graphs and dependency
parsing
Dependency graphs: syntactic structures over sentences
Def. 1.: A sentence is a sequence of tokens denoted by
S = w0w1 . . . wn
Def. 2.: Let R = {r1, . . . , rm} be a ﬁnite set of possible
dependency relation types that can hold between any two
words in a sentence. A relation type r ∈ R is additionally called
an arc label.
IA161 Syntactic Formalisms for Parsing Natural Languages 15 / 32
Lecture 4
Deﬁnitions of dependency graphs and dependency
parsing
Dependency graphs: syntactic structures over sentences
Def. 3.: A dependency graph G = (V, A) is a labeled directed
graph, consists of nodes, V, and arcs, A, such that for
sentence S = w0w1 . . . wn and label set R the following holds:
1 V ⊆ {w0w1 . . . wn}
2 A ⊆ V × R × V
3 if (wi, r, wj) ∈ A then (wi, r′
, wj) /∈ A for all r′
̸= r
IA161 Syntactic Formalisms for Parsing Natural Languages 16 / 32
Lecture 4
Approach to dependency parsing
a. data-driven
it makes essential use of machine learning from linguistic data
in order to parse new sentences
b. grammar-based
it relies on a formal grammar, deﬁning a formal language, so
that it makes sense to ask whether a given input is in the
language deﬁned by the grammar or not.
→ Data-driven have attracted the most attention in
recent years.
IA161 Syntactic Formalisms for Parsing Natural Languages 17 / 32
Lecture 4
Data-driven approach
.
......
according to the type of parsing model adopted,
the algorithms used to learn the model from data
the algorithms used to parse new sentences with the model
a. transition-based
start by deﬁning a transition system, or state machine, for
mapping a sentence to its dependency graph.
b. graph-based
start by deﬁning a space of candidate dependency graphs for a
sentence.
IA161 Syntactic Formalisms for Parsing Natural Languages 18 / 32
Lecture 4
Data-driven approach
a. transition-based
learning problem: induce a model for predicting the next state
transition, given the transition history
parsing problem: construct the optimal transition sequence for
the input sentence, given induced model
b. graph-based
learning problem: induce a model for assigning scores to the
candidate dependency graphs for a sentence
parsing problem: ﬁnd the highest-scoring dependency graph for
the input sentence, given induced model
IA161 Syntactic Formalisms for Parsing Natural Languages 19 / 32
Lecture 4
Transition-based Parsing
Transition system consists of a set C of parser conﬁgurations
and of a set D of transitions between conﬁgurations.
Main idea: a sequence of valid transitions, starting in the
initial conﬁguration for a given sentence and ending in one of
several terminal conﬁgurations, deﬁnes a valid dependency
tree for the input sentence.
D1′m = d1(c1), . . . , dm(cm)
IA161 Syntactic Formalisms for Parsing Natural Languages 20 / 32
Lecture 4
Transition-based Parsing
Deﬁnition
Score of D1′m factors by conﬁguration-transition pairs (ci, di):
s(D1′m) =
∑m
i=1 s(ci, di)
Learning
Scoring function s(ci, di) for di(ci) ∈ D1′m
Inference
Search for highest scoring sequence D∗
1′m given s(ci, di)
IA161 Syntactic Formalisms for Parsing Natural Languages 21 / 32
Lecture 4
Transition-based Parsing
Inference for transition-based parsing
Common inference strategies:
Deterministic [Yamada and Matsumoto 2003, Nivre et al. 2004]
Beam search [Johansson and Nugues 2006, Titov and Henderson
2007]
Complexity given by upper bound on transition sequence length
Transition system
Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003]
Limited non-projective O(n) [Attardi 2006, Nivre 2007]
Unrestricted non-projective O(n2) [Nivre 2008, Nivre 2009]
IA161 Syntactic Formalisms for Parsing Natural Languages 22 / 32
Lecture 4
Transition-based Parsing – Nivre algorithm
IA161 Syntactic Formalisms for Parsing Natural Languages 23 / 32
Lecture 4
Transition-based Parsing
Learning for transition-based parsing
Typical scoring function:
s(ci, di) = w · f(ci, di) where f(ci, di) is a feature vector over
conﬁguration ci and transition di and w is a weight vector
[wi = weight of featurefi(ci, di)]
Transition system
Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003]
Limited non-projective O(n) [Attardi 2006, Nivre 2007]
Unrestricted non-projective O(n2) [Nivre 2008, Nivre 2009]
Problem
Learning is local but features are based on the global history
IA161 Syntactic Formalisms for Parsing Natural Languages 24 / 32
Lecture 4
Transition-based Parsing
Projectivization to pseudo-projectivity:
IA161 Syntactic Formalisms for Parsing Natural Languages 25 / 32
Lecture 4
Graph-based Parsing
For a input sentence S we deﬁne a graph Gs = (Vs, As) where
Vs = {w0, w1, . . . , wn} and
As = {(wi, wj, l)|wi, wj ∈ V and l ∈ L}
Score of a dependency tree T factors by subgraphs Gs, . . . , Gs:
s(T) =
∑m
i−1 s(Gi)
Learning: Scoring function s(Gi) for a subgraph Gi ∈ T
Inference: Search for maximum spanning tree scoring sequence
T∗
of Gs given s(Gi)
IA161 Syntactic Formalisms for Parsing Natural Languages 26 / 32
Lecture 4
Graph-based Parsing
Learning graph-based models
Typical scoring function:
s(Gi) = w · f(Gi) where f(Gi) is a high-dimensional feature vector
over subgraphs and w is a weight vector
[wj = weight of feature fj(Gi)]
Structured learning [McDonald et al. 2005a, Smith and
Johnson 2007]:
Learn weights that maximize the score of the correct dependency
tree for every sentence in the training set
Problem
Learning is global (trees) but features are local (subgraphs)
IA161 Syntactic Formalisms for Parsing Natural Languages 27 / 32
Lecture 4
Graph-based Parsing – Eisner algorithm
IA161 Syntactic Formalisms for Parsing Natural Languages 28 / 32
Lecture 4
Graph-based Parsing – Chu-Liu-Edmonds algorithm
IA161 Syntactic Formalisms for Parsing Natural Languages 29 / 32
Lecture 4
Grammar-based approach
a. context-free dependency parsing
exploits a mapping from dependency structures to CFG
structure representations and reuses parsing algorithms
originally developed for CFG → chart parsing algorithms
b. constraint-based dependency parsing
parsing viewed as a constraint satisfaction problem
grammar deﬁned as a set of constraints on well-formed
dependency graphs
ﬁnding a dependency graph for a sentence that satisﬁes all the
constraints of the grammar (having the best score)
IA161 Syntactic Formalisms for Parsing Natural Languages 30 / 32
Lecture 4
Grammar-based approach
a. context-free dependency parsing
Advantage: Well-studied parsing algorithms such as CKY,
Earley’s algorithm can be used for dependency parsing as well.
→ need to convert dependency grammars into eﬃciently
parsable context-free grammars; (e.g. bilexical CFG, Eisner and
Smith, 2005)
b. constraint-based dependency parsing
deﬁnes the problem as constraint satisfaction
Weighted constraint dependency grammar (WCDG, Foth and
Menzel, 2005)
Transformation-based CDG
IA161 Syntactic Formalisms for Parsing Natural Languages 31 / 32
Lecture 4
Conclusions
1 Dependency syntax vs. constituency (phrase-structure) syntax
2 Non-projectivity
3 Graph-based and Transition-based methods
IA161 Syntactic Formalisms for Parsing Natural Languages 32 / 32