Lecture 9
Outline
HPSG Parser : Enju
Parsing method
Description of parser
Result
CCG Parser : C&C Tools
Parsing method
Description of parser
Result

Theoretical backgrounds
Lecture 3 about HPSG Parsing
Lecture 6 & 7 about CCG Parsing and Combinatory Logic

Enju (Y. Miyao, J.Tsujii, 2004, 2008)
Syntactic parser for English
Developed by Tsujii Lab. Of the University of Tokyo
Based on the wide-coverage probabilistic HPSG
HPSG theory [Pollard and Sag, 1994]
Useful links to Enju
http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html
http://www-tsujii.is.s.u-tokyo.ac.jp/enju/

Motivations
Parsing based on a proper linguistic formalism is one of the core research fields in CL and NLP.
But! a monolithic, esoteric and inward looking field, largely dissociated from real world application. Motivations
So why not!
The integration of linguistic grammar formalisms with statistical models to propose an robust, efficient and open to eclectic sources of information other than syntactic ones

Motivations
Two main ideas
Development of wide-coverage linguistic grammars
Deep parser which produces semantic representation (predicate-argument structures)

Parsing method
Application of probabilistic model in the HPSG grammar and development of an efficient parsing algorithm
Accurate deep analysis
Disambiguation
Wide-coverage
High speed
Useful for high level NLP application

Parsing method
1 Parsing based on HPSG
Mathematically well-defined with sophisticated constraint-based system
Linguistically justified
Deep syntactic grammar that provides semantic analysis

Parsing method
Difficulties in parsing based on HPSG
Difficult to develop a broad-coverage HPSG grammar
Difficult to disambiguate
Low efficiency: very slow

Parsing method
Solution: Corpus-oriented development of an HPSG grammar
The principal aim of grammar development is treebank construction
Penn treebank is coverted into an HPSG treebank
A lexicon and a probabilistic model are extracted from the HPSG treebank

Parsing method
Approach: develop grammar rules and an HPSG treebank
collect lexical entries from the HPSG treebank How to make an HPSG treebank?
Convert Penn Treebank into HPSG and develop grammar by restructuring a treebank in conformity with HPSG grammar rules

Parsing method
HPSG = lexical entries and grammar rules
Enju grammar has 12 grammar rules and 3797 lexical entries for 10,536 words (Miyao et al. 2004)

Parsing method
Overview of grammar development
1. Treebank conversion
2. Grammar rule application
3. Lexical entry collection
Modify constituent structures by adding feature structures
Apply the grammar rule when a parse tree contains correct analysis and specified feature values are filled
Collect terminal nodes of HPSG parse trees and assign predicate-argument structure

Parsing method
2 Probabilistic model and HPSG: Log-linear model for unification-based grammars (Abney 1997, Johnson et al. 1999, Riezler et al. 2000, Miyao et al. 2003, Malouf and van Noord 2004, Kaplan et al. 2004, Miyao and Tsujii 2005)
p(T|w)
w = "A blue eyes girl with white hair and skin walked
T = A blue eyes girl with white hair and skin walked
NP NP NP NP S NP NP PP VP

Parsing method
T1 T2 T3 T4 Tn
All possible parse trees derived from w with a grammar.
For example, p(T3|w) is the probability of selecting T3 from T1, T2, …, and Tn. Parsing method
Log-linear model for unification-based grammars
Input sentence: w
w = w1/P1, w2/P2, . . . wn/Pn
Output parse tree T
Normalization factor
Weight for a feature function
Feature function

Description of parser

Description of parser
parsing proceeds in the following steps:
1. preprocessing
Preprocessor converts an input sentence into a word lattice.
2. lexicon lookup
Parser uses the predicate to find lexical entries for the word lattice
3. kernel parsing
Parser does phrase analysis using the defined grammar rules in the kernel parsing process. Description of parser
Chart data structure
two dimensional table
we call each cell in the table 'CKY cell.'
Example
Let an input sentence s(= w1, w2, w3, ..., wn),
w1 = "I", w2 = "saw", w3 = "a", w4 = "girl", w5 = "with", w6 = "a", w7 = "telescope"
for the sentence "I saw a girl with a telescope", the chart is arranged as follows. I saw a girl with a telescope
0,1 1,2 2,3 3,4 4,5 5,6 6,7
0,2 1,3 2,4 3,5 4,6 5,7
0,3 1,4 2,5 3,6 4,7
0,4 1,5 2,6 3,7
0,5 1,6 2,7
0,6 1,7
0,7

Description of parser
System overview
Supertagger
Enumeration of assignments
Deterministic disambiguation
Mary loved John
HEAD noun Subj < > COMPS < >
HEAD noun Subj < > COMPS < >
HEAD noun Subj < > COMPS < >
HEAD verb Subj COMPS
HEAD noun Subj < > COMPS < >
HEAD noun Subj < > COMPS < >
HEAD noun Subj < > COMPS < >
HEAD verb Subj COMPS
HEAD verb Subj COMPS
HEAD noun Subj < > COMPS < >
HEAD verb Subj COMPS
HEAD noun Subj < > COMPS < >
Mary loved John
Mary loved John

Demonstration
http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html

Results
Fast, robust and accurate analysis
Phrase structures
Predicate argument structures
Accurate deep analysis – the parser can output both phrase structures and predicate-argument structures. The accuracy of predicate-argument relations is around 90% for newswire articles and biomedical papers.
High speed – parsing speed is less than 500 msec. per sentence by default (faster than most Penn Treebank parsers), and less than 50 msec when using the highspeed setting ("mogura"). C&C tools
Developed by Curran and Clark [Clark and Curran, 2002, Curran, Clark and Bos, 2007], University of Edinburgh
Wide-coverage statistical parser based on the CCG: CCG Parser
Computational semantic tools named Boxer
Useful links
http://svn.ask.it.usyd.edu.au/trac/candc
http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo

CCG Parser [Clark, 2007]
Statistical parsing and CCG
Advantages of CCG
providing a compositional semantic for the grammar
→completely transparent interface between syntax and semantics
the recovery of long-range dependencies can be integrated into the parsing process in a straightforward manner

Parsing method
Penn Treebank conversion : TAG, LFG, HPSG and CCG
CCGBank [Hockenmaier and Steedman, 2007]
CCG version of the Penn Treebank
Grammar used in CCG parser
CCGBank
Some rules used as the grammar
Lexical category set
Training data for the statistical models
Supertagger
Parser

Parsing method-CCG Bank
Corpus translated from the Penn Treebank, CCGBank contains
Syntactic derivations
Word-word dependencies
Predicate-argument structures

Parsing method-CCG Bank
Semi automatic conversion of phrase-structure trees in the Penn Treebank into CCG derivations
Consists mainly of newspaper texts
Grammar:
Lexical category set
Combinatory rules
Unary type-changing rules
Normal-form constraints
Punctuation rules

Parsing method
Supertagging [Clark, 2002]
uses conditional maximum entropy models
implement a maximum entropy supertagger
ADV NOM PRP PRO:DEM NOM KON VER:pres VER:infi DET:ART
tout commentaire sur cette proposition et prefere avancer les
(s\1 s)/(s np/n (s\1 s)/n np/np s\1 s n np (np\np)/n
(s\1 s)/np (n\n)/np pp_sur/np np/n n ((np\s)\( ((np\s)/n
(np\s)/np (s/np)/(n np\s (np\s)/(n ((np\s_inf) (np\s_inf) np/n

Parsing method-Supertagger
Set of 425 lexical categories from the CCGbank
The per-word accuracy of the Supertagger is around 92% on unseen WSJ text.
→ Using the multi-supertagger increases the accuracy significantly – to over 98% – with only a small cost in increased ambiguity. Parsing method-Supertagger
Log-linear models in NLP applications:
POS tagging
Name entity recognition
Chunking
Parsing
→ referred as maximum entropy models and random fields

Parsing method-Supertagger
Log-linear parsing models for CCG
1 the probability of a dependency structure
2 the normal-form model: the probability of a single derivation
→ modeling 2) is simpler than 1)
1 defined as P(π|S) = ∑ d∈∆(π) P(d, π|S)
2 defined using a log-linear form as follows:
P(w|S) = 1 ZS eλ.f(w)
ZS = ∑ w∈p(S) eλ.f(w′)

Parsing method-Supertagger
Features common to the dependency and normal-form models
Feature type Example
LexCat + word (S/S)/NP + Before
LexCat + POS (S/S)/NP + IN
RootCat S[dcl]
RootCat + World S[dcl] + was
RootCat + POS S[dcl] + VBD
Rule S[dcl] → NP S[dcl]\NP
Rule + Word S[dcl] → NP S[dcl]\NP + bought
Rule + POS S[dcl] → NP S[dcl]\NP + VBD

Parsing method-Supertagger
Predicate-argument dependency features for the dependency model
Feature type Example
Word-Word ⟨bought, (S\NP1)/NP2, 2, stake, (NP\NP)/(S[dcl]/NP)⟩
Word-POS ⟨bought, (S\NP1)/NP2, 2, NN, (NP\NP)/(S[dcl]/NP)⟩
POS-Word ⟨VBD, (S\NP1)/NP2, 2, stake, (NP\NP)/(S[dcl]/NP)⟩
POS-POS ⟨VBD, (S\NP1)/NP2, 2, NN, (NP\NP)/(S[dcl]/NP)⟩
Word + Distance(words) ⟨bought, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 2
Word + Distance(punct) ⟨bought, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0
Word + Distance(verbs) ⟨bought, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0
POS + Distance(words) ⟨VBD, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 2
POS + Distance(punct) ⟨VBD, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0
POS + Distance(verbs) ⟨VBD, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0

Parsing method-Supertagger
Rule dependency features for the normal-form model
Feature type Example
Word-Word ⟨company, S[dcl] → NP S[dcl]\NP, bought⟩
Word-POS ⟨company, S[dcl] → NP S[dcl]\NP, VBD⟩
POS-Word ⟨NN, S[dcl] → NP S[dcl]\NP, bought⟩
POS-POS ⟨NN, S[dcl] → NP S[dcl]\NP, VBD⟩
Word + Distance(words) ⟨bought, S[dcl] → NP S[dcl]\NP⟩+ > 2
Word + Distance(punct) ⟨bought, S[dcl] → NP S[dcl]\NP⟩ + 2
Word + Distance(verbs) ⟨bought, S[dcl] → NP S[dcl]\NP⟩ + 0
POS + Distance(words) ⟨VBD, S[dcl] → NP S[dcl]\NP⟩+ > 2
POS + Distance(punct) ⟨VBD, S[dcl] → NP S[dcl]\NP⟩ + 2
POS + Distance(verbs) ⟨VBD, S[dcl] → NP S[dcl]\NP⟩ + 0

Description of parser
Input sentence
CCGBank
C&C taggers
Supertaggers
POStagger
Chunker
Parser
Boxer

Demonstration
http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo

Results
Supertagger ambiguity and accuracy on section00
β k CATS/WORD ACC SENT ACC ACC(POS) SENT ACC
0.075 20 1.27 97.34 67.43 96.34