Building and Maintenance of Semantic Networks
Tomáš Čapek
March 21, 2011
Natural Language Processing
Efforts to make computers understand our language
Philosopher's stones:
Machine traslation on general domain
HCI based on natural language
Crisis in NLP
Often unclear goals and results (word juggling)
No real major progress after decades of research in:
Syntax, WSD, MT
What we're good at: morphology, KR, corpora, CAT, ...
Main Approaches in NLP
Statistical (Praha)
Rule-based (Brno)
Both use language resources to study languages
Language Resources in NLP
Corpora
Dictionaries
Knowledge Representation schemes
Ontologies, Semantic Networks
Applications of NLP
Dictonary creation (corpora)
Information Retrieval (google)
Information Extraction (summarization)
Text Categorization, plagiarism detection
Word Sense Disambiguation
Machine Translation, Dialogue Systems
Semantics in NLP
Role of Annotated Resources
Most valuable for both „rules“ and „statistics“
Most expensive to create by hand
Often „set in stone“ and thus
unreliable and compromising
Issues in Semantic Networks
Style guide (nonexitent, too big, vague)
Balance in the data
Maintenance (manual, random, automated)
Accumulated errors of all kind
Application-unfriendly
→ GIGO
My Proposed Contribution
Application-driven development (merge model)
Clusters of data separate of ontologies
Heuristic tests for automated maintenance
Focused on precision rather than recall
Chain of Succession
morphology → MWE/NE recognition →
(syntax) → knowledge representation (SN) →
semantic annotation → WSD → MT