PA154 Corpus Tools

Faculty of Informatics
Spring 2008
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Thu 8:00–9:50 B410
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 21 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.
  • Enrolment Statistics (Spring 2008, recent)
  • Permalink: https://is.muni.cz/course/fi/spring2008/PA154