FI:PA154 Corpus Tools - Course Information
PA154 Corpus Tools
Faculty of InformaticsSpring 2009
- Extent and Intensity
- 2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
- Guaranteed by
- prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D. - Timetable
- Tue 15:00–16:50 B410
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Applied Informatics (programme FI, N-AP)
- Information Technology Security (programme FI, N-IN)
- Bioinformatics (programme FI, N-AP)
- Czech Language and Literature (programme FF, M-FI) (2)
- Czech Language and Literature (programme FF, M-HS)
- Information Systems (programme FI, N-IN)
- Informatics (programme FI, M-IN)
- Informatics (programme FI, N-IN)
- Parallel and Distributed Systems (programme FI, N-IN)
- Computer Graphics (programme FI, N-IN)
- Computer Networks and Communication (programme FI, N-IN)
- Computer Systems (programme FI, N-IN)
- Embedded Systems (eng.) (programme FI, N-IN)
- Theoretical Informatics (programme FI, N-IN)
- Upper Secondary School Teacher Training in Informatics (programme FI, M-SS)
- Upper Secondary School Teacher Training in Informatics (programme FI, M-TV)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-SS) (2)
- Artificial Intelligence and Natural Language Processing (programme FI, N-IN)
- Image Processing (programme FI, N-AP)
- Course objectives
- The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the principles of their building.
- Syllabus
- Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
- Literature
- Assessment methods
- Lectures, written exam.
- Language of instruction
- Czech
- Further Comments
- Study Materials
The course is taught annually.
- Enrolment Statistics (Spring 2009, recent)
- Permalink: https://is.muni.cz/course/fi/spring2009/PA154