FI:IB047 Introduction to Corpus Linguis - Course Information
IB047 Introduction to Corpus Linguistics and Computer Lexicography
Faculty of InformaticsSpring 2005
- Extent and Intensity
- 2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
- Guaranteed by
- prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc. - Timetable
- Thu 14:00–15:50 B007
- Prerequisites (in Czech)
- ! I047 Introduction to Corpus Linguistics and Computer Lexicography
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Applied Informatics (programme FI, B-AP)
- Czech Language and Literature (programme FF, M-FI) (2)
- Czech Language and Literature (programme FF, M-HS)
- Informatics with another discipline (programme FI, B-BI)
- Informatics with another discipline (programme FI, B-FY)
- Informatics with another discipline (programme FI, B-GE)
- Informatics with another discipline (programme FI, B-GK)
- Informatics with another discipline (programme FI, B-CH)
- Informatics with another discipline (programme FI, B-IO)
- Informatics with another discipline (programme FI, B-MA)
- Informatics with another discipline (programme FI, B-SO)
- Informatics with another discipline (programme FI, B-TV)
- Informatics (programme FI, B-IN)
- Course objectives
- A basic introduction to the field of corpus linguistics and computational lexicography. Students will study types of corpora, corpus building and usage, especially in the sake of dictionaries building.
- Syllabus
- Information technologies and language (text) corpora. Beginning of corpus linguistics, purpose of corpora.
- Corpus data, corpus types and their standardization, SGML, XML, TEI, CES. Annotated corpora, tagging on various levels: structural tagging, grammatical tagging -- POS, lemmata, word forms. Syntactic tagging, treebanks, skeleton analysis. Parallel corpora, alignment programes. Tools for automatic and semi-automatic annotation, disambiguation.
- Building corpora, maintainance. Corpus tools: corpus manager. Concordance programmes. Queries, regular expressions and their use. Statistical programmes, absolute and relative frequencies, MI and T-score. Work with corpus attributes and tags.
- Working with corpora -- CNC, SUSANNE, Prague Dependency Treebank Words, constructions, collocations.
- Computational lexicography, lexicology.
- Descripton of meanings (semantic features).
- Types of computer dictionaries. Lexicography standards.
- Data for dictionary building -- corpora.
- Lexicography Software tools. Lemmatizers.
- Literature
- SAMPSON, Geoffrey. English for the computer : the SUSANNE corpus and analytic scheme. Oxford: Clarendon Press, 1995, ix, 499. ISBN 0198240236. info
- RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
- Computational lexicography for natural language processing. Edited by Ted Briscoe - Bran Boguraev. London: Longman, 1989, xiv, 310 p. ISBN 0-470-21187-3. info
- SAMPSON, Geoffrey. Empirical linguistics. London: Continuum, 2001, viii, 226. ISBN 0-8264-4883-6. info
- Corpus processing for lexical acquisition. Edited by Bran Boguraev - J. (James) Pustejovsky. Cambridge: Bradford Book, 1996, xi, 245 s. ISBN 0-262-02392-X. info
- Language of instruction
- Czech
- Further Comments
- The course is taught annually.
- Enrolment Statistics (Spring 2005, recent)
- Permalink: https://is.muni.cz/course/fi/spring2005/IB047