FI:I047 Introduction to Corpus Linguis - Course Information
I047 Introduction to Corpus Linguistics and Computer Lexicography
Faculty of InformaticsSpring 2001
- Extent and Intensity
- 2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- prof. PhDr. Karel Pala, CSc. (lecturer)
doc. Mgr. Pavel Rychlý, Ph.D. (seminar tutor)
doc. RNDr. Pavel Smrž, Ph.D. (seminar tutor) - Guaranteed by
- prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc. - Timetable
- Wed 16:00–17:50 B410
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Czech Language and Literature (programme FF, M-FI) (2)
- Czech Language and Literature (programme FF, M-HS)
- Syllabus
- Introduction to Corpus Linguistics and Computational Lexicography
- Information technologies and language (text) corpora. Beginning of corpus linguistics, purpose of corpora.
- Building corpora, collecting corpus data and their standardization, SGML, TEI, representativeness of corpora, their maintenance.
- Corpora tools, query processors: CQP, MANATEE, concordance programmes -- XKWIC, OCP, LEXA, WORDCRUNCHER. Queries, regular expressions and their use. Statistical programmes, absolute and relative frequencies, M/I and T-score. Sorting programmes, different codings, code conversions.
- Annotated corpora, tagging on various levels: structural tagging (SGML), grammatical tagging -- POS, lemmata, word forms, program AJKA.
- Syntactic tagging, treebanks, skeleton analysis, constraint grammars, desambiguation on morphological and syntactic level.
- Parallel corpora, alignment programes.
- Czech National Corpus, working with CNC, words, constructions, collocations. Building dictionaries.
- Basic concepts of Computational Lexicography.
- Language of instruction
- Czech
- Further comments (probably available only in Czech)
- The course is taught annually.
- Enrolment Statistics (Spring 2001, recent)
- Permalink: https://is.muni.cz/course/fi/spring2001/I047