FI:I047 Introduction to Corpus Linguis - Course Information
I047 Introduction to Corpus Linguistics and Computer Lexicography
Faculty of InformaticsSpring 2000
- Extent and Intensity
- 2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- prof. PhDr. Karel Pala, CSc. (lecturer)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
doc. RNDr. Pavel Smrž, Ph.D. (lecturer) - Guaranteed by
- prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc. - Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Informatics (programme FI, B-IN)
- Informatics (programme FI, M-IN)
- Upper Secondary School Teacher Training in Informatics (programme FI, M-IN)
- Upper Secondary School Teacher Training in Informatics (programme FI, M-SS)
- Information Technology (programme FI, B-IN)
- Syllabus
- Introduction to Corpus Linguistics and Computational Lexicography
- Information technologies and language (text) corpora. Beginning of corpus linguistics, purpose of corpora.
- Building corpora, collecting corpus data and their standardization, SGML, TEI, representativeness of corpora, their maintenance.
- Corpora tools, query processors: CQP, CUE, CQM, concordance programmes - XKWIC, OCP, LEXA, WORDCRUNCHER. Queries, regular expressions and their use. Statistical programmes, absolute and relative frequencies, M/I and T-score. Sorting programmes, different codings, code conversions.
- Annotated corpora,tagging on various levels: structural tagging (SGML), grammatical tagging - POS, lemmata, word forms, programme LEMMA.
- Syntactic tagging, treebanks, skeleton analysis, constraint grammars, desambiguation on morphological and syntactic level.
- Parallel corpora, alignment programmes.
- Czech National Corpus, working with CNC, words, constructions, collocations. Building dictionaries.
- Basic concepts of Computational Lexicography.
- Language of instruction
- Czech
- Further comments (probably available only in Czech)
- The course is taught annually.
The course is taught: every week.
- Enrolment Statistics (Spring 2000, recent)
- Permalink: https://is.muni.cz/course/fi/spring2000/I047