FI:IB047 Intro to Corpus Linguistics - Course Information
IB047 Introduction to Corpus Linguistics and Computer Lexicography
Faculty of InformaticsSpring 2020
- Extent and Intensity
- 2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
prof. PhDr. Karel Pala, CSc. (alternate examiner)
RNDr. Miloš Jakubíček, Ph.D. (assistant) - Guaranteed by
- doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics - Timetable
- Mon 17. 2. to Fri 15. 5. Wed 14:00–15:50 B411
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Image Processing and Analysis (programme FI, N-VIZ)
- Applied Informatics (programme FI, B-AP)
- Bioinformatics and systems biology (programme FI, N-UIZD)
- Bioinformatics (programme FI, B-AP)
- Computer Games Development (programme FI, N-VIZ_A)
- Computer Graphics and Visualisation (programme FI, N-VIZ_A)
- Computer Networks and Communications (programme FI, N-PSKB_A)
- Cybersecurity Management (programme FI, N-RSSS_A)
- Czech Language and Literature (programme FF, M-FI) (2)
- Czech Language and Literature (programme FF, M-HS)
- Czech Language with Orientation on Computational Linguistics (programme FF, B-FI)
- Formal analysis of computer systems (programme FI, N-TEI)
- Graphic design (programme FI, N-VIZ)
- Graphic Design (programme FI, N-VIZ_A)
- Hardware Systems (programme FI, N-PSKB_A)
- Hardware systems (programme FI, N-PSKB)
- Image Processing and Analysis (programme FI, N-VIZ_A)
- Information security (programme FI, N-PSKB)
- Informatics with another discipline (programme FI, B-BI)
- Informatics with another discipline (programme FI, B-EB)
- Informatics with another discipline (programme FI, B-FY)
- Informatics with another discipline (programme FI, B-GE)
- Informatics with another discipline (programme FI, B-GK)
- Informatics with another discipline (programme FI, B-CH)
- Informatics with another discipline (programme FI, B-IO)
- Informatics with another discipline (programme FI, B-MA)
- Informatics with another discipline (programme FI, B-SO)
- Informatics with another discipline (programme FI, B-TV)
- Informatics (programme FI, B-IN)
- Informatics (programme FI, B-INF) (2)
- Public Administration Informatics (programme FI, B-AP)
- Informatics in education (programme FI, B-IVV) (2)
- Information Security (programme FI, N-PSKB_A)
- Quantum and Other Nonclassical Computational Models (programme FI, N-TEI)
- Mathematical Informatics (programme FI, B-IN)
- Parallel and Distributed Systems (programme FI, B-IN)
- Computer graphics and visualisation (programme FI, N-VIZ)
- Computer Graphics and Image Processing (programme FI, B-IN)
- Computational Linguistics (programme FF, B-PLIN_) (3)
- Computer Networks and Communication (programme FI, B-IN)
- Computer Networks and Communications (programme FI, N-PSKB)
- Computer Systems and Data Processing (programme FI, B-IN)
- Principles of programming languages (programme FI, N-TEI)
- Programming and development (programme FI, B-PVA)
- Programmable Technical Structures (programme FI, B-IN)
- Embedded Systems (programme FI, N-IN)
- Cybersecurity management (programme FI, N-RSSS)
- Services development management (programme FI, N-RSSS)
- Software Systems Development Management (programme FI, N-RSSS)
- Services Development Management (programme FI, N-RSSS_A)
- Service Science, Management and Engineering (programme FI, N-AP)
- Social Informatics (programme FI, B-AP)
- Software Systems Development Management (programme FI, N-RSSS_A)
- Software Systems (programme FI, N-PSKB_A)
- Software systems (programme FI, N-PSKB)
- Machine learning and artificial intelligence (programme FI, N-UIZD)
- Teacher of Informatics and IT administrator (programme FI, N-UCI)
- Informatics for secondary school teachers (programme FI, N-UCI) (2)
- Artificial Intelligence and Natural Language Processing (programme FI, B-IN)
- Computer Games Development (programme FI, N-VIZ)
- Processing and analysis of large-scale data (programme FI, N-UIZD)
- Natural language processing (programme FI, N-UIZD)
- Course objectives
- A basic introduction to the field of corpus linguistics and computational lexicography. Students will study types of corpora, corpus building and usage, especially in the sake of dictionaries building.
- Learning outcomes
- Student will be able to: choose the right korpus type for specific purpose; interpret all layers of corpus annotation; use statistical methods on text corpora; design the structure of a dictionary; use free tools for dictionary writing.
- Syllabus
- Information technologies and language (text) corpora. Beginning of corpus linguistics, purpose of corpora.
- Corpus data, corpus types and their standardization, SGML, XML, TEI, CES. Annotated corpora, tagging on various levels: structural tagging, grammatical tagging -- POS, lemmata, word forms. Syntactic tagging, treebanks, skeleton analysis. Parallel corpora, alignment. Tools for automatic and semi-automatic annotation, disambiguation.
- Building corpora, maintenance. Corpus tools: corpus manager. Concordance programmes. Queries, regular expressions and their use. Statistical programmes, absolute and relative frequencies, MI and T-score. Work with corpus attributes and tags.
- Working with corpora -- CNC, SUSANNE, Prague Dependency Treebank
- Words, constructions, collocations.
- Computational lexicography, lexicology.
- Descripton of meanings (semantic features).
- Types of computer dictionaries. Lexicography standards.
- Data for dictionary building -- corpora.
- Lexicography Software tools. Lemmatizers.
- Literature
- SAMPSON, Geoffrey. English for the computer : the SUSANNE corpus and analytic scheme. Oxford: Clarendon Press, 1995, ix, 499. ISBN 0198240236. info
- RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
- Computational lexicography for natural language processing. Edited by Ted Briscoe - Bran Boguraev. London: Longman, 1989, xiv, 310 p. ISBN 0-470-21187-3. info
- SAMPSON, Geoffrey. Empirical linguistics. London: Continuum, 2001, viii, 226. ISBN 0-8264-4883-6. info
- Corpus processing for lexical acquisition. Edited by Bran Boguraev - J. (James) Pustejovsky. Cambridge: Bradford Book, 1996, xi, 245 s. ISBN 0-262-02392-X. info
- Teaching methods
- Teaching is performed in the form of oral lectures and seminars, in which the slides and demos of the relevant software tools are combined. Students work out homeworks, prepare presentations based on the literature they had read and develop smaller projects. At the appropriate points of the teaching the open dialog between a teacher and students is used.
- Assessment methods
- written exam
- Language of instruction
- Czech
- Follow-Up Courses
- Further Comments
- Study Materials
The course is taught annually.
- Enrolment Statistics (Spring 2020, recent)
- Permalink: https://is.muni.cz/course/fi/spring2020/IB047