PA107 Corpus Tools Project

Faculty of Informatics
Spring 2025
Extent and Intensity
0/2/0. 2 credit(s) (plus extra credits for completion). Type of Completion: z (credit).
In-person direct teaching
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 29 fields of study the course is directly associated with, display
Course objectives
The aim of the seminar is to provide students with a deeper knowledge concerning a chosen area of corpus linguistics and practical checking of this knowledge by working on the project. The popularisation of corpus linguistics and other areas of language engineering is one of the main goals of Natural Language Processing Laboratory at the Faculty of Informatics.
Fundamental information about the Natural Language Processing Laboratory and corpus linguistics in general can be found on http://www.fi.muni.cz/nlp/.
Learning outcomes
Student will be able to: create a text corpus from different sources; use automatic tools for corpus annotation; evaluate accuracy of automatic tools; present evaluation results.
Syllabus
  • The aim of the seminar is to provide students with a deeper knowledge concerning a chosen area of corpus linguistics and practical checking of this knowledge by working on the project. The popularisation of corpus linguistics and other areas of language engineering is one of the main goals of Natural Language Processing Laboratory at the Faculty of Informatics.
  • Fundamental information about the Natural Language Processing Laboratory and corpus linguistics in general can be found on http://www.fi.muni.cz/nlp/.
Literature
  • OAKES, Michael P. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press, 1998, xvi, 287 s. ISBN 0-7486-0817-6. info
  • PALA, Karel, Pavel RYCHLÝ and Pavel SMRŽ. DESAM - Annotated Corpus for Czech. In Proceedings of SOFSEM 97. Heidelberg: Springer Verlag, 1997, p. 523-530. ISBN 3-540-63774-5. URL info
  • Corpus processing for lexical acquisition. Edited by Bran Boguraev - J. (James) Pustejovsky. Cambridge: Bradford Book, 1996, xi, 245 s. ISBN 0-262-02392-X. info
  • ALLEN, James. Natural language understanding. 2nd ed. Redwood City: Benjamin/Cummings Publishing Company, 1995, xv, 654 s. ISBN 0-8053-0334-0. info
  • SINCLAIR, John McHardy. Corpus, concordance, collocation. Edited by Ronald Carter. Oxford: Oxford University Press, 1991, xviii, 179. ISBN 0194371441. info
  • Computational lexicography for natural language processing. Edited by Ted Briscoe - Bran Boguraev. London: Longman, 1989, xiv, 310 p. ISBN 0-470-21187-3. info
Teaching methods
lectures, work on individual project, personal consultation, presentation
Assessment methods
Project. Evaluation based on presentation of project results.
Language of instruction
English
Further Comments
The course is taught annually.
The course is taught: every week.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024.
  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/fi/spring2025/PA107