FI:PA154 Corpus Tools - Course Information

PA154 Corpus Tools

Faculty of Informatics
Spring 2014

Extent and Intensity

2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).

Teacher(s)

doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)

Guaranteed by

prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics

Timetable

Wed 10:00–11:50 G125

Course Enrolment Limitations

The course is also offered to the students of the fields other than those the course is directly associated with.

fields of study / plans the course is directly associated with

Applied Informatics (programme FI, N-AP)
Information Technology Security (programme FI, N-IN)
Bioinformatics (programme FI, N-AP)
Information Systems (programme FI, N-IN)
Parallel and Distributed Systems (programme FI, N-IN)
Computer Graphics (programme FI, N-IN)
Computer Networks and Communication (programme FI, N-IN)
Computer Systems (programme FI, N-IN)
Embedded Systems (eng.) (programme FI, N-IN)
Embedded Systems (programme FI, N-IN)
Service Science, Management and Engineering (eng.) (programme FI, N-AP)
Service Science, Management and Engineering (programme FI, N-AP)
Social Informatics (programme FI, B-AP)
Theoretical Informatics (programme FI, N-IN)
Upper Secondary School Teacher Training in Informatics (programme FI, N-SS) (2)
Artificial Intelligence and Natural Language Processing (programme FI, N-IN)
Image Processing (programme FI, N-AP)

Course objectives

This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
At the end of the course students will not only be able to use these tools, but mainly will understand the related theories and algorithms, which is often a key competence for the right (effective and correct) usage of these tools.

Syllabus

NLTK toolkit
Elements of Probability and Information Theory
Language Modeling in General and the Noisy Channel Model
Smoothing and the Expectation-Maximization algorithm
Markov models, Hidden Markov Models (HMMs)
Viterbi Algorithm
Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
Statistical Alignment and Machine Translation
Text Categorization and Clustering
Graphical Models
Parallelization, MapReduce

Literature

RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info

Teaching methods

lectures

Assessment methods

Written exam.

Language of instruction

Czech

Further Comments

Study Materials
The course is taught annually.

The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

Enrolment Statistics (Spring 2014, recent)
Permalink: https://is.muni.cz/course/fi/spring2014/PA154

FI:PA154 Corpus Tools - Course Information

PA154 Corpus Tools

Other applications