FI:PV030 Textual Information Systems - Course Information
PV030 Textual Information Systems
Faculty of InformaticsSpring 2003
- Extent and Intensity
- 2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. RNDr. Petr Sojka, Ph.D. (lecturer)
RNDr. David Antoš, Ph.D. (seminar tutor) - Guaranteed by
- doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D. - Timetable
- Mon 9:00–10:50 D2
- Timetable of Seminar Groups:
PV030/02: Mon 14:00–14:50 B204, Mon 14:00–14:50 B311, D. Antoš
PV030/03: Mon 15:00–15:50 B204, Mon 15:00–15:50 B311, D. Antoš
PV030/04: Mon 16:00–16:50 B204, Mon 16:00–16:50 B311, D. Antoš
PV030/05: Mon 17:00–17:50 B204, Mon 17:00–17:50 B311, D. Antoš - Prerequisites
- ! P030 Textual Information Systems
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well. - Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Applied Informatics (programme FI, B-AP)
- Applied Informatics (programme FI, N-AP)
- Informatics (programme FI, B-IN)
- Informatics (programme FI, N-IN)
- Information Technology (programme FI, B-IN)
- Course objectives
- Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
- Syllabus
- Basic notions. TIS - text information system. Classification of information systems.
- Searching in TIS. Searching and pattern matching classification and data structures.
- Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
- Theory of automata for searching. Classification of searching problems.
- Indexes. Indexing methods. Data structures for searching and indexing.
- Google as an examples of search and indexing engine.
- Signature methods.
- Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
- Data compression. Basic notions. Statistic methods.
- Compression methods based on dictionary. Neural nets for text compression.
- Syntactic methods. Context modelling.
- Spell checking. Filtering information channels. Document classification.
- Literature
- Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
- KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
- Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
- Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
- Assessment methods (in Czech)
- Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení domácích písemných úloh zadávaných v průběhu semestru. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
- Language of instruction
- Czech
- Follow-Up Courses
- Further comments (probably available only in Czech)
- The course is taught annually.
- Teacher's information
- http://www.fi.muni.cz/~sojka/PV030/
- Enrolment Statistics (Spring 2003, recent)
- Permalink: https://is.muni.cz/course/fi/spring2003/PV030