P030 Textual Information Systems

Faculty of Informatics
Spring 2002
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Wed 14:00–15:50 D2
  • Timetable of Seminar Groups:
P030/raz: Thu 13:00–13:50 B311
P030/dva: Thu 14:00–14:50 B311
P030/tri: Thu 15:00–15:50 B311
P030/vnouzi: Thu 16:00–16:50 B311
Prerequisites
I005 Formal Languages and Automata I || I505 Formal Languages and Automata I
Students are adviced to bring some basic knowledge of automata theory (I005) and natural language processing (I030, I047). Some database basics (P002) is helpfull as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
Assessment methods (in Czech)
Výuka probíhá klasickým zpusobem a je zakončena písemným testem (příklady testů z předchozích let jsou vystaveny na URL předmětu). Na cvičeních dochází k procvičování látky z přednášek a brainstormingu.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/P030/
The course is also listed under the following terms Spring 1996, Spring 1997, Spring 1998, Spring 1999, Spring 2000, Spring 2001.

P030 Textual Information Systems

Faculty of Informatics
Spring 2001
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 10:00–11:50 A107
  • Timetable of Seminar Groups:
P030/01: Mon 12:00–13:50 B204, P. Sojka
P030/02: Mon 14:00–15:50 B204, P. Sojka
Prerequisites
I005 Formal Languages and Automata I
Students are adviced to bring some basic knowledge of automata theory (I005) and natural language processing (I030, I047). Some database basics (P002) is helpfull as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • WITTEN, Ian H., Alistair MOFFAT and Timothy C. BELL. Managing gigabytes :compressing and indexing documents and images. New York: Van Nostrand Reinhold, 1994, xiv, 429 s. ISBN 0-442-01863-0. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
Assessment methods (in Czech)
Výuka probíhá klasickým zpusobem a je zakončena písemným testem (příklady testů z předchozích let jsou vystaveny na URL předmětu). Na cvičeních dochází k procvičování látky z přednášek a zpracování týmového projektu.
Language of instruction
Czech
Further Comments
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/tis/
The course is also listed under the following terms Spring 1996, Spring 1997, Spring 1998, Spring 1999, Spring 2000, Spring 2002.

P030 Textual Information Systems

Faculty of Informatics
Spring 2000
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Prerequisites
I005 Formal Languages and Automata I
Students are adviced to bring some basic knowledge of automata theory (I005) and natural language processing (I030|I047). Some database basics (P002) is helpfull as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Signature methods.
  • Google as an examples of search and indexing engine. Query languages and data structures for searching and indexing.
  • Data compression. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • WITTEN, Ian H., Alistair MOFFAT and Timothy C. BELL. Managing gigabytes :compressing and indexing documents and images. New York: Van Nostrand Reinhold, 1994, xiv, 429 s. ISBN 0-442-01863-0. info
Assessment methods (in Czech)
Výuka probíhá klasickým zpusobem a je zakončena písemným testem (příklady testů z předchozích let jsou vystaveny na URL předmětu). Na cvičeních dochází k procvičování látky z přednášek a zpracování týmového projektu.
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/tis/
The course is also listed under the following terms Spring 1996, Spring 1997, Spring 1998, Spring 1999, Spring 2001, Spring 2002.

P030 Textual Information Systems

Faculty of Informatics
Spring 1999
Extent and Intensity
2/1. 3 credit(s). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Prerequisites (in Czech)
I005 Formal Languages and Automata I && P002 Introduction to Database Systems && I030 Introduction to Computer Linguistics
Je potřeba absolvovat předměty I005 Formal Languages and Automata I, P002 a I030 Introduction to Computational Linguistics.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter. Theory of automata for searching.
  • Indexes. Indexing methods. Signature methods.
  • Languages for searching.
  • Data compression. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking.
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/tis/
The course is also listed under the following terms Spring 1996, Spring 1997, Spring 1998, Spring 2000, Spring 2001, Spring 2002.

P030 Textual Information Systems

Faculty of Informatics
Spring 1998
Extent and Intensity
2/1. 3 credit(s). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Prerequisites (in Czech)
I005 Formal Languages and Automata I && P002 Introduction to Database Systems && I030 Introduction to Computer Linguistics
Je potřeba absolvovat předměty I005 Formal Languages and Automata I, P002 a I030 Introduction to Computational Linguistics.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter. Theory of automata for searching.
  • Indexes. Indexing methods. Signature methods.
  • Languages for searching.
  • Data compression. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking.
Language of instruction
Czech
Teacher's information
http://www.fi.muni.cz/~sojka/tis/
The course is also listed under the following terms Spring 1996, Spring 1997, Spring 1999, Spring 2000, Spring 2001, Spring 2002.

P030 Textual Information Systems

Faculty of Informatics
Spring 1997
Extent and Intensity
2/1. 3 credit(s). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Prerequisites (in Czech)
Je potřeba absolvovat předměty I005 Formal Languages and Automata I, P002 a I030 Introduction to Computational Linguistics.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter. Theory of automata for searching.
  • Indexes. Indexing methods. Signature methods.
  • Languages for searching.
  • Data compression. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking.
Language of instruction
Czech
Teacher's information
http://www.fi.muni.cz/~sojka/tis/
The course is also listed under the following terms Spring 1996, Spring 1998, Spring 1999, Spring 2000, Spring 2001, Spring 2002.

P030 Textual Information Systems

Faculty of Informatics
Spring 1996
Extent and Intensity
0/0. 3 credit(s). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Prerequisites (in Czech)
Je potřeba absolvovat předměty I005 Formal Languages and Automata I, P002 a I030 Introduction to Computational Linguistics.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter. Theory of automata for searching.
  • Indexes. Indexing methods. Signature methods.
  • Languages for searching.
  • Data compression. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking.
Language of instruction
Czech
Teacher's information
http://www.fi.muni.cz/~sojka/tis/
The course is also listed under the following terms Spring 1997, Spring 1998, Spring 1999, Spring 2000, Spring 2001, Spring 2002.
  • Enrolment Statistics (recent)