PV030 Textual Information Systems

Faculty of Informatics
Spring 2013
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Timetable
Tue 10:00–12:50 C416, Tue 12:00–12:50 B311
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 45 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
English
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2012
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Timetable
Thu 10:00–11:50 C511, Thu 12:00–12:50 B311, Thu 12:00–12:50 C511
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 45 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
English
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2011
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 12:00–13:50 B411, Mon 14:00–14:50 B116, Mon 14:00–14:50 B411
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 44 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
English
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2010
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 12:00–13:50 B204, Mon 18:00–18:50 B311, Mon 18:00–18:50 B410
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 41 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
English
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2009
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 12:00–13:50 B204
  • Timetable of Seminar Groups:
PV030/01: Mon 16:00–16:50 B410, Mon 16:00–16:50 B311, P. Sojka
PV030/02: Mon 17:00–17:50 B311, Mon 17:00–17:50 B410, P. Sojka
Prerequisites
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 38 fields of study the course is directly associated with, display
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
English
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2008
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Wed 8:00–9:50 C511, Wed 14:00–14:50 C525, Wed 14:00–14:50 B311
Prerequisites
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 37 fields of study the course is directly associated with, display
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods (in Czech)
Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení písemek zadávaných v průběhu semestru na cvičeních. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
Language of instruction
English
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2007
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 12:00–13:50 A107
  • Timetable of Seminar Groups:
PV030/01: Mon 16:00–16:50 B311, Mon 16:00–16:50 B411, P. Sojka
PV030/03: Mon 18:00–18:50 B411, Mon 18:00–18:50 B311, P. Sojka
Prerequisites
! P030 Textual Information Systems
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 18 fields of study the course is directly associated with, display
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods (in Czech)
Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení písemek zadávaných v průběhu semestru na cvičeních. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2006
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Wed 10:00–11:50 D2, Thu 17:00–17:50 B410, Thu 17:00–17:50 B311
  • Timetable of Seminar Groups:
PV030/01: Thu 16:00–16:50 B410, Thu 16:00–16:50 B311, P. Sojka
PV030/03: Thu 18:00–18:50 B410, Thu 18:00–18:50 B311, P. Sojka
Prerequisites
! P030 Textual Information Systems
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 18 fields of study the course is directly associated with, display
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods (in Czech)
Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení písemek zadávaných v průběhu semestru na cvičeních. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2005
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Tue 10:00–11:50 D1
  • Timetable of Seminar Groups:
PV030/01: Tue 16:00–16:50 B204, Tue 16:00–16:50 B311, P. Sojka
PV030/02: Tue 17:00–17:50 B311, Tue 17:00–17:50 B204, P. Sojka
PV030/03: Tue 18:00–18:50 B204, Tue 18:00–18:50 B311, P. Sojka
Prerequisites
! P030 Textual Information Systems
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 18 fields of study the course is directly associated with, display
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods (in Czech)
Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení písemek zadávaných v průběhu semestru na cvičeních. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2004
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 12:00–13:50 D2
  • Timetable of Seminar Groups:
PV030/01: Mon 15:00–15:50 A107, Mon 15:00–15:50 B311, P. Sojka
PV030/02: Mon 16:00–16:50 A107, Mon 16:00–16:50 B311, P. Sojka
PV030/03: Mon 17:00–17:50 A107, Mon 17:00–17:50 B311, P. Sojka
Prerequisites
! P030 Textual Information Systems
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods (in Czech)
Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení písemek zadávaných v průběhu semestru na cvičeních. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2003
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
RNDr. David Antoš, Ph.D. (seminar tutor)
Guaranteed by
doc. Ing. Jan Staudek, CSc.
Department of Computer Systems and Communications – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Timetable
Mon 9:00–10:50 D2
  • Timetable of Seminar Groups:
PV030/01: Mon 13:00–13:50 B204, Mon 13:00–13:50 B311, P. Sojka
PV030/02: Mon 14:00–14:50 B204, Mon 14:00–14:50 B311, D. Antoš
PV030/03: Mon 15:00–15:50 B204, Mon 15:00–15:50 B311, D. Antoš
PV030/04: Mon 16:00–16:50 B204, Mon 16:00–16:50 B311, D. Antoš
PV030/05: Mon 17:00–17:50 B204, Mon 17:00–17:50 B311, D. Antoš
Prerequisites
! P030 Textual Information Systems
Students are strongly adviced to bring some basic knowledge of automata theory (IB005) and natural language processing (IB030 or IB047). Some database basics (PB154) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Course objectives
Basic techniques and algorithms used in textual information systems are taught. That means text search algorithms (KMP, AC, BM, RK, ...), data structures used for index storage, query languages, architecture of textual information system that uses natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an examples of search and indexing engine.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modelling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Assessment methods (in Czech)
Výuka probíhá klasickým způsobem a je zakončena písemným testem (tvoří 70 % hodnocení). Příklady testů z předchozích let jsou vystaveny na webu předmětu. 30 % závěrečného hodnocení tvoří hodnocení domácích písemných úloh zadávaných v průběhu semestru. Na cvičeních dochází k procvičování látky z přednášek, k brainstormingu. V průběhu výuky jsou studenti motivováni dílčími úkoly honorovanými udělením prémiových bodů.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught annually.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2019

The course is not taught in Spring 2019

Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 39 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Course is no more offered.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2018

The course is not taught in Spring 2018

Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 39 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Course is no more offered.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2017

The course is not taught in Spring 2017

Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 39 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Course is no more offered.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2016

The course is not taught in Spring 2016

Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 39 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
Course is no more offered.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2015

The course is not taught in Spring 2015

Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 38 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught once in two years.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.

PV030 Textual Information Systems

Faculty of Informatics
Spring 2014

The course is not taught in Spring 2014

Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Guaranteed by
doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Prerequisites
Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 38 fields of study the course is directly associated with, display
Course objectives
At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
Syllabus
  • Basic notions. TIS - text information system. Classification of information systems.
  • Searching in TIS. Searching and pattern matching classification and data structures.
  • Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
  • Theory of automata for searching. Classification of searching problems.
  • Indexes. Indexing methods. Data structures for searching and indexing.
  • Google as an example of search and indexing engine. Pagerank.
  • Signature methods.
  • Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
  • Data compression. Basic notions. Statistic methods.
  • Compression methods based on dictionary. Neural nets for text compression.
  • Syntactic methods. Context modeling.
  • Spell checking. Filtering information channels. Document classification.
Literature
  • Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
  • KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
  • Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
  • Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
Teaching methods
Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
Assessment methods
Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
Language of instruction
Czech
Follow-Up Courses
Further comments (probably available only in Czech)
The course is taught once in two years.
The course is taught: every week.
Teacher's information
http://www.fi.muni.cz/~sojka/PV030/
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013.
  • Enrolment Statistics (recent)