PA154 Language Modeling

Faculty of Informatics
Spring 2025
Extent and Intensity
2/0/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
In-person direct teaching
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Zuzana Nevěřilová, Ph.D. (assistant)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 32 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
English
Further Comments
The course is taught annually.
The course is taught: every week.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024.

PA154 Language Modeling

Faculty of Informatics
Spring 2024
Extent and Intensity
2/0/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Zuzana Nevěřilová, Ph.D. (assistant)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Tue 12:00–13:50 C416
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 51 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
English
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2023
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Thu 16. 2. to Thu 11. 5. Thu 14:00–15:50 C511
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 51 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
English
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2022
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Thu 17. 2. to Thu 12. 5. Thu 12:00–13:50 C416
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 51 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
English
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2021
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Tue 10:00–11:50 Virtuální místnost
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 51 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2020
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. Mgr. Pavel Rychlý, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Mon 17. 2. to Fri 15. 5. Mon 12:00–13:50 A218
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 51 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2019
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. RNDr. Aleš Horák, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Wed 10:00–11:50 C525
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 19 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2018
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. RNDr. Aleš Horák, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Mon 14:00–15:50 B411
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 19 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
Learning outcomes
At the end of the course students will be able to: use tools containing language models; understand the related theories and algorithms; include probabilistic models in the design of text processing applications; implement selected techniques in own applications.
Syllabus
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2017
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. RNDr. Aleš Horák, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Thu 14:00–15:50 C525
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 19 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
At the end of the course students will not only be able to use these tools, but mainly will understand the related theories and algorithms, which is often a key competence for the right (effective and correct) usage of these tools.
Syllabus
  • NLTK toolkit
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2016
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. RNDr. Aleš Horák, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Mon 10:00–11:50 C416
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 19 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
At the end of the course students will not only be able to use these tools, but mainly will understand the related theories and algorithms, which is often a key competence for the right (effective and correct) usage of these tools.
Syllabus
  • NLTK toolkit
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Language Modeling

Faculty of Informatics
Spring 2015
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
doc. RNDr. Aleš Horák, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Wed 8:00–9:50 C416
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 18 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
At the end of the course students will not only be able to use these tools, but mainly will understand the related theories and algorithms, which is often a key competence for the right (effective and correct) usage of these tools.
Syllabus
  • NLTK toolkit
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2014
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Wed 10:00–11:50 G125
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 18 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
At the end of the course students will not only be able to use these tools, but mainly will understand the related theories and algorithms, which is often a key competence for the right (effective and correct) usage of these tools.
Syllabus
  • NLTK toolkit
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2013
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
RNDr. Miloš Jakubíček, Ph.D. (seminar tutor)
RNDr. Vojtěch Kovář, Ph.D. (seminar tutor)
RNDr. Vít Suchomel, Ph.D. (assistant)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Tue 8:00–9:50 B411
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 25 fields of study the course is directly associated with, display
Course objectives
This course aims at providing the students with state-of-the-art in (mainly statistical) methods, algorithms and tools used for processing of large text corpora when they are created or subject to subsequent information retrieval.
These tools are practically used in many areas of natural language processing (semiautomatic building of text corpora, morphological analysis and desambiguation, syntactic analysis, effective indexation and search in text corpora, statistical machine translation, semantic analysis etc.).
At the end of the course students will not only be able to use these tools, but mainly will understand the related theories and algorithms, which is often a key competence for the right (effective and correct) usage of these tools.
Syllabus
  • NLTK toolkit
  • Elements of Probability and Information Theory
  • Language Modeling in General and the Noisy Channel Model
  • Smoothing and the Expectation-Maximization algorithm
  • Markov models, Hidden Markov Models (HMMs)
  • Viterbi Algorithm
  • Tagging methods, HMM Tagging, Statistical Transformation Rule-Based Tagging
  • Statistical Alignment and Machine Translation
  • Text Categorization and Clustering
  • Graphical Models
  • Parallelization, MapReduce
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • MANNING, Christopher D. and Hinrich SCHÜTZE. Foundations of statistical natural language processing. Cambridge: MIT Press, 1999, xxxvii, 68. ISBN 0-262-13360-1. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2012
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics
Timetable
Thu 14:00–15:50 G124
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 25 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the principles of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2011
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Thu 10:00–11:50 C511
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 24 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the principles of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2010
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Tue 13:00–14:50 B313
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 24 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the principles of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Teaching methods
lectures
Assessment methods
Written exam.
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2009
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Tue 15:00–16:50 B410
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 21 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the principles of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Assessment methods
Lectures, written exam.
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2008
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Thu 8:00–9:50 B410
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 21 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2007
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Wed 18:00–19:50 B411
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 9 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2006, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2006
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Thu 10:00–11:50 B411
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 9 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
Study Materials
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2005
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer)
Guaranteed by
prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. Mgr. Pavel Rychlý, Ph.D.
Timetable
Tue 18:00–19:50 B411
Course Enrolment Limitations
The course is only offered to the students of the study fields the course is directly associated with.
fields of study / plans the course is directly associated with
there are 9 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2004
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
prof. PhDr. Karel Pala, CSc. (lecturer)
Guaranteed by
prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc.
Timetable
Tue 18:00–19:50 B204
Course Enrolment Limitations
The course is only offered to the students of the study fields the course is directly associated with.
fields of study / plans the course is directly associated with
there are 8 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.

PA154 Corpus Tools

Faculty of Informatics
Spring 2003
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
prof. PhDr. Karel Pala, CSc. (lecturer)
Guaranteed by
prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc.
Timetable
Tue 10:00–11:50 B204
Course Enrolment Limitations
The course is only offered to the students of the study fields the course is directly associated with.
fields of study / plans the course is directly associated with
there are 8 fields of study the course is directly associated with, display
Course objectives
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
Syllabus
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
Literature
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000, 531 s. ISBN 80-7184-893-X. info
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is also listed under the following terms Spring 2004, Spring 2005, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2011, Spring 2012, Spring 2013, Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025.
  • Enrolment Statistics (recent)