PV211 Introduction to Information Retrieval

Faculty of Informatics
Spring 2024
Extent and Intensity
2/1/0. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Petr Sojka, Ph.D. (lecturer)
Mgr. Marek Toma (seminar tutor)
Ing. Martin Fajčík (seminar tutor)
Santosh Kesiraju, Ph.D. (seminar tutor)
Mgr. Šárka Ščavnická (seminar tutor)
Mgr. Michal Štefánik (assistant)
RNDr. Viktória Spišaková (assistant)
Mgr. Tereza Vrabcová (assistant)
Mgr. Marek Kadlčík (assistant)
Guaranteed by
doc. RNDr. Petr Sojka, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics
Timetable
Wed 12:00–13:50 D2, except Wed 17. 4. ; and Wed 17. 4. 12:00–13:50 B517
  • Timetable of Seminar Groups:
PV211/01: Thu 12:00–12:50 B011, M. Fajčík, S. Kesiraju, Š. Ščavnická, M. Štefánik, M. Toma
PV211/02: Thu 13:00–13:50 B011, M. Fajčík, S. Kesiraju, Š. Ščavnická, M. Štefánik, M. Toma
Prerequisites
SOUHLAS
As the main teacher will take a sabbatical in Spring 2024, this year's lectures will be [partly] substituted by previous year's recordings and invited lectures. Enrollment will be limited (SOUHLAS needed) with preference given to UMI students. Curiosity and motivation to retrieve information about information retrieval. Chapters 1--5 benefit from a basic course on algorithms and data structures. Chapters 6--7 need in addition linear algebra, vectors, and dot products. For Chapters 11--13 basic probability notions are needed. Chapters 18--21 demand course in linear algebra, notions of matrix rank, eigenvalues, and eigenvectors.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 73 fields of study the course is directly associated with, display
Course objectives
The main objectives of this course are to introduce principles of information retrieval and get acquainted with machine learning algorithms for NLP-based text processing.
Learning outcomes
Students will understand document preprocessing, tokenization, lemmatization, indexing, and querying done on up to a web-scale (as Google does). First principles and algorithms of NLP-based text preprocessing, text semantic filtering and classification, and web searching needed for information systems and digital library design will be taught.
Syllabus
  • Boolean retrieval; The term vocabulary and postings lists
  • Dictionaries and tolerant retrieval
  • Index construction, Index compression
  • Scoring, term weighting, and the vector space model
  • Computing scores in a complete search system
  • Evaluation in information retrieval
  • Relevance feedback and query expansion
  • XML/MathML retrieval
  • Text classification with vector space model
  • Machine learning and information retrieval
  • Matrix decompositions and latent semantic indexing
  • Web search basics
  • Web crawling and indexes
  • Link analysis, PageRank
  • Invited lectures on hot topics, e.g. deep learning approaches to multilingual NLP and multimodal IR.
Literature
    required literature
  • MANNING, Christopher D., Prabhakar RAGHAVAN and Hinrich SCHÜTZE. Introduction to information retrieval. 1st pub. Cambridge: Cambridge University Press, 2008, xxi, 482. ISBN 9780521865715. info
  • http://informationretrieval.org
    recommended literature
  • BAEZA-YATES, R. and Berthier de Araújo Neto RIBEIRO. Modern information retrieval : the concepts and technology behind search. 2nd ed. Harlow: Pearson, 2011, xxx, 913. ISBN 9780321416919. info
Teaching methods
Student activities are explicitly welcomed as a part of the evaluation. Mentoring rather than ex-cathedra lectures: ``The flipped classroom is a pedagogical model in which the typical lecture and homework elements of a course are reversed.'' Students will be expected to come prepared by reading the given materials in advance. Contact hours will be devoted to a topically constrained discussion or to solving examples during exercises. This will respect individual learning speed and students' apriori knowledge. Rich study materials are available: MOOC, materials on http://web.stanford.edu/class/cs276/, including the whole IIR book http://nlp.stanford.edu/IR-book/.
These teaching methods may be complemented by invited lectures of specialists from the IR community (researchers of Seznam, Facebook, RaRe Technologies, etc.).
Assessment methods
Evaluation is based on the system that motivates students for continuous work during the semester and for active participation in the course.
The classification system is based on points achieved (100 pts). A student can get 60 pts during the term: 20 pts for each of two programming tasks, 12=2x6 pts for evaluation of your colleague's results, 8 pts for your activity during the term (lectures or discussion forums,...). 40 pts could be achieved in the final test (ROPOT in IS), consisting of multiple-choice questions (2x20 pts). In addition, one can get additional premium points based on activities during lectures, exercises (good answers) or negotiated related projects. Grading scale (adjustments based on ECTS suggestions) z/k[/E/D/C/B/A] corresponds approximately to 50/57/[64/71/78/85/92] points.
Dates of at least three terms of final exams will be announced via IS.muni.cz.
Language of instruction
English
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
Teacher's information
https://www.fi.muni.cz/~sojka/PV211/
Materials will be posted and updated in the interactive syllabi https://is.muni.cz/auth/el/fi/jaro2024/PV211/index.qwarp.
The course is also listed under the following terms Spring 2014, Spring 2015, Spring 2016, Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2025.
  • Enrolment Statistics (Spring 2024, recent)
  • Permalink: https://is.muni.cz/course/fi/spring2024/PV211