FI:PV211 Information Retrieval - Course Information
PV211 Introduction to Information Retrieval
Faculty of InformaticsSpring 2018
- Extent and Intensity
- 2/1/0. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. RNDr. Petr Sojka, Ph.D. (lecturer)
RNDr. Vít Starý Novotný, Ph.D. (seminar tutor)
RNDr. Michal Růžička, Ph.D. (assistant) - Guaranteed by
- doc. RNDr. Petr Matula, Ph.D.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D.
Supplier department: Department of Visual Computing – Faculty of Informatics - Timetable
- Wed 12:00–13:50 D3, except Wed 16. 5.
- Timetable of Seminar Groups:
PV211/02: Thu 9:00–9:50 B410, except Thu 22. 2., except Thu 12. 4., except Thu 19. 4., except Thu 10. 5. ; and Thu 22. 2. 9:00–9:50 B311, Thu 12. 4. 9:00–9:50 B311, Thu 19. 4. 9:00–9:50 B311, Thu 10. 5. 9:00–9:50 B311, P. Sojka, V. Starý Novotný - Prerequisites
- Curiosity and motivation to retrieve information about information retrieval. Chapters 1--5 benefit from basic course on algorithms and data structures. Chapters 6--7 needs in addition linear algebra, vectors and dot products. For Chapters 11--13 basic probability notions are needed. Chapters 18--21 demand course in linear algebra, notions of matrix rank, eigenvalues and eigenvectors.
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- there are 36 fields of study the course is directly associated with, display
- Course objectives
- Main objectives can be summarized as follows: - to understand basic principles of information retrieval: document preprocessing, indexing, and querying done on up to a web scale (as Google does); - to understand principles and algorithms of NLP-based text preprocessing, text semantic filtering and classification, and web searching needed for information systems and digital library design.
- Learning outcomes
- Students that successfully complete the course will be able
- to understand architecture, algorithms and basic principles of indexing and searching of (textual) information systems;
- to evaluate the design and properties of information systems;
- to understand key algorithms (PageRank, kNN), metrics (precision, recall, F-measure) used in the information retrieval domain;
- to gain insight into inner working of scalable systems of Google type. - Syllabus
- Boolean retrieval; The term vocabulary and postings lists
- Dictionaries and tolerant retrieval
- Index construction, Index compression
- Scoring, term weighting and the vector space model
- Computing scores in a complete search system
- Evaluation in information retrieval
- Relevance feedback and query expansion
- XML and MathML retrieval
- Text classification with vector space model
- Machine learning and information retrieval
- Hierarchical clustering
- Matrix decompositions and latent semantic indexing
- Web search basics
- Web crawling and indexes
- Link analysis, PageRank
- Invited lectures on related topics: image indexing, machine learning to rank, deep learning approaches, or even gait recognition.
- Literature
- required literature
- MANNING, Christopher D., Prabhakar RAGHAVAN and Hinrich SCHÜTZE. Introduction to information retrieval. 1st pub. Cambridge: Cambridge University Press, 2008, xxi, 482. ISBN 9780521865715. info
- http://informationretrieval.org
- recommended literature
- BAEZA-YATES, R. and Berthier de Araújo Neto RIBEIRO. Modern information retrieval : the concepts and technology behind search. 2nd ed. Harlow: Pearson, 2011, xxx, 913. ISBN 9780321416919. info
- Teaching methods
- Student activities explicitly welcomed as a part of evaluation (10 pts). Mentoring rather than ex cathedra lectures: ``The flipped classroom is a pedagogical model in which the typical lecture and homework elements of a course are reversed.'' Students will be expected to come prepared by reading given materials in advance. Contact hours will be devoted to topically constrained discussion (during lecture hours) or by solving examples during exercises. This will respect individual learning speed and students' apriori knowledge.
Questions on PV211 IS discussion forum are welcome especially before lectures.
Rich study materials are available: MOOC, materials on http://web.stanford.edu/class/cs276/, including the whole IIR book http://nlp.stanford.edu/IR-book/.
These teaching methods will be complemented by invited lectures of specialists from the IR community (researchers of Seznam, Facebook, RaRe Technologies, etc.). - Assessment methods
- Evaluation is based on the system that motivates students for continuous work during semester and for active participation in the course.
Classification system is based on points achieved (100 pts). A student can get 50 pts during the term: 20 pts for each of two midterm tests, 10 pts for your activity during term (lectures or discussion forums,...), and 50 pts for the final test. Final written exam will consist of open exercises (30 pts, similar to midterm ones) and multiple choice questions (20 pts). In addition, one can get additional premium points based on activities during lectures, exercises (good answers) or negotiated related projects. Grading scale (adjustments based on ECTS suggestions) z/k[/E/D/C/B/A] corresponds approximately to 50/57/[64/71/78/85/92] points.
Dates of at least three terms of final exams will be announced via IS.muni.cz. One substitute midterm test will be held during the exam period for those officially excused at the study department. - Language of instruction
- English
- Further comments (probably available only in Czech)
- Study Materials
The course is taught annually. - Teacher's information
- http://www.fi.muni.cz/~sojka/PV211/
- Enrolment Statistics (Spring 2018, recent)
- Permalink: https://is.muni.cz/course/fi/spring2018/PV211