👷 Introduction to Information Retrieval
doc. RNDr. Petr Sojka, Ph.D.
👷 Introduction to Information Retrieval

Dear students,

Welcome to the 2025 run of the FI:PV211 Introduction to Information Retrieval course.

The course starts with an introduction based on the Introduction to Information Retrieval textbook by Manning, Raghavan, and Schutze (hard copies available in MU libraries) taught at Stanford, Munich, and other places. In the course you will, among other things, learn how it is possible to fulfill seekers' information needs at the pace of 10,000+ questions per second on the global web-scale within milliseconds. Since 2023, the use of transformers and large language models has been added to the syllabus.

Students will be motivated to try active/flipped learning approaches wherever possible.

The course moved from its  to IS MU in 2011. Please look if you would like to take a sneak peek at the and the topics we will discuss in the course. However, this interactive syllabus is this course's primary source of information.

Course trailer (in Czech)
A trailer for the PV211 Introduction to Information Retrieval course by Tomáš Effenberger
Second project assignment
CQADupStack Collection and the ARQMath Collection
Second project assignment (CQADupStack Collection)
Google Colaboratory code for the second project
Second project leaderboard (CQADupStack Collection)
Google Spreadsheet leaderboard for the second project
Alternative second project assignment (ARQMath Collection)
Google Colaboratory code for the alternative second project
Alternative second project leaderboard (ARQMath Collection)
Google Spreadsheet leaderboard for the alternative second project
Projects' Jupyter Hub
Dedicated computational resources for your projects

Introduction, and boolean retrieval 19. 2. 2025
Učitel doporučuje studovat od 17. 2. 2025 do 23. 2. 2025.
The term vocabulary and postings lists, and dictionaries and tolerant retrieval 26. 2. 2025
Učitel doporučuje studovat od 24. 2. 2025 do 2. 3. 2025.
Index construction 5. 3. 2025
Učitel doporučuje studovat od 3. 3. 2025 do 9. 3. 2025.
Index compression, and scoring, term weighting and the vector space model 12. 3. 2025
Učitel doporučuje studovat od 10. 3. 2025 do 16. 3. 2025.

2024-03-19: Submissions due for the first project

Computing scores in a complete search system, and evaluation in information retrieval 19. 3. 2025
Učitel doporučuje studovat od 17. 3. 2025 do 23. 3. 2025.

2024-03-26: Peer reviews due for the first project

Anatomy of the web-scale IR system and embedding revolution 26. 3. 2025
Učitel doporučuje studovat od 19. 3. 2025 do 30. 3. 2025.
Latent semantic representations: Introduction to LLM, matrix decompositions, LSI, and distributed word representations 2. 4. 2025
Učitel doporučuje studovat od 27. 3. 2025 do 6. 4. 2025.
Question Answering 9. 4. 2025
Učitel doporučuje studovat od 7. 4. 2025 do 13. 4. 2025.
Neural Information Retrieval 16. 4. 2025
Učitel doporučuje studovat od 14. 4. 2025 do 20. 4. 2025.
Relevance feedback, query expansion, text classification, (and a lot more) 23. 4. 2025
Učitel doporučuje studovat od 21. 4. 2025 do 27. 4. 2025.
Information retrieval by question answering by large language models and Clustering 30. 4. 2025
Učitel doporučuje studovat od 24. 4. 2025 do 4. 5. 2025.

2024-05-12: Submissions due for the second project

Web search basics 7. 5. 2025
Učitel doporučuje studovat od 5. 5. 2025 do 11. 5. 2025.

2024-05-19: Peer reviews due for the second project

Link analysis and Web crawling 14. 5. 2025
Učitel doporučuje studovat od 12. 5. 2025 do 18. 5. 2025.

    Here are materials from the previous runs of the course: spring 2019, spring 2020, spring 2021, spring 2022 and spring 2023

    I will be glad if you get encouraged into course topics and decide to get insight into them by solving [mini]projects. Activities in this direction will be rewarded with several premium points toward successful grading. The number of stars below is an estimate of project difficulty, from the mini project [(*), 10 points] to the big project size [(*****), 30+ points]. I am also open to assigning/extending a project as a Bachelor/Master/ Dissertation thesis. 

    • (*)+ Pointing to any (factual, typographical) errors in the course materials.
    • (**)+ Preparation of Deepnote instructions, documentation, and support for the solution of course projects
    • (**)+ Preparation of hot topic slides, production or preparation of motivating Khan-Academy style video, or other course materials in LaTeX.
    • (**)+ Presentation or teaching video on topics relevant to the course. Possible topics: Sketch Engine, search with linguistic attributes, random walks in texts, topic search and corpora, time-constrained search, topic modeling with gensim, LDA, Wolfram Alpha, specifics of search of structured data (chemical and mathematical formulae, linguistic trees - syntactic or dependency), etc.
    • (***) Participation in IR competition at Kaggle.com.
    • (***)+ Participation in IR research in our group Math Information Retrieval on research agendas and ARQMath task or EuDML project or DML project.
    • (***)+ Evaluation of Math Information Retrieval in system MIaS - possible as a Dean project or a Bachelor/Master/Dissertation thesis.

    To a pupil who was in danger, Master said, “Those who do not make mistakes, they are most mistaken for all – they do not try anything new.” Anthony de Mello

    Předchozí
    Následující