FI:PA212 Advanced Search Techniques

PA212 Advanced Search Techniques for Large Scale Data Analytics

Faculty of Informatics
Spring 2025

Extent and Intensity

2/0/0. 2 credit(s) (plus extra credits for completion). Type of Completion: zk (examination).
In-person direct teaching

Teacher(s)

doc. RNDr. Jan Sedmidubský, Ph.D. (lecturer)
prof. Ing. Pavel Zezula, CSc. (lecturer)

Guaranteed by

doc. RNDr. Jan Sedmidubský, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics

Prerequisites

Knowledge of the basic principles of data processing is assumed.

Course Enrolment Limitations

The course is also offered to the students of the fields other than those the course is directly associated with.

fields of study / plans the course is directly associated with

Image Processing and Analysis (programme FI, N-VIZ)
Bioinformatics and systems biology (programme FI, N-UIZD)
Computer Games Development (programme FI, N-VIZ_A)
Computer Graphics and Visualisation (programme FI, N-VIZ_A)
Computer Networks and Communications (programme FI, N-PSKB_A)
Cybersecurity Management (programme FI, N-RSSS_A)
Discrete algorithms and models (programme FI, N-TEI)
Formal analysis of computer systems (programme FI, N-TEI)
Graphic design (programme FI, N-VIZ)
Graphic Design (programme FI, N-VIZ_A)
Hardware Systems (programme FI, N-PSKB_A)
Hardware systems (programme FI, N-PSKB)
Image Processing and Analysis (programme FI, N-VIZ_A)
Information security (programme FI, N-PSKB)
Information Security (programme FI, N-PSKB_A)
Quantum and Other Nonclassical Computational Models (programme FI, N-TEI)
Computer graphics and visualisation (programme FI, N-VIZ)
Computer Networks and Communications (programme FI, N-PSKB)
Principles of programming languages (programme FI, N-TEI)
Cybersecurity management (programme FI, N-RSSS)
Services development management (programme FI, N-RSSS)
Software Systems Development Management (programme FI, N-RSSS)
Services Development Management (programme FI, N-RSSS_A)
Software Systems Development Management (programme FI, N-RSSS_A)
Software Systems (programme FI, N-PSKB_A)
Software systems (programme FI, N-PSKB)
Machine learning and artificial intelligence (programme FI, N-UIZD)
Computer Games Development (programme FI, N-VIZ)
Processing and analysis of large-scale data (programme FI, N-UIZD)
Natural language processing (programme FI, N-UIZD)

Course objectives

The objective of the course is to explain the problems of information retrieval in large collections of unstructured data, such as text documents or multimedia objects. The main emphasis will be on describing the basic principles of distributed algorithms for processing large volumes of data, e.g., Locality Sensitive Hashing, MapReduce, or PageRank. The algorithms for processing stream data will be introduced as well. The students will also acquire practical experience by applying the presented algorithms to specific tasks.

Learning outcomes

After completing the course, students are able to:

Describe algorithmic-based differences between processing offline data collections and online data streams;

Understand the basic principles of distributed algorithms for processing large volumes of data;

Evaluate the results of algorithms by several metrics;

Apply presented algorithms, such as k-Means, Locality Sensitive Hashing, MapReduce, or PageRank, to specific tasks.

Syllabus

Introduction – what is searching, things useful to know
Support for distributed processing – distributed processing, MapReduce, performance evaluation
Retrieval operators and metrics – common similarity search operators, retrieval metrics for evaluating search results
Clustering – clustering in Euclidean and non-Euclidean spaces; hierarchical, k-means, and BFR clustering algorithms
Finding frequent item sets – counting frequent items; A-Priori and PCY algorithms
Finding similar items – near-neighbor search, shingling of documents, min-hashing, Locality Sensitive Hashing
Processing data streams – sampling data from a stream, queries over sliding windows, filtering a stream
Link analysis – PageRank, topic sensitive PageRank, link spam
Search applications – advertising on the web, recommender systems

Literature

recommended literature

P, Deepak and Prasad M. DESHPANDE. Operators for similarity search : semantics, techniques and usage scenarios. Cham: Springer, 2015, xi, 115. ISBN 9783319212562. info
LESKOVEC, Jurij, Anand RAJARAMAN and Jeffrey D. ULLMAN. Mining of massive datasets. 2nd ed. Cambridge: Cambridge University Press, 2014, xi, 467. ISBN 9781107077232. info
BAEZA-YATES, R. and Berthier de Araújo Neto RIBEIRO. Modern information retrieval : the concepts and technology behind search. 2nd ed. Harlow: Pearson, 2011, xxx, 913. ISBN 9780321416919. info

Teaching methods

Lectures with slides in English. The approach combines theory, algorithms, and practical examples.

Assessment methods

The final exam consists of only a written part. The student is asked several theoretical and practical questions to verify their knowledge obtained during the course lectures.

Language of instruction

English

Further Comments

The course is taught annually.
The course is taught: every week.

The course is also listed under the following terms Spring 2017, Spring 2018, Spring 2019, Spring 2020, Spring 2021, Spring 2022, Spring 2023, Spring 2024.

Enrolment Statistics (Spring 2025, recent)
Permalink: https://is.muni.cz/course/fi/spring2025/PA212

FI:PA212 Advanced Search Techniques - Course Information

PA212 Advanced Search Techniques for Large Scale Data Analytics

Other applications