FI:PA212 Advanced Search Techniques - Course Information
PA212 Advanced Search Techniques for Large Scale Data Analytics
Faculty of InformaticsSpring 2023
- Extent and Intensity
- 2/0/0. 2 credit(s) (plus extra credits for completion). Type of Completion: zk (examination).
- Teacher(s)
- doc. RNDr. Jan Sedmidubský, Ph.D. (lecturer)
prof. Ing. Pavel Zezula, CSc. (lecturer) - Guaranteed by
- doc. RNDr. Jan Sedmidubský, Ph.D.
Department of Machine Learning and Data Processing – Faculty of Informatics
Supplier department: Department of Machine Learning and Data Processing – Faculty of Informatics - Timetable
- Thu 16. 2. to Thu 11. 5. Thu 12:00–13:50 B410
- Prerequisites
- Knowledge of the basic principles of data processing is assumed.
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Image Processing and Analysis (programme FI, N-VIZ)
- Applied Informatics (programme FI, B-AP)
- Applied Informatics (programme FI, N-AP)
- Information Technology Security (eng.) (programme FI, N-IN)
- Information Technology Security (programme FI, N-IN)
- Bioinformatics and systems biology (programme FI, N-UIZD)
- Bioinformatics (programme FI, B-AP)
- Bioinformatics (programme FI, N-AP)
- Computer Games Development (programme FI, N-VIZ_A)
- Computer Graphics and Visualisation (programme FI, N-VIZ_A)
- Computer Networks and Communications (programme FI, N-PSKB_A)
- Cybersecurity Management (programme FI, N-RSSS_A)
- Discrete algorithms and models (programme FI, N-TEI)
- Formal analysis of computer systems (programme FI, N-TEI)
- Graphic design (programme FI, N-VIZ)
- Graphic Design (programme FI, N-VIZ_A)
- Hardware Systems (programme FI, N-PSKB_A)
- Hardware systems (programme FI, N-PSKB)
- Image Processing and Analysis (programme FI, N-VIZ_A)
- Information security (programme FI, N-PSKB)
- Information Systems (programme FI, N-IN)
- Informatics with another discipline (programme FI, B-EB)
- Informatics with another discipline (programme FI, B-FY)
- Informatics with another discipline (programme FI, B-GE)
- Informatics with another discipline (programme FI, B-GK)
- Informatics with another discipline (programme FI, B-CH)
- Informatics with another discipline (programme FI, B-IO)
- Informatics with another discipline (programme FI, B-MA)
- Informatics with another discipline (programme FI, B-TV)
- Public Administration Informatics (programme FI, B-AP)
- Information Security (programme FI, N-PSKB_A)
- Quantum and Other Nonclassical Computational Models (programme FI, N-TEI)
- Mathematical Informatics (programme FI, B-IN)
- Parallel and Distributed Systems (programme FI, B-IN)
- Parallel and Distributed Systems (programme FI, N-IN)
- Computer graphics and visualisation (programme FI, N-VIZ)
- Computer Graphics and Image Processing (programme FI, B-IN)
- Computer Graphics (programme FI, N-IN)
- Computer Networks and Communication (programme FI, B-IN)
- Computer Networks and Communication (programme FI, N-IN)
- Computer Networks and Communications (programme FI, N-PSKB)
- Computer Systems and Data Processing (programme FI, B-IN)
- Computer Systems (programme FI, N-IN)
- Principles of programming languages (programme FI, N-TEI)
- Embedded Systems (eng.) (programme FI, N-IN)
- Programmable Technical Structures (programme FI, B-IN)
- Embedded Systems (programme FI, N-IN)
- Cybersecurity management (programme FI, N-RSSS)
- Services development management (programme FI, N-RSSS)
- Software Systems Development Management (programme FI, N-RSSS)
- Services Development Management (programme FI, N-RSSS_A)
- Service Science, Management and Engineering (eng.) (programme FI, N-AP)
- Service Science, Management and Engineering (programme FI, N-AP)
- Social Informatics (programme FI, B-AP)
- Software Systems Development Management (programme FI, N-RSSS_A)
- Software Systems (programme FI, N-PSKB_A)
- Software systems (programme FI, N-PSKB)
- Machine learning and artificial intelligence (programme FI, N-UIZD)
- Theoretical Informatics (programme FI, N-IN)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-EB)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-FY)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-GK)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-MA)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-SS)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-TV)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-FY)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-GK)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-MA)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-SS) (2)
- Upper Secondary School Teacher Training in Informatics (programme FI, N-TV)
- Artificial Intelligence and Natural Language Processing (programme FI, B-IN)
- Artificial Intelligence and Natural Language Processing (programme FI, N-IN)
- Computer Games Development (programme FI, N-VIZ)
- Processing and analysis of large-scale data (programme FI, N-UIZD)
- Image Processing (programme FI, N-AP)
- Natural language processing (programme FI, N-UIZD)
- Course objectives
- The objective of the course is to explain the problems of information retrieval in large collections of unstructured data, such as text documents or multimedia objects. The main emphasis will be given on describing basic principles of distributed algorithms for processing large volumes of data, e.g., Locality-sensitive hashing, MapReduce or PageRank. The algorithms for processing stream data will be introduced as well. The students will also acquire practical experience by applying the presented algorithms to the specific tasks.
- Learning outcomes
- After completing the course students are able to:
- Describe algorithmic-based differences between processing offline data collections and online data streams; - Understand the basic principles of distributed algorithms for processing large volumes of data;
- Evaluate the results of algorithms by several metrics;
- Apply presented algorithms, such as K-Means, Locality-sensitive hashing, MapReduce or PageRank, to the specific tasks. - Syllabus
- Introduction – What is searching, Things useful to know
- Support for Distributed Processing – Distributed file system, MapReduce, Algorithms using MapReduce, Cost model and performance evaluation
- Retrieval Operators and Result Evaluations – Common similarity search operators, Retrieval metrics
- Clustering – K-means algorithms, Clustering in non-Euclidean spaces, Clustering for streams and parallelism
- Finding Frequent Item Sets – Handling large datasets in main memory, Counting frequent items in a stream
- Finding Similar Items – Applications of near-neighbor search, Shingling of documents, Similarity-preserving summaries of sets, Locality sensitive hashing
- Searching in Data Streams – The stream data model, Filtering streams
- Link Analysis – Page Rank, Topic sensitive, Link spam
- Search Applications – Advertising on the web, Recommendation systems (collaborative filtering), Mining social-network graphs
- Literature
- recommended literature
- P, Deepak and Prasad M. DESHPANDE. Operators for similarity search : semantics, techniques and usage scenarios. Cham: Springer, 2015, xi, 115. ISBN 9783319212562. info
- LESKOVEC, Jurij, Anand RAJARAMAN and Jeffrey D. ULLMAN. Mining of massive datasets. 2nd ed. Cambridge: Cambridge University Press, 2014, xi, 467. ISBN 9781107077232. info
- BAEZA-YATES, R. and Berthier de Araújo Neto RIBEIRO. Modern information retrieval : the concepts and technology behind search. 2nd ed. Harlow: Pearson, 2011, xxx, 913. ISBN 9780321416919. info
- Teaching methods
- Lectures with slides in English. The approach combines theory, algorithms and practical examples.
- Assessment methods
- The final exam consists of only a written part. The student is asked several theoretical and practical questions to verify their knowledge obtained during the course lectures.
- Language of instruction
- English
- Further Comments
- Study Materials
The course is taught annually.
- Enrolment Statistics (Spring 2023, recent)
- Permalink: https://is.muni.cz/course/fi/spring2023/PA212