👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization
doc. RNDr. Petr Sojka, Ph.D.
👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

News

  • The course is a regular research seminar in the stated researched areas. It is mandatory that enrolled student has a presentation on a research topic during the term. Topics of presentations are those focused primarily (but not necessarily) on those of the  group: machine learning, information retrieval, representation learning, and scientific visualization.
  • There is a discussion group with official course information and a communication channel, in addition to the course outline below: watch both frequently!

Topics and Course Outline

Week 1

Join us at A502 the Faculty of Informatics MU on September 21st at 10 AM (CET).

  1. 10:00 Class introduction, warm-up round-up discussion (expectation, topics/expertise/background, suggested presentations, and readings). Why do you go to college? Bring your research presentation offers and ideas to present, read, study, and discuss!
  2. 10:30 Principles of research communication and scientific work, Reading a scientific paper. Put readers in your place!
    Specifics of CS research and doctoral studies and their evaluation at FI MU: CS conference rankings
  3. 10:40 Importance of "selling the ideas and work", picking the right topics and questions, researching "big issues", picking the right publication forums (in CS and NLP), and h-index as a measure of impact. The danger of Tyranny of metrics.
  4. 10:50 Motivating video: DEK's advice to young students.
  5. 10:55 Preparation of schedule of talks for this term, topics to cover.
  6. 11:20 Varia, socializing (team builder wanted!), lunch.

Week 2  no meeting (state holiday)


Week 3  Michal Štefánik: Can In-context Learners Learn a New Reasoning Concept from Demonstrations?

Join us in room A502 at FI MU on October 5th at 10 AM (CET) [or on Zoom.].

Chapter contains:
1
Image
1
Study text
Teacher recommends to study from 24/9/2023 to 7/10/2023.

Week 4 – Denisa Šrámková: Interpretability of Deep Learning

Canceled due to the speaker's illness. Will be presented next week together with Adam's talk.  Join us in room A502 at FI MU on October 12th at 9:15 AM! (CET) [or on Zoom.]

Proteins with knotted backbones are an exceedingly rare phenomenon, and the mechanisms governing the knot formation and functional implications remain poorly understood. We fine-tuned the ProtBert-BFD Transformer to classify proteins as either knotted or unknotted solely from their primary structure. As a training set, we used a collection of proteins from selected protein families whose 3D structures were predicted by AlphaFold2. The knotted status of proteins was assigned using Topoly (polymer topology analysis tool). While the model exhibits high accuracy (98%) in predicting a protein's knot status, it does not directly provide a biological explanation or pinpoint which regions of the protein contribute to knot formation. To uncover this phenomenon, we propose a patching technique: a sliding window (patch) replacing part of the sequence and therefore testing the importance of this part for the knot formation. We tested this method on proteins from the SPOUT family and found that the most influential patches reside within the C-terminal portion of the knot core, which is also responsible for substrate binding.

Chapter contains:
2
Image
1
PDF
1
Video
1
Study text
Teacher recommends to study from 5/10/2023 to 20/10/2023.

Week 5 – Adam Hájek: De-novo identification of small molecules from their GC-EI-MS spectra

Join us in room A502 at FI MU on October 19th at 10:00 AM (CET) [or on Zoom.]

Mass spectrometry is an analytical technique used to determine the mass-to-charge ratio of ions. When combined with chromatography, it becomes a powerful tool for identifying molecules in chemical samples. Typically, the analysis of experimental spectra relies on comparing them to a well-maintained database of reference data. However, a significant challenge arises because existing spectral databases don't adequately cover the vast chemical space. To address this limitation, recent attention has shifted towards machine learning-based de-novo methods. These methods can directly derive the molecular structure from the mass spectrum. In this context, we introduce a novel approach that addresses a specific use case involving GC-EI-MS spectra. This case is particularly challenging because it lacks additional information from the initial stage of MS/MS experiments, which previous methods depend on.

Chapter contains:
1
Image
2
Video
1
Study text
Teacher recommends to study from 13/10/2023 to 21/10/2023.

Week 6 –

canceled (no speaker found, we will meet on Week 14 instead)...

Week 7 – Adam Hájek: De-novo identification (cont.) + Vlastimil Martinek: TBA

Join us in room A502 at FI MU on November 2nd at 10 AM (CET) [or on Zoom.]

Chapter contains:
1
Image
1
Video
1
Study text
Teacher recommends to study from 19/10/2023.

Week 8 – Dávid Meluš: Enhancing Quality of Optical Character Recognition for Financial Document Processing

Join us in room A502 at FI MU on November 9th at 10 AM (CET) [or on Zoom.]

[Dávid Meluš]: Enhancing Quality of Optical Character Recognition for Financial Document Processing [Šárka Ščavnická]: CIVQA

Chapter contains:
1
PDF
1
Video
1
Study text
Teacher recommends to study from 7/11/2023 to 16/11/2023.

Week 9 – David Valecký: Transformers in Computer Vision

Join us in room A502 at FI MU on November 16th at 10 AM (CET) [or on Zoom.]

Chapter contains:
1
PDF
1
Video
1
Study text
Teacher recommends to study from 9/11/2023 to 30/11/2023.

Week 10 – Marek Kadlčík: TBA

Join us in room A502 at FI MU on November 23rd at 10 AM (CET) [or on Zoom.]

Large language models (LLMs) are commonly used for solving natural language tasks like question answering or generating text. However, their outputs can be outdated, factually incorrect, or untruthful. In particular, LLMs are notoriously bad at arithmetic computation. A promising way to mitigate this problem is to allow LLMs to interact with external tools, such as a calculator, a computer algebra system, or a code interpreter. In this talk, we will cover the training of calculator-using models, compare their capability of solving math word problems to vanilla LLM baselines, and discuss possible improvements in the training workflow.

Chapter contains:
1
Image
1
PDF
1
Video
1
Study text
Teacher recommends to study from 16/11/2023 to 24/11/2023.

Week 11 – Jan Rodák: TBA

Join us in room A502 at FI MU on November 30th at 10 AM (CET) [or on Zoom.]

Cybersecurity has become increasingly important in recent years. Many organisations such as large enterprises, governments, hospitals, the military and airports use computers to communicate, process data, serve customers, etc. These computers may contain sensitive data or be part of a company's critical infrastructure. Protecting and securing these machines is becoming an increasingly important issue for many companies. One approach to securing the system is an automated security audit (according to the organization's security policy). These security policies define, through a set of rules and recommendations, what a secure system should look like for specific use cases. SCAP is used to check whether the system complies with the policy. There are ideas on how to use machine learning to improve user-friendliness and simplify the work of developers.

Chapter contains:
1
PDF
1
Video
1
Study text
Teacher recommends to study from 14/11/2023 to 30/11/2023.

Week 12 – Andrej Kubanda: Forecasting of glycemia

Join us in room A502 at FI MU on December 7th at 10 AM (CET) [or on Zoom.]

Chapter contains:
1
PDF
1
Video
1
Study text
Teacher recommends to study from 30/11/2023 to 7/12/2023.

Week 13 – David Čechák: TBA

MicroRNAs are small non-coding RNAs that play a central role in many molecular processes, but the exact rules of their activity are not known. One of the processes is gene regulation, pairing with the Ago protein and, as a pair, binds to mRNA. The common techniques used in this field are manual feature selection followed by a classical ML method. However, these methods are greatly dependent on short sequence patterns called a seed. As a result, they work well on conventional binding caused by the seed, however, they lack in less frequent cases of unconventional binging. We build an explainable CNN model for the binding of miRNA and a subsequence of mRNA. Subsequently, we use this model to scan the whole mRNA sequence (transcript) and produce a signal of SHAP values. This scanning method creates a signal sample for each transcript. We try to correlate the signal with a fold change in gene expression the miRNA would cause if introduced in large quantities to the environment. We build a CNN + RNN regression to predict the fold change based on the signal. We hypothesize that using the signal could help to shield the model overfitting on simple sequence patterns and help with cases where the conventional seed pattern is not strongly present.

Chapter contains:
1
PDF
1
Video
1
Study text
Teacher recommends to study from 30/11/2023 to 8/12/2023.

Join us in room A502 at FI MU on December 14th at 10 AM (CET) [or on Zoom.]

Week 14 – [Michal Štefánik, Marek Kadlčík]: EMNLP breaking news

Join us in room A502 at FI MU on December 21st at 10 AM (CET) [or on

EMNLP is among the most impactful NLP conferences (A* in CORE ranking). In this presentation, we will report on the acceptance of our presented paper in the main track. We will comment on the main research direction in empirical NLP and show you the highlights of the research that struck our attention during the event.

Chapter contains:
8
Image
1
Study text
Teacher recommends to study from 7/12/2023 to 22/12/2023.

Tips for readings, discussions, and presentation preparations:

  1. Top2Vec towardsdatascience.com/top2vec-new-way-of-topic-modelling 
  2. How to speak by Patrick Winston (youtube video)

Žákovi, který se hrozil chyb, Mistr řekl: "Ti, kdo nedělají chyby, chybují nejvíc ze všech – nepokoušejí se o nic nového." Anthony de Mello: O cestě

To a student who was in danger, the Master said: "Those who do not make mistakes most of all – they do not try anything new." Anthony de Mello

Previous