👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

[Denisa Šrámková]: Interpretability of Binary Protein Knot Classification 19. 10. 2023

Abstract

Proteins with knotted backbones are an exceedingly rare phenomenon, and the mechanisms governing knot formation and functional implications remain poorly understood. We fine-tuned the ProtBert-BFD Transformer to classify proteins as either knotted or unknotted solely from their primary structure. As a training set, we used a collection of proteins from selected protein families whose 3D structures were predicted by AlphaFold2. The knotted status of proteins was assigned using Topoly (polymer topology analysis tool).

While the model exhibits high accuracy (98%) in predicting a protein's knot status, it does not directly provide a biological explanation or pinpoint which regions of the protein contribute to knot formation. To uncover this phenomenon, we propose a patching technique: a sliding window (patch) replacing part of the sequence and therefore testing the importance of this part for the knot formation. We tested this method on proteins from the SPOUT family and found that the most influential patches reside within the C-terminal portion of the knot core, which is also responsible for substrate binding.

Slides

Lecture recording

Readings

  1. Denisa Šrámková, Maciej Sikora, Dawid Uchal, Eva Klimentová, Agata P. Perlinska, Mai Lan Nguyen, Marta Korpacz, Roksana Malinowska, Pawel Rubach, Petr Šimeček, Joanna I. Sulkowska: Knot or Not? Sequence-Based Identification of Knotted Proteins With Machine Learning