Interaktivní osnova
👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization
[Denisa Šrámková]: Interpretability of Binary Protein Knot Classification 19. 10. 2023
Abstract
Proteins with knotted backbones are an exceedingly rare phenomenon, and the mechanisms governing knot formation and functional implications remain poorly understood. We fine-tuned the ProtBert-BFD Transformer to classify proteins as either knotted or unknotted solely from their primary structure. As a training set, we used a collection of proteins from selected protein families whose 3D structures were predicted by AlphaFold2. The knotted status of proteins was assigned using Topoly (polymer topology analysis tool).
While the model exhibits high accuracy (98%) in predicting a protein's knot status, it does not directly provide a biological explanation or pinpoint which regions of the protein contribute to knot formation. To uncover this phenomenon, we propose a patching technique: a sliding window (patch) replacing part of the sequence and therefore testing the importance of this part for the knot formation. We tested this method on proteins from the SPOUT family and found that the most influential patches reside within the C-terminal portion of the knot core, which is also responsible for substrate binding.
Slides
Lecture recording
Readings
- Maciej Sikora, , Klimentová, Agata P. Perlinska, Mai Lan Nguyen, Korpacz, , Pawel Rubach, Petr Šimeček, Joanna I. Sulkowska: Knot or Not? Sequence-Based Identification of Knotted Proteins With Machine Learning https://www.biorxiv.org/content/10.1101/2023.09.06.556468v1