Seminární skupina 02 předmětu Laboratoř elektronických a multimediálních aplikací

[Vlastimil Martinek]: Deep learning for genomic data 11. 11. 2021


Abstract

Biological sequences have been traditionally using rule-based methods and handcrafted features. Following the success of deep learning in natural language processing, the field of genomics is now adopting many similar techniques using deep learning. I will introduce various types of biological sequences, namely DNA, RNA, and Proteins. Further, we will talk about papers that successfully applied NLP methods to these sequences. These methods include ULMFit and transformer architectures. Finally, I will introduce the topics we are working on in the CEITEC Bioinformatics lab.

Seminář 29. 4. 2021 10:00
2021-04-29 lecture by Vlasta Martinek

Readings

Strodthoff, N., Wagner, P., Wenzel, M., & Samek, W. (2019). UDSMProt: Universal Deep Sequence Models for Protein Classification. Cold Spring Harbor Laboratory. https://doi.org/10.1101/704874

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118