BIOIN FORM ATIKA DATABÁZE PROTEINOVÝCH UniProtKB SEKVENCÍ SWISS-PROT: kvalitní ruční anotace TrEMBL: automatická anotace (TrEMBL ->• SWISS-PROT) DATABÁZE DNA SEKVENCÍ EMBL-Bank : Europe (EMBL-EBI), přístup z ENA (European Nucleotide Archive) GenBank: USA, vyhledávač ENTREZ DDBJ Japan, vyhledávač ARSA, D B Get STRUKTURNÍ DATABÁZE PDB PDBsum: shrnutí a analýzy SCOP: fold-superfamily-family CATH: class-architecture-topology-homology PÁROVÉ PRILOŽENÍ (PAIRWISE ALIGNMENT) DAGTKVSAEQ I L DAGTKECHQ I L DAGTKVSAEQIL DAGTKECH - Q I L score=5, gap=0 score=8, gap=l DAGTKVSAE- -QIL D A G T K - - - E C H Q I L score=9, gap=5 BLOSUM62 C STAGPJDEQN H R K M I L V W Y F C 9 C s T A G P -1 -1 0 -3 -3 4 1 5 1 G 4 0-206 -1 -1 -1 -2 7 S T A G P D E Q N -3 -4 -3 -3 0 -1 -2 -1 -1 0 -1 -1 -2 -1 0 -1 -1 -2 -1 1 0-2 0-2 6 2 5 0 2 5 10 0 6 D E Q N H R K -3 -3 -3 -1 -2 -2 -2 -2 ■1 -1 -1 -2 -2 0 -1 -1 -2 -1 -10 0 1 -2010 -1110 8 0 5 -12 5 H R K H I L V -1 -1 -1 -1 -1 -1 -1 -3 -2 -2 -1 -1 -4 -3 -2 -1 -1 -4 -3 -2 0 0 -3 -2 -3 -2 0 -2 -3 -3 -3 -3 -4 -3 -2 -3 -3 -2 -2 -3 -2 -1 -1 -3 -3 -3 -3 -2 -2 -3 -3 -2 5 1 4 2 2 4 13 14 M I L V W Y F -2 -2 -2 -3 -2 -3 -2 -4 -2 -2 -2 -3 -3 -2 -2 -2 -3 -4 -4 -3 -2 -4 -3 -2 -1 -2 -3 -3 -3 -3 -2 -3 -3 2 -2 -2 -1 -3 -3 -1 -3 -2 -3 -1 -1 -1 -1 Q G G -1 11 2 7 1 3 6 W Y F C S T A G P D E Q N H R K H I L V W Y F PÁROVÉ PROHLEDÁVÁNÍ DATABÁZÍ "Fast local similarity algorithms" • FastA • BLAST MGIKQYSQEELKEMALVEIAHELFEEHKKPVPFQELLNEIASLLGVKKEELGDRIAQFYT DLNIDGRFLALSDQTWGLRSWYPYDQLDEETQPTVKAKKKKAKKAVEEDLDLDEFEEIDE DDLDLDEVEEELDLEADDFDEEDLDEDDDDLEIEEDIIDEDDEDYDDEEEEIK ttgggtatca aacaatattc acaggaagag ctaaaggaaa tggctttagt tgaaatcgct cacgaattat ttgaagaaca taaaaaacca gttccttttc aggagctttt aaatgaaatc gcatctttgc tcggcgtgaa aaaagaagag cttggagacc gcattgctca attttataca gatttaaaca ttgacggccg cttcctggcg ctttctgacc agacgtgggg gcttcgcagc tggtatcctt atgatcagct tgatgaagaa actcagccga cagtcaaggc gaaaaagaaa aaagcgaaga aagcagtcga agaagatctt gatcttgacg agtttgaaga gatcgacgaa gacgaccttg atttggatga agttgaggaa gaactcgatc ttgaagccga cgattttgac gaagaagatc ttgatgaaga cgacgatgat cttgagatcg aagaagatat tattgatgaa gatgatgaag actatgatga tgaagaagag gaaattaaat ag VÍCENÁSOBNÉ PRILOŽENÍ (MSA = MULTIPLE SEQUENCE ALIGNMEN Postupné (progresivní) algoritmy • CLUSTAL: párové priložení + tvorba stromu príbuznosti >1SEM11 Chains A, B ISEM-5ICaenorhabditis elegans (6239) ETKFVQALFDFNPQESGELAFKRGDVITLINKDDPNWWEGQLNNRRGIFPSNYVCPYN >lGL51|Chain AITYROSINE-PROTEIN KINASE TEClMUS MUSCULUS (10090) GSEIVVAMYDFQATEAHDLRLERGQEYIILEKNDLHWWRARDKYGSEGYIPSNYVTGKKSNNLDQY >P42682SH3 DERIQVKALYDFLPREPGNLALKRAEEYLILERCDPHWWKARDRFGNEGLIPSNYVTENRL PŘEDPOVÍDÁNÍ STRUKTURY ZE SEKVENCE • Sekundárni štruktúra: PSI-PRED • Fold: threading • Terciami štruktúra z homolognľ štruktúry: homolognf modelovaní • Terciárnf štruktúra z MSA: AlphaFold.2 testovací sekvence: PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK ALPHAFOLD2 Article Highly accurate protein structure prediction with AlphaFold https://doi.org/10.1038/s41586-021-03819-2 Received: 11 May 2021 Accepted: 12 July 2021 Published online: 15 July 2021 Open access Check for updates John Jumper1,4^, Richard Evans1,4, Alexander Pritzel1,4, Tim Green1,4, Michael Figurnov1,4, Olaf Ronneberger1,4, Kathryn Tunyasuvunakool1,4, Russ Bates1,4, Augustin Žídek1,4, Anna Potapenko1,4, Alex Bridgland1,4, Clemens Meyer1,4, Simon A. A. Kohl1,4, Andrew J. Ballard1,4, Andrew Cowie1,4, Bernardino Romera-Paredes1,4, Stanislav Nikolov1,4, Rishub Jain1,4, Jonas Adler1, Trevor Back1, Stig Petersen1, David Reiman1, Ellen Clancy1, Michal Zielinski1, Martin Steinegger2,3, Michalina Pacholska1, Tamas Berghammer1, Sebastian Bodenstein1, David Silver1, Oriol Vinyals1, Andrew W. Senior1, Koray Kavukcuoglu1, Pushmeet Kohli1 & Demis Hassabis14H Nature | Vol 596 | 26 August 2021 | 583 SRC VKLGQGCFGEV H CK KKLGA QF EV ABL HKLGGGQYGEV LCK ERLGAGQFGEV SLK GELGDGAFGKV SBK RELGKGTYGKV kinasa SRC, kinasa SLK GXGXXG - smyčka +4 SRC VKLGQGCFGEV HCK KKL A QF3EV ABL HKLGGGQYGEV LCK ERLGAGQFGEV SLK GELGDGAFGKV SBK RELGKGTYGKV kinasa SRC kinasa SLK E-K K- • E Homo sapiens S us ser of3 Equus ca ba 11 us Dugong dugon Balaena mysticetus Physeter macrocephalus človek prase I o --- kun moron indický velryba grónská vorvaří obrovský G: malá 24 64 116 HGQEV HGAT QSKH HGQEV HGNT QSKH HGQEV HGTV HSKH HGLEV HGTT QSKH HGQDV HGNT HSRH HGQDI HGVT HSRH 24 64 116 Homo sapiens S us ser of3 Equus cabal I us Dugong dugon Balaena mysticetus človek prase I o --- kun moron indický velryba grónská HGQEV HGAT QSKH HGQEV HGNT QSKH HGQEV HGTV HSKH HGLEV HGTT QSKH HGQDV HGNT HSRH Physeter macrocephalus vorvaří obrovský HGQDI HGVT HSRH E: d I o u h á 0 ••• ©K: krátká D: krátká ©•••© R: dlouhá krátká-D e ••• ® R - d I o u h á dlouhá-E0---e K-krátká flTTTttt Input sequence ( ft TTTttt ) Genetic database s,search > -(Pairing)- MSA h|J J J J. J ,--A — A — A J J J J - 4 4. Columnwise gated self-attention Transition Outer product mean 4> Triangle update using outgoing edges <±> Triangle update using incoming edges Triangle self-attention around starting node <±> Triangle self-attention around ending node <±> Transition Ä i MSA 2^ representation @ Single repr. (r,c) Backbone frames {r, 3x3) and (r,3) (initially all at the origin) 8 blocks (shared weights) r i IPA module Predict X angles and compute all atom positions 1 (jj Single repr. (r,c) Predict relative rotations and translations 2. Backbone frames (r, 3x3) and (r,3) J Loss function ->■ , —*■ —*■ ro OUlOUlOUlO Energy ->■ i —*■ —*■ ro OOlOUlOUlO