IV107 Bioinformatika I Přednáška 4 IV107 Bioinformatika I Přednáška 5 Katedra informačních technologií Masarykova Univerzita Brno Jaro 2019 Předchozí týden ► Struktura genu ► prokaryotického ► eukaryotického ► Porovnání sekvencí ► globální (Needleman-Wunsch) ► semi-globální ► lokální (Smith-Waterman) Outline IV107 Bioinformatika I Prednäska 4 Typy dat v databázích IV107 Bioinformatika I -Přednáška 4 9% 8% ] nucleotide sequence | RNA sequence/structure ^] microarray/gene expression | molecular biology | nonhurnan genomes □ human/vertebrate genomes human genes/diseases □ protein sequences | proteomics data ■ structural data □ pathways/interactions □ organelle data ■ plant data ■ immunological data http://www.agr.kuleuven.ac.be/vakk Nárůst databáze GenBank IV107 Bioinformatika I Přednáška 4 GenBank Genetic Sequence Data Bank August 2009 NCBI-GenBank Flat File Release 164.0 National Center for Biotechnology Information ► 106533156756 bp ► 108431692 sekv. ftp ://http://www.ncbi. nlm.nih.gov/genbank/ GenBank NCBI-GenBank Flat File Release 232.0 June 15 2019 Distribution Release Notes ► 329 835 282 370 bp ► 213 383 758sekv. ftp://ftp.ncbi.nlm.nih.gov/genbank/ □ S Součásti databáze GenBank IV107 Bioinformatika I -Přednáška 4 ► INV, VRT, MAM, PLN, PRI, ROD, BCT, VRL ► PAT (Patents) ► HTGS (High Throughput Genomic Sequences) ► GSS (Genome Survey Sequences) ► ETS (Expressed Sequence Tags) ► STS (Sequence Tagged Sites) ► WGS (Whole Genome Shotgun) Příklad záznamu v databázi GenBank IV107 Bioinformatika I -Přednáška 4 LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM SCU49845 5028 bp DNA Saccharomyces cerevisiae TCPl-beta gene, Ax 12 p (AXL2) and Rev7p (REV7) genes, complete U49845 U49845.1 GI:1293613 Saccharomyces cerevisiae (baker's yeast) Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomy Saccharomycetes; Saccharomycetales; Saccharomycetaceae; S Vyhledávání v sekvenčních databázích IV107 Bioinformatika I -Přednáška 4 ► textové (klíčová slova) ► sekvenční (BLAST) GenBank IV107 Bioinformatika I -Prednäska 4 Uniprot September, 2019 UniprotKB release 2019.08 The UniProt consortium: European Bioinformatics Institute (EBI), Swiss Institute of Bioinformatics (SIB) and Protein Information Resource (PIR) ► 560,823 (SwissProt) ► 171,501,488 (TrEMBL) ► 37,597,356 (UniRef50) Release 2019_08 of 18-Sep-19 of UniProtKB/Swiss-Prot contains 560823 sequence entries, comprising 201585439 amino acids abstracted from 268349 references. http://expasy.org/sprot/ Príklad záznamu v databázi UniProt IV107 Bioinformatika I Přednáška 4 Entry rsamo Primary atcössion r.mncGr LM07HUMAN Q8WWII Secondary accession numbers Integrated into Swiss-Prot on Sequence was last modified on ;015 4£2 09&34 6 Q9UKC1 Q9yQH5 C9Y6AT Match S I, 2004 March IS, (Sequence version 21 nnotatior.s were last modified on .July 25, 2C06 (Entry version 39) LIM domain only protein 7 ane and ocigin of the protein Protein name Synor.yjns Sene name From taxonomy LOM P ■■'bor-: only pro ta i n 20 Name: Synonyms: FBX2G, FBJÍ02 Ů, KIAA0Ů5Ě Homo sapiens (Human) | [ XääíS.: 9606} j äXÁ^yJXJi.: tUSSS&BaJ £o£&áá£a-- S£ě&U&&S Ve r te b r a t a ,* Ejj£ si gss tQüliJ Mamma i. Primates; «Ott i r. i da e; Homo. References [ 1) NUCLEOTIDE SEQUENCE | MRNAJ (ISOFOHM J), AND TISSUE SPECIFICITY, TISSCHB-Srain, arid Peripheral blood leukocyte; 1dOI=1O.1OOVs0Q439-OO1-Ö64€-6; £ubJtJejä=ll93S3l6 I KCBI, SEI, Israel, Japan] Rozer.blum H.. Vahteristo P., Sandberg T., oerqthorsspn J.T., Svriakoski K.. Weaver D., Har^idssor. Kv. Johar.nsdottir H.K., Vehmanen P., Niqam S., Golberqer N., Robbins C, Pai< E., Dutra A., Gillar.der Stephan D.A., Bailey-Wilson J., $ ■ -«- , MißR . £3 ES, Kay^pjndeai 0.-P.; "A genomic map of a 6-Mb region at 13q2i-q22 implicated in cancer development: identification and characterization of candidate genes. Him. Genet. 110:111-121(20021. http ://www. u n i prot. o rg/ Příklad záznamu v databázi U ni Prot Key From To Length Description FTId CHAIN 1 1683 1683 LIM domain only protein 7. PROJ000075824 DOMAIN 54 168 115 CH. DOMAIN 1042 1128 87 PDZ. DOMAIN 1612 1678 67 LIM zinc-binding. 10 20 30 40 50 60 MKKIRICHIF TFYSWMSYDV LFQRTELGAL EIWRQLICAH VCICVGWLYL RDRVCSKKDI 70 80 90 100 110 120 ILRTEQNSGR TILIKAVTEK NFETKDFRAS LENGVLLCDL INKLKPGVIK KINRLSTPIÄ 130 140 150 160 170 180 GLDNINVFLK ACEQIGLKEA QLFHPGDLQD LSNRVTVKQE ETDRRVKNVL ITLYWLGRKA PDB IV107 Bioinformatika I -Přednáška 4 CO yy-. PftOTEIK DATA BANK An Information Portal to Biological MacromoJecular Structures I FOB Slatistics® Homr| Search]' Remits) QuortosH |~ «1 St»iirtur* Jfln| Iii Web FogtHdUj 1 unrekaStp-jctL.« ■ n^wni¥(i-ioof3i> ■ Results ID List ■ KaFirte [his bearüi ■ 1 -'=d: NMR SO «fudvrtl Cl*t*ift<:»titit Strum i rj i Protein Mol. Idr I MoJrt irlc; C Terminal Lim Duni.n Prahun 1 Frjumcul I4Jm Con-.j.- Qín, K.H.. Naga^hlma, 1.. Hayashí, ť,. Yokůyama, S, 0 ix1k l^l íi IS SolulEnn ilmrtiire of 1 IM rionialn In L.im-pi ffriridiríiííci fhl*»t B»ti; : 4'Nav-200 5 í«H. Hatfrodi NMR 20 Stnjm.nl CittiťfietťtH Metal Binding Protein Compound Moli Id. t Mrietwtai SkilaUl WueiI* Llrfl ftnrtln 3 Fragment: Um Dom ■ in Autťtr* He, F^, Muto, T\, Inoue, M.. Kigaws, T,. ShiroUÜU, M., Terato, t,. tokoyama, n il q Rj 83 Solution structure of UM tf omain In Four snrt a half lim domains protein 31 Ctwr*tiTH*ici Rrl*J*o Oat*- 14-Nav'200i trp.Hrlhodi NMR 20 StrurivfM Cir«4 Nl 06) 1HEW 73 IV107 Bioinformatika I Prednáška 4 Zaznam v PDB IV107 Bioinformatika I -Prednaska 4 HELIX 1 A ARG 5 HIS 15 1 1HEW 75 HELIX 2 B LEU 25 GLU 35 1 1HEW 76 HELIX 3 C CYS 8 0 LEU 84 5 1HEW 77 HELIX 4 D THR 89 ILE 98 1 1HEW 78 HELIX 5 E VAL 10 9 ASN 113 1 1HEW 79 SHEET 1 SI 2 LYS 1 PHE 3 0 1HEW 80 SHEET 2 SI 2 PHE 38 THR 40 -1 N THR 40 0 LYS 1 1HEW 81 SHEET 1 S2 3 ALA 42 ASN 46 0 1HEW 82 SHEET 2 S2 3 SER 50 GLY 54 -1 0 SER 50 N ASN 46 1HEW 83 SHEET 3 S2 3 GLN 57 SER 60 -1 0 ILE 58 N TYR 53 1HEW 84 TURN 1 Tl MET 12 HIS 15 TYPE III 1HEW 85 TURN 2 T2 LYS 13 GLY 16 TYPE I 1HEW 86 TURN 3 T3 LEU 17 TYR 20 TYPE II 1HEW 87 TURN 4 T4 ASN 19 GLY 22 DISTORTED TYPE II 1HEW 88 TURN 5 T5 TYR 2 0 TYR 23 TYPE I' 1HEW 89 TURN 6 T6 SER 2 4 ASN 27 TYPE III 1HEW 90 TURN 7 T7 LEU 25 TRP 28 TYPE III 1HEW 91 TURN 8 T8 SER 3 6 ASN 39 TYPE III' 1HEW 92 Záznam v PDB IV107 Bioinformatika I -Přednáška 4 CRYSTl 78 . .860 78 . 860 38 . 250 90 . 00 90 .00 90.00 P 43 21 2 8 1HEW 113 ORIGXl 1.000000 0 . 000000 0 . 000000 0 . 00000 1HEW 114 ORIGX2 0 .000000 1. 000000 0 . 000000 0 . 00000 1HEW 115 ORIGX3 0.000000 0 . 000000 1 . 000000 0 . 00000 1HEW 116 SCALEl 0.012681 0 . 000000 0 . 000000 0 . 00000 1HEW 117 SCALE2 0.000000 0 . 012681 0 . 000000 0 . 00000 1HEW 118 SCALE3 0.000000 0 . 000000 0 . 026144 0 . 00000 1HEW 119 ATOM 1 N LYS 1 3 .398 9 . 981 10 . 408 1 .00 30 . 48 1HEW 120 ATOM 2 CA LYS 1 2 . 459 10 .365 9 .364 1 .00 28 . 03 1HEW 121 ATOM 3 C LYS 1 2 . 458 11 .880 9 . 149 1 .00 21. 93 1HEW 122 ATOM 4 O LYS 1 2 .481 12 . 672 10 . 100 1 .00 14 . 10 1HEW 123 ATOM 5 CB LYS 1 1 . 026 9 . 935 9 . 695 1 .00 30 . 54 1HEW 124 ATOM 6 CG LYS 1 0 . 028 10 .169 8 . 558 1 .00 37 . 93 1HEW 125 ATOM 7 CD LYS 1 -1 .415 10 .089 9 . 048 1 .00 33. 23 1HEW 126 ATOM 8 CE LYS 1 -2 .357 10 . 822 8 . 082 1 .00 32 . 17 1HEW 127 ATOM 9 NZ LYS 1 -3 . 661 10 .090 8 . 025 1 .00 31. 92 1HEW 128 ATOM 10 N VAL 2 2 . 429 12 .232 7 . 880 1 .00 17 . 30 1HEW 129 ATOM 11 CA VAL 2 2 .395 13 . 653 7 .465 1 .00 14 . 47 1HEW 130 ATOM 12 C VAL 2 0 . 977 13 .868 6 . 903 1 .00 17 . 58 1HEW 131 ATOM 13 O VAL 2 0 . 642 13 .368 5 . 826 1 .00 32 . 65 1HEW 132 ATOM 14 CB VAL 2 3 .533 14 .012 6 . 536 1 .00 22 . 88 1HEW 133 Gene Ontology IV107 Bioinformatika I -Prednáška 4 ► Funkce genů a proteinů zjišfujeme experimentálně ► Slovní popis není jednoznačný ► syntéza proteinů ► syntéza polypeptidů ► translace ► aktivita ribozomů ► Ontológie je způsob jak do používaných termínů vnést systém Gene Ontology IV107 Bioinformatika I -Přednáška 4 biological process physiological process cellular process cellular physiological process is_^/ \^ a cell cycle cell division M phase meiotic cell cycle se meiotii J\ ^/part_of cytokinesis M phase of meiotic cell cycle _ is a Gene Ontology ► Molekulární proces ► katalytická aktivita ► transport ► intermolekulární vazba ► Biologický proces ► přenos signálu ► aktivace imunitního sytému ► regulace genů ► Buněčná složka ► buněčné jádro ► plazmatická membrána Gene Ontology - kódy zdroje dat IV107 Bioinformatika I -Přednáška 4 Curator-assigned Evidence Codes ► Experimental Evidence Codes ► IDA: Inferred from Direct Assay ► IPI: Inferred from Physical Interaction ► IMP: Inferred from Mutant Phenotype ► IGI: Inferred from Genetic Interaction ► IEP: Inferred from Expression Pattern ► Computational Analysis Evidence Codes ► ISS: Inferred from Sequence or Structural Similarity ► IGC: Inferred from Genomic Context ► RCA: inferred from Reviewed Computational Analysis ► Author Statement Evidence Codes ► TAS: Traceable Author Statement ► NAS: Non-traceable Author Statement ► Curator Statement Evidence Codes ► IC: Inferred by Curator ► ND: No biological Data available ► Automatically-assigned Evidence Codes ► IEA: Inferred from Electronic Annotation ► Obsolete Evidence Codes Metabolické dráhy IV107 Bioinformatika I -Přednáška 4 Interaction i Association types Interaction (stimulatory) iHtelMton (inhti(ůly) Interaction Ait Hiallw (banKtipKírnil actwbon) Signalling Modulus h IFiW|> h IFN-r a b (iV\ai JP| \\ \ i 0 Q IRF3 module lKŕié é® Cnemoklne TolHike receplor D P TLR module □i CCl'j 'N" ©> @> 0 © T TilFN-ll IFN-a/p module bb^ á IFN-y Chemokine i module module (/JiW i 11 STATI module m'jg □jrj]rjirjjnjrjjrj]rjj[jjrj][j]|3jrjj rjrjrjQrj) ®@®®&®&&®éé>€>®é>é>é>®> llliť liTLIOCj. COrYíŕíňŕnl 1 "■ , ii.iiiMi|Wt.......tav Component! im......m h...........M ľ' n h*r.r bdfl 111....... fCcmpwwj; J on sbmuialory interacllon http ://www. g e n o m e .j p/keg g/ UCSC Genome Browser IV107 Bioinformatika I Přednáška 4 ,592 - UCSC Genome Browser vl34 - Konqueror Location Edit View Bookmarks lools Settings Help V id=7335Q821&knownGene=full Ü Human chr5:70,256,524-70,28... Home enomes "ables Gene Sorter PGR DNA Convert PDF/PS Help UCSC Genome Browser on Human Mar. 2006 Assembly zoom out move <<< zoom in [ L5lí I 3* I 10k | base lux 1.5x 3x josition/search fchr5:70,256,524-70,284,592 11 jump | clea~| size 28,069 bp. | configure chrs {ql3,2) ■ B3-1 _l J cnrs: 70260000I 70265000I 702700901 70275000I 70280000I STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps STS Markers UCSC Known Genes Eased on UniProt, RefSeq, and GenBank mRNFi Hrfrt-f. HI. C ( ( C ( ( ( C( ( C ( ( C ( ( (-H-H-H-i-H-H-t-j C C ( C H ( C (.( It (t (((((( C (-t-j ( C C ( ( ( ((-m-t-H-HHH-t-H-t-{-H-& J HK130633 SMfl3ii)))))) SMN1 SMN2 ^)))))) SMN1 4 1)))) SMN2 i---- — ----- ) j))))))))])))l I ))))an 1) I'll RefSeq Genes Human mRNfl Sb1 i ced ESTs RefSeq Genes H-h H—H~ )))))) I) — — "H s -f- Human mRNňs from GenBank —-1-1—H- H-h H-H- Human ESTs mat Have Been Spliced I I II III II I I Vertebrate Multiz Alignment & Conservation (17 SpeciesJ Conservat ion mouse rat rabb it dog armadi1lo e lepnant opossum I.. I li_i.JlU.Jl http://genome,ucsc,edu/cgi-bin/hgc?hgsid=733,,,523&r=70284592&db=hgl8£ipix=620 ÍÍJC S Ensembl Genome Browser IV107 Bioinformatika I Prednäska 4 Chr. Stand DNA(cortigs) Ivtarkers ErEembl Genes Vega Havana Genes rcRNA Genes est Genes Gere legend I D9S736 □9316« I II □9S1749 D9S1607 D9S016 II lllll □9S2G6Ü □9S2143 □9S2137 D9S974 D9S942 D9S1748 D9S1G04 D8S9S8 CQS160 D95173E MTAP LC9orf53 LCDKN2A LCDKN2 rMeiged Known Proteincoding [Vega Havana Putative Processed trarEcript IVega Havana Krown Protein codirg RNAPseudogene (Nowi) Iest gene I II D9S1&14 D9: 1083 CÖS.187Ü D9ES75 I I I D9S96S D9S790 □9S97G □MRTA1 Havana Known Proteincoding Vega Havana Processed pseudogene G Browse IV107 Bioinformatika I Přednáška 4 GBrowse view of the Pto DC3000 region near PSPTO_1375 1111111 1507k 1508k min 1510k 1509k pseu 1511k 1512k 1513k 1514k 1515k 1516k 1517k 1518k. fill Genes uith links to pseudononas.con hopNl hopAňl-1 hrpWl PSPT0_1371 shell hopMl fill proteins uith links to NCBI type III effector HopNl type III helper protein HrpWl H 28868578 type III chaperone ShcM 28868582 type III effector HopMl 28868581 conserved effector locus protein 28868579 type HI effector HopAAl-1 28868580 28868583 Putative orthologs in Pseudomonas aeruginosa PR01 Putative orthologs in Pseudononas aeruginosa PH14 Putative orthologs in Pseudononas fluorescens Pf-5 Putative orthologs in Pseudononas putida Putative orthologs in Pseudononas syringae b728a Psyr_1185 shcE avrEl type III chaperone ShcE 28868584 type III effector protein ftvrEl 28868585 Psyr_1182 Protein of unknown function UPF0187 Psyr_1184 Psyr_1188 Pectate lyase conserved effector locus protein avirulence protein AvrE(Pto) Psyr_1186_ type III effector HopPtoM Psyr_1187 DspFAvrF Putative orthologs in Pseudononas syringae pv. phaseolicola PSPPH_1264 type III helper protein HrpWl PSPPH.1265 PSPPH_1267 type III chaperone protein AvrF PSPPH_1268 type III chaperone protein Shell Putative orthologs in Pseudononas entonophila L4S fill COGs uith links to NCBI COG database C0G3781 Function unknown type III effector flvrEl Argo IV107 Bioinformatika I Prednaska 4 r £ AllM File Track Edit Stint View Emm ftulf-rs Anji^c Urtr tooknurfts Window Help rMl^KTJirjL JTCAEAEA'DGU UJAhJAJUAA^TUTAAECJU. HA. IT. LAJTU. JIM. II. LT..TJC r.WKCFZtt,-rirjtlt>ii.r.r,-j-rmrrr,:ir11n i h;j i il.hhi'-juth MCCCTOT^AATCCCJUJ*CCCMiCTTir;TCCAAATAT TE ATTICA I IT. I CCTCCT^*TLLjJJ.r .1- ■.: .. .. nam - k^eju-jut. hucVnlnlr C 1Z7/M1I1 iu.ttqc: ■.......MniMi «rf AJLKT^tTTOGJUL L . . JMTCJlTJJJJJJ,TAJ■ fJJ,TOTJWUXCtfc&TnST>J^TTTT AAhT^ATTVATCUTJLh ETTTTtLL^TJ^AjLTCAAAJU-iATAT^^ n. n Librl DC EbriBn ■ + - birch DecodeMe Browser IV107 Bioinformatika I -Prednäska 4 Golden Helix Genome Browser IV107 Bioinformatika I Prednäska 4 jit Ptot of Column C« r/Trend -teglO P from Association T«tt (Additiv* ModoL} f31 21 9 0L Cwr/Trend-log WP-vak* E ^ Corr/lrend 4oglOP:dr... 0 iSü Gorr/Irend -toglTJ P: dr... _I_i r L _K_■_i__■_Li. Hem SUKBfhne fiiet Ur* ■ - i !_□ IhI o 13 * Dsta Console User Arrotabore Cwr/lrend -tag !TJ PL dt - 3:0.K0M9 Postion: drfi:6*&ä(j459 UCSC-EhMtiM-NCBt-HwlA» Corr/Trend -log 10 P»value Cwr/lWiPc -tofllOP:dr»I ■ CwT/tonl -toglOP: dir « £ Corr/lrenti -toglTJP:dw - 3 ■ Cwr/Trend -hsftlOS*: ehr - 4 ChrB:54,B5-31M Chre 191,421 SM dirBM27.99M ~rn Khgvjn Gene Annotation !! IIIIII Ul Hl! MI II tili Ii II III El Sil II 11111,11111111 [II Fl II IGB IV107 Bioinformatika I Prednäska 4 Chromosome 1 (Arabidopsis thaliana TAIRS) - Integrated Genome Browser 5.5 File Edit View Bookmarks Tools Help I 65.3 MB/ 1,016.1MB ' JGI Browser IV107 Bioinformatika I -Přednáška 4 Position - ■ « Scaffold » scaffold 1:1-100000 A|>|>ly ■ Zoom - 1.5x 3x líx AA General :p Permalink Add custom tracks + Strand Flip | i Open : Close Toolbar I Size: 100000 Fe.ituie: JAM UserModels:522 m Sl+l HiJiJ ■ d±l+ljj UBEdLtliJiJ El .iliJ.U estExtDG_fgenesh_newKGs_kg iis]=ILil±liJ EuGene Base Position GC Content Scaffold scaffold_l GeneCatalog User Models □1+1 ±l±liJ : ±ld±l±lil □1+1 TjjJiJ CSUSM_unigenes Blát fgenesh_newKGs_kg estExtDG_fgenesh_newKGs_pm .Hj+1. .I+Jiljj □1+1 TjiJ-Ll day7_ESTs Blat lOoOO! 20 WO I 3O0Ů0I «0001 6OO00I 700001 B0OD0! 900001 1ŮŮOŮŮ 65.00 ^ Contigs in Scaffolds f NI Ift III JIN Jib TI W ttTTTttltl III ■■ III llll Ill 111 IB |ll MM III 'si I II ■ II iL u III III III III ii III III III i 39161 transcripts in catalog per Fri Jan 30 17:18:22 2009, 750 manually curated ► I* H li H IHM +11 Hll IIMHHIP User Models H II III llll: EST-extended Fgenesh cDNA-based H EuGene models HI I models 1 1 HI < CSUSMjjnigenes! Blat ► UHU l'UHIBi- ■■ II i IB 11nHI■ - - i nil hi Fgenesh cDNA-based models 1 day i ÉST-extended Fgenesh homo II 7 post-inoculation ESTs Blat mil ogy-base;d model si IIU1U11 III III III H HI !► ► » I II I H H Ilk \ irrr RIKEN Genome Browser IV107 Bioinformatika I Prednäska 4 i s *Go to Search DJjjc FT«Iii(ar cui?«i^ inrwvul iQlLkii FAnfTOUd SkL>P-,*rjrj - © Mo-t* WuLrl Qimm ß "Vrt L»W* T'**tt^ ißUthiM Enu-rbi Vt£i(ici I ß U^M GTOI» 10 Uö-mm RttSaa ON* B 1 gtl«^ Inn«* V«giO«i« IgUw L"!mjür-r WUI-I ß U*,W IK) Gtnt ,lf0^^..,... « Fflur by I >ty*rtri M***-Syn*3l*i1 Unknot.- SWSSfßO r [RS i MOUSE !l^S«cj>«rttf* HP.« □ [51 GenoDive IV107 Bioinformatika I Prednáška 4 Příště IV107 Bioinformatika I -Přednáška 4 Analýza proteinových sekvencí, strukturních a funkčních dat Outline IV107 Bioinformatika I Prednäska 4 For Further Reading IV107 Bioinformatika I -Prednaska 4 X