IV107 Bioinformatika I Přednáška 5 Katedra informačních technologií Masarykova Univerzita Brno Jaro 2011 faculty-logo Předchozí týden ► Struktura genu - prokaryotického - eukaryotického ► Porovnání sekvencí - globální (Needleman-Wunsch) *■ semi-globální - lokální (Smith-Waterman) Outline Bioinformaticke databaze Databaze GenBank Databaze UniProt Protein Data Bank Gene Ontology KEGG Analýza proteinových sekvencí, strukturních a funkcních dat faculty-logo Typy dat v databázích Nárůst databáze GenBank faculty-logo GenBank Genetic Sequence Data Bank August 2009 NCBI-GenBank Flat File Release 164.0 National Center for Biotechnology Information ► 106533156756 bp ► 108431692 sekv. ftp://http://www.ncbi.nlm.nih.gov/genbank/ GenBank Whole Genome Shotgun sequences August 2009 National Center for Biotechnology Information ► 148165117763 bp ► 48443067 sekv. ftp://http://www.ncbi.nlm.nih.gov/genbank/ Součásti databáze GenBank ► INV, VRT, MAM, PLN, PRI, ROD, BCT, VRL ► PAT (Patents) ► HTGS (High Throughput Genomic Sequences) ► GSS (Genome Survey Sequences) ► ETS (Expressed Sequence Tags) ► STS (Sequence Tagged Sites) ► WGS (Whole Genome Shotgun) Příklad záznamu v databázi GenBank LOCUS SCU49845 5028 bp DNA DEFINITION Saccharomyces cerevisiae TCPl-beta gene, Axl2p (AXL2) and Rev7p (REV7) genes, complete ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomy Saccharomycetes; Saccharomycetales; Saccharomycetaceae; S faculty-logo Vyhledávání v sekvenčních databázích ► textové (klíčová slova) ► sekvenční (BLAST) GenBank Uniprot March 8, 2011 UniprotKB release 2011 _03 The UniProt consortium: European Bioinformatics Institute (EBI), Swiss Institute of Bioinformatics (SIB) and Protein Information Resource (PIR) 14,423,061 záznamů - 525,997 (SwissProt) - 13,897,064 (TrEMBL) - 3,785,756 (UniRef50) ► 4651472673 AAs http://expasy.org/sprot/ Příklad zaznamu v databázi UniProt LH07_HUHAH QSWWIl 0154É2 QÍUKCl Q9UQM£ Q<ľi6Al March IS, 2ÜG4 ~~SHarcti 15, 2004 (Sequence version 2) July 25, 2006 (Entry version 39) LIM domain only protein ~3 Syn or.yms LOHE F-box only protain 20 Gene name Name: LM07 Synonyms: : , fexoJ:'.:. kiaa - - F cam Hoir.o sapiens (iluma.^.) ! [ i Tax-on omy VcitGbrata References . [ 1) HUCLEOTIDE SEQUENCE | mrna; (ISOFORM TISSUE-Brain, and Peripheral blood íboi-IO. 1001/5304 39-C01-064S-6., N . , Ro-bb i p. s JJD TISSUE SPECIFICITY, SS316 i liCEE, SsEA&t, EB1, Israel, Vehlen Mig^ CJilla-deť E.j Stephan E . A,, teail*y-Vil46n J., Jya s.-ii.K., "a genomic map of a 6-Mb regio: development! identification an* foum. Genet. 110;111-131<2002>, 13q2I--q22 Implicated in cancer .ai-aetetization of candidate genes http://www.uniprot.org/ faculty-logo □ gi - » -II"* ^Q^O Příklad záznamu v databázi UniProt Key From TO Length Description FTId CHAIN 1 1683 1683 LIM domain only protein 7. PROJ000075824 DOMAIN 54 168 115 CH. DOMAIN 1042 1128 87 PDZ. DOMAIN 1612 1678 67 LIM zinc-binding. 10 20 30 40 50 60 MKKIRICHIF TFYSWMSYDV LFQRTELGAL EIWRQLICAH VCICVGWLYL RDRVCSKKDI 70 80 90 100 110 120 ILRTEQNSGR TILIKAVTEK NFETKDFRAS LENGVLLCDL INKLKPGVIK KINRLSTPIA 130 140 150 160 170 180 GLDNINVFLK ACEQIGLKEA QLFHPGDLQD LSNRVTVKQE ETDRRVKNVL ITLYWLGRKA faculty-logo □ & - = 11= PDB Zäznam v PDB HEADER COMPND COMPND SOURCE AUTHOR REVDAT JRNL JRNL JRNL JRNL JRNL JRNL JRNL REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK HYDROLASE(O-GLYCOSYL) 20-JAN-92 1HEW LYSOZYME (E.C.3.2.1.17) COMPLEXED WITH THE INHIBITOR 2 TRI-N-ACETYLCHITOTRIOSE HEN (GALLUS GALLUS) EGG WHITE J.C.CHEETHAM/P.J.ARTYMIUK/D.C.PHILLIPS 1 31-JAN-94 1HEW 0 AUTH J.C.CHEETHAM,P.J.ARTYMIUK,D.C.PHILLIPS TITL REFINEMENT OF AN ENZYME COMPLEX WITH INHIBITOR TITL 2 BOUND AT PARTIAL OCCUPANCY. HEN EGG-WHITE TITL 3 LYSOZYME AND TRI-N-ACETYLCHITOTRIOSE AT 1.75 TITL 4 ANGSTROMS RESOLUTION REF J.MOL.BIOL. V. 224 613 1992 REFN ASTM JMOBAK UK ISSN 0022-2836 070 1 1 REFERENCE 1 1 AUTH L.N.JOHNSON,J.C.CHEETHAM,P.J.MC*LAUGHLIN, 1 AUTH 2 K.R.ACHARYA,D.BARFORD,D.C.PHILLIPS 1 TITL PROTEIN-OLIGOSACCHARIDE INTERACTIONS: LYSOZYME, 1 TITL 2 PHOSPHORYLASE, AMYLASES 1 REF CURR.TOP.MICROBIOL.IMMUNOL. V. 139 81 1988 1 REFN ASTM CTMIA3 GW ISSN 0070-217X 761 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW fäculty-logo Zaznam v PDB REMARK REMARK REMARK REMARK SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES HET HET NAG HET NAG FORMUL 2 THE THREE SUGAR UNITS OF THE INHIBITOR MOLECULE ARE BOUND 1HEW 56 IN THE UPPER THREE SITES (A TO C) OF THE LYSOZYME ACTIVE 1HEW 57 SITE CLEFT- NAG MOLECULES, NUMBERED 203, 202, AND 201, ARE 1HEW 58 BOUND IN SITES A, B, AND C, RESPECTIVELY- 1HEW 59 129 LYS VAL PHE GLY ARG CYS GLU LEU ALA ALA ALA MET LYS 1HEW 60 129 ARG HIS GLY LEU ASP ASN TYR ARG GLY TYR SER LEU GLY 1HEW 61 129 ASN TRP VAL CYS ALA ALA LYS PHE GLU SER ASN PHE ASN 1HEW 62 129 THR GLN ALA THR ASN ARG ASN THR ASP GLY SER THR ASP 1HEW 63 129 TYR GLY ILE LEU GLN ILE ASN SER ARG TRP TRP CYS ASN 1HEW 64 129 ASP GLY ARG THR PRO GLY SER ARG ASN LEU CYS ASN ILE 1HEW 65 129 PRO CYS SER ALA LEU LEU SER SER ASP ILE THR ALA SER 1HEW 66 129 VAL ASN CYS ALA LYS LYS ILE VAL SER ASP GLY ASN GLY 1HEW 67 129 MET ASN ALA TRP VAL ALA TRP ARG ASN ARG CYS LYS GLY 1HEW 68 129 THR ASP VAL GLN ALA TRP ILE ARG GLY CYS ARG LEU 1HEW 69 201 15 N-ACETYL-D-GLUCOSAMINE 1HEW 70 202 14 N-ACETYL-D-GLUCOSAMINE 1HEW 71 203 14 N-ACETYL-D-GLUCOSAMINE 1HEW 72 NAG 3(C8 H15 N1 O6) 1HEW 73 faculty-logo □ gi - ■* -lit -OQ^ty Zaznam v PDB HELIX HELIX HELIX HELIX HELIX SHEET SHEET SHEET SHEET SHEET TURN TURN TURN TURN TURN TURN TURN TURN A ARG B LEU C CYS D THR E VAL S1 2 LYS 51 2 PHE 52 3 ALA S2 3 SER S2 3 GLN T1 MET T2 LYS T3 LEU T4 ASN T5 TYR T6 SER T7 LEU T8 SER 5 HIS 25 GLU 80 LEU 89 ILE 109 ASN 1 PHE 38 THR 42 ASN 50 GLY 57 SER 12 HIS 13 GLY 17 TYR 19 GLY 20 TYR 24 ASN 25 TRP 36 ASN 15 35 84 98 113 3 40-1 N THR 46 0 50 58 54-1 O SER 60 -1 O ILE TYPE III TYPE I TYPE II DISTORTED TYPE II TYPE I' TYPE III TYPE III TYPE III' N ASN N TYR 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 1HEW 40 O LYS faculty-logo □ gi - ■* -lit -OQ^ty Záznam v PDB CRYST1 ORIGX1 ORIGX2 ORIGX3 SCALE1 SCALE2 SCALE3 ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 78-860 78-860 38-250 90-00 1.000000 0-000000 0-000000 0-000000 1-000000 0-000000 0-000000 0-012681 0-000000 0-000000 0-012681 0-000000 0-000000 LYS LYS LYS LYS LYS LYS LYS LYS LYS VAL VAL VAL VAL VAL 0 - 000000 1 . 000000 0 - 000000 0 - 000000 0 - 026144 3 -398 2-459 2-458 2-481 1 -026 0 -028 -1-415 -2 -357 -3 -661 2-429 2 -395 0 -977 0 -642 3 -533 9-981 10- 365 11- 880 12- 672 9-935 10-169 10-089 10-822 10-090 12- 232 13- 653 13-868 13- 368 14- 012 90-00 0-00000 0-00000 0-00000 0-00000 0-00000 0-00000 10-408 9 -364 9 -149 10 -100 9 -695 8 -558 9 -048 8 -082 8 -025 7 -880 7-465 6 -903 5 -826 6 -536 P 43 21 2 1-00 30-48 1-00 28-03 1-00 21-93 1-00 14-10 1-00 30-54 1-00 37-93 1-00 33-23 1-00 32-17 1-00 31-92 1-00 17-30 1-00 14 -47 1-00 17-58 1-00 32-65 1-00 22-88 1HEW 113 1HEW 114 1HEW 115 1HEW 116 1HEW 117 1HEW 118 1HEW 119 1HEW 120 1HEW 121 1HEW 122 1HEW 123 1HEW 124 1HEW 125 1HEW 126 1HEW 127 1HEW 128 1HEW 129 1HEW 130 1HEW 131 1HEW 132 1HEW 133 90.00 fáculty-logo □ r3> - » .lit >o^O Gene Ontology ► Funkce genů a proteinů zjišťujeme experimentálně ► Slovní popis není jednoznačný - syntéza proteinů *■ syntéza polypeptidů - translace - aktivita ribozomů ► Ontológie je způsob jak do používaných termínů vnést systém faculty-logo □ gi - » -li"* ^Q^O Gene Ontology physiological process biological process s_^/ \^s_a cellular process cellular physiological process cell cycle cell division part_of/ \is a M phase meiotic cell cycle is\ /Part-°f M phase of meiotic cell cycle cytokinesis - »3 faculty-logo Gene Ontology ► Molekulární proces - katalytická aktivita *■ transport - intermolekulární vazba ► Biologický proces - přenos signálu - aktivace imunitního sytému *■ regulace genů ► Buněčná složka *■ buněčné jádro plazmatická membrána Gene Ontology - kódy zdroje dat Curator-assigned Evidence Codes ► Experimental Evidence Codes - IDA: Inferred from Direct Assay - IPI: Inferred from Physical Interaction *■ IMP: Inferred from Mutant Phenotype - IGI: Inferred from Genetic Interaction - IEP: Inferred from Expression Pattern ► Computational Analysis Evidence Codes ISS: Inferred from Sequence or Structural Similarity - IGC: Inferred from Genomic Context *■ RCA: inferred from Reviewed Computational Analysis ► Author Statement Evidence Codes - TAS: Traceable Author Statement ► NAS: Non-traceable Author Statement ► Curator Statement Evidence Codes IC: Inferred by Curator ND: No biological Data available ► Automatically-assigned Evidence Codes - IEA: Inferred from Electronic Annotation ► Obsolete Evidence Codes NR: Not Recorded Metabolické dráhy UCSC Genome Browser □ r3i - ■* i t ^<\(y faculty-logo Ensembl Genome Browser faculty-logo GBrowse Argo DecodeMe Browser •ObO =s-1C m. - e n n íl 'liti nimi ei iiiiiM hei i;i nii n i r n n i n um ii C ■ ......■ M I !■ I H wfif/niBJiD waizKifiiBJip miEía-Kianp wttfiľ'eiiBJíp Blit 9t St m n El U Cl ě KB í 9 S 016* pusjľ^ _j H *^.vd &i &m- w?u/jjo3 "1 El e s^ejg jasnej ^ * □ b x s i a jesMOjg 8LU0U89 x\\Q\-\ ueppo □ S ~ = -11= "O^O JGI Browser faculty-logo RIKEN Genome Browser GenoDive faculty-logo 9624 Příště Analýza proteinových šěkvěncí, strukturních a funkčních dat faculty-logo Outline Dodatek faculty-logo For Further Reading x faculty-logo □ gi - ■* -lit -oo^ty