Využití internetových zdrojů při studiu mikroorganismů doc. RNDr. Milan Bartoš, Ph.D. bartosm@vfu.cz Přírodovědecká fakulta MU, 2014 Obsah přednášky 1) Práce se sekvenčními daty 2) Základní veřejně dostupné databáze 3) Práce se stránkami NCBI 4) Jak se posuzuje podobnost sekvencí 5) Prohledavač BLAST, BLAST2 6) Mnohočetné přiřazení - program CLUSTAL Doporučená literatura Cvrčkova F. (2006): Úvod do praktické bioinformatiky, Academia Praha http://www.ncbi.nlm.nih.gov/ Práce se sekvenčními daty Sekvenční data = zápis primární sekvence makromolekul, tj. DNA (RNA) a proteinů > DNA a RNA se zapisují ve směru 5'- 3' > Proteiny se zapisují od N-konce k C-konci > Používají se jednopísmenkové kódy (podle IUPAC) Zkratky pro nukleové kyseliny DNA, RNA Kód A C G T U Y S W Báze Adenin Cytosin Guanin Ty m i n Uracil A, G (purin) C, T (pyrimidin) G, C (strong) A, T (weak) Kód K M B D H N Báze G, T (keto) A, C (amino) C, G, T (ne A) A, G, T (ne C) A, C, T (ne G) A, C, G (ne T, U) cokoli (any) mezera Zkratky pro proteiny Kód Zkratka Amino kyselina Kód Zkratka Amino kyselina A Ala Alanin P Pro Prolin C Cys Cystein Q Gin Glutamin D Asp Aspartat R Arg Arginin E Glu Glutamat S Ser Serin F Phe Fenylalanin T Thr Threonin G Gly Glycin V Val Valin H His Histidin W Trp Tryptofan 1 lie Izoleucin Y Tyr Tyros i n K Lys Lys i n X Xxx cokoli L Leu Leucin B Asx Asp, Asn M Met Methionin Z Glx Glp, Gin N As n Asparagin Způsoby zápisu Surová data (raw data, raw formát) > Některé programy je umí přijmout a zpracovat > Nejsou ale vhodné pro dlouhodobé uchovávání Specializované formáty > Základní veřejné databáze je umí převádět Jednoduché formáty - FASTA > Nejlépe bez mezer a speciálních znaků >gi|291219937|ref|NM_001888.3| Horno sapiens crystallin, mu (CRYM), transcript variant 1, mRNA TTTCAAATGGGGAGTTTCCCTGCACAAGCTTTCTTGTCTGCCACTATGTGAGATATACCTT TCACCTTCTGCCGTGATTGTGAGGCCTCCTCAGCCACGTGGAACTGTAAAAACTCCTGGAA GAAAAGATCCTGCAATTT FASTA a WORD Na co si dát pozor > Uložit ve formátu „pouze text" > Nepoužívat tabelátory a jiné cizí znaky > Vypnout funkce „automatické opravy" a „automatický text" i funkce „inteligentní vyjímání a vkládání" Typ písma Doporučuji formát pisma „Courier New" - každé písmeno zaujímá stejnou plochu Courier New 24 TTTCAAATGGGGAGTTTCCCTGCACAAGCTTTCTT AAAGTTTACCCCTCAAAGGGACGTGTTCGAAAGAA Arial 24 TTTC A A ATG G G G AGTTTCCCTG C AC A AG CTTTCTT AAAGTTTACCCCTCAAAGGGACGTGTTCGAAAGAA Pozor, zkratky pro NA a proteiny jsou v ^ některých případech shodné! Vstupní formáty pro počítačové zpracování musí být specifikovány, aby program rozpoznal, jde-li o NA nebo protein , Molekulárně-biologické databáze Evropský institut pro bioinformatiku ve Velké Británii (EBI) EMBL, 1980 www.ebi.ac.uk Národní centrum pro biotechnologické informace (NCBI) založené v rámci Národní lékařské knihovny (NLM) v USA GenBank, 1982 www.ncbi.nlm.nih.gov Centrum pro inormační biologii (CIB), jako oddělení Národního genetického institutu (NIG) v Japonsku DDBJ, 1984 www.cib.niq.ac.jp GenBank/EMBL/DDBJ > Vzájemně si vyměňují si informace > Volně dostupné > Přijímají nové sekvence z genomových center a pracovišť zabývajících se sekvenováním Sekvenci v databázích může zveřejnit kdokoli! 7989 Databáze sekvencí proteinů Databáze SWISS-PROT založená na Univerzitě v Ženevě v roce 1986 Spravuje Švýcarský institut pro bioinformatiku (SIB) www.expasy.org Obsahuje automaticky doplňované překlady sekvencí z E M B L Databáze PDB (The Protein Databank) Archivuje a analyzuje proteinové struktury a komplexy informačních biomakromolekul ^ http://www.rcsb.orq/pdb/home/home.do Práce s databází NCBI www.ncbi.nlm.nih.gov Resources M How To |v| ÍNCBI National Center for Biotechnology Information NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation All Databases Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Jet Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit data to GenBank or other NCBI databases Genomic Structural Variation dbVar archives large scale genomic variation data and associates defined vanants with phenotypic information My NCBI Sign In Search Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem NCBI Announcements New Microbial BLAST Page 12Jun2012 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-complete Sian ud for the Fall Discovery Workshops! Práce s databází NCBI My NCBI Sign In JNCBI national Center for Biotechnology Information NCBI Home Resource List (A-ZJ All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation All Databases All Resources All Databases Downloads Submissions Tools How To Tools 1000 Genomes Browser An interactive graphical viewer that allows users to explore variant calls, genotype calls and supporting evidence (such as aligned sequence reads) that have been produced by the 1000 Genomes Project. ASN 1 Format Summary An International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. For data specifications and conversion tools, see NCBI Data Specification below. Amino Acid Explorer This tool allows users to explore the characteristics of amino acids by compalng their structural and chemical properties, predicting protein sequence changes caused by mutations, viewing common substitutions, and browsing the functions of given residues in conserved domains. Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram. BLAST Link (BLink) A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other Práce s databází NCBI bené položky Náitroje Nápověda ) Identity Safe - Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation aligned sequence reads) that have been produced by the 10QQ Genomes Project. ASN t Format Summary An International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. For data specif cations and conversion tools, see NCBI Data Specification below. Amino Acid Explorer This tool allows users to explore the characteristics of amino acids by comparing their structural and chemical properties, predicting protein sequence changes caused by mutations, viewing common substitutions, and browsing the functions of given residues in conserved domains. Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram. BLAST Link (BLinK) A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other protein sequences at NCBI. BLAST Microbial Genomes Performs a BLAST search for similar sequences from selected complete eukaryotic and prokaryotic genomes. BLAST RefSeqGene Performs a BLAST search of the genomic sequences in the RefSeqGene/LRG set. The default display provides ready navigation to review alignments in the Graphics display. BLAST Tutorials and Guides This page links to a number of BLAST-related tutorials and guides, including a selection guide tor BLAST algorithms, descriptions of BLAST output formats, explanations of the parameters tor stand-alone BLAST, directions tor setting up standalone BLAST on local machines and using the BLAST URL API. Práce s databází NCBI BLAST® Basic Local Alignment Search Tool My NCBI Home Recent Results Saved Strategies Help ► NCBľ BLAST/ blastn suite bio BLAST microbial genomes blastn blastx tblastn Enter Query Sequence BLASTN programs search nucleotide databases using a nucleotide query, more... Enter accession numbers], gi(s), or FASTA sequence(s) y> Clear Query subrange _■ From To Or, upload file Job Title Enter a descriptive title for your BLAST search Procházet. 4» Choose Search Set Database Organism Optional f Complete genomes O Draft genomes yj Genomes: 2096 Enter organism name or id-completions will be suggested E3 Exclude 2 Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown, y Entrez Query Optional \z „ . Enter an Entrez query to limit search mi Program Selection < Reset cage Bookmark Dostali jste se na prohledavač BLAST Další zajímavé „ Tools" Vyhledávání STS This interactive tool allows users to build E-utility URLs, either from a form or by hand, arid then view their raw output. The tool provides a simple environment for testing E-utility URLs before including them in applications. E-Utilities Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. Ebot A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline. Electronic PGR (e-PCRl A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs. Frequency-weighted Link fFLinIO FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics. Gene Expression Omnibus (GEO) BLAST Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database. Gene Plot A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes. Genetic Codes Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonornic tree. Genome BLAST Další zajímavé „ Tools" Srovnání dvou prokaryotických genomů This interactive tool allows users to build E-utility URLs, either from a form or by hand, arid then view their raw output. The tool provides a simple environment for testing E-utility URLs before including them in applications. E-Utilities Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. Ebot A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline. Electronic PGR (e-PCRl A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs. Frequency-weighted Link fFLinIO FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics. Gene Expression Omnibus (GEO) BLAST Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms Gene Plot A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes. Genetic Codes Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonomie tree. Genome BLAST_ Další zajímavé „ Tools Tabulky genetických kódu This interactive tool allows users to build E-utility URLs, either from a form or by hand, and then view their raw output. The tool provides a simple environment for testing E-utility URLs before including the m in applications. E-Utilities Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. Ebot A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline. Electronic PGR (e-PCRl A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs. Frequency-weighted Link fFLinIO FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics. Gene Expression Omnibus (GEO) BLAST Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database. Gene Plot A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes. Genetic Codes Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonornic tree. Genome BLAST Další zajímavé „ Tools" Navrhování primem pro PCR PSSM Viewer Allows users to display, sort subset and download position-specific score matrices {PSSMs) either from CDD records or from Position Specific Iterated (PSI)-BLAST protein searches. The tool also can align a query protein to the PSSM and highlight positions of nigh conservation Phenotype-Genotype Integrator(PheGenl) Supports finding human phenotype/genotype relationships with queries by phenotype, chromosome location, gene, and SNP identifiers. Currently includes information from dbGaP, the NHGRI GWAS Catalog, and GTeX. Displays results on the genome, on sequence, or in tables for download. Primer-BLAST The Primer-BLAST tool uses Primer3 to design PCR primers to a sequence template. The potential products are then automatically analyzed with a BLAST search against user specified databases, to check the specificity to the target intended. ProSplign A utility for computing alignment of proteins to genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, ProSplign is accurate in determining splice sites and tolerant to sequencing errors. PubChem Power User Gateway (PUG) PUG provides access to PubChem services via a programmatic interface. PUG allows users to download data, initiate chemical structure searches, standardize chemical structures and interact with the E-utilities. PUG can be accessed using either standard URLs or via SOAP. PubChem Standardization Service Standardization, in PubChem terminology, is the processing of chemical structures in the same way used to create PubChem Compound records from contributors' original structures. This service lets users see how PubChem would handle any structure they would lite to submit. PubChem Structure Search PubChem Structure Search allows the PubChem Compound Database to be queried by chemical structure or chemical Primer-BLAST Primer-BLAST ► NCBK Primer-BLAST: Finding primers specific to your PC R template (using Primer3 and BLAST), more... Tips for finding specific primers pQpj Template Reset page Save search parameters Retrieve recent results Enter accession, gi, or FASTA sequence (A refseq record is preferred) j>; Clear Range From To Forward primer Reverse primer Or, upload FASTA file Procházet... Primer Parameters Use my own forward primer (5'->3' on plus strand} Use my own reverse primer l5'->3' on minus strand) PCR product size # of primers to return i*1 Clear Clear Min Max 70 5 Min Primer melting temperatures 57 q Exon/intron selection 1000 Opt Max 60.0 63.0 Max Tm difference 3 ty Exon junction span Exon junction match A refseq mRNA sequence as PCR template input is required for options in the section Hi® No preference Exon at 5' side Exon at 3" side Prohlédněme si tuto stránku podrobně Navrhněte prímery pro identifikaci genu pro 16S rRNA Borrelia burgdorferi metodou PCR > Do zadávacího okénka pro sekvenci zadejte Acc. No. sekvence pro 16S rRNA, např. HQ433693.1 > Využijte DEFAULT nastavení nebo měňte parametry podle vlastního uvážení Ukázka výsledku Primer-BLAST ► NCBIÍ Primer-BLAST : results: Job id=JSID 01 366985 130.14.18.128 9002 more.. Input PCR template Range Specificity of primers Other reports HQ433693,1 Borrelia burgdorferi strain QSYSP3 16S ribosomal RNA gene, partial sequence 1 - 481 primers may not be specific to the input PCR template as targets were found in selected database:All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGE sequences) ...help on specific primers Search Summary ▼ Summary of primer pairs Sequence (5'->3') Template strand Length Start Stop Tm GC% Self complementarity Self 3' complementarity Forward primer GCGAAAGCCTGACGGAGCGA Plus 20 322 341 59.7765.00 3.00 D.DO á Ukázka výsledku ▼ Detailed primer reports Primer pair 1 Sequence (5'■>3') Ternplate strand Length Start StopTm GC% SeIf compIementarity Self 3' complementarity Forward primer GCGAAAGCCTGACGGAGCGA Plus 20 322 341 59.77 65.00 3.00 0.00 Reverse primer ATTACCGCGGCTGCTGGCAC Minus 20 473 459 60.3965.006.00 200 Product length 157 Products on intended target >HQ433693 1 Borrelia burgdorferi strain GSYSP3 16S ribosomal RNA gene: partial sequence product length = 157 Forward primer 1 gcgaaagcctgacggagcga 20 Template 322 .................... 341 Reverse primer 1 attaccgcggctgctggcac 20 Template 478 .................... 459 Products on potentially unintended templates >EU135595.1 Borrelia valaisiana strain QSYSP3 16S ribosomal RNA gene: partial sequence product length = 157 Forward primer 1 gcgaaajgcctgacggagcga 20 Template 350 .................... 369 á Vyhledejte sekvenci HQ433693.1 (16S rRNA Borrelia burgdorferi) a vyznačte na ní pozici nalezených primerů 1) Do vyhledávače BLAST zadejte „Borrelia burgdorferi 16S" 2) Najděte sekvenci HQ433693.1 3) Můžete do vyhledávače zadat taky přímo Acc. No. Výsledek AGCATGCAAGTCAAACGGGATGTAGCAATACATCTAGTGGCGAAC GGGTGAGTAACGCGTGGATGATCTACCTATGAGATGGGGATAACT ATTAGAAATAGTAGCTAATACCGAATAAAGTCAATTAATTTGTTA ATTGATGAAAGGAAGCCTTTAAAGCTTCGCTTGTAGATGAGTCTG CGTCTTATTAGTTAGTTGGTAGGGTAAATGCCTACCAAGGCGATG ATAAGTAACCGGCCTGAGAGGGTGAACGGTCACACTGGAACTGAG ACACGGTCCAGACTCCTACGGGAGGCAGCAGCTAAGAATCTTCCG CAATGGGCGAAAGCCTGACGGAGCGACACTGCGTGAATGAAGAAG GTCGAAAGATTGTAAAATTCTTTTATAAATGAGGAATAAGCTTTG TAGGAAATGACAAAGTGATGACGTTAATTTATGAATAAGCCCCGG CTAATTACGTGCCAGCAGCCGCGGTAATACG Forward 322-341 5'- GCGAAAGCCTGACGGAGCGA - 3' Reverse 478-459 5'- ATTACCGCGGCTGCTGGCAC - 3' Další zajímavé „ Tools" Taxonomie a umný iui computing cuiMA-lu-aseu un a vaiiauuri 01 uie Neeuiernan-vvunscii yiuuai alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, Splign is accurate in determining splice sites and tolerant to sequencing errors. TaxPlot A tool for comparing genomes on the basis of the protein sequences they encode. To use TaxPlot, one selects a reference genome and two species for comparison. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome, based on the best alignment with proteins in each of the two genomes being compared. Taxonomy Browser Supports searching the taxonomy tree using partial taxonomic names, common names, wild cards and phonetically similar names. For each taxonomic node, the tool provides links to all data in Entrez for that node, displays the lineage, and provides links to external sites related to the node. Taxonomy Common Tree Generates a taxonomic tree for a selected group of organisms. Users can upload a file of taxonomy IDs or names, or they can enter names or IDs directly. Taxonomy Statistics Displays the number of taxonomic nodes in the database for a given rank and date of inclusion. Taxonomy Status Reports Displays the current status of a set of taxonomic nodes or IDs. Variation Reporter A tool designed to search for and report human sequence variation data from dbSNP and dbVar. Individual variations or batch lies can be submitted in HGVS, GVF or BED formats. Related information will be retrieved and reported in a downloadable table containing variation identifiers, nucleotide and cytogenetic band locations on various genomic assemblies, allele type and minor allele frequencies, predicted functional consequences (missense, nonsense, frameshift, splice site, etc.), reported clinical significance, and relevant citations. VecScreen A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen searches a Kolik záznamů o sekvencích DNA a kolik záznamů o sekvencích proteinů je v databázi ohledně druhu Thermus aquaticus? Ke konci června 2012 to bylo 338 záznamů o DNA a 562 (5 641) záznamů o proteinech Práce s databází NCBI www.ncbi.nlm.nih.gov Resources m How To |v| My NCBI Sign In ÍNCBI National Center for Biotechnology Information All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Get Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit da:a to GenBank cr other NCBI databases Genomic Structural Variation dbVar archives large scale genomic vanation data and associates defined vanants with phenotypic information Search Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem NCBI Announcements New Microbial BLAST Page 12Jun2012 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-complete Sian ud for the Fall Discovery Workshops! Práce s databází NCBI www.ncbi.nlm.nih.gov Resources m How To |v| My NCBI Sign In ÍNCBI National Center for Biotechnology Information All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Homology Genetics & Medicine Genomes & Maps Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Get Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit da:a to GenBank cr other NCBI databases Genomic Structural Variation dbVar archives large scale genomic vanation data and associates defined vanants with phenotypic information Search Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem NCBI Announcements New Microbial BLAST Page 12Jun2012 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-complete Sian ud for the Fall Discovery Workshops! Jak s nástroji pracovat ř; NCBI Resources 0 How To© My NCBI Sign In %NCBI All Databases I Search | National Center tor Biotechnology Information NCBI Home All Resources Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy 1 Training & Tutorials Variation All Databases Downloads Submissions Tools How To How To Find bioassays in which a given drug is active Find bioassavs that test a particular disease or protein target Submit data to NCBI Save text searches and set up automated searches with E-mail Download NCBI Software Retrieve all sequences for an organism ortaxon Find the function of a gene or gene product Find expression patterns Find genes associated with a phenotvpe or disease Compare protein homologs between two microbial genomes View/download features around an object or between two objects on a chromosome Find sequenced genomes, including those in progress, for a taxonomic group Download the complete genome for an organism Display genomic annotation graphically Submit sequence data :o NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between the genomes of two organisms Find a homolog for a gene in another organism Obtain the full text of an article uvidíme později Porovnaní proteinů u dvou génomů NCBl Resources (v) How To Q MyNCBI Sign In I All Databases National Center for Biotechnology Information NCBl Home Resource List [A-Z] All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: Compare protein homologs between two microbial genomes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Starting with the Prokaryotic Genome Project homepage... FOR TWO ORGANISMS 1. Scroll down to "find the genome of interest. 2. Click the NC_ accession link from the RefSeq column. 3. Click GenePlot (if available) from the BLAST homologs column of the resulting table interface. 4. Select the two organisms of choice and then click "Compare Selected Pair. FOR THREE ORGANISMS 1. Proceed as in Steps 1 and 2 above. 2. Select TaxPlot from the BLAST homologs column of the resulting table interface. 3. Select two other organisms from the drop-down menus below the selected genome of interest. 4. Click the "compare" button located just below the graphical plot. Návod FOR TWO ORGANISMS 1) Scroll down to find the genome of interest. 2) Click the NC_ accession link from the RefSeq column. 3) Click GenePlot (if available) from the BLAST homologs column of the resulting table interface. 4) Select the two organisms of choice and then click "Compare Selected Pair". FOR THREE ORGANISMS 1) Proceed as in Steps 1 and 2 above. 2) Select TaxPlot from the BLAST homologs column of the resulting table interface. 3) Select two other organisms from the drop-down menus below the selected genome of interest. 4) Click the "compare" button located just below the graphical plot. Jak s nástroji pracovat Download trie complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between ttie genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article Find articles about a topic similar to that in a given article View the 3D structure of a protein Find a curated version of a sequence record (NCBI Reference Sequence) Align two or more 3D structures to a given structure Find published information on a gene or sequence Find transcript sequences for a gene Link from an object on a map to another resource Design PGR primers and check them for specificity Automate BLAST searches performed on NCBI servers Obtain genomic sequence for/near a gene, marker, transcript or protein Compare your sequence to the RefSeqGene/LRG standard Run BLAST software on a local computer Submit multiple query sequences in a single BLAST search Find the complete taxonomic lineage for an organism Generate a Common Tree for a set of taxa Complete an NCBI tutorial F nd out vv hat's new at NC3I Learn about an NCBI resource Learn about the basics of molecular biology and bioinformatics Download a large, custom set of records from NCBI Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated wiih a gene View genotype frequency data for a gene, disease or short genetic variation Databáze Pub Med Resources Q How To @ My NCBI Sign In %NCBI National Center for Biotechnology Information n All Databases i - NCBI Home Resource List (A-Z) All Resources Chemicals & Bioass^^ Data & Software DNA & RNA Domains & Structures Geres & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: Obtain the full text of an article Please note that there is a VoulJM3 tutorial about this. Starting with an abstract in PubMed... 1. Search the PubMed with a search term, author name, or PubMed ID. Author name can be entered as follows: smith aj[au]. 2. Click on the title of an entry of interest. 3. Look for icons in the upper-right-hand corner of the record: ■ Click on the PubMed Central link or a Publisher's link to access the full text of the article. Articles in PubMed Central are freely available. Articles on Publisher's websites are either freely available or can be accessed with a fee. Contact the specific publisher for questions about their site. ■ For PubMed records with no icons in the upper-right-hand corner, Loansome Doc can be accessed to order the article following these directions: PubMed Help. Databáze Pub Med Resources m How To M PublQed US National Library of Mediáne National institutes of Health PubMed Advanced MyNCBI Sign In Help PubMed PubMed comprises more than 21 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. Using PubMed PubMed Quick Start Guide Full Text Articles PubMed FAQs PubMed Tutorials New and Noteworthy PubMed Tools PubMed Mobile Single Citation Matcher Batch Citation Matcher Clinical Queries Topic-Specific Queries More Resources MeSH Database Journals in NCBl Databases Clinical Trials E-Utilities LinkOut I Najděte publikace o Deinococcus radiodurans Kolik review databáze obsahuje? 1) Ke konci června 2012 jich bylo kolem 962 2) Z toho review bylo 52 3) Všimněte si, že jen některé jsou volně dostupné Jak s nástroji pracovat Download trie complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between Itie genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article Find articles about a topic similar to that in a given article View the 3D structure of a protein Find a curated version of a sequence record (NCBI Reference Sequence) Align two or more 3D structures to a given structure Find published information on a gene or sequence Find transcript sequences for a gene Link from an object on a map to another resource Design PGR primers and check them for specificity Automate BLAST searches performed on NCBI servers Obtain genomic sequence for/near a gene, marker, transcript or protein Compare your sequence to the RefSeqGene/LRG standard Run BLAST software on a local computer Submit multiple query sequences in a single BLAST search Find the complete taxonomic lineage for an organism Generate a Common Tree for a set of taxa Complete an NCBI tutorial F nd out vv hat's new at NC3I Learn about an NCBI resource Learn about the basics of molecular biology and bio informatics Download a large, custom set of records from NCBI Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated wiih a gene View genotype frequency data for a gene, disease or short genetic variation 3D struktury proteinů řj NCBl Resources Q How To Q My NCBI Sign In %NCB1 national Center for Biotechnology Information All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: View the 3D structure of a protein Starting with... 'AMUHUUUL (e.g.lUUUJ 1. Go to the Structure Home Page. bnter tne huh coae in tne searcn dox ana press tne l^o Dutton. 3. Click the structure image, and on the resulting page click the "Structure View in Cn3D" button. A PDB-FORMAT FILE THAT IS NOT IN PDB 1. Go to the VAST search page. 2. Enter or browse for the PDB file name and click the Submit button. 3. Click the 'View 3D Structure" button on the next page. A PROTEIN ACCESSION NUMBER {e.g. NP_000240) OR SEQUENCE 1. Use the Finding a Structural Template guide to find the most appropriate PDB structure. 2. Continue with step 1 under "a PDB code" above. 3D struktury proteinů NCBI Resources M How To M My NCBI Sign In Structure Structure Limits Advanced Help Using Structure Search How to (Quick Start) Guides Help News FTP Publications Three dimensional structures provide a wealth of information on the biological function and the evolutionary history of macromolecules. They can be used to examine sequence-structure-function relationships, interactions, active sites, and more. Structure Tools Macromolecular Resources Overview CBLAST Cn3D IBIS VAST More Resources PDB Protein CDD PubChem NCBI Structure Group Resources & Research Discover Najděte strukturu mykobakteriální katalázy Kolik záznamů najdete? 1) Heslo „catalase Mycobacterium" 2) Ke konci června 2012 jich bylo 46, všechny získané z krystalografických dat prostřednictvím paprsků X, žádná NMR Jak s nástroji pracovat Download trie complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between ttie genomes of two organisms Find a homolog for a gene in another organism Obtain the full text of an article Find articles about a topic similar to that in a given article View the 3D structure of a protein Find a curated version of a sequence record (NCBI Reference Sequence) Align two or more 3D structures to a given structure Find published information on a gene or sequence Find transcript sequences for a gene Link from an object on a map to another resource Design PCR primers and check them for specificity Automate BLAST searches performed on NCBI servers Obtain genomic sequence for/near a gene, marker, transcript or protein Compare your sequence to the RefSeqGene/LRG standard Run BLAST software on a local computer Submit multiple query sequences in a single BLAST search Find the complete taxonomic lineage for an organism Generate a Common Tree for a set of taxa Complete an NCBI tutorial F nd out vy hat's new at NC3I Learn about an NCBI resource Learn about the basics of molecular biology and bioinformatics Download a large, custom set of records from NCBI Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated wiih a gene View genotype frequency data for a gene, disease or short genetic variation Srovnání sekvence s referenčními NCBl Resources [v] How To Q MyNCBI Sign In %NCBI National Center lor Biotechnology Information All Databases ^ NCBl Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software How to: Compare your sequence to the Ref SeqGene/LRG standard Startinr. with a «nn.nM nr »«»Pn». DNA&RNA Domains & Structures Geres & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation 1. From the RefSeqGene homepage, click on RefSeqGene BLAST in the Tools section. ^^^u£mj^Qjjrjjuer^ec|uence^rj^ 3. Review the results as aligned to the RefSeqGene records by clicking on the Graphics in the Descriptions table. 4. If you submitted more than one query sequence and would like to review the alignment of a particular sequence, click on 'Configure", select voir closer a igmient and wove :ie check box in Pont of:ie aligniren:s ycu don': want cisplayed. Then click on "Configure" at the bottom of the page to apply your revised selections. 5. If you identify any differences between your sequence and the RefSeqGene, you can evaluate whether others have reported sequence variation in that region by reviewing the variation annotated on the RefSeqGene. Srovnání sekvence s referenčními BLAST Home Recent Results Saved St ► NCBIf BLAST/ blastn suite Basic Local Alignment Search Tool RefSeqGene Nucleotide BLAST blastn My NCBI rsiqn lni IRegisterl I Enter Query Sequence Enter accession number(s), gi(s), or FASTA sequence(s) y: Search RefSeqGene using a nucleotide query, more... Clear Query subrange y From To Or, upload file Job Title Enter a descriptive title for your BLAST search D Align two or more sequences yj Choose Search Set Procházet.. Database Reference genomic sequences (refseq_genomic) f* \ <£> Optional Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown, w Exclude □ |y|0[je|s (XM/XP) □ Uncultured/environmental sample sequences Optional LI Exclude ^ Errtrez Query Reset cage Bookmark Zkopírujte si níže uvedenou sekvenci a porovnejte ji s databází referenčních sekvencí. Komu patří? 1) ATGAGTGAAATGAAATGCCCTTATGACCATACCAACTTGACCATGAGTAATGGCGCGCCTGTTATTGACA 2) ACCAAAATTCAATGACCGCAGGTGCCAGAGGGCCACTGCTTGCCCAAGATTTATGGCTCAATGAAAAATT 3) AGCCGACTTTGCCCGTGAGGTCATTCCAGAACGCCGCATGCACGCCAAAGGCTCAGGCGCATTTGGCACA 4) TTCACGGTAACGCACGACATCACCCAATACACCCGTGCTAAGATTTTTAGTGAAGTTGGCAAAAAAACTG 5) AGATGTTCGCTCGTTTTACCACCGTAGCAGGCGAGCGGGGGGCGGCGGACGCTGAGCGTGATATCCGTGG 6) TTTTGCCCTAAAATTCTACACCGAAGAGGGTAATTGGGACATGGTGGGTAATAACACGCCTGTTTTCTTT 7) TTAAGAGACCCAAAAAAATTCCCTGATTTAAATAAAGCGGTCAAACGAGACCCACGCACCAACATGCGTT 8) CTGCCACCAATAACTGGGATTTTTGGACACTGCTGCCAGAGGCGTTTCATCAGGTGACCATTGTGATGAG 9) CGACCGTGGCATTCCTAAATCTTACCGTCATATGCACGGCTTTGGCTCGCACACTTATAGCTTTATCAAT 10) GCTGATAATGAACGCTTTTGGGTCAAATTTCACTTTCGCACCCAACAAGGCATTGAAAATCTAACCGATG 11) CCGAAGCTG AAATGGTGGTTGGTAAAGACCGTGAGAGCAATCAGCGTG ATTTGTTTG ATGCCATTGAGCG 12) TGGCGATTTCCCAAAATGGACAATGTATGTGCAAATCATGCCAGAAACCGATGCCCAAACTGTGCCTTAT 13) CACCCATTTGATTTAACCAAAGTGTGGCCAAAAGGCGACTATCCGCTCATTGAAGTGGGTGAGTTTGAGT 14) TAAATAAAAATCCTGAAAACTTCTTTTTAGACGTTGAACAATCCGCTTTTGCCCCAAGCAACCTAGTCCC 15) GGGCATCAGTGTGTCCCCTGACCGCATGCTCCAAGCACGCCTATTTAACTATGCTGATGCGCAGCGTTAT 16) CGTTTGGGCGTCAATCGTAACCAAATTCCAGTGAATGCCCCACGCTGTCCTGTGTACTCAAACCAAAGAG 17) ACGGACAAGGGCGAGTGGGCGATAACTATGGCGGTCGTCCGCACTATGAACCGAACAGTTTTGGACAATG 18) GCAAGACCAGCCGCATTTGGCTGAACCAGCATTAAAAATTCATGGCGATGCTAAGTTTTGGGATTATCGT 19) GAGAATGATGATGATTATTTTAGCCAACCCAGAGCCTTGTTTGAGTTGATGAGCGATGAGCAAAAACAGG 20) CGTTATTTGGTAATACGGCTCGTGCGATGGGCGATGCCCCTGATTTTATTAAATACCGCCATATCCGTAA 21) TTGCGATAAATGCCACCCTGATTATGCCATGGGTGTGGCCAAAGCGTTAGGCCTTACGGTTGAAGATGCC 22) AAAAATGCGTATGAGAGCGACCCTGCTCGCCATCTGCCCAGCTTTTTATA Mohlo by vám vyjít to, co je na následující -^r strance Distribution of 5 Blast Hits on the Query Sequence ■& Mouse over to see the define, click to show alignments Color key for alignment scores Qutrv <40 40-50 SO-200 >=200 1 1 1 1 1 300 GOO 900 1200 1500 Legend for links to other resources: E UniGene Q GEO [±3 Gene E] Structure Map Viewer EA PubChem BioAssay Sequences producing significant alignments: Accession Description Max score Total score Ouerv coveraae E — value Max ident 2808 2808 100% 0.0 100% 763 763 83% 0.0 78% 695 695 87% 0.0 76% 553 553 89% 7e-153 74% 333 333 56% le-86 74% Links NC 014147.1 Moraxella catarrhalis RH4 chromosome, complete qenome NC 015460.1 Gallibacterium anatis UMN179 chromosome, complete aenome NC 009524.1 Psychrabacter sp. PRwf-1 chromosome, complete genome NC 014752,1 Neisseria lactamica 020-06 chromosome, complete genome NC 010382.1 Lysinibacillus sphaericus C3-41 chromosome, complete genome Práce s databází NCBI www.ncbi.nlm.nih.gov Resources m How To |v| My NCBI Sign In ÍNCBI National Center for Biotechnology Information All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Homology Genetics & Medicine Genomes & Maps Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Get Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit data to GenBank or other NCBI databases Genomic Structural Variation dbVar archives large scale genomic variation data and associates defined vanants with phenotypic information Search Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem NCBI Announcements New Microbial BLAST Page 12Jun2012 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-complete Sian up for the Fall Discovery Workshops! Pokyny pro vložení vlastních dat % NCBI Resources E) How To 0 My NCBI Sign In ÍNCBI National Center for Biotechnology Information NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation All Databases How to: Submit data to NCBI Starting with... SEQUENCE DATA For guidance on the submission process for your sequencer), please see|How To: Submit sequence data to NCBI. our data will be submitted to one of the following databases: GenBank Sequence Read Archive (SRA> dbSNP dbVar GEO MICRO ARRAY DATA If you have microarray data from clinical studies that require controlled access, you should submit your data to dbGaP. For all other microarray data, you should submit your data to GEO via GEO's Submission page. BIOASSAY DATA, SUBSTANCE OR SEQUENCE-BASED REAGENTS BioAssay data and chemical substance information should be submitted to PubChem via their PubChem Deposition Gateway. Posuzování podobnosti sekvencí Posuzování podobnosti sekvencí Hledáme homologické sekvence vzniklé v průběhu evoluce Úkol: Jsou si podobnější sekvence A a B nebo B a C? Výchozí sekvence A = ATTGCTCTGT B = ATAGCTCGGT C = ATTGCACTGTAATGCCATGT D = ATTGCTCTGAAATGCCCTGT Posuzování podobnosti sekvencí Přiložíme sekvence k sobě = přiřazení (alignment) A = B = A T T G C T C II I I I I A T A G C T C G T I I G T par nepár C=ATTGCACTGTAATGCCATGT I I I I I Ml I I I I I I III D=ATTGCTCTGAAATGCCCTGT Posuzování podobností sekvencí Výpočet normalizované hodnoty podobnosti (score) A = ATTGCTCTGT II I I I I II B=ATAGCTCGGT hodnota páru hodnota nepáru SAB = (8x1 + 2 x 0)/10 = 0,80 y \ počet pozic počet párů počet nepárů (match) (mismatch) Posuzování podobností sekvencí ATTGCACTGTAATGCCATGT MIM Ml I I I I I I III ATTGCTCTGAAATGCCCTGT SCD = (17x1 +3x0)720 = 0,85 0,85 > 0,80 -> C a D jsou si podobnější Globální a lokální přiřazení Problém sekvencí odlišné délky nebo velmi odlišné sekvence stejné délky Global alignment > Sekvence přiřadíme po celé délce i za cenu vnášení mezer > Vhodné pouze u příbuzných sekvencí > Vhodné pro mnohočetná přiřazení Local alignment > Sekvence přiřadíme jen tam, kde jsou velmi podobné, ostatní budeme ignorovat > Vhodné pro nepříbuzné sekvence > U podobných sekvencí odpovídá globálnímu přiřazení Globální a lokální přiřazení Global alignment SLAV----------APATNIK-------PIQNYR-I------AKSE TQRYMVIE SLAVYTYIE FVRANAPATNIKSECVRAAPIQNYRRVEHVRATAKSE TQRYMVIE Local alignment S LAVYT YIE FVRANAPATNIKSE C VRAAPIQN YRRVE H VRAT AKSE TQRYMVIE -------------NAPATNI KSE CVRA- PIQNYRRVE HVRA------------- Bodový diagram Grafická mapa podobností sekvencí, pomůcka pro volbu přiřazení ATTGATCGGTCÍ A# T • Ť •F G i C T » C • G • G • T • A# T.j»............. T • G • TG Filtrace krátkých Nalezené shody diagonál ATTGATCGGTCTTG ATTGATCGGTCTTG A# • A# T • • • T • • T • • T • % G • • G • C • • C • T • • T • C • • C • G • G • G • G • T • T A# • A# • T • • T # • T • • G • # G • • Výběr algoritmu přiřazení l\ \ 1 l\ \ 1 K 1 Globální přiřazení je možné jen pro dvojici A-B Prohledavače FASTA > Modelový heuristický algoritmus > Vytvořený v roce 1988 > Dnes už se málo používá, jsou výkonnější metody BLAST > Nej rozšířenější heuristický algoritmus > Vytvořený v roce 1990 > Rychlejší než FASTA asi 6x BLAST Basic Local Alignment Search Tool http://blast.ncbi.nlm.nih.gov/Blast.cgi Bjijc Local Alignment Search Tool • NC Ulf BLAST Ham* BLAST finds rsglons of similarity bitvmn biological xmqumnw DELTA-BLAST, a mora sensitive protein ptotetn searcti Mi BLAST Assembled RefSeq Genomes ChOOaO a woe* genome to ajaacfi or 11« all panomic BLAST dataoaw a Rat Basic BLAST I Choose a BLAST program id run i fr^aaaha o Bos fvmt i maraiiiiueaaai o Pallu* galhjt ■ ■ ' i • o Menem Your Recant nm.m ttm AHHKratrtiMlU- A rm* rumom* 51 AST peam m Men 04 Jun íOtľ 12 00 OO EST LiltaicQIMIriwn,,. nucl»otid« blan pnMetn bla» Search a nuci*otio« database using a nucleotide query Algonltmu blasln megablast discontiguous megarjiast Search protain database using a protein query Argonffim* Mastp pv blast phi bum drrubuu prote.n ■ •' ■ iJ-u-jiuj.itran»la«»dnueh»otioa íľ-iy II ruu nMllttM «1 aw •OTlMO" 01 • PWtlCUUl o«» Tento prohledávací nástroj prochází celou databází a už jsme jej několikrát použili BLAST I Basic BLAStI Choose a BLAST program to run. available. Mori, 04 Jun 2012 12:00:00 EST ß More BLAST news... nucleotide blast protein blast blastx tblastn tblastx Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontiguous megablast Search protein database using a protein query Algorithms: blastp, psi-blast, phi-blast, delta-blast Search protein database using a translated nucleotide query Search translated nucleotide database using a protein query Search translated nucleotide database using a translated nucleotide query Specialized BLAST Tip of the Day Use Genomic BLAST to see the genomic context If you are interested in the evolution of a particular gene or gene family it is often intetesting to examine the intro -exon structure even across species. |3 More tips... Choose a type of specialized search (or database name in parentheses.) Make specific primers with Primer-BLAST Search trace archives a Find conserved domains in your sequence (cds) n Find sequences with similar conserved domain architecture (cdart) n Search sequences that have gene expression profiles (GEO) □ Search immunoglobulins (IgBLAST) Q Search using SNP flanks n Screen sequence for vector contamination (vecscreen) n Align two (or more) sequences using BLAST (bl2seq) □ Search protein or nucleotide targets in PubChem BioAssay Využití variant BLAST Program Dotaz Databáze Úroveň blastp protein tblastn protein srovnaní protein protein protein protein DNA1 protein tblastx DNA DNA1 Použití Hledání edentických sekvencí DNA Hledání homologických proteinů Hledání genů a homologických proteinů na nové DNA Hledání genů u necharakterizovaných DNA protein Studium struktury genů * Jsou srovnávány přeložené DNA sekvence ve všech čtecích rámcích Datové soubory Jsou jednotné pro všechny zmíněné databáze > Každý záznam má přístupový kód - Accession Number - proměnlivý počet písmen a číslic podle toho, přes kterou databázi byl přijat-je to jakési rodné číslo > Publikací v GenBank získá jedinečné číslo Gl (GenBank Identifier) - číslo občanského průkazu > Autoři primárního záznamu jej mohou upravovat a vznikají tak verze, první má číslo 1 > Změnou verze se mění číslo Gl > Všechny verze se uchovávají Hlavička záznamů NCBI Resources Q How To © Nucleotide Nucleotide přístupový kód název Display Settirlps: Fl GenBank its Advanced Send to: © Mycobacterium aviur/ insertion element hot spot flanking region FR300 GenBank: AF3S9936 1 'FASTA Graphi) Go to: [v LOCUS AF369936 W 312 bp DNA linear BCT 27-MAY-2 '. 1 DEFINITION Mycobacterium avium insertion element hot spot flanking region FR300. ACCESSION AF369936 VERSION AF369936I1 1GI:14210032 typ záznamu F F h F verze číslo Gl gb = GenBank, emb = EMBL, dbj = DDBJ Někdy sekvenuje daný úsek nezávisle více různých skupin, pak je v databázi v několika podobách s různými přístupovými kódy a často i pod různými názvy! Anatomie databázového záznamu řj NCBI Resmrces @ How To © My NCBI Sign In Nucleotide Nucleotide Limits Advanced Help Display Settings: fcl GenBank Send to: fcl Mycobacterium avium insertion element hot spot flanking region FR300 GenBank: AF369936.1 FASTA Graphics Go to: R LOCUS defisitiom ACCESSION VESSI017 KEYWORDS SOURCE CEC-JlVISK REFEEE17CE AUTHORS TITLE REFEEE17CE AUTHORS TITLE ľOVS-VP.L FEATURES source AF3E5936 312 bp D1I?_ linear 3CT Z7-tUX-l-30l Mycobacterium avium inser-ion el^rsnt hot apot flariiir.g region r*3-::. AF-3€593« AF3E593S.1 GI:14210D82 Mycobacterium avium Mycobacterium avium Bacterid; Actlnobactexia; Actinabacteridae; Actitiomycetales; Coxynebacterineae; Mycobactexiaceae; Mycobacterium,- Mycobacterium ivim camplex {MAC) . 1 (bases 1 to 312) Bartoa,M. , Si-aatcva, P., Dvoraks,!., Weaton,R.I. and Pavlilr,I. Inaertiar. element IS^-D1 hot apot FR300 Unpublished 2 (bases 1 to 312) Bart-os,!!. , Svaatcva,P., Bvarska,L., Has-cr., R. T . and PavlLI:,I. I-lrect Submission Submit-ed l13-APR-20C1! Department cf Bacteriology, Veterinary Reaearch Inatitute, Hudcova 70, Brno £21 32r Czech republic Location/Qualifiers 1_.312 /organiair= "Mycobacteriun avium" /irjol_type="ger.c-ir.lc DMA" /db_xref="t axon:1764" 1_.312 /tiote="irjaertioT} element hot spct- flanking region FR30C," contair.a ho- spo- for IS90I insertion.1' Charge region shown Customize1 view Analyze this sequence H Run BLAST Pick Primers Highlight Sequence Features Find in this Sequence Related information Related Sequences Taxonomy e Recent activity h Turn Off Ctear [=] Mycobacterium avium insertion elernen! hol spot flanking region FR300 ^sieou^ FR3ÜD (2) |5 Neisseria gonorrhoeae strain PID2059 TraG3 (traG3). EppA (eppA): Ych1 (. 4u=iEoDd= Neisseria gonorrhoeae (22947) NuchwrJöe |~] ActincbacillLis pie uro pneumoniae in vivo induced promoter iviG; and CpsIB [c NusiBoUte 1 cagccagccg aatgtcatcc zgagg^agag aagccagaac agc:cgaaag=L cgc^ccacgc CI cgcracggrg ccggrgccga gcccgatgta gaggctgcgc tgrcgat-rca cgcggttgat 121 ctgr-tcttrg atgc-ggcgg gcacgatctt cattgg-ggc ttrctttcgg tggggcggcg 1S1 ccggagtggc gccg^.cgttg zgcccagtac aagcccggcc ggzggctacn gatzccaacc 241 acgr-ccggca cgc£ctaccc -gcacggcag ggggctgtcg aaagggttcg ccggtgaacg 301 tgt^gcgagt tg Anatomie databázového záznamu Mycobacterium avium FR300 Neisseria gonorrhoeae Program b!2seq Porovnání dvou a více sekvencí Specialized BLAST Choose a type of specialized search (or database name in parentheses.) □ Make specific primers with Primer-BLAST □ Search trace archives □ Find conserved domains in your sequence (cds) □ Find sequences with similar conserved domain architecture (cdart) □ Search sequences that have gene expression profiles (GEO) n Search immunoglobulins (IgBLAST) □ Search using SNPflanks □ Align two (or more) sequences using BLAST (bl2seq) ^^^r^e^rrcTHm>firTrH)Huj^ □ Search SRA transcript and genomic libraries n Constraint Based Protein Multiple Alignment Tool □ Needleman-Wunsch Global Sequence Alignment Tool □ Search RefSeqGene " Search WGS sequences grouped by organism BLAST is a registered trademark of the National Library of Medicine. Copyright I Disclaimer I Privacy I Accessibilr.v C-srlact I Send feedback Program bl2seq BLAST Home Recent Results Saved Strategies Help ► NCE .' ELA STl blasln suite ttHaBb Enter Query Sequence Enter accession number<5|, ni(st. or FflSTA 5eque.ncef.5l , Or. upload file Job Title Enter a oeECflptJve We Kir your BLAST seared y H Align two or more sequences S* Enter Subject Sequence Ent^er^cces^iorMHjrnbcrji^^ftST^^ejU^ I Prpchazet... I .. Or. upload file Program Selection Optimize for [ Prpchazet... ] * Align Sequences Nucleotide BLAST BLASTHproonwrn usrch nucleatMe subjects ualng a nuclsaUdB quwj. mora... gin Query subrange , From To Subject subrange fee From To V Highry similar sequences (megabit) 'J More dissimilar sequences {discontiguous rnegablastl Somewhat similar sequences (blasln) Choose a 3LAST algorithm St Search n..cleotide seque-oe using Megablast lOptimiie for highly sirrila- sequences) ^ ^haw riGUItt ki a nn window l±lAlgorithm parameters rrr Výsledek porovnání dvou sekvencí Basic Lacat Alignment Search Tooľ Home Recent Results- -Saved Strategies Help MlUBi; BLAST/blastn s u iŤ^2 sequencer Formatting Result* - YZXRUlflTCV11R E-iit s.r-z R^BLbrrit Save S^arsl- Strata ies > Formatting options Dviwrka-:: Nucleotide Sequence (774 letters} Blast 1 sequences QueiylD ld|31915 Description None Molecule type nucleic acid Query Length 7 74 dotaz Subject ID 31917 Description None Molecule type nucleic acid Subject Length 589 Program BLASTN 2,2,26+- >Citation Other rep ores: > Search Summary FT axon o my rep o its 1 QGraphic Summary C is Tibu: ion of 2EBastHits onI he Query Sequence !i' Mouse-over to show defline and scores, click to snow alignments Color key for alignment scares <40 40-50 50-30 33-200 >-200 FH Dot Matrix View / ň Descriptions ^:: -or r resources: [5 U n Gene Q GEO Q Cene S Structure Q Map Viewer Hl PubCŕiem BioAsssy Lducinfl significant alignments; Dot Matrix View Plot of Id |42899 vs 42901 oo -r--- o ■—i ud cd cd OO O i—i OJ co o i i i i i i i i i i i i i i i i i i i i I I I I i i i i I I I I i i i i i i i i i i i i 1111 i i i i i i i i i i i i | ICII428 99 11 ]0 1! 50 2( )0 z\ 50 3i ]0 z\ 50 4i ]0 41 JO 5( )o 5; 50 6( )0 50 71 30 71 1 51 Výsledek porovnání dvou sekvencí Q Descriptions Legend for links to other resources: E U n Gene Q GEO B Gene Structure □ Klap Viewer Si PiibCtam BnAssay Sequences pruducincj ai-rjnitiLdrtt ůli^tlrrtůnti; Aťľťť ptiúŕi Měk acurt; Tcrfcal acurĽ Query cuvĽrEqt! E veIuľ Mte idĽnt Linka :i;i: 1 1057 12" 0.0 100% ň Alignments LengthF6EŠ Sort alignments for this subject sequence bj: L vilue ?ccrť ?ercw.^ iicr.-ioy ^Ľery s-ar- nrsi-i-r. rtbisr- s-ar- nrsi-irr. &7 Mts (57E), Expect = CO IKies = 5B4/5?ř Gaps = 5trand=Plus/PLu= llllllll llllllllllllllll llllllllllllllllllllllllllllllll Sbjct 1 GTm;:^_^_T:^_:^:3ľAGTACr|rCfc.ASGSB^^ 6ŕ Query 51. CA^TCTCTACGCATTrc^iCCGCTAftffreS^^ LEC I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct SL CiQTCTTrím^TTTCiCmTACACGT^A^^ LÍŮ Que r 7 U.1 ACTCCCňGTCTKAAÄTGĽňGTTCCCAň^ LEP 1.65 bits 1 E&], Ejcpect = Ee-45 Identi.ti.e- = EŠ7E? (LOM). Gaps = 1VE5 I Dl) Strarjd=eius/ flus Querj e>i TCiGCAŕAGAAAGCŕAGC^TTm^3^^ 111111111111111111111111111111111111111111111111111111111111 Sbjct 5&Ů TCňGCAAAGAAAGÍlAAGÍľrTTCITCCrGCTACCGTTľľG^ Ouer; 71a BCCiSMTTG^ICIÍASÍGi^TCiAC 761 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct 65C BCCiSMTTG^ICIÍASÍGi^TCiAC 67 E Identities = frakce totožných pozic Výsledek porovnání dvou sekvencí Q Descriptions Legend for links to other resources: E U n Gene Q GEO B Gene Structure □ Klap Viewer Si PubCtam BnAssay Sequences pruducincj si-ůnificdrtt ůliůrtrrtůnfci; Aťľťť ptiúrl Měk acurt; Total acurĽ Query CĽVĽrSqt! E veIuľ Max idt±nt Linka 1 1057 1222 B7ifc 0.0 Q Alignments >Lľ.L|S1-Í" Lengtb=EEE Score = bits (57e], Ijcpect = c1 .D Identities = &e4r5£B Caps = £7551» (cl) St r amŕPluP Lu= Query l 'KľirrC'KľT;^ 60 i i I i i II i i I i i II i i I i i II i II i I i i II i i I i i II i II i i I i i II i II i i I i i I Sc-rt alignments fcr this subject sequence b;: L viLue Sccrť ľťrcer.o iier_-ioy Sľc-sv ľ ~ar~ -ppsi-jdt- Fnbi ec~ ľ - ar~ ■pdsí-íkt. Sbjírt ftuerf Sbjírt 6l Cř.Cř.T lľTrľTA"5 IZ^.TTTri1. ""5 IľT.1. ľä."5T5EäATT IľTA ""5 5t" ľ JľTAAAETAITT CľTA5 I i I i i I I i i I i i I i i I I i i I i i I I i I I i i I i i I I i i I i i I I I i i I i i I I i I I i i I i i I sl c^.TcirrACBC?.TTTrii ľľcscrACACEi:-::::-.tt" " irraaa sta rrr m.1. s Quer? LEI ACTCCCAGTCTGAAJVra^ LEP »1.55 bits (E&], Licpect = Ee-45 Identities = ES7E& ílľim], Saps = IVES í h] Strand=eLus/flus Querj 67 1 TCAjSCAAAGAAř.SCAABnTTCTTO i I i i I i I i i I i I I i I i I I i I i i I i I i i I i I i i I i I I i I i I I i I i i I i I i i I i I i i I i I I i 5bj ct TCAGCAAJ££AA£CAiGCTTTC^^ i I i i I i I i i I i I I i I i I I i I i i I i I i i I i Sbjct 65ľ SOĽMCÍTTG^TÍ^ASCai^ 67 E Score (zjištěná hodnota podobnosti) = pokud dosáhne zvolené mezní hodnoty (cutoff) program přiřazení zaznamená jako HSP [high scoring pairs, jinak je opustí Výsledek porovnání dvou sekvencí Q Descriptions Legend for links to other resources: LTJ U n Gene Q GEO B Gene Structure □ klap Viewer Si PubCuem BnAssay Sequences pruducinp äi-unificdnt ůliůnirtůnfci; Aťľťť ůtiůrl Měk acurt; Tutsi s cure Query ŕúvťrttíiť e value Max idĽnt Linka 1 1057 1222 B7ifc 0.0 lttiW Q Alignments >LľL|S19.1-Lengtb=eFjE Sc-rt alignments for this subject; sequence b;: L viLue ?ĽDľf ľťrcer.o iier_~ ioy ^Ľc-ry ľ~ar~ ri^bisr- ľ- ar~ ■p;;iľ-irr. Sccre = ld&7 bits (57e], Ijcpect = ľ.ŕ Identities = 5E4/59u (ŠSl), Caps = ŕp&šŕ (PI) 3t r andělu s/P Lu= Query l 'KľirrC'KľT;^ 6Ľ I I i I I ii I I i I I ii I I i I I ii I ii I i I I ii I I i I I ii I ii I I i I I ii I ii I I i I I i řbjrrt i "Sn1. r^.T 5A" 5T Ľ.STAIIATT™ HAA 555 S :TT5 "HľT "5 " ľTT " 5 ^TATT"'[ľT" 6u I I i I I i i I I i I I i I I i i I I i I I i i I i i I I i I I i i I I i I I i i i I I i I I i i I i i I I i I I i srbjtrt sl ^á^tctc^^ i;c Quer? Lil. ACTCCCMTCI^^TGÍ^S^^ LEP Score = L65 bits (E&], Licpect = Ee-Wentities = E&7E& ílc'lľ-t j, Saps = IVES í H] Strand=eLus/flus Querj 67 j TCÍÁKjAJ^^^ I i I I i I i I I i I i i I i I i i I i I I i I i I I i I i I I i I i i I i I i i I i I I i I i I I i I i I I i I i i I 5bj ct TCMCAAJÍAAASCAAKTTTCT^^ I i I I i I i I I i I i i I i I i i I i I I i I i I I i I Sbjrrt 65ľ SOĽMCÍTTG^TÍ^ASCai^ 67 E Expectancy, E-value (hodnota očekávatelnosti) = 8e-45 8 x 10"455 průkazné jsou hodnoty pod 0,001 Něco navíc k procvičení BLAST Prohledejte databázi a zjistěte, jakému organismu patří následující sekvence GCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAATGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCCCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGGCATGGCTGCGTCAGGGTTCCC CCCATTGCGCAATATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGACCGTGTCTCA GTTCCAGTGTGGCTGGTCATCCTCTCAGACCAGCTAGAGATCGCAGGCTTGGTAGG CCTTTACCCCACCAACTACCTAATCCCACTTGGGCTCATCTTATGGCAGGTGGCCC TAAGGTCCCACCCTTTCCTCCTCAGAGAATACGCGGTATTAGCTGCAGTTTCCCAC AGTTATCCCCCTCCATAAGCCAGATTCCCAAGCATTACTCACCCGTCCGCCACTCG TCAGCAAAGAAAGCAAGCTTTCTTCCTGCTACCGTTCGACTTGCATGTGTTAAGCC TGCCGCCAGCGTTCAATCTGAGCCAGGATCAACNTCTTTCTCCAAA Měla by to být Pasteurella multocida Porovnejte tyto dvě sekvence, patří stejnému druhu? GCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAATGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCCCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGG GCTTTCGCGCATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAAAGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCGCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGG ANO, shoda 368/371, 99% Tímto jsme se bavili ve 3. ročníku v praktických cvičeních dost a dost Mnohočetné přiřazení Multiple alignment > Jedním z příkladů využití je porovnávání více sekvencí současně CLUSTAL > CLUSTAL W = všeobecně dostupný > CLUSTAL X = CLUSTAL W opatřený grafickým rozhraním pro Windows > CLUSTAL OMEGA = poslední verze http://www.clustal.org Shrnutí 1) Práce se sekvenčními daty 2) Základní veřejně dostupné databáze 3) Práce se stránkami NCBI 4) Jak se posuzuje podobnost sekvencí 5) Prohledavač BLAST, BLAST2 6) Mnohočetné přiřazení - program CLUSTAL