Využití internetových zdrojů při studiu mikroorganismů doc. RNDr. Milan Bartoš, Ph.D. Bartoš.Milan@atlas.cz Přírodovědecká fakulta MU, 2017 Obsah přednášky 1) Práce se sekvenčními daty Základní veřejně dostupné databáze Práce se stránkami NCBI Jak se posuzuje podobnost sekvencí Prohledavač BLAST, BLAST2 Mnohočetné přiřazení - program CLUSTAL 4ft Doporučená literatura Cvrčkova F. (2006): Úvod do praktické bioinformatiky, Academia Praha http://www.ncbi.nlm.nih.gov/ Práce se sekvenčními daty Sekvenční data = zápis primární sekvence makromolekul, tj. DNA (RNA) a proteinů > DNA a RNA se zapisují ve směru 5'- 3' > Proteiny se zapisují od N-konce k C-konci > Používají se jednopísmenkové kódy (podle IUPAC) Zkratky pro nukleové kyseliny DNA, RNA Kód Báze Kód Báze A Adenin K G, T (keto) C Cytosin A, C (amino) G Guanin B C, G, T (ne A) Tymin A, G, T (ne C) U Uracil H A, C, T (ne G) R A, G (purin) A, C, G (ne T, U) Y C, T (pyrimidin) N cokoli (any) S G, C (strong) ■ mezera W A, T (weak) - Zkratky pro proteiny Kód Zkratka Amino kyselina Kód Zkratka Amino kyselina A Ala Alanin P Pro Prolin C Cys Cystein Q Gin Glutamin D Asp Aspartat R Arg Arginin E Glu Glutamat S Ser Serin F Phe Fenylalanin T Thr Threonin G Gly Glycin V Val Valin H His Histidin W Trp Tryptofan 1 lie Izoleucin Y Tyr Tyrosin K Lys Lys in X Xxx cokoli L Leu Leucin B Asx Asp, Asn M Met Methionin Z Glx Glp, Gin N Asn Asparagin Způsoby zápisu Surová data (raw data, raw formát) > Některé programy je umí přijmout a zpracovat > Nejsou ale vhodné pro dlouhodobé uchovávání Specializované formáty > Základní veřejné databáze je umí převádět Jednoduché formáty - FASTA > Nejlépe bez mezer a speciálních znaků >gi|291219937|ref|NM_001888.3| Horno sapiens crystallin, mu (CRYM), transcript variant 1, mRNA TTTCAAATGGGGAGTTTCCCTGCACAAGCTTTCTTGTCTGCCACTATGTGAGATATACCTT TCACCTTCTGCCGTGATTGTGAGGCCTCCTCAGCCACGTGGAACTGTAAAAACTCCTGGAA GAAAAGATCCTGCAATTT FASTA a WORD Na co si dát pozor > Uložit ve formátu „pouze text" > Nepoužívat tabelátory a jiné cizí znaky > Vypnout funkce „automatické opravy" a „automatický text" i funkce „inteligentní vyjímání a vkládání" Typ písma Doporučuji formát pisrna „Courier New — každé pisměno zaujimá stejnou plochu Courier New 24 TTTCAAATGGGGAGTTTCCCTGCACAAGCTTTCTT AAAGT T TAC C C C TCAAAGGGAC GT GT T C GAAAGAA Arial 24 TTTC A A ATG G G G AGTTTCCCTG C AC A AG CTTTCTT AAAGTTTACCCCTCAAAGGGACGTGTTCGAAAGAA Pozor, zkratky pro NA a proteiny jsou ^ v některých případech shodné! Vstupní formáty pro počítačové zpracování musí být specifikovány, aby program rozpoznal, jde-li o NA nebo protein j Molekulárně-biologické databáze Evropský institut pro bioinformatiku ve Velké Británii (EBI) EMBL, 1980 www.ebi.ac.uk Národní centrum pro biotechnologické informace (NCBI) založené v rámci Národní lékařské knihovny (NLM) v USA GenBank, 1982 www.ncbi.nlm.nih.qov1 Centrum pro inormační biologii (CIB) Jako oddělení Národního genetického institutu (NIG) v Japonsku DDBJ, 1984 www.cib.niq.ac.jp GenBank/EMBL/DDBJ > Vzájemně si vyměňují si informace > Volně dostupné > Přijímají nové sekvence z genomových center a pracovišť zabývajících se sekvenováním Sekvenci v databázích může zveřejnit kdokoli! Databáze sekvencí proteinů Databáze SWISS-PROT založená na Univerzitě v Ženevě v roce 1986 Spravuje Švýcarský institut pro bioinformatiku (SIB) www.expasv.org Obsahuje automaticky doplňované překlady sekvencí z EMBL Databáze PDB (The Protein Databank) Archivuje a analyzuje proteinové struktury a komplexy informačních biomakromolekul http://www.rcsb.orq/pdb/home/home.do Práce s databází NCBI www.ncbi.nlm.nih.gov NCBI Resources Q How To Q My NCBI Sign In %NCBl National Center for Biotechnology Information I All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Set Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit data to Gen Bank or other NCBI databases Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem Genomic Structural Variation 1 I « I - - - dbVar archives large scale genomic variation data and associates defined variants with phenotypic information. ľ * J ft' 1 . ■ \ • ■ i 1 r m -*. II 12345678 NCBI Announcements New Microbial BLAST Page 12 Jun2D12 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, including auto-complete Siqn up for the Fall Discovers Workshops! Práce s databází NCBI s sources ow To (v) My NCBI Sign In PjNCBI National Center for Biotechnology Information | All Databases NCBI Harne Resource List (A-Zi All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Geres & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation All Resources All Databases Downloads Submissions Tools How To Tools 1000 Genomes Browser An interactive graphical viewer that allows users to explore variant calls, genotype calls and supporting evidence (such as aligned sequence reads) that have been produced by the 1QQQ Genomes Project. ASN 1 Format Summary An International Standards Organization (ISO) data representation Tormat used to achieve interoperability between platforms. For data specifications and conversion tools, see NCBI Data Specification below. Amino Acid Explorer This tool allows users to explore the characteristics of amino acids by comparing their structural and chemical properties, predicting protein sequence changes caused by mutations, viewing common substitutions, and browsing the functions of given residues in conserved domains. Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram. BLAST Link (BLink) A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other Práce s databází NCBI bené položky Náitroje Nápověda j) Identity Safe - Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation aligned sequence reads) that have been produced by the 1QQQ Genomes Project. ASN.1 Format Summary An International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. For data specifications and conversion tools, see NCBI Data Specification below. Amino Acid Explorer This tool allows users to explore the characteristics of amino acids by comparing their structural and chemical properties, predicting protein sequence changes caused by mutations, viewing common substitutions, and browsing the functions of given residues in conserved domains. Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram. BLAST Link (BLink) A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other protein sequences at NCBI. BLAST Microbial Genomes Performs a BLAST search for similar sequences from selected complete eukaryotic and prokaryotic genomes. BLAST RefSeqGene Performs a BLAST search of the genomic sequences in the RefSeqGene/LRG set. The default display provides ready navigation to review alignments in the Graphics display. BLAST Tutorials and Guides This page links to a number of BLAST-related tutorials and guides, including a selection guide for BLAST algorithms, descriptions of BLAST output formats, explanations of the parameters for stand-alone BLAST, directions for setting up standalone BLAST on local machines and using the BLAST URL API. Práce s databází NCBI BLAST Home Basic Local Alignment Search Tool ► NCBI/ BLAST/ blastn suite blastn blastp blasts tb las tri BLAST microbial genomes My NCBI rSion lul fRemsterl 1 Enter Query Sequence BLASTN programs search nucleotide databases using a nucleotide query, more.,. Enter accession number(s), gi(s). or FASTA sequencers) ^ Clear Query subrange y From To Or, upload file Job Title Enter a descriptive title for your BLAST search 4$) Choose Search Set Procházet... Reset page Bookmark ■ U Database ^ Complete genomes © Draft genomes Genomes: 2096 rga m sm Ente r o rga n i sm na me or id-co it p 1 eti o ns wi 11 be suggested Optional Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown, yj O Exclude + Entrez Query Optional Ä _ _ , Enter an Entrez query to limit search $£> Program Selection rrr Dostali jste se na prohledavač BLAST > Další zajímavé „ Tools Vyhledávání STS íí This interactive tool allows users to build E-utility URLs, either from a form or by hand, and then view their raw output. The tool provides a simple environment for testing E-utility URLs betöre including them in applications. E-Utilities Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. Ebot A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline. Electronic PGR (e-PCR) A computational procedure that is used to identify sequence tagged sites (STSs) within DMA sequences. e-PCR looks tor potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs. Frequency-weighted Link (FLink) FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics. Gene Expression Omnibus (GEO) BLAST Tool tor aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database. Gene Plot A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes. Genetic Codes Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonomic tree. Genome BLAST_|_1 Další zajímavé „ Tools" Srovnání dvou prokaryotických genomů This interactive tool allows users to build E-utility URLs, either from a form or by hand, and then view their raw output. The tool provides a simple environment for testing E-utility URLs betöre including them in applications. E-Utilities Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. Ebot A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline. Electronic PCR fe-PCR) A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks tor potential STSs in DNA sequences by searching for subsequences that closely match the PCR primers and have the correct order, orientation, and spacing that could represent the PCR primers used to generate known STSs. Frequency-weighted Link(FLink) FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics. Gene Expression Omnibus (GEO) BLAST Tool tor aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms Gene Plot A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes. Genetic Codes Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonomie tree. Genome BLAST_ Další zajímavé „ Tools" Tabulky genetických kódu This interactive tool allows users to build E-utility URLs, either from a form or by hand, and then view their raw output. The tool provides a simple environment for testing E-utility URLs betöre including them in applications. E-Utilities Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. E bot A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline. Electronic PCR ťe-PCR) A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PCR primers and have the correct order, orientation, and spacing that could represent the PCR primers used to generate known STSs. Frequency-weighted Link(FLink) FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics. Gene Expression Omnibus (GEO) BLAST Tool tor aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database. Gene Plot A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes. Genetic Codes Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonomie tree Genome BLAS Další zajímavé „ Tools" Navrhování primerů pro PCR PSSM Viewer Allows users to display, sort subset and download position-specific score matrices (PSSMs) either from CDD records or from Position Specific Iterated (PSI)-BLAST protein searches. The tool also can align a query protein to the PSSM and highlight positions of high conservation. Phenotype-Genotype Integrator(PheGenl) Supports finding human phenotype/genotype relationships with queries by phenotype, chromosome location, gene, and SKIP identifiers. Currently includes information from dbGaP, the NHGRI GWAS Catalog, and GTeX. Displays results on the genome, on sequence, or in tables for download. / ^ Primer-BLAST The Primer-BLAST tool uses Prirner3 to design PCR primers to a sequence template. The potential products are then automatically analyzed with a BLAST search against user specified databases, to check the specificity to the target intended. L " - ProSplign A utility for computing alignment of proteins to genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, ProSplign is accurate in determining splice sites and tolerant to sequencing errors. PubChem Power User Gateway (PUG) PUG provides access to PubChem services via a programmatic interface. PUG allows users to download data, initiate chemical structure searches, standardize chemical structures and interact with the E-utilities. PUG can be accessed using either standard URLs or via SOAP. PubChem Standardization Service Standardization, in PubChem terminology, is the processing of chemical structures in the same way used to create PubChem Compound records from contributors' original structures. This service lets users see how PubChem would handle any structure they would like to submit. PubChem Structure Search PubChem Structure Search allows the PubChem Compound Database to be queried by chemical structure or chemical c?tn ir-ti ir^ n^ttarn_Tha DI Ihrn glyjitj-h^r Tillnit/r- -3 nucni +n hjj Hraiun m-nn.I Illu_I Iranp mill -nlr-n r-nju-ift; th^ c?tn ij^nml mioni_ Primer-BLAST <^ Primer-BLAST ► N I Bl/ Primer-BLAST: Finding primers specific to your PCR template (using Primer3 and BLAST), mere. . Tips iorindinc specific primers pQpj Tgrnplate Reset page Save search parameters Retrieve recent results Enter accession, gi, or FASTA sequence (A refseq record is preferred) ^ Clear Range Use my own forward primer (5'->3' on plus strand) Use my own reverse primer (5'->3' on minus strand) [ PCR product size # of primers to return Min 70 5 Min From Forward primer To # Clear Reverse primer Or, upload FASTA file Procházet... Primer Parameters Clear Clear Max 1000 Opt 60.0 Max 63.0 Max Tm difference 3 W> Primer melting temperatures 57 rj ' tzxon/intron selection A refseq mRNA sequence as PCR template input is required for options in the section ^ Exon junction span Exon junction match 7~~ 4~ No preference Exon at 5' side Exon at 3' side B Prohlédněme si tuto stránku podrobně Navrhněte primery pro identifikaci genu pro 16S rRNA Borrelia burgdorferi metodou PCR > Do zadávacího okénka pro sekvenci zadejte Acc. No. sekvence pro 16S rRNA, např. HQ433693.1 > Využijte DEFA ULT nastavení nebo měňte parametry podle vlastního uvážení Ukázka výsledku Primer-BLAST > NCBIř Primer-BLAST : results: Job id=JSID 01 366935 130.14.18.123 9002 more.. Input PGR template HQ433693,1 Borrelia burgdorferi strain QSYEP3 16S ribosomal RNA gene, partial sequence Range 1-481 Specificity of primers primers may not be specific to the input PGR template as targets were found in selected database:All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) ...help on specific primers Other reports ^Search Summary r Summary of primer pairs JO 1G0 270 360 ■ICO Detailed primer reports Primer pair 1 Seq ue n ce (5' ->0') Tern p late stra n d Le n gth Sta rt Sto p Tm GC% Selfcomplem entarity Self 3' com p lem e ntarity Forward primerGCGAAAGCCTGACGGAGCGA Plus 20 322 341 59.77 65.00 3.00 0.00 Ukázka výsledku ▼ Detailed primer reports Primer pair 1 Seq ue n ce [5' ->3") Tern p late stra n d Le n gth Sta rt Sto p Tm GC% Se If co m p le m entarity Self 3" com p lem e ntarity Forward primer GCGAAAGCCTGACGGAGCGA Plus 20 322 341 59.7765.00 3.00 0.00 Reverse primer ATTACCGCGGCTGCTGGCAC Minus 20 478 459 60.39 65.00 6.00 2 00 Product length 157 Products on intended target >HQ433693.1 Borrelia buradorferi strain QSYSP3 16S ribosomal RNA aene. partial sequence product length = 157 Forward primer 1 GCGAAAGCCTGACGGAGCGA 20 Template 322 .................... 341 Reverse primer 1 ATTACCGCGGCTGCTGGCAC 20 Template 478 .................... 459 Products on potentially unintended templates >E_ 1355Sc " Borrelia valaisiana strain QSVS~3 16S r aosomal RNA ciene. parial sequence product length = 157 Forward primer 1 GCGAAAGCCTGACGGAGCGA 20 Template 350 .................... 369 Vyhledejte sekvenci HQ433693.1 (16S rRNA Borrelia burgdorferi) a vyznačte na ní pozici nalezených primerů 1) Do vyhledávače BLAST zadejte „Borrelia burgdorferi 16S" 2) Najděte sekvenci HQ433693.1 3) Můžete do vyhledávače zadat taky přímo Acc. No. Výsledek AGCATGCAAGTCAAACGGGATGTAGCAATACATCTAGTGGCGAAC GGGTGAGTAACGCGTGGATGATCTACCTATGAGATGGGGATAACT AT TAGAAAT AG TAGC TAAT AC C GAATAAAG T CAAT TAAT T T GT TA AT TGATGAAAGGAAGC C T T TAAAGC T T C GC TT GTAGAT GAGT C T G CGTCTTATTAGTTAGTTGGTAGGGTAAATGCCTACCAAGGCGATG ATAAGTAACCGGCCTGAGAGGGTGAACGGTCACACTGGAACTGAG ACAC GGTC CAGAC TC C TAC GGGAGGCAGCAGC TAAGAAT C T T C C G CAATGGGC GAAAGC C TGAC GGAGC GACAC TGC GT GAATGAAGAAG GTCGAAAGATTGTAAAATTCTTTTATAAATGAGGAATAAGCTTTG TAGGAAAT GAC AAAGT GAT GAC GT TAAT T TAT GAATAAGC C C C G G C TAAT TAC GTGC C AGC AGC C GC GGTAAT AC G Forward 322-341 5'- GCGAAAGCCTGACGGAGCGA - 3' Reverse 478-459 5'- ATTACCGCGGCTGCTGGCAC - 3' Další zajímavé „ Tools it Taxonomie a ULiniy mi coiiiuuuny cuiMA-uxjeiioiuic s-equeiice alignments, il is uaseu un a vaiiauun 01 we NtiKui«rniiii-vvunsun giuuai- alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm Splign is accurate in determining splice sites and tolerant to sequencing errors. TaxPlot A tool for comparing genomes on the basis of the protein sequences they encode. To use TaxPlot one selects a reference genome and two species for comparison. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome: based on the best alignment with proteins in each of the two genomes being compared. Taxonomy Browser Supports searching the taxonomy tree using partial taxonomie names, common names, wild cards and phonetically similar names. For each taxonomie node, the tool provides links to all data in Entrez for that node, displays the lineage, and provides links to external sites related to the node. Taxonomy Common Tree Generates a taxonomie tree for a selected group of organisms. Users can upload a tile of taxonomy IDs or names, or they can enter names or IDs directly. Taxonomy Statistics Displays the number of taxonomie nodes in the database for a given rank and date of inclusion. Taxonomy Status Reports Displays the current status of a set of taxonomie nodes or IDs. Variation Reporter A tool designed to search for and report human sequence variation data from dbSNP and dbVar. Individual variations or batch files can be submitted in HGVS, GVF or BED formats. Related information will be retrieved and reported in a downloadable table containing variation identifiers, nucleotide and cytogenetic band locations on various genomic assemblies, allele type and minor allele frequencies, predicted functional consequences (missense, nonsense, frameshift, splice site, etc.), reported clinical significance, and relevant citations. VecScreen A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen searches a Kolik záznamů o sekvencích DNA a kolik záznamů o sekvencích proteinů je v databázi ohledně druhu Thermus aquaticus? Ke konci června 2012 to bylo 338 záznamů o DNA a 562 (5 641) záznamů o proteinech Práce s databází NCBI www.ncbi.nlm.nih.gov NCBI Resources Q How To Q My NCBI Sign In %NCBl National Center for Biotechnology Information I All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA&RNA Domains & Structures Geres & Expression ^^^^^ Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Get Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit data to Gen Bank or other NCBI databases Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem Genomic Structural Variation 1 I « I - - - dbVar archives large scale genomic variation data and associates defined variants with phenotypic information. ľ * J ft' 1 . ■ \ • ■ i 1 r m -*. II 12345678 NCBI Announcements New Microbial BLAST Page 12 Jun2D12 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, including auto-complete Siqn up for the Fall Discovery Workshops! Práce s databází NCBI www.ncbi.nlm.nih.gov NCBI Resources Q How To Q My NCBI Sign In %NCBl National Center for Biotechnology Information I All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA&RNA Domains & Structures Genes & Expression Genetics & Medicine ^^^^m Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Get Started ■ Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit data to Gen Bank or other NCBI databases Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem Genomic Structural Variation 1 I « I - - - dbVar archives large scale genomic variation data and associates defined variants with phenotypic information. ľ * J ft' 1 . ■ \ • ■ i 1 r m -*. II 12345678 NCBI Announcements New Microbial BLAST Page 12 Jun2D12 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, including auto-complete Sign up for the Fall Discovery Workshops! Jak s nástroji pracovat řj NC B J Resources© How To Q MyNCBI Sign In rJNCBI National Center for Bioteehr o logy Information All Databases | - | Search NCBI Home Resource List (A-Zi All Resources All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation All Databases Downloads Submissions Tools How To How To Find bioassays in which a given drug is active Find bioassays that test a particular disease or protein target Submit data to NCBI Save text searches and set up automated searches with E-mail Download NCBI Software uvidíme později ■ Retrieve all sequences for an organism ortaxon ■ Find the function of a gene or gene product ■ Find expression patterns ■ Find genes associated with a phenotvpe or disease * Compare protein homologs between two microbial genomes ■ View/download features around an object or between two objects on a chromosome ■ Find sequenced genomes, including those in progress. Tor a taxonomic group ■ Download trie complete genome for an organism ■ Display genomic annotation graphically ■ Submit sequence data to NCBI ■ Convert feature coordinates between genomic assemblies ■ Determine conserved synteny between the genomes of two organisms ■ Find a homolog for a gene in another organism ■ Obtain the full text of an article Porovnání proteinů u dvou genomů NCBl Resources @ Haw To 0 MyNCBI Sign In %NCBI All Databases p] | 1 Search 1 National Center for Biotechnology Information NCBl Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: Compare protein homologs between two microbial genomes Starting with the Prokaryotic Genome Project homepage. FOR TWO ORGANISMS 1. Scroll down to find the genome of interest. 2 Click the NC_ accession link from the RefSeq column. 3. Click GenePlot (if available) from the BLAST homologs column of the resulting table interface. 4. Select the two organisms of choice and then click "Compare Selected Pair". FOR THREE ORGANISMS 1. Proceed as in Steps 1 and 2 above. 2. Select Tax Plot from the BLAST homologs column of the resulting table interface. 3. Select two other organisms from the drop-down menus below the selected genome of interest. 4. Click the "compare" button located just below the graphical plot. Návod FOR TWO ORGANISMS 1) Scroll down to find the genome of interest. 2) Click the NC_ accession link from the RefSeq column. 3) Click GenePlot (if available) from the BLAST homologs column of the resulting table interface. 4) Select the two organisms of choice and then click "Compare Selected Pair". FOR THREE ORGANISMS 1) Proceed as in Steps 1 and 2 above. 2) Select TaxPlot from the BLAST homologs column of the resulting table interface. 3) Select two other organisms from the drop-down menus below the selected genome of interest. 4) Click the "compare" button located just below the graphical plot. Jak s nástroji pracovat ► Download the complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between the genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article Find articles about a topic similar to that in a given article View the 3D structure of a protein Find a curated version of a sequence record (NCBI Reference Sequence) Align two or more 3D structures to a given structure Find published information on a gene or sequence Find transcript sequences for a gene Link from an object on a map to another resource Design PCR primers and check them Tor specificity Automate BLAST searches performed on NCBI servers Obtain genomic sequence for/near a gene, marker, transcript or protein Compare your sequence to the RefSeqGene/LRG standard Run BLAST software on a local computer Submit multiple query sequences in a single BLAST search Find the complete taxonomic lineage for an organism Generate a Common Tree for a set of taxa Complete an NCBI tutorial Find out what's new at NCBI Learn about an NCBI resource Learn about the basics of molecular biology and bio informatics Download a large, custom set of records from NCBI Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated with a gene View genotype frequency data for a gene, disease or short genetic variation Databáze PubMed ■3 NCBI Resources 0 How To Q I All Databases [^~| My NCBI Sign In National Center for Biotechnology Information NCBI Home Resource List (A-Z) All Resources Chemicals & Bioass^^ Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: Obtain the full text of an article Please note that there is a tu:orial about this. Starting with an abstract in PubMed... 1. Searcti the F Lib Med with a search term, author name, or PubMed ID. Author name can be entered as follows: smith aj[au]. 2. Click on the title oT an entry of interest. 3. Look to r ico ns i n th e u p pe r-rig ti t-h a nd corn e r of th e record: ■ Click on the PubMed Central link or a Publisher's link to access the full text of the article. Articles in PubMed Central are freely available. Articles on Publisher's websites are either freely available or can be accessed with a fee. Contact the specific publisher for questions about their site. ■ For PubMed records with no icons in the upper-right-hand comer, Loansome Doc can be accessed to order the article following these directions: PubMed Help. Databáze PubMed NCB] Resources ® How To © My NCBI Sign In PublíZjcd PubMed US National Library of Medicine National InsfiluteE of Health Advanced Help Using PubMed PubMed Quick Start Guide Full Text Articles PubMed FAQs PubMed Tutorials New and Noteworthy □ PubMed PubMed comprises more than 21 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites PubMed Tools PubMed Mobile Single Citation Matcher Batch Citation Matcher Clinical Queries More Resources MeSH Database Journals in NCBI Databases Clinical Trials E-Utilities Topic-Specific Queries LinkQjt Najděte publikace o Deinococcus radiodurans Kolik review databáze obsahuje? 1) Ke konci června 2012 jich bylo kolem 962 2) Z toho review bylo 52 3) Všimněte si, že jen některé jsou volně dostupné Jak s nástroji pracovat Download the complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between the genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article Find articles about a topic similar to that in a given article View the 3D structure of a protein Find a curated version of a sequence record (NCBI Reference Sequence) Align two or more 3D structures to a given structure Find published information on a gene or sequence Find transcript sequences for a gene Link from an object on a map to another resource Design PCR primers and check them Tor specificity Automate BLAST searches performed on NCBI servers Obtain genomic sequence for/near a gene, marker, transcript or protein Compare your sequence to the RefSeqGene/LRG standard Run BLAST software on a local computer Submit multiple query sequences in a single BLAST search Find the complete taxonomic lineage for an organism Generate a Common Tree for a set of taxa Complete an NCBI tutorial Find out what's new at NCBI Learn about an NCBI resource Learn about the basics of molecular biology and bio informatics Download a large, custom set of records from NCBI Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated with a gene View genotype frequency data for a gene, disease or short genetic variation 3D struktury proteinů Resources @ How To 0 MyNCBI Sign In %NCBl National Center for Biotechnology Information All Databases Search NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: View the 3D structure of a protein Starting with... AMUU CODE (e.g. !BBG] 1. Go to the Structure Home Page. 'i. tnter tne KUts cooe in tne searcn dox ana press tne i^o Dutton 3. Click the structure image, and on the resulting page click the "Structure View in Cn3D" button. A PDB-FORMAT FILE THAT IS NOT IN PDB 1. Go to the VAST search page. 2. Enter or browse for the PDB file name and click the Submit button. 3. Click the "View 3D Structure" button on the next page. A PROTEIN ACCESSION NUMBER (e.g. NP_000240) OR SEQUENCE 1. Use the Finding a Structural Template guide to find the mostapproplate PDB structure. 2. Continue with step 1 under "a PDB code" above. 3D struktury proteinů Resources© HowToQ My NCBI Sign In Structure Structure Three dimensional structures provide a wealth of information on the biological function and the evolutionary history of macromolecules They can be used to examine sequence-structure-function relationships, interactions, active sites, and more. Using Structure Search How to fQuick Start! Guides Help Mews FTP Publications Structure Tools Macro molecular Resources Overview C BLAST Cn3D IBIS VAST More Resources PDB Protein ODD PubChem NCBI Structure Group Resources & Research Discover Najděte strukturu mykobakteriální katalázy Kolik záznamů najdete? 1) Heslo „catalase Mycobacterium" 2) Ke konci června 2012 jich bylo 46, všechny získané z krystalografických dat prostřednictvím paprsků X, žádná NMR Jak s nástroji pracovat ► Download the complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI Convert feature coordinates between genomic assemblies Determine conserved synteny between the genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article Find articles about a topic similar to that in a given article View the 3D structure of a protein Find a curated version of a sequence record (NCBI Reference Sequence) Align two or more 3D structures to a given structure Find published information on a gene or sequence Find transcript sequences for a gene Link from an object on a map to another resource Design PCR primers and check them Tor specificity Automate BLAST searches performed on NCBI servers Obtain genomic sequence for/near a gene, marker, transcript or protein Compare your sequence to the RefSeqGene/LRG standard Run BLAST software on a local computer Submit multiple query sequences in a single BLAST search Find the complete taxonomic lineage for an organism Generate a Common Tree for a set of taxa Complete an NCBI tutorial Find out what's new at NCBI Learn about an NCBI resource Learn about the basics of molecular biology and bio informatics Download a large, custom set of records from NCBI Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated with a gene View genotype frequency data for a gene, disease or short genetic variation Srovnání sekvence s referenčními řj NCBl Resources @ HowToQ MyNCBI Sign In ' ÍNCBI National Center for Biotechnology Information All Databases t Search NCBl Home How to: Compare your sequence to the Ref SeqGene/LRG standard 1 " Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA&RNA Domains & Strjctjres Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation ^t.Trtinfi with ji ijBniiBnPB nr tpnucnra; 1. From the ReTSeqGene homepage, click on RefSeqGene BLAST in the Tools section. ? Ruhrrit voiir mien,- senuenm or miiltinle sequences 3. Review the results as aligned to the RefSeqGene records by clicking on the Graphics in the Descriptions table. 4. If you submitted more than one query sequence and would like to review the alignment of a particular sequence, click on "Configure", select your chosen alignment and remove the check box in front of the alignments you don't want displayed. Then click on "Configure" at the bottom of the page to apply your revised selections. 5. If you identify any differences between your sequence and the RefSeqGene, you can evaluate whether others have reported sequence variation in that region by reviewing the variation annotated on the RefSeqGene. Srovnání sekvence s referenčními BLAST Recent Results Saved Strategies Basic Local Alignment Search Tool Help ► NCBI/ BLAST/ blastn suite blastn RefSeqGene Nucleotide BLAST Enter Query Sequence Search RefSeqGene using a nucleotide query, more.. R&s&t page Bookmark Enter accession numbers), gi(s), or FA STA sequence(s) y Clear Query subrange H< From To Or, upload file Job Title Enter a descriptive title for your BLAST search □ Align two or more sequences y Choose Search Set Procházet.. Database 1 Reference genomic sequences (refseq genomic) I ^Ul II4III Optional Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown, y I—I Exclude ^ Exclude Optional Entrez Query O Models (XM/XP) □ Uncultured/environmental sample sequences Zkopírujte si níže uvedenou sekvenci a porovnejte ji s databází referenčních sekvencí. Komu patří? 1) ATGAGTGAAATGAAATGCCCTTATGACCATACCAACTTGACCATGAGTAATGGCGCGCCTGTTATTGACA 2) ACCAAAATTCAATGACCGCAGGTGCCAGAGGGCCACTGCTTGCCCAAGATTTATGGCTCAATGAAAAATT 3) AGCCGACTTTGCCCGTGAGGTCATTCCAGAACGCCGCATGCACGCCAAAGGCTCAGGCGCATTTGGCACA 4) TTCACGGTAACGCACGACATCACCCAATACACCCGTGCTAAGATTTTTAGTGAAGTTGGCAAAAAAACTG 5) AGATGTTCGCTCGTTTTACCACCGTAGCAGGCGAGCGGGGGGCGGCGGACGCTGAGCGTGATATCCGTGG 6) TTTTGCCCTAAAATTCTACACCGAAGAGGGTAATTGGGACATGGTGGGTAATAACACGCCTGTTTTCTTT 7) TTAAGAGACCCAAAAAAATTCCCTGATTTAAATAAAGCGGTCAAACGAGACCCACGCACCAACATGCGTT 8) CTGCCACCAATAACTGGGATTTTTGGACACTGCTGCCAGAGGCGTTTCATCAGGTGACCATTGTGATGAG 9) CGACCGTGGCATTCCTAAATCTTACCGTCATATGCACGGCTTTGGCTCGCACACTTATAGCTTTATCAAT 10) GCTGATAATGAACGCTTTTGGGTCAAATTTCACTTTCGCACCCAACAAGGCATTGAAAATCTAACCGATG 11) CCG AAGCTG AAATGGTGGTTGGTAAAG ACCGTG AG AGCAATCAGCGTG ATTTGTTTG ATGCCATTG AGCG 12) TGGCGATTTCCCAAAATGGACAATGTATGTGCAAATCATGCCAGAAACCGATGCCCAAACTGTGCCTTAT 13) CACCCATTTGATTTAACCAAAGTGTGGCCAAAAGGCGACTATCCGCTCATTGAAGTGGGTGAGTTTGAGT 14) TAAATAAAAATCCTGAAAACTTCTTTTTAGACGTTGAACAATCCGCTTTTGCCCCAAGCAACCTAGTCCC 15) GGGCATCAGTGTGTCCCCTGACCGCATGCTCCAAGCACGCCTATTTAACTATGCTGATGCGCAGCGTTAT 16) CGTTTGGGCGTCAATCGTAACCAAATTCCAGTGAATGCCCCACGCTGTCCTGTGTACTCAAACCAAAGAG 17) ACGGACAAGGGCGAGTGGGCGATAACTATGGCGGTCGTCCGCACTATGAACCGAACAGTTTTGGACAATG 18) GCAAGACCAGCCGCATTTGGCTGAACCAGCATTAAAAATTCATGGCGATGCTAAGTTTTGGGATTATCGT 19) GAGAATGATGATGATTATTTTAGCCAACCCAGAGCCTTGTTTGAGTTGATGAGCGATGAGCAAAAACAGG 20) CGTTATTTGGTAATACGGCTCGTGCGATGGGCGATGCCCCTGATTTTATTAAATACCGCCATATCCGTAA 21) TTGCGATAAATGCCACCCTGATTATGCCATGGGTGTGGCCAAAGCGTTAGGCCTTACGGTTGAAGATGCC 22) AAAAATGCGTATGAGAGCGACCCTGCTCGCCATCTGCCCAGCTTTTTATA Mohlo by vám vyjít to, co je na následující stránce Distribution of 5 Blast Hits on the Query Sequence & Mouse over to see the detline, click to show alignments Color key for alignment scores Tt 40-50 Query I 80-200 >=200 300 GOO I JOO I 1200 I 1500 Q Descriptions Legend for links to other resources: \U UniGene Ld GEO Gene U Structure LJ Map Viewer HM PubChem BioAssay □ □ l Sequences producing significant alignments: Accession Description Max score Total score Ouerv coverage E — value Max ident Links NC 015460.1 NC 009524.1 NC 014752.1 NC 010332.1 2308 753 695 553 333 2808 753 695 553 333 100% 83% 87% 89% 56% 0.0 100% 0.0 78% 0.0 76% 7e-153 74% le-36 74% f Práce s databází NCBI www.ncbi.nlm.nih.gov NCBI Resources Q How To Q My NCBI Sign In %NCBl National Center for Biotechnology Information I All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | Research | RSS Feeds Get Started Tools: Analyze data using NCBI software Downloads: Get NCBI data or software How-To's: Learn how to accomplish specific tasks at NCBI Submissions: Submit data to Gen Bank or other NCBI databases Popular Resources PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein PubChem Genomic Structural Variation 1 I « I - - - dbVar archives large scale genomic variation data and associates defined variants with phenotypic information. ľ * J ft' 1 . ■ \ • ■ i 1 r m -*. II 12345678 NCBI Announcements New Microbial BLAST Page 12 Jun2D12 Now easier to use and with the familiar format and features of the standard NCBI BLAST services, including auto-complete Siqn up for the Fall Discovers Workshops! Pokyny pro vložení vlastních dat řj NCBI Resources 0 How To Q My NCBI Sign In %NCBI Nati or a I Center for Biotechnology Information All Databases NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation How to: Submit data to NCBI Starting with. SEQUENCE DATA For guidance on the submission process for your sequencefs), please seefHcw To: Submit sequence data to NCBI. £our data will be submitted to one of the following databases: GenBanK Sequence Read Archive (SRA) dbSNP clbVar GEO MICRO ARRAY DATA If you have microarray data from clinical studies that require controlled access, you should submit your data to dbGaP. For all other microarray data, you should submit your data to GEO via GEO's Submission page. BIOASSAY DATA, SUBSTANCE OR SEQUENCE-BASED REAGENTS BioAssay data and chemical substance information should be submitted to PubChem via their PubChem Deposition Gateway. Posuzování podobnosti sekvencí Posuzování podobnosti sekvencí Hledáme homologické sekvence vzniklé v průběhu evoluce Úkol: Jsou si podobnější sekvence A a B nebo B a C? Výchozí sekvence A = ATTGCTCTGT B = ATAGCTCGGT C = ATTGCACTGTAATGCCATGT D = ATTGCTCTGAAATGCCCTGT Posuzování podobnosti sekvencí Přiložíme sekvence k sobě = přiřazení (alignment) A = B = A T T G C T C II I I I I A T A G C T C G T I I G T par nepár C=ATTGCACTGTAATGCCATGT I I I I I Ml I I I I I I III D=ATTGCTCTGAAATGCCCTGT Posuzovaní podobnosti sekvenci Výpočet normalizované hodnoty podobnosti (score) A = ATTGCTCTGT II I I I I II B = ATAGCTCGGT hodnota páru hodnota nepáru i i SAB = (8x1 + 2 x 0)/10 = 0,80 počet pozic počet párů počet nepárů (match) (mismatch) Posuzování podobnosti sekvencí ATTGCACTGTAATGCCATGT MIM Ml I I I II I III ATTGCTCTGAAATGCCCTGT SCD = (17 x 1 + 3 x 0)/20 = 0,85 0,85 > 0,80 -> C a D jsou si podobnější Globální a lokální přiřazení Problém sekvencí odlišné délky nebo velmi odlišné sekvence stejné délky Global alignment > Sekvence přiřadíme po celé délce i za cenu vnášení mezer > Vhodné pouze u příbuzných sekvencí > Vhodné pro mnohočetná přiřazení Local alignment > Sekvence přiřadíme jen tam, kde jsou velmi podobné, ostatní budeme ignorovat > Vhodné pro nepříbuzné sekvence > U podobných sekvencí odpovídá globálnímu přiřazení Globální a lokální přiřazení Global alignment SLAV----------APATNIK-------PIQNYR-1------AKSETQRYMVIE S LAVYT YIE FVRANAPATNI KS E CVRAAPIQN YRRVE HVRAT AKS E TQRYMVT E Local alignment S LAVYT YIE FVRANAPATN I KS E CVRAAP I QN YRRVE HVRAT AKS E TQRYMVIE -------------NAPATNIK S E C VRA- PI QN YRRVE HVRA------------- Bodový diagram Grafická mapa podobností sekvencí, pomůcka pro volbu přiřazení ATTGATCGGTCmQ A« T • Ť G "c T 9 C • G • G • T • A« T • Ť • G • Nalezené shody ATTGATCGGTCTTG A* • T • • • T • % G • • C • • T • • C • • G • # T • A» • T • • T • • G • • Filtrace krátkých diagonál ATTGATCGGTCTTG A« T • • T • % G • C • T • C • G G 9 T A# • T • • T • • G • • Výběr algoritmu přiřazení Globální přiřazení je možné jen pro dvojici A-B Prohledavače FASTA > Modelový heuristický algoritmus > Vytvořený v roce 1988 > Dnes už se málo používá, jsou výkonnější metody BLAST > Nej rozšířenější heuristický algoritmus > Vytvořený v roce 1990 > Rychlejší než FASTA asi 6x BLAST Basic Local Alignment Search Tool http://blast.ncbi.nlm.nih.gov/Blast.cgi BLAST Home ► NCBIf BLAST Home Sasic Local Alignment Search Tool BLAST Assembled RefSeq Genomes Choose a species genome to search; or list all genomic BLAST databases. □ Human □ Mouse □ Rat n Arabidopsis thatiana Basic BLAST n Oryza satíva □ Sos taurus n Danio rerio □ Drosophita melanoqaster n Gallus gailtis n Microbes Choose a BLAST program to run. nucleotide blast protein blast blastx t-hlaetn Search a nucleotide database using a nucleotide query Algorithms: blastn. megablast discontiguous megablast Search protein database using a protein query Algorithms: blastp. psi-blast. phi-blast, delta-blast Search protein database using a translated nucleotide query "-Ifiarrh tranclatari ni irlarttiHe Hatahatfi i iiinn a nrntuin ni iprv BLAST finds reqions of similarity between biological sequences, more... USU DELTA-BLAST, a more sensitive protein-protein search Ji°J Your Recent Results Newi 12) - Nucleotide Sequence (49... Nucleotide Sequence (492 lett... i2\ - Nucleotide Sequence (15... U All Recent results... News Microbial BLAST A new microbial BLAST page is available. Hon, W Jun 2012 12:00:00 EST b More BLAST news... TipofSieDay Use Genomic BLAST to see the genomic context If you are interested in the evolution of a particular gene Tento prohledávací nástroj prochází celou databází a už jsme jej několikrát použili BLAST I Basic BLAST! Choose a BLAST program to run. nucleotide blast protein blast blastx tblastn tblastx Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontiguous megablast Search protein database using a protein query Algorithms blastp: psi-blast phi-blast, delta-blast Search protein database using a translated nucleotide query Search translated nucleotide database using a protein query Search translated nucleotide database using a translated nucleotide query I Specialized BLAST available. Mon, 04 Jun 2012 12:00:00 EST [£| More BLAST news.,. Tip of the Day Use Genomic BLAST to see the genomic context If you are interested in the evolution of a particular gene or gene family it is often intetesting to examine the intro -exon structure even across species. Pi More tics. Choose a type of specialized search (or database name in parentheses.) □ Make specific primers with Primer-BLAST □ Search trace archives □ Find conserved domains in your sequence (cds) □ Find sequences with similar conserved domain architecture (cdart) □ Search sequences that have gene expression profiles (GEO) a Search immunoglobulins (IgBLAST) □ Search using SNP flanks □ Screen sequence for vector contamination (vecscreen) □ Align two (or more) sequences using BLAST (bl2seq) □ Search protein or nucleotide targets in PubChem Bio Assay Využití variant BLAST Program Dotaz Databáze Úroveň srovnání Použití blastn DNA DNA DNA Hledání edentických sekvencí DNA blastp protein protein protein Hledání homologických proteinů blastx DNA* protein protein Hledání genů a homologických proteinů na nové DNA tblastn protein DNA* protein Hledání genů u necharakterizovaných DNA tblastx DNA* DNA* protein Studium struktury genů * Jsou srovnávány přeložené DNA sekvence ve všech čtecích rámcích Datové soubory Jsou jednotné pro všechny zmíněné databáze > Každý záznam má přístupový kód - Accession Number - proměnlivý počet písmen a číslic podle toho, přes kterou databázi byl přijat-je to jakési rodné číslo > Publikací v GenBank získá jedinečné číslo Gl (GenBank Identifier) - číslo občanského průkazu > Autoři primárního záznamu jej mohou upravovat a vznikají tak verze, první má číslo 1 > Změnou verze se mění číslo Gl > Všechny verze se uchovávají Hlavička záznamů % NCBI Resources © How To © Nucleotide Nucleotide přístupový kód I Display Settirlgs: R GenBank its Advanced nazev Send to: 0 Mycobacterium aviur/ insertion element hot spot flanking region FR300 GenBank: AF319936.1 k FASTA Graphi) Goto: (v LOCUS AF369936 W~ 312 bp DNA linear BCT 27-MAY-2 '. '. 1 DEFINITION Mycobacterium avium insertion element hot spot flanking region FF.3 l j . ACCESSION AF369936 VERSION AF369936I1 J GI:1421QQ32 typ záznamu F F h - verze číslo Gl gb = GenBank, emb = EMBL, dbj = DDBJ Někdy sekvenuje daný úsek nezávisle více různých skupin, pak je v databázi v několika podobách s různými přístupovými kódy a často i pod různými názvy! Anatomie databázového záznamu řj NCBI Resojnces © Ham To lil Nucleotide Ku; e it cle Limits Advanced My NCBI Sign In Hep Display Settings: fc) GenBank Send to: R Mycobacterium avium insertion element hot spot flanking region FR300 GenBank: AF369936.1 FASTA Graphics Goto: © LOCOS jEFISITIOU access:cm vessio:; keywords drga17ism reference authors title jq7jr17al reference adthdrs title jq7jr17al features source rr.iEC ------- AF3Ě533Í 112 bp DNA linear 3CT 27-MAY-2001 Mycobacterium avium insertion element hot spot flanrir.g" region FR30u_ AF3Ě533Í AF3ÉB33Í.L Gl:14210082 Mycobacterium avium Mycobacterium avium Eacteřičí A-ctinobacteria; Actinabacteridae; Actinomyaetales; Corynebacterineae; Mycobacteriaceae; Myccbacr-erium,- Mycobactexiim avium complex ÍHAC]. 1 (bases 1 to 312] Bartos,M. , Svaat-cva p P . , Dvorská,!.., flss-cr., R. T. and Pavlifc, I. Insertion element 18301 hot apot FR300 Unpublished 2 (baaes 1 to 3L2) Bartoš,M_ , Svs9tcvärP.; DvorslcarL., Weston,R.I. and Pavlii,I_ TJirect Submission Submitted ! 13-APH-20C LI Department of Bacteriology, Veterinary Research Institute, Hudcova 70, Brno £21 32, Czech Republic Locsr-ion/ Qual i f ier a L_.312 / organiam.= lrMycob acte ri urn avium" /irjol_type=lrger.c-ir.lc DH?.lr / db_xref="t axon:17 S4" 1..312 /note="ingertion e lernen- hot spot flanking region FR300; contair.a hor- spor- for 15901 insertion" Change region shown Customize view Analyze this sequence ň Run BLAST Pick Primers Highlight Sequence Features Find in this Sequence Related information Related Sequences B Taxonomy Recent activity Tum Off Clear Q Mycobacterium avium insertion element hoi spot flanking region FR3C0 lucieou* FR3OT (2) D CL n Neisseria gonorrhoeae strain PID2059 TraG3 ((raG3). EppA (eppA): Ycril (iNusiEoa* Neisseria gonorrhoeae {22947) ActinobacilluspleLiroprieLirYioniae in vivo induced promoler iviG; and CpsIB (c Nu"|M.UiS5 See "jr-E: .. 1 cagccagccg aatopcatcc =gagg~agag =. agccagaac ag=cgaaaga cgc~ccacgc €1 cgccacggcg ccggcgccga gcccgatgta gaggctgcgc tgccgatcca cgcggt-tgat 121 ctg-^tcttcg atgc-ggcgg gcacgatctt cattgg-ggc ttcľctttcgg tggggcggcg 1S1 ccggagtggc gccg^-cgttg cgccc agt-a c aagcccggcc ggcggctacc gatEccaacc 241 acgitccggiľč cgca^taccc -gcacggcag ggggctgtcg aaagggttcg ccggtgaa^g 3Ü1 tgtiľgcgagt tg Anatomie databázového záznamu Mycobacterium avium FR300 Neisseria gonorrhoeae Program bl2seq Porovnání dvou a více sekvencí Specialized BLAST Choose a type of specialized search (or database name in parentheses.) ° Make specific primers with Primer-BLAST □ Search trace archives □ Find conserved domains in your sequence (cds) □ Find sequences with similar conserved domain architecture (cdart) □ Search sequences that have gene expression profiles (GEO) □ Search immunoglobulins (IgBLAST) □ Search using SNPflanks □ Align two (or more) sequences using BLAST (bl2seq} ^^T^5&*aTTr^nT!TeTTTT?HTTn?Te7?T1fleT □ Search SRA transcript and genomic libraries □ Constraint Based Protein Multiple Alignment Tool ° Needleman-Wunsch Global Sequence Alignment Tool ° Search RefSeqGene □ Search WGS sequences grouped by organism BLAST is a registered trademark of the National Library of Medicine. Copyright | Disclaimer | Privacy | Accessibility | Contact | Send feedback Program bl2seq BLAST® Basic Local A\ignmsni Search Too! Home Recent Results Saved Sb ategies Help * NCBU BLAST/ blastn suite l"-"l Align Sequences Nucleotide BLAST blaaln coata Msabr. Maa-fri mlaab; Errier Query Sequence Enter accessio" rur'bsrjs:. n i's'i or-A 5~A sequence^! Or, upload file Job Tide Enter a. fleEolpiiYe Dde r-ar yaur BLAST search 121 A ig- two armors sequences & Errier Subjecl Sequence Enter accession number, gi, cr FA ETA seau=nc= Prochazet... ] A Or, upload file Program Selection Optimize for [ Prochazet... ] & hlasth programs search nucleotide subjects ua*ig j nnclKirJda quwy. mom... ijia-ir Query subrange Fror To Subjec; subrange From To - Highly sknilar sequences (mega blast) More dissimilar sequences {discontiguous ne-g = blas1) '.■ Somewhatsimilar sequences (blastn) cnosse a 3LAST algorithm i# BLAST Search nucleotide sequence using Mega blast (Optimize for highly similar sequences] r^^hnw rnsutts ki d rwwwindow Výsledek porovnání dvou sekvencí >UCB\t BLAST/ blasin fi.uite-2&equencťsJ Formatting Res-ults - YZXRU' Edit and Resubmit Save Search Strateges. > F-srrrattire optEr=- t> Download Nucleotide Sequence (774 letters} Blast 1 sequences QuerylD Id | 3 19 15 Description None Molecule type nucleic acid Query Length 7 74 dotaz Subject ID 31917 Description None Molecule type nucleic acid Sukject Length 689 Program BLASTN 2.2.26 + > Citation Other reports: ^Search Summary rTaxonomv reports] QGraphic Summary CisrifcLi:icn of 2 Blast Hits or the Query Sequence u Ihfcuse-OYer to slww deMiie and scores, click 1» show alignments color Key rat alignment scute; <40 JO-SO so-ao SO-200 »=200 I 1 n 1 1 1fiO :■(()() I 4r>0 1 fiOO 7fi0 B Dot Matrix Vi&w / R Descriptions Ei : E Sene S í Gene tJ Structure d Map Viewer 3d PubCItem BioAssay ducing Significant alignments: Dot Matrix View Plot of Id |42899 vs 42901 r---o CD -IT) CD ■-■ |=| O -OD CD CD -(XI ■-■ CO (XI o i i i i 1 i i i i i i i i i i i i i i i i i i i i i i i i 1111 i i i i 1111 i i i i i i i i i i i i i i i i i i i i 1 ICII42S99 11 30 1E 50 Zi 30 2\ 50 3l 30 50 4l 30 4i J0 5( 30 5E 50 6C 30 50 71 30 71 1 51 Výsledek porovnání dvou sekvencí Q Description s Legend for links to other resources: Gene Q Structure Map Viewer PubCriem BioAssay |uen-c*a producing iignifi-cdnt AlignmftntS; Acq De tü-iptiun Max «Art Tutal Mt Ouerv cjuhc «e E vůl u« Max ident Links 31917 1057 1 ttt B7«i 00 lOTJWi >ld 131911 Lerjgtb=eEE Port Alignments for this subject sequence bj: L value i ::rf sfsztr.- -i.ir.-z-y Query star^ ppgi"ipn PubieEt start position bits (571), Zjcpect : ■ SS4/590 |9M1, Saps ' Strand=ELus/ELus Query L GcirrG£rrGTTGAGO?rcŕ.GTACŕ.mľGfc.AG&^ e ŕ 1111111111111 íl li 11111 1111111 íjuerj Gl ö_i^_TCTCTAC^_TTTCT_CC5C^^ IEP I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct El CiCiTm^iC^.TTTÖ.CCCCTiö.CCT^JiTTCriC^^ ItC Sed^= 165 bits (EE), Expect = Ee-45 Identities = EE/EE (1IMM), Caps = IVEE (CI) 5trand=Plus/PLus I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 3bj ct SEP TÖ.^J^.SiJASÖ^SrTTTCITOrir^ auerr 713 BCĽÄeCETTCiATCr^.eCĽÍSCiTCÍŕ.C 761 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I SbJEt 65E GO^COSTTCiATCr^.BCCř.CSř.TCřJ.C 67 E Identities = frakce totožných pozic Výsledek porovnání dvou sekvencí Q D&scriptions Legend for links to other resources: Gene Q Struct Lire Map Viewer PubChem BioAssay |UEncä. producing iignifi-cänt alignments; Acq saiůn De tü-iptiun Max «Art Tutal Mt Ouerv cjOHť «e E ■, a I u t Max idtnt Links 31917 1057 1 JJJ B7«i 00 líCWi Q Aliqnm&nts >icLi;i9ir L5r.7-ŕ.=c= : Fort alLgr_ner.t-= for this subject sequence bj: L value i ::r-i jir:-if.~ ii.-ar.~i"-.-Quer? start pd"J"Ldti Piabjeirt start position Score = Lu57 bits ZjcpecC = D.D Identities = 55 4/ S&D Gaps = Ü/^D ful) Strant^PLus/ Plus Query I 'HTirrCHTT'^ CP I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ebjct L GOTTOXjjIJlTG^ 60 Quer j Gl CACATTTTTACSCATTTC^^ 1EP I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ktrjct El "ACMXTCTAC^JTTra lEC Quer? 12L ACTCXX^ffTCTSAAAT^^ lED ?ed^F^**1ě5 bits (E&), Expect = Ee-45 Identities = E5/E5 (1IMM), Sips = IVE5 (CI) Strajsd=Plu5/ PLus I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I auerr T13 BCĽÄÍÍSTTCiATCr^.eCĽÍ^.TCÍŕ.C 161. I I I I I I I I I I I I I I I I I I I I I I I I I I I I I BJiiirt S6f BCCAffiSTTCiATCr^.BCCř.íSř.TCřJ.C E? E Score (zjištěná hodnota podobnosti) = pokud dosáhne zvolené mezní hodnoty (cutoff) program přiřazení zaznamená jako HSP (high scoring pair), jinak je opustí Výsledek porovnání dvou sekvencí Q D&scriptions Legend for links to other resources: Gene Q Struct jre Map Viewer PubChem BioAssay |uen-c*a producing Aig nificant Alignmftnta: Acq saiůn De tü-iptiun Max «Art Tutal Mt Ouerv cjOHť «e E ■, a I u t Max idtnt Links 31917 1057 1 JJJ B7«i 00 100 "Ä Q Aliqnm&nts >icLi;i9ir Port alignments for this subject sequence bj: L value i :: i-* jir:ff.: ii^r."i~r Query star^ ppsi"ipn PubieEt start ppsi.ti.prj Scoxe = bits (571), Zjcpect = D.D Identities = Hi/590 (951), Saps = D/S.?l> (CI) Strand=PLus/ Plus Query 1 SCTTT0G[^ffl7rGAG0?rc^.CTACATTG0[^.A^^ EC I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ppjet L ^LTLCGCX&3SX5C&I^J^}£?irF0^ EC Query 51 CA^.TCTCTiCSCATTT^-KXiCTA^-n^^ 1EP I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 5b] st El CAC^.Tm^AC^.TTT&CCGCTA^.^^ 1E0 Query 12L ACTCX^.GTCT^JiATra.GTTCO^^ 1EP Pepře = 1E5 bits (E&), Expect = Ee-45 Identities = E5/E5 <1I>M), Caps = IVE5 (PI) Strand=Plus/ Plus Query E73 TCJ£CAAMiiJ£CAMCTTTClTa3^ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 3bj ct TO_^_ŕA6ŕJAGO_A5CTTTCITnľ^^ Query 713 BCĽÄeCETTCiATCr^.eCĽÍSCiTCÍŕ.C 761. I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ppjet E&C GO^COSTTCiATCr^.BCCř.CSiTCřJ.C E? E Expectancy, E-value (hodnota očekávatelnosti) = 8e-45 8 x 10"45, průkazné jsou hodnoty pod 0,001 Něco navíc k procvičení BLAST Prohledejte databázi a zjistěte, jakému organismu patří následující sekvence GCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAATGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCCCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGGCATGGCTGCGTCAGGGTTCCC CCCATTGCGCAATATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGACCGTGTCTCA GTTCCAGTGTGGCTGGTCATCCTCTCAGACCAGCTAGAGATCGCAGGCTTGGTAGG CCTTTACCCCACCAACTACCTAATCCCACTTGGGCTCATCTTATGGCAGGTGGCCC TAAGGTCCCACCCTTTCCTCCTCAGAGAATACGCGGTATTAGCTGCAGTTTCCCAC AGTTATCCCCCTCCATAAGCCAGATTCCCAAGCATTACTCACCCGTCCGCCACTCG TCAGCAAAGAAAGCAAGCTTTCTTCCTGCTACCGTTCGACTTGCATGTGTTAAGCC TGCCGCCAGCGTTCAATCTGAGCCAGGATCAACNTCTTTCTCCAAA Měla by to být Pasteurella multocida Porovnejte tyto dvě sekvence, patří stejnému druhu? GCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAATGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCCCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGG GCTTTCGCGCATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAAAGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCGCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGG Mnohočetné přiřazení Multiple alignment > Jedním z příkladů využití je porovnávání více sekvencí současně CLUSTAL > CLUSTAL W = všeobecně dostupný > CLUSTAL X = CLUSTAL W opatřený grafickým rozhraním pro Windows > CLUSTAL OMEGA = poslední verze http://www.clustal.org Shrnutí 1) Práce se sekvenčními daty 2) Základní veřejně dostupné databáze 3) Práce se stránkami NCBI 4) Jak se posuzuje podobnost sekvencí 5) Prohledavač BLAST, BLAST2 6) Mnohočetné přiřazení - program CLUSTAL