Financováno Evropskou unií NextGenerationEU Národní plán obnovy MINISTERSTVO ŠKOLSTVÍ MLÁDEŽE A TĚLOVÝCHOVY Analýza sekvenčních dat Vratislav Peška - vpeska@ibp.cz Central European Institute of Technology BRNO | CZECH REPUBLIC NPO-4 Pokročilé metody v genomice a proteomice, NPO_MUNI_MSMT-16606/2022 Sangerova metoda sekvenace - výsledky PCR ELFO EŽl DAmetody\sariger-sequencing-output\101125FN-0a4jip\ So u bo r Úp ravy Zo b razeni 0 bl ibené Nástroje Nápověc a s> — v r X ä Pňdat Rozbalit Zkontrolovat Kopírovat Přesunout Vymazat Informace 7 ^ D:\metady\sanger-sequenclng- outpuť\101125FN-084.zip\ Název Velikost Komprimován... Změněn Vytvor blast_result 362 475 78 618 □ SA-A-M13F-pUC.ab1 280 084 144 420 2010-11 3021:01 @ 6A-A-M13F-pUC.pdf 101963 96 718 2010-11 3021:05 □ 6A-A-M13F-p!X.phd.l ^^^^ 10 882 4190 2010-11 30 21:01 < Q 6A-A-M13F-pUCscf 1 ^^04 702 41167 2013-11 30 21:01 ^ 5 rjA-i-M13F-pUC.txt 4. 409 2010-11 30 21:05 Q 6B-B-M13F-pUCabl ^^^^ 2E0132 ■152 381 2010-11 30 21:01 @ 6B-BM13FpUC.pdf ^^^^ 85 305 r«2fflCM1 30 21:06 □ 6B-B-M13F pUC.phd.1 ^ W 10 891 3 963 20«^ «jU1|01 □ 6B-B.M13F-pJC.scf 35 778 2010-11 3027!r^^. ^ 6B-B-MI3F-pUC.txt I0SS* 407 2010-11 30 21:06 D 69-FJ-M13R-pUC.abl 280 3 72 147458 2010-11 30 21:01 B 6B-B-M13R-pUC.pdf 109 177 2010-11 30 21:06 Q 6B-B-M13R-pUC.phd.l 10 875 426^ r^010-ll 30 21:01 Q 6B-B-M13R-pUC.scf 104 990 43 318 2OT0W 30 21:01 g 6B-B.M13R-pUC.txl 1006 414 2010-11 3rJ*jji06 □ 7A-1-M13F-pUC.ab1 265 665 136 828 2010 11 @ 7A-1-MI3F-pJC.pdf 131 394 125 558 2010-11 30 22:47 □ 7A-1-MI3F-pUCphd.1 9201 3 380 2010-11 3022:47 □ 7A-1-M13F-pJC.scf 91166 45 602 2010-11 30 2247 ^ 7A-l-M13F-pUC.txt 882 394 2010-11 30 22:47 □ 7B-5-M13F-pUC.ab1 279 740 144 380 2010-11 30 22:47 S7B-5-M13F-pUC.pdf 123112 117721 2010-11 30 22:48 □ 7B-5-M13F-pLC.phd,l 10 706 3984 2010-11 30 22:47 □ 7B-5-M13F pUC.scf 104 394 46 502 2010-11 30 22:47 H 7B-5-M13F-pUC.txt 1006 437 2010-11 30 22:48 Klonování do vektoru Sekvenační servis / 7-Zip https :/Awww.7-zip.org 7-Zip > Tento počítač > DATA (D:) > metody > sanger-sequencing-output Název 1 Hl 101125FN-084.zip Datum změny 02.12.2010 14:05 Typ Soubor ZIP Velikost 4177kB Obrázky vytvořil Vratislav Peška 5 6A-A-M13F-pUC.trt - [D:\m#tcKV\Hnger-sequerMľing-oiitput\101125Fr\l-084\eA-A-M1äF-pUC.uieil3B-e4_K23_6A-a-IA13F-pUC.abl 9S0 INArtóNNNNřJNIv^TTúTMTACtjACTCACTATAfjtjGCrjAATTGGGCCC 3 TČT^Äri^CATGCTCGAGCGGCCGCCAGTGTGATGGATATCTGCAGAAT^ 4 CeCCCTTr^^t^ĚAtCTTC-TECTTQCCAAAGTCAAAEGTTTTCCTGCT 5 TGGCCTGCCGTGGlA^t*j^TTATTATTCTCTTTTGCGTTTGCTTCTTTA 6 TGGATGfKCTATAAGTTTf^^^Stí^KTTACCA-TCCTATTT-TCT-TCT 7 -.555iTT_.TTT_,TTTS;-C-.TGT-ľcre^^KCT..5SiTTt™-.TC-S TGTTAAGTC&ATTCTTTCTTGTCCläATTGTAGAr^rtj^TTAAUTTCTT 9 ctgcctaatcgatatf5amtttttctcaaaggagctga^tí^^aatt 1b cttgttgttagtagafjacgggaacacaactagtttagaagggcgaa 11 agcacactggcggcc6ttactagtggatccgagctcggtaccaagctt6a 12 tgcatakttgagtattctatagtgtcacctaaatagcttggcgtaatca 13 tggtcatagctgtttcctgtgtgaaattgttatccgctcacaattccaca 14 caacatacgagccggaago^taaagtgtaaagcctggggtgcctaatgag íttgcgctcactgcccgctttccagtcg ittaatgaatcggccaacgcgcůgggag cttcngcttcctcgctcactgactcgc iCuAGCuGTATCAGCTCACTCAAAGGCG AGGGGATAACGCAGGAAAGAACATGTG AbrLAAAAbbLt.ÁtLAAAAb()LLAGGAACCGTAAAAAGGCCGCGTTGCTGG File: 6A-A-M15F-pUCabl Sample: '•''■'''''"•''"áilÄJÍIltó MA 'iU J' 510 520 530 540 550 560 570 580 590 600 610 620 ATAGCTTG AGTATTCTAT.'-GTGTCAC C C T CG GCCGG • GC T . GTG Obrázky vytvořil Vratislav Peška - Geneious https://wvAV.geneious.com • $900 / rok FREE na M eta cent rum Geneious | Bioinformatics Software for Sequence Data Analysis Geneious Prime is the world's leading bioinformatics software platform for molecular biology and sequence analysis. Enterprises involved in antibody ... ACADEMIC GROUP ACTIVATIONS 2 5900 5 Í2.250 10 Í4.25C Custom Hers re available Start Your 30-day Geneious Prime Free Trial No credit card is required. f (J fi) Ů http K/Aww.d nastar.com/scftw3fe/l3 A4 CS -Ci a 0 ŕ i DNASTÁR" > Supported $2600 / rok = Lasergene Molecular Biology Lasergene Genomics Lasergene Protein ^ DNASTAR Lasergene Pricing Annua! pricing starting at; 5699/year $1,599/year $799/year $2,599/year O IS UMOUČENÉ-Integrated Biů: x + O íí) A Nezabezpečeno urjene.net Unipro UGENE v. 45_oE£tma-ry-ifla UGENE is a Tree open-source bioinformatics software Far Windows, rnacQS, and Linux. Down toad UGENE^ ES Solunem* Pioduns. and S-eriiiei B-hsc S-jpBon Absui Roqu-e;! a Trinl Gy clK*«jtriB burton TsjiM roueiwiOi*' Inmsol Service *nd Ptrvac,1 Pohry. ivKV Policy indTtfrrri of stftric* apply) Secondary Analysis J*)??? J fOk QIAGEN CLC Workbench Premium QIAGEN CLC Genomics Workbench QIAGEN CLC Mair Workbench QCI Secondary Analysis QIAGEN CLC Genome Finishing Module QIAGEN CLC Microbial Genomics Module QIAGEN CLC Genomics Server G Interpretation and Visualization FREE na Metacentrum QIAGEN Ingenuity Pathway Analysis |IPA) QIAGEN OmicSoft Suire Availability Ö u5l Ô https://thalljiscience.github.io Geneious release R7 (version 7.1.5): module gei • Geneious release R7 (version 7.0.3): module g • a license purchased by CERIT-SC Centre allow CLCbio Genomics Workbench version 9.5.3 mc • upgrade of a license purchased by Centre CER CLCbio Genomics Workbench version 7.0: modi • upgrade of a license purchased by Centre CER CLCbio Genomics Workbench version 6.5.1: mc • a license purchased by Centre CERIT-SC (avai FREE Copyright @I99 7-2017 Tom Hall sequence alignment editor for Wm'JjrJli/NT/'AK/XP Obrázky vytvořil Vratislav Peška V File Edit View Tools Sequence Annotate & Predict Help ®» ftj» ^.Qb" 7s* -4" v O v © v Acid Export BLAST workflows Align (Assemble Tree Primers Cloning Help Geneious Prime ■ Expires in 28 days 11 hour Back Forwar H Local 1 to acari to allium to anelida to apidae to apis 20 to Aspergillus to basidiomycoca iJI chordata to from PETR 4698 (4 unread) to heteroptera to LAMA 634 (6 unread} to leptopilina to Pi5sum_iativLim to pyrApt to rqnunqukiceae to Re sults_exr. ended i78 [A unrt to spider5 to 5podoptera to TRSl -proteins &7 Ij yeasts Sample Documents S Reftwa Features 841 C Shared Databases O Operations «■v Cloud Database (beta) J9 EuPathDB ft NCBI 5 UniProt Q, Search Everywhere Description C9 Ä 6AA-Ml3F-pUC.ab1 Sequence View Dotplot {Self} Chromatogram Text View Lineage Info 4" -> (^Extract @ FLC. t|£ Translate O Add/Edit Annotation Allow Editing l1> Annotate & Predict ■* ©Save 1 of 1 selected V FH -Modified Great" 30 Nov 2010 8:01 p... 30 No |®JQj j im|q|h Graphs ccttgstgggk q Show Graphs q Chromatogram A CC TTGTG CTTGC CAAAGTCA A AG GTTTTC CTG CTTG G C CTGC C GTGGTATGTATTTAT -U-1--TT-r-LT-"-"---J,--'---LT"--l_i-injqj- H Chromatogram TATTCTCTTTTG CGTTTGCTTCTTTATGGAľFGGC CTATA A GTTTC TC C A C ATGTTA d Quality U Hp. C Ot Highlight O GC Content TCTATCTAGGG •■• ",:>< A , IAt ( ":.( I K CT* . - ' "i ■ ,| I A a , ! ( C, -'■ . i ■ C , •■• I I , ' a ;, .-. ( I .. G A AG F^'1^", P'°' Sliding Window Size: Q I (P Export... O 40|0 ľ* __ rl> loo 0 —>í i o — TTCTTCTG C CTAATC G ATATG A GGTTTTTCTCAAA G GA GCTGAA ATC CATTA ATTCTTG ^GAGACGGGAACACA A CTA GTTTAGA A G G G C G A ATTC C Protein Coding Prediction AC ACTGCT Window Size: 200 | Q | Step Size: 3 0[ Based on the [MBOSS 6.5.7 tool tcode. C ATG GTC ATA G CTGTTTC CTGTGTG^ FGTTATC C G C VTTCCiCA CA A C ATAC GA GCCGGAAGCATA/ 5GTGCCTAATGAGTGAGC1 fGCGTTGCGCl GGAAACCTGTCGTGCCABC1 EC /\/\'/\";c: ■ATTGGGCGCTCTTCEK GCGAICGC G AHC ■ A A AMC C A G CHA Al \GHAACCGTAAAAiGGCCGCGTTGCH Obrázky vytvořil Vratislav Peška t eneious by Dotmatics q Name v q a 6A-A-M13F-pUC.ab1 % File Edit View Tools Sequence Annotate & Predict 8 Reference Features 841 > M Deleted Items 5S Shared Databases Qs Operations Seq 1 ■í | Trim Ends... | Trim using BBDuk... Transfer Annotations... Annotate from Database... Compare Annotations... Annotate by BLAST Find ORFs.. Find Motifs... A Find CRI5PR Sites... A. Analyze CRISPR Editing Results... Find Variations/SNPs.. Find High/Low Coverage-Calculate Expression Levels-Compare Expression Levels... Locate Tandem Repeat(s) with Phobos... Find Protein Domains With Interproscan.. Search for Transcription Factors... Annotate & Predict Help New Sequence... G* Extract Regions... íH Reverse Complement.. ^ Translate... BackTranslate... O Circularize Sequence Free End Gaps Alignment Change Residue Numbering... Convert between RNA and DNA Set Read Direction... Set Read Technology... ™ Set Paired Reads... \TC ctatttatctatc1 Sequence View Dotplot (Self) Chrom A doud Database (beta) & Extract @ R.C. © Translate £ Add/Edit Annotation ÄÄ&4 EM" 00 Merge Paired Reads... Remove Duplicate Reads... Remove Chimeric Reads... Error Correct & Normalise Reads... Separate Reads by Barcode... Group Sequences into a List... 0 Extract Sequences from List- Tools Sequence Annotate & Predict Help I I I A I C I AT-res ^ Align/Assemble Tree... ■^7 Primers O Cloning BLAST... Add/Remove Databases > Q. Classify Sequences... Q? Extract Annotations-Mask Alignment-Concatenate Sequences or Alignments-Generate Consensus Sequence... Submit to GenBank - T (J) Workflows Plugins... Preferences.. Ctrl+Shift+P A C CTTGTG CTTG C C ents : 1 1 1 ur l~l I TATCTA xxrxxrxfiBrBAAxr.&AT^x^cfixx- S Allow Editing © Ann, q Operations kGGGCGAATTGGGCCCXCTAGATGCATGCTCGAGCGGCCGCC MOJE VLASTNI ANOTACE GTGGTATGTATTTATTATTCTC I I I I GCGTTTGCTTCTTTATGGATGGCCTATAAGTTTCTCCACATGTTACCAATCCTATTTATCT Obrázky vytvořil Vratislav Peška DATABÁZE - Velká trojice EMBL/NCBI/DDBJ Obrázky vytvořil Vratislav Peška Data type DDBJ EMBL-EBI NCBI Next generation reads Sequence Read Archive European Nucleotide Archive fENAl Sequence Read Archive Capillary reads Trace Archive Trace Archive Annotated sequences DDBJ GenBank Samples BioSample BioSample Studies BioProject BioProject Ů https://wwrt.ric6i-ri1m.iiift.gciv European Nucleotide Archive The ENA Acvjrceo Search API is changing on J023-OB-02! Ceiairi. í-- G fil Ů http;://vvww.íldbj.ni9.ac.jp/iridex-e.htfnl % DDBf Services Supercomputer Statistics ActŕvtOes About Us fAcril 10th 10:00 ■ April Ittti 14:001 Suspension of DDBJ Fiiiertmstauie Z.DDBJ Biomformalion and DDBJ Cen from lire science researcnes ai Featured data resources AlphaFold DB Database for protein structure predictions for numerous species CC-BY BioModels A repository of peer-reviewed, published, computational models. Web API I CCO Featured tools Clustal Omega ► HMMER HMMER Multiple sequence alignment of DHA or protein sequences. Clustal Omega replaces the older ClustalW alignment tools. Web API Fast sensitive protein homology searches using profile hidden Markov models (HMMs) for querying against both sequence and HMM target databases. WrIi apt ChEMBL An open data resource of binding, lonforn functional and ADM ET bioactivity :a data. Web API | CC-BY Annotation Platform Consolidating text-mined and curated annotations Tools Ar: mi Add • • H • Assembly Archive Basic Local Alignment Search Tool (BLAST) Batch Entree BioAssay Services BLAST Lkik(BIJnk) BLAST Microbial Genomes BLAST RefSeqGene CDTree C BV C::3i; COBALT Concise Microbial Protein BLAST Conserved Domain Architecture Retrieval Tool (CDART) Conserved Domain Search Service (CD Search) Digital Differential Display (DDD) Electronic PCR (e-PCR) Frequency-weighted Link (FLink) Gene Expression Omnibus (GEO) BLAST Genetic Codes Genome BLAST Explores amino add properties Substitutions and functions Links the raw sequence information found in the Trace Archive with assembly information found in GenBank/EMBL/DDBJ Finds regions of fcrcal similarity between biological sequences Retrieves records specified in in uploaded file of identifiers Tools Ihst summarize the biological tesl rasutts in the PubChem database Displays the results ol a pre-computed BLAST search of a protein against all other protein sequences at NCBI Finds regions of local similarity between query sequences and sequences from complete microbial genomes Finds regions of local similarity between query sequences and genomic sequences in the RefSeqGeneyLRG set Classifies protein sequences and investigates their evolutionary relationships Compare genomes based on assembly .assembly alignments Displays and manipulates 3-dimensional structures and alignments from the Structure database Performs protein multiple sequence alignments Finds regions of local similarity between query proteins and proteins from complete microbial (prokaryotic) genomes Displays the functional domains thai make up a given protein sequence Identities the conserved domains present in a protein sequence Identities genes with significantly different expression levels by comparing EST profiles Identifies sequence tagged sites [STSs) within DNA sequences Links a group -of re-cords in a source database to a ranked list -of associated records in a destination database based on frequency-weighted statistics Finds regions of local similarity between query sequences and Gen Bank sequences included on microarray or SAGE platforms in the GEO daiabase Displays the genetic codes for organisms in the Taxonomy database in tables and on a laxon&mic tree Finds regions of local similarity between query sequences and genome sequences NCBI databáze Assembly BioCollections BioProject (formerly Genome Project) BioSample Bookshelf ClinVar ClinicalTrials.gov Computational Resources from NCBI's Structure Group Consensus CDS (CCDS) Conserved Domain Database (CDD) Database of Genomic Structural Variation (dbVar) Database of Genotypes and Phenotypes (dbGaP) Database of Short Genetic Variations (dbSNP) GenBank Gene Gene Expression Omnibus (GEO) Database Gene Expression Omnibus (GEO) Datasets Gene Expression Omnibus (GEO) Profiles GeneReviews Genes and Disease Genetic Testing Registry (GTR) Genome Genome Reference Consortium (GRC) Glycans HIV-1, Human Protein Interaction Database Identical Protein Groups Influenza Virus Journals in NCBI Databases MeSH Database MedGen NCBI C++ Toolkit Manual NCBI Education Page NCBI Glossary NCBI Handbook NCBI Help Manual NCBI Pathogen Detection Project National Library of Medicine (NLM) Catalog Nucleotide Database Online Mendelian Inheritance in Man (OMIM) PopSet Protein Clusters Protein Database Protein Family Models PubChem BioAssay PubChem Compound PubChem Substance PubMed PubMed Central (PMC) RefSeqGene Reference Sequence (RefSeq) Retrovirus Resources SARS CoV Sequence Read Archive (SRA) Structure (Molecular Modeling Database) Taxonomy Third Party Annotation (TPA) Database Trace Archive Viral Genomes Virus Variation í~ O (a) Ô https://jgi.doe.gov/data-and-tools/ IfZi & JOINT GENOME INSTITUTE J\JI A DOE OFFICE OF SCIENCE USER FACILITY COVID-19 ABOUT US CONTACT US Our Science Our Projects Data &. Tools User Programs IMG Data Portal MycoCosm PhycoCosm Phytozome 4~ G Lai Ů https,//genomevolution.org/coge/ CoGe Search database My Data Tools ► Help > Log in Organisms: 21,634 Genomes: 57,631 Features: 4,066,190,572 New to CoGe? CoGe is a platform for performing Comparative Genomics research. It provides an open-ended network of interconnected tools to manage, analyze, and visualize next-gen data Latest News CoGe Leadership Chr May 1,2021 Flash-free GEvo UpdE f G ň Ô https://www.plabipd.de 5lant.html publishe I Plant Genomes Obrázky vytvořil Vratislav Peška AKX/r CIA DOG RAM TIVELI'.E f C fí Ô https://www.genome.jp/kegg/ Databases Auto annotation Kanehisa Lab KEGG v Search Help KEGG Home Release notes 4- O ŕíl a Nezabezpečeno | geneontology.org » Japanese KEGG: Kyoto Encyclopedia of Genes and Genomes n 0 i A4 alVance .« ONTOLOGY G ß Ů https://www.expasy.org/resources/uniprotkb-swiss-prot e.g. BLAST. UniProt. MSH6. Albumin.. UniProtKB/Swiss-Prot G ß A Nezabezpečeno | flybase.org Proteins & Proteomes Database ô https://www.arabidopsis.org Obrázky vytvořil Vratislav Peška tair Search Browse Home Heap Contact About Us Subscribe Login Register Tools Portals Download Submit The Arabidopsis Information Resource ' NEW ARABIDOPSIS COMMUNITY Group Free to Join. v Open to All. A https://www.gir1nst.org/fepba5e/ pin C> REPBASE Subscription News! Browse Search Repeat Masking Download JTCXKCAATCDTľAAfJATAGÍľCAAATATTATTATTGTTCAGATACT AiTJTGSSSSfiAAATCAGTGAAA' Ö dx) Ô https://wormbase.Org//#012-34-5 AATAATAA/JjjjľAACAACAiTjroCAAC^C^^ Home Tools" Downloads» Links - Community - Base Version WS287 ALLIANCE of GENOME RESOURCES About D..!^feHomf Js Downloads Community Support OXFORD Journals ACADEMIC You are here; NAR Journ > Database Summary Paper Categories NAR Database Summary Paper Category List Nucleotide Sequence Databases RNA sequence databases Protein sequence databases Structure Databases Genomics Databases (non-vertebrate) Metabolic and Signaling Pathways Human and other Vertebrate Genomes Human Genes and Diseases Microarray Data and other Gene Expression Databases Proteomics Resources Other Molecular Biology Databases Organelle databases Plant databases Immunological databases Cell biology ► Compilation Paper ► Category List ► Alphabetical List ► Category/Paper List ► Search Summary Papers Oxford University Press is not responsible for the content of external internet sites O X F O R D journals Books ACADEMIC DRTRBRSE The Joumol 01 Biological Databases and Curatlon Volumes t Submit t Alerts About DRTRBRSE Impact Factor 4.462 5 year Impact Factor 4.776 Editor-in-Chief David Landsman Latest Volume Volume 2023 2023 Příklad stahování dat z NCBI f File Edit View Tools Sequence Annotate & Predict Help <- -> C O ô 52 httf ■ anelida Index of /genomes/all/C JaPis Name Last modified Size "=7 v Back Forward Add Export BLAST Workflows Align/Assemble Tree Primers * Local £| Name v GCA_947563725.1_qqArgBrue1.1_genomic Parent Directory GCA 026543865.1 Aaur 10X A.M.r- assembly structure/ 2022 -12 ■s: M 147 - GCA 926543865.1 Aaur 10X mm assembly report.txt 2822 •12 ■01 00 :46 9.7H GCA 026543865.1 Aaur 10X A.".r;H assembly stats.txt 2022 •12 ■01 00 i 46 7.0K GCA 026543865.1 Aaur 10X A.'-'i.h feature count.txt.gz 2022 •12 -01 CO ;46 172 GCA 026543865.1 Aau- 10X A'-INH genomic.fna.gz 2022 ■12 -31 00 :46 560M CCA 026543865.1 Aaur 10X AT-aiK genomic.gbff.gz 2022 ■12 ■o: M :46 744M CCA 026545865.1 Aaur 10X AMNH genomic gaps.txt.gz 2622' ■12 ■ei 00 :46 541K GCA 026543865.1 Aaur 10x AMNH wesmaster.gbff. ez 2022- ■12 ■0i 00 :46 1.3K README.txt 2020 0 9 02 IS :2C 43K annotation hashes.txt 2022 ■12 ■01 CO :4|5 410 assembly status. txt 2023 ■04 ■03 C7 :25 14 mdSchecksums.txt 2022 ■12 ■01 CO :47 1.7K HHS Vulnerability Disclosure Obrázky vytvořil Vratislav Peška An official website of the United States government Here's now vou know ■ National Library of Medicine National Center for Biotechnology Information All Databases v | homo sapiens NCBI Home Resource List (A Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information About the NCBI | Mission | Organization | NCBI News & Blog Submit Deposit data or manuscripts into NCBI databases t Develop Use NCBI APIs and code libraries to build applications Obrázky vytvoril Vratislav Peška Download Transfer NCBI data to your computer A Analyze Identify an NCBI tool for your data analysis task Learn Find help documents, attend a class or watch a tutorial Research Explore NCBI research and collaborative projects National Library of Medicine National Center for Biotechnology Information Search NCBI homo sapiens Results found in 32 databases TAXONOMY Was this helpful? 4 f Homo sapiens Human (Homo sapiens) is a species of primate in the family Hominidae (great apes). Taxonomy ID: 9606 Genomes I Browse and download Genes Browse and download f_ \ Genome Data Viewer X®^) Browse the reference genome BLAST Search the reference sequence Obrázky vytvořil Vratislav Peška Genome Obrázky vytvořil Vratislav Peška Download a genome data package including genome, transcript and protein sequence, annotation and a data report Selected taxa Horno sapiens (human) O Enter one or more taxonomie names — Filters ] Download v Select columns r ^ L. J 1 082 genomes Rows per page 20 ▼ 1-20 of 1082 □ Assembly GenBank RefSeq Scientific name f" Modifier Annotati... Size ( Action □ GRCh38.p14 Q GCA.000001405.29 GCF_000001405.40 Homo sapiens (human) ^^^^^^ 3,0' • • □ HuRef GCA_000002125.2 GCF_000002125.1 Homo sapiens (human) male (sex) 2,8- • ■ • □ CHM1J.1 GCA.000306695.2 GCF_000306695.2 Homo sapiens (human) CH Ml (isolate) {^^^^J 3,o: • • • □ T2T-CHM13v2.0 GCA.009914755.4 GCF_009914755.1 Homo sapiens (human) 3,1 • • • □ WGSA A GCA_000002115.2 Homo sapiens (human) 2,81 • • National Library of Medicine National Center for Biotechnology atasets Taxonomy Genome Ge Genom reference Download ľ3 datasets ■ CBI RefSeq sequence Submitted GenBank sequence Taxon nonym Assembly type Submitter Date curl GCF_0 GCA_0 Homo hg38 haploi Genom Feb 3, it iew the legacy Assembly page ssembly statistics 1 Genome available for download Select the files you want Select file source ® All (2) O RefSeq only (1) O GenBank only (1) Select file types _ Genome sequences H (FASTA) Q Annotation features (GTF) □ (JSONL) Your selected data will be downloaded as a ZIP archive Estimated file size is 2 6B Name your file- Annotation features (GFF) Sequence and annotation (GBFF) Q Transcripts (FASTA) Genomic coding sequences (FASTA) Q Protein (FASTA) □ Sequence report (JSONL) Assembly data report ncbi_dataset.zip Cancel Download A* ft Stažené soubory n O & 4fc D Q. - & jg^j Co chcete provést se souborem ncbi_dataset.zip? Otevřít Uložit jako v Další informace iBil ncbi_dataset.zip ncbi_dataset _ ncbí_dataset D README.md Obrázky vytvořil Vratislav Peška > Tento počítač > DATA (D:) > metody > ncbi.dataset > ncbi_dataset > data > GCF_000001405.40 Název Datum změny Typ Velikost Q cds_from_genomic.fna 06.04.2023 15:20 Soubor FNA 357 020 kB D GCF_000001405.40_GRCh38.p14.genomic.fna 06.04.2023 15:20 Soubor FNA 3 261 464 kB Pi genomic.gff 06.04.2023 15:20 Soubor GFF 1 480 761 kB Q protein.faa 06.04.2023 15:20 Soubor FAA 103 388 kB Q rna.fna 06.04.2023 15:20 Soubor FNA 725 186 kB Drop files here to import f 36% - Importing Files Analyzing file contents f Ambiguous Sequences What type of sequences are in GCA_000001405.29_GRCh38.p14_genomic.fna? Geneious thinks this data is probably nucleotide sequences. O Nucleotide sequences (_} Protein sequences Apply to all ambiguous sequences X Obrázky vytvořil Vratislav Peška Cancel f Grouping Sequences X How do you want to store the sequences from GCA_00000140S.29_GRCh38.p14_genomic.fna in Geneious Prime? This file contains a large number of sequences so creating a list is highly recommended. Large numbers of separately stored sequences may cause Geneious to run slowly and use more memory. Sequences can be grouped into or extracted from lists at any time using the Sequence menu. □ Remember my preference Keep sequences separate Create sequence list % File Edit View Tools Sequence Annotate & Predict Help Geneious Prime - Expires in 28 days 11 hours <- © - ĺLi v Ql -— V 4" v O v © - I Q> Search Everywhere Back Forward Add Export BLAST Workflows Align/Assemble Tree Primers Cloning Help ■ Local to acari 1 to anelida to apis 20 k aspergillus to basidiomycota to chordata to from PETR 4698 (4 unread to heteroptera to human 1 to leptopilina to Pissum.sativum to pyrApt 14(1 unread) to ranunculaceae to Results_extended 378 (4 u to spiders to spodoptera to TRB1 -proteins 57 to yeasts 232 to Sample Documents 0 of 1 se Name v V GCA.00000140S.29.GRCh38.p14.genomic Description Obrázky vytvořil Vratislav Peška f File Edit View Tools Sequence Annotate & Predict Help -> © v ĹLi v to yeasts 232 ■ to Sample Documents 55 Reference Features f Deleted Items S Shared Databases ® Operations Q, Search Everywhere 1 of 1 selected V FT Name - Description □ S> GCA.00000140S.29.GRCh38.pl 4_genomic Sequence View <- -> 0 Extract Lengths Graph Text View Lineage Info ® R.C. © Translate Lt> Add/Edit Annotation Allow Editing tl> Annotate 8, Predict * El Save '.Z Mt i: 100 M bp ■- Ml : 2CO Mt p 24t K&423 1:CM000663.2 BEI P^Töl» Annotationsand Tracks —'S) V Filter v®L This sequence has no annotations. 9.990 10.000 10.pl0 10,( CM000663.2 (Homo sapiens chromosome 1,... JNNŇNNNNNNNNNŇTAACCCTAAČCCTAACCCTV Obrázky vytvořil Vratislav Peška D Name " q S* GCA.000001405.29.GRCh38.p14.genomic Description Sequence View Lengths Graph Text View Lineage Info <- -> G? Extract @ R.C. ®ľ Translate O Add/Edit Annotation ^ Allow Editing O Annotate 8. Predia •* __ Save 20 Mbp dO Mbp CO Ubp SO Mbp 100 Mbp 120 Mbp IdOMbp 160 Mbp ISO Mbp 200 Mbp J JO Mbp CM0006632(Homo sapiens chromosome 1,.,. KI270706.1 {Homo sapiens chromosome 1 u... KI270707.1 {Homo sapiens chromosome 1 u... KI27070S.1 (Homo sapiens chromosome 1 u... KI270709.1 {Homo sapiens chromosome 1 u... KI2707T0.1 (Homo sapiens chromosome 1 u... KI270711.1 (Homo sapiens chromosome 1 u.„ KI270712.1 (Homo sapiens chromosome 1 u,.. KI27071 3.1 (Homo sapiens chromosome 1 u... KI270714.1 (Homo sapiens chromosome 1 u... CM000664.2 (Homo sapiens chromosome 2,,.. KI270715.1 (Homo sapiens chromosome 2 u... KI2707T6.1 (Homo sapiens chromosome 2 u... CM000665.2 (Homo sapiens chromosome 3,.,. GL000221.1 (Homo sapiens chromosome 3 ... CM000666,2 (Homo sapiens chromosome 4,.,. GL000008.2 (Homo sapiens chromosome 4... CM000667.2 (Homo sapiens chromosome 5,.,. GL000208.1 (Homo sapiens chromosome 5 ... CM00066S.2 (Homo sapiens chromosome 6,... CM0006692 (Homo sapiens chromosome 7,., CM000670.2 (Homo sapiens chromosome 8,... CM0Q0671.2 (Homo sapiens chromosome 9__„ KI2707T7.1 (Homo sapiens chromosome 9 u... KI270718.1 (Homo sapiens chromosome 9 u... KI270719.1 (Homo sapiens chromosome 9 u... KI270720.1 {Homo sapiens chromosome 9 u... CM000672.2 (Homo sapiens chromosome 1... CM000673.2 (Homo sapiens chromosome 1... KI270721.1 {Homo sapiens chromosome 11 ... CM000674.2 (Homo sapiens chromosome 1... CM000675.2 (Homo sapiens chromosome 1... CMQQ0676.2 (Homo sapiens chromosome 1... GL000009.2 {Homo sapiens chromosome 14... GL000225.1 {Homo sapiens chromosome 14... KI270722.1 (Homo sapiens chromosome 14 ... GL000194.1 [Homo sapiens chromosome 14... KI270723.1 (Homo sapiens chromosome 14 ... KI270724.1 (Homo sapiens chromosome 14 ... KI270725.1 (Homo sapiens chromosome 14 ... KI270726.1 (Homo sapiens chromosome 14 ... CM000677.2 (Homo sapiens chromosome 1... KI270727.1 {Homo sapiens chromosome 15 ... CM000678.2 (Homo sapiens chromosome 1... KI270728.1 (Homo sapiens chromosome 16 ... CM000679.2 (Homo sapiens chromosome 1... GL000205.2 (Homo sapiens chromosome 17... KI270729.1 (Homo sapiens chromosome 17 ... KI270730.1 (Homo sapiens chromosome 17 ... CM000680,2 (Homo sapiens chromosome 1.,. CM0006S1.2 (Homo sapiens chromosome 1... CM000682.2 (Homo sapiens chromosome 2... 40 Mbp 60 Mbp 80 Mbp 100 Mop 120 Mbp 140 Mbp 1€OMbp 150 Mbp 200 Mbp 220 Mbp All Sequences v Q 010|% Annotations and Tracks V Filter This sequence has no c Obrázky vytvořil Vratislav Peška Q Name v Q S> GCA_000001405.29_GRCh38.p14_genomic Description Sequence View Lengths Graph Text View G Extract @ R.C. © Translate i±> A 20 Mbp « Mbc CM000663.2 (Homo sapiens chromosome 1,... KI270706.1 (Homo sapiens chromosome 1 u... KI270707.1 (Homo sapiens chromosome 1 u... KI270708.1 (Homo sapiens chromosome 1 u... KI270709.1 (Homo sapiens chromosome 1 u... KI270710.1 (Homo sapiens chromosome 1 u... KI270711.1 (Homo sapiens chromosome 1 u... KI270712.1 (Homo sapiens chromosome 1 u... KI270713.1 (Homo sapiens chromosome 1 u... KI270714.1 (Homo sapiens chromosome 1 u... CM000664.2 (Homo sapiens chromosome 2,... KI270715.1 (Homo sapiens chromosome 2 u... KI270716.1 (Homo sapiens chromosome 2 u... CM000665.2 (Homo sapiens chromosome 3,... GL000221.1 (Homo sapiens chromosome 3 ... CM000666.2 (Homo sapiens chromosome 4,... GL000008.2 (Homo sapiens chromosome 4... CM000667.2 (Homo sapiens chromosome 5,... GCF_000001405.40 © Nový - [O ^ Domů > A Vratislav - MU Disk Googl< O cwnCloud \M Plocha * ^_ Stažené sol * ^ Dokumenty * f\ Obrázky * ® % t4- Seřadit - = Zobrazit - > Tento počftač > DATA (D:) > metody > ncbi.dataset > ncbi.dataset > data > GCFj Název Q cds_from_genomic.fna Q GCF_000001405.40.GRCh38.p14.genomic.fna Q genomic.gff Q protein.faa rna.fi Datum změny 06.04.2023 15:20 06.04.2023 15:20 06.04.2023 15:20 06.04.2023 15:20 f 1.19% - Importing Files Reading GFF data X Cancel Obrázky vytvořil Vratislav Peška öioHrojec I E riatasets NCBI RefSeq sequence Submitted Gen Bank sequence Taxon Synonym Assembly type Submitter Date GCF_G00QQ14Q5.40 GCA_0000014G5 29 Homo sapiens (human) hg38 haploid-with-alt-loci Genome Reference Consortium Feb 3, 2022 PRJNA31257 tions The Human Gen maintained by th Download RefSeq BLAST against this genome j( See in Genome Data Viewer See more files on FTP Index of/genomes/all/GCF/OOO/OOl/405/GCF 000001405.40_GRCh38.pl4 View the legacy Assembly page Assembly statistics RefSeq nnisning me tin 22 sequence CG Cole, et al. Nature 2006 The DNA sequel annotation of hi. SG Gregory, etal. Nature 2006 The DNA sequel of human chrorr Name Last modified Parent Directory station ccmparison/ GCF 908901495.40 GRCh35,Dl4 , GRCh3S major release seqs for aliei RefSeq transcripts alignments/ GCF_998681405.4.e-RS_2623_e3_annotatiQn_i GCF 890981495 ,49_tjRCh38. pl4_assembly_regions - txt 6CF_9eeeei4&5.4e_5RCh38.pl4_a5sembIy_report.txt GCF_^0601495.46 GRCh38.pl4_aGsembiy_stats.txt GCF966661465.46 GRCh38.pl4 cds from genomic.fna.gr GCF_9eee614e5.40GRCh38.pl4 feature count.txt.gr GCF_eeeeei4e5.4eGRCh38.pl4 feature table.txt.gr j'. • Kf eoeeei405.40 GRCh3s : i- Cr- :■■!_: t>14 eenomic Tna.g: ebff.pi 5CF 000001405.40 <3BCh38 t>14 genomic gfLüB 000001405.40 GRCh38 pl4 genomic jtf.gi 900001405.40 GRCh3S pl4 genomic paps.txt 000001405.40 GRCh38 p!4 protein fae.g- GCF 868601405.40 GRCh38,pl4 protein.gpff.ez GCF 800001405.46 GRCh38,pl4_pseudo without product.fna GCF 868601405.48 GRCh38,Pl4 rm.OUt.pz GCF 886881405.48 GRCh38.pl4 rir.run GCF 068661485.48 GRCh38.pl4 rna.fna.gt GCF 002001405.4m G[;rh?-,.:-14 r-a.gL'-f.g: GCF 968061485.46 <3RCh38,pl4 rna from genomic-fna.ez GCF 800601485.40 GRChi3.P14 translated cds.faa.g: README GCF 666881405,48-RS 2823 83 README patch release.txt all alt scaffold placement.txt annotation hashes.txt assembly status.txt md5 checks urns . txt 2021 2823 2623 2823 2023 2023 2023 2023 2 02 3 2 02 3 202: 2023 2023 2023 2023 2023 .gl 2023 2823 I 2023 2023 2023 2023 2023 2023 2033 2023 292 : 2 02 7 83-21 10:15 03-36 02:36 83-02 22:14 83-21 10:15 03-21 10:15 -83-21 10:15 -83-21 19:15 -83-21 10:15 -03-21 10:15 -03-21 10:15 -03-21 10:15 -03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 63-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 03-21 10:15 83-21 10:15 03-21 10:15 03-21 10:15 03- 21 10:15 04- 86 05:34 04-04 13:58 289K 44K 77K 98K 27M 1.5K 8.6M 928M 1.2G 67H 49M 11R 27H 1S*T. S.90 183:' 874 129ÍI cev S4" 17" 2.IK 1.5K 50k 411 14 193 K HHS Vulnerability Disclosure Obrázky vytvořil Vratislav Peška File Transfer Protocol - FTP (FTPS, SFTP) Using your Browser as an FTP Client 1. Open your Browser, in our example I'll use Chrome 2. In your Address Bar, you can enter: ftp://Host * Note: If you're not certain what your FTP details are you can read our article from here. In our example, I'll use mars.whfweb.com for Hostname, so I'll connect by typing ftp://mars.whfweb.com 3. If you didn't directly use your FTP User and its Password in the URL you will be prompted for them ftp://mars.whfweb.com x ^ O Ö © ftp://mars.whfweb.com FTP přes webový prohlížeč Sign in ftp://mars.whfweb.com Your connection to this site is not private Username Password Cancel Obrázky vytvořil Vratislav Peška FTP Software From sources across the web FileZilla GNU General Public License Commander One Proprietary software 3 I CuteFTP h* pow*r « Proprietary software WinSCP GNU General Public License Transmit Proprietary software © Core FTP Freeware I Cyberduck GNU General Public License SmartFTP 4-VSl/ Proprietary software Filezilla server GNU General Public License 30 more v Feedback Co musím znát: Hostitel - Uživatelské jméno-Heslo - (port + protokol) Obrázky vytvořil Vratislav Peška 0 FileZilla Soubor Upravit Zobrazit □ X Přenos Server Záložky Nápověda be PDF s Acrobat ±i±" | š 31 Hostitel: Uživatelské jméno: Heslo: Port: ( Rychlé připojeni' Správce míst Vyberte místo: Ě Mista I copy I copy-share 1 copy-share 2 1 javor METACENTRUM I elmo-praha5-elixir 1 minos-plienl I nympha-plzenl 1 onyx-brno2 1 perian-brno2 I skmt-brnc-I storage-du-archiv 1 storage-du-zaloha B tarkil-vestecl-elixir 1 tilia-pruhonicel-ibot 1 zuphux-brno3-cerit | NCBI I repeatexplorer X Všeobecné Pokročilé Nastaveni'přenosu Znaková sada Protokol: Hostitel: Šifrování: FTP - File Transfer Protocol ftp-private.ncbi.nlm.nih.gov Port: Použit explicitní FTP přes TLS pokud je k dispozici Způsob přihlašováni': Normálni Uživatelské jméno: subftp Heslo: ........ Barva pozadí': Nic Poznámky: Obrázky vytvořil Vratislav Peška NCBI - subftp@ftp-private.ncbi.nlm.nih.gov - FileZilla Soubor Upravit Zobrazit Přenos Server Záložky Nápověda äH s ä O »« O =K V Hostitel: Uživatelské jméno: Heslo: Port: tav: Připojeni' navázáno, čekání na uvítací zprávu... tav: Nezabezpečený server, nepodporuje FTP přes TLS. tav: Přihlášen tav: Načftání výpisu složky M/uploads/vpeska_scÍ.muni.cz_ARrmq53e"... tav: Výpis složky „/uploads/vpeska_sci.muni.cz_ARrmq53e" proběhl úspěšné vtístní složka: D:\PAVOUCI2024\ PAVOUCI2024 B scripts System Volume Information éi-^Z tm-fagopyrum-20230307 Vzdálená složka: /uploads/vpeska_sci.muni.cz_ARrmq53e X1 ň-"? uploads vpeska_sci.muni.cz_ARrmq53e Název souboru *Zi literatura old "L oprávněni £) list-conflict-of-interest_2023.pdf Ä} C1-final.pdf £) 2305560_CZ_f.pdf Název souboru Prázdný výpis složky Obrázky vytvořil Vratislav Peška Hostitel: Uživatelské jméno: Heslo: Port: [ Rychlé pripo itav: Připojování k 130.14.250.5:21... itav: Připojeni navázáno, čekáni na uvítací zprávu... itav: Nezabezpečený server, nepodporuje FTP přes TLS. ►tav. Přihlášen itav. Spoušti' se odesíláni' D:\9-10-analyza_sekvenci_metody2022.pptx na server Místní'složka: D:\ Vzdálená složka: /uploads/vpeska_sci.muni.cz_ARrmq53e B- — D: (DATA) B-1 1 | s " SRECYCLE.BIN É-3. uploads ÉlC_ AnyMP4Temp vpeska_sci.muni.cz_ARrmq53e lil ~~ downloads Název souboru trfi-fagopyrum-20230307 " Vratislav Q .dropbox.device fle ~S9-10-analyza_sekvenci_metody2022.pptx ť Bločky pro PFGE.docx Sf~ prihlaska_do_gsgm.doc J'~20230323_metacentrum_handout.docx I3ř 9-10-analyza_sekvenci_metody2022.pptx 3- BFU-AVCR.pptx /ybrán 1 soubor. Celková velikost: 9 735 531 bajtú Server nebo místní' soubor ^ subftp@ftp-private.ncbi.nlm.nih.gov D:\9-10-analyza_sekvenci_metody2022.pptx 00:00:03 uplynulo 00:00:08 zbývá 37-5°° 3 670 016 bajtů (973,4 KiB/s) Název souboru Prázdný výpis složky Prázdná složka. Obrázky vytvořil Vratislav Peška Soubory ve frontě (1) Neúspěšné přenosy Úspěšné přenosy Sekvenační servis nebo centrální laboratoř (CF - Core Facility) Nfvogene Services Applications Technology Resources About Contact U *>'i,',f:.fn-iii3 ■ IŕVhoíe-Cenome EctuwntJng Humanizing Genomics macrogen GeneCore |||| Home About Us Whole Genome Sequencing Re-quest a Quote Contact Lti to chscuss ho*Y*e with data analysis at the individual or population level. SNP/INDElJCNV/SVano other variants of the genome can be _____3______ sequencing analysis enables the identification of somatic and germline mutations as well as customized patterns of cancers and other diseases Human Whoie Genome Sequencing Animal & Plant Whole- Genome Sequencing Microbial Whole Genome Sequencing Vladimir Benes Head of GeneCore Room V106 Phone: +49 6221 387-8486 m nm Sequencing Platform TECHNOLOGY s. 1.6 -40 B paired end ISO, PESO. PE50.e:c MVDr. Boris Tichý, Ph.D. PacBio Sequel II/IIe System System specifications: NS0>T5kb. read leng:hs up to 2£kb (CCS). >».999% (CTVSO) consensus accuracy, coverage across high-CCn-epea: regions HiR re-ads for PacBio Sequel lie onlv learn more Vedoucí sdílené laboratoře Researcher ID C-1514-2009 imethlON uns 12Mb. high yields for real-time and accessible ///íflílllll _| Obrázky vytvořil Vratislav Peška Jak vypadá produkt poskytnutý sekvenačním servisem nebo centrální laboratoří (CF - core facility) Obrázky vytvořil Vratislav Peška IEImi03.S11.L002.R2.001 .fastq.gz Eami03_SH_L002_R1_001.fastq.9Z BSJI mi 02.S10.L001 .R2.001 .fastq.gz l___TEL102_S10_L001_R1_001 .fastq.gz _BmiOO_S9_L001_R2_001 .fastq.gz I_äT_L100_S9_L001_R1_0O1 fastq.gz _fi]m97_S14_L002_R2_001 .fastq.gz ES TEL97_S14_L002_R1.001 .fastq.gz US SME_S6_L001_R2_001 .fastq.gz E2SME_S6_L001_R1_OOl .fastq.gz E2 SCENE_S1 3_L002_R2_001.fastq.gz JBSCENE.S13_L002_Rl.001.fastq.gz [__ORACR_S7_L001_R2_(X)l.fastq.gz 12RACR_S7_L001_R1_001.fastq.gz _SPYR_S20_L002_R2_001.fastq.gz __aPYR_S20_LOO2_Rl_0Ol.fastq.gz |Š__PpWTl.S1.L001.R2.001.fastq.gz _filPpWT1_S1_LX1_Rl_001.fastq.gz H PARA_S12_LO02_R2_001 fastq.gz Ea PARA.S12.L002.R1.001 fastq.gz [HP_Sl6_l002_R2_001.fastq.gz __aP.S16.L002.Rl.001.fastq.gz 13 L_S17_L002_R2_001 .fastq.gz EaL_S17.L002.R1.001 .fastq.gz BaiPS.S19.L002.R2.001.fastq.gz __aiPS.S19.L002_Rl_001.fastq.gz EaCHX302_S8_L00l.R2.001.fastq.gz aaCHX302.S8.L001.R1.0O1 .fastq.gz _l_l H.S18.L002.R2.001 ,fastq.g2 EaH.S18_L0O2.Rl_O01.fastq.gz EJ FTAR.S5.L001.R2.001 .fastq.gz E3 FTAR_S5_L001_R1J»1 ,fastq.gz 13 FESC.S15.L0O2.R2.OO1 .fastq.gz 12 FESC.Sl 5_L0O2_R1_001 .fastq.gz H__DKC.MOCK;.S4.LO01.R2.0O1.fa5tq.gz __3DKC_MOCe_S4_LCO1_R1_001.fastq.gz I__DKCJP7.S3.LOO1_R2_0O1.fastq.gz _SlDKC_IP7_S3_L001_Rl_001.fastq.gz l!__DKC_IP6.S2.LO01.R2.0O1.fastq.gz Přes webové rozhraní Externí disk FTP Filesender As pera Etc. 1 597274 KB 1 .55 ke 3 567 597 KB 3 392 515 KB 2 088 595 KB 2013 927 KB 1 443 726 KB 1 385 953 KB 1 980 845 KB 1 881 276 KB 1 948 463 KB 1 347 9 53 KB 1737 364 KB 1 662 259 KB 5 008 736 KB 4 754 863 KB 2933 673 KB 2 712 56BKB 1 945 228 KB 1 835 490 KB 1 987 326 KB 1 886 706 KB 1 692 458 KB 1 633 4C'2 kB 4271 811 KB 4 085 774 KB 2 363 943 KB 2 226 033 KB 1 635 235 KB 1 559 576 KB 2784 510 KB 2675 555 KB 1 756909 KB 1 691095 KB 1444264 KB 1 318 614 KB 2 597 897 KB 2 380 1 77 K8 1933 748 KB (__|]PAM42230. ____PAM42230. UPAM42230. H1PAM42230. !_]PArv142230. H]PArv142230. 13PArvM2230. [SUPArvW2230. __SPAM42230. ____PAM42230. ____PAM42230. 13PAM42230. [1HPAM42230. HO1PAM42230. ____PArv142230. (BJPAM42230. mPAM42230. ____PA(v142230. HPAM42230. (Q_]PArvU2230. H__]PArvM2230. ____PAM42230. I1PAM42230. [Ě__]PAM42230. __SPAM42230. H___PAM42230. (__ž]PAM42230. [QÍPAM42230. .pass_135f8216_ .pass_135f8216. .pa5S_135f8216_ .passJ35f8216_ .pass_135f8216_ .pass.l35f8216. .passJ35f8216_ .pass_135f8216. .pass_135f8216. .pass_135f8216_ .pass_135f8216. .pass_135f8216. .pass_135f8216. .pass_135f8216 .pass_135f8216. .pass_135f8216. .pass_135f8216 _pass_135f8216. .pass.135f8216. .passJ35f8216 .pass_135f8216 .pass_135f8216 .pass_135f8216. .pass_135f8216. .passJ35f8216 .pass_135f8216 .pass_135f8216 _pass_135f8216. lňJIPAM43Jín nace 1í1ÍRJ1fi ed2e97e7. ed2e97e7_ ed2e97e7_ ed2e97e7. ed2e97e7. ed2e97e7 ed2e97e7. ed2e97e7. ed2e97e7 ed2e97e7 ed2e97e7. ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7 ed2e97e7. ed2e97e7 ed2e97e7. .ed2e97e7. ed2e97e7. ed2e97e7 ed2e97e7. ed2e97e7 .H;>»Cj7.7 .162.fastq.gz 161.fastq.gi 160.fastq.gz .159.fastq.gz .158.fastq.gz .157.fastq.gz 156.fastq.gz .155. fastq.gz .154.fastq.gz .153.fastq.gz .152.fastq.gz 151.fastq.gz .150.fastq.gz .149.fastq.gz .148.fastq.gz 147.fastq.gz .146.fastq.gz _145.fastq.gz _144.fastq.g2 143.fastq.gz 142.fastq.gz _141.fastq.gz .140.fastq.gz .139.fastq.gz .138.fastq.gz 137.fastq.gz _136.fastq.gz _135.fastq.gz 1 54 factn m 53,5 GB z 53,5 GB v 2 855 z 2 855 Obrázky vytvořil Vratislav Peška Nahrávání hrubých NGS dat do SRA NCBI t ttps://www.n cbi.nlm.nih.gov "t ® * Blockcham (JJE • 0 8mo it> ® WP Q BiOIOgist' @ JCMM [oj @ C ESN ET % ENG-lesson ■'Filesender HH An official website ofthe United States government Here's how you Know National Library of Medicine National Center for Biotechnology Information All Databases ~ I1 f E Q Panel Oblíbené položky NCBI Home_ Resource Lisi (A-Z) All Resources Chemicals & Sioassayj Data & Sof.wa'c DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information About the NCBI I Mission | Organization | NCBI News & Bloa Submit Deposit data or manuscripts into NCBI databases Develop Use NCBI APIs and code libraries to build applications Download Transfer NCBI data to your computer Analyze Identify an NCBI tool for your data analysis task COVID-19 Information Public health information (CDC) | Research information (NIHt | = treatment information |HHS> | Espaňol Learn Find help documents, attend a class or watch a tutorial Research Explore NCBI research and collaborative projects JNCBJ) I Prevention and Popular Resources PubMed Bookshelf PubMed Central BLAST Nucleotide Genome SNP Gene Protejn PubCfiem NCBI News a Blcg GenBank Release 254.0 is Available! 1? "-=t 2023 GenBank release 254 0 (2/13/2022) is now available on the NCBI FTP site This reload hac 77 t>7 trillion ha*o« ann" 1 IT New Enhancements to PMC WebSite 27 Feb 2023 Legacy view will no longer be available effective mid-March 2023 You asked, we lictonarir Wo annrei-iale un.nr foorihar k What is NCBI and who works here? 23 Feb 2023 Ever wonder who is behind all the data at Ihe National Center for Biotechnology Irrfnrmatinn (Nf.Rh? Whn k ..-.!-•- r.-. Obrázky vytvořil Vratislav Peška Other Tools TSA Submit computationally assembled, transcribed RNA sequences after submitting unassembled reads to SRA. Learn more > GEO Submit RNA-seq, ChlP-seq, and other types of gene expression and epigenomics datasets. Learn more > BioProject & BioSample Choose a tool above if submitting sequence data. Learn more > Obrázky vytvořil Vratislav Peška What You Should Expect How to submit You should register BioProject or BioSamples separately from your data only in the following situations: BioProject 1. Large and long-term projects where samples are collected over a course of year or more Sample types 2. If an NCBI curator instructed you to register a separate BioProject or BioSamples Sample metadata 3. If you are submitting an annotated genome before submitting the reads or the unannotated genome Next > Obrázky vytvořil Vratislav Peška Submit k National Library of Medicine National Center for Biotechnology Information A vpeska Submission Portal Home My submissions Manage data Groups Templates My profile BioProject New submission ATTN: to update an existing record or recent submission, you can use "Manage data to make some changes yourself. If you cannot make the desired change there, then please email your request with your BioProject accession or Submission ID included Do not create new submission to update an existing submission! Short description and brief instructions + 5 submissions Submission $ Title $ Group Status $ Updated $ SUB10948809 Acalles testensis Metagenome v BioProject: Processed PRJNA796755: Evolution of telomeres and telomerases in plants (TaxlD: 932851) Manage data J 05:00 SUB10948813 Acalles testensis Metagenome Unfinished at the Review & Submit step O BioProject: Error Similar projects already exist: PRJNA796755 Jan 13 2022 SUB6671801 Telomere sequence identification in Zostera genus v» BioProject: Processed PRJNA594842 : Telomere sequence identification in Zostera genus Manage data Jan 01 2021 Jun 162020 SUB5615852 Identification of telomerase RNA in plants v BioProject: Processed PRJNA542932 : Identification of telomerase RNA in plants Locus Tag Prefixes: • None (SAMN11639744) • None (SAMN11639746) locustagprefix.txt Manage data May 14 2019 SUB4980599 Comparative study of repeats in onion, garlic, and wild garlic v BioProject: Processed PRJNA512235 : Comoarative studv of reoeats in onion, aarlic. and wild aarlic Manage data Obrázky vytvořil Vratislav Peška Formát sekvenčních dat • FASTQ • FASTA • SAM/BAM • GFF • BED • VCF Kontrolní otázka: FASTQ [fa :stki u:] loo;ádků ^ koli,k je to i # »w ■ ^iMiMwi^jviij readu a co z toho vyplýva ohledně párovosti dat? • Nukleotidová sekvence + odpovídající Phred quality scores (Q) • textový formát (4 řádky/lsekvence) • přípona obvykle *.fq nebo *.fastq (často GNU zip, *.fastq.gz) • sekvence + kvalita zastoupena jedním znakem ASCII • Malý soubor v notepadu, velký soubor nahlížení pomocí less/more, head/tail (linux terminal - It) • FASTQC / MULTIQC Obrázky vytvořil Vratislav Peška Phred score (Q) 10x záporný logaritmus o základu 10 pravděpodobnosti chyby Phred Quality Score Probability of incorrect base call Base call accuracy 10 1 n 10 P = 0.1 10 1 20 1 in 100 p = o.Ol 10"2 99% 3: 1 in 1000 p = 0.001 3 39.9% 40 1 in 10,000 p = 0.0001 4 55.&EJ% 50 1 in 100,000 P = 0.00001 99.999% 60 1 in 1,000,000 P = 0.000001 99.9999% Obrázky vytvořil Vratislav Peška Q = -10 log10P x = log, y ax= y Převodní tabulka různých Qs sssssssssssssssssssssssssssssssssssssssss..................................................... ..........................xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...................... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................... .................................3111131131111331131111311311111111113333..................... LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL.................................................... PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP ! "#$%&'(}*+, - ./0123456789:;<=>?g'ABCDEFGHi:KLMNOPQRSTLT^'IXYZ[\]AJabcdefghijklmnopqrstuvwxyz{| }- I 126 33 59 64 73 104 126 e.......... 26. . .31 40 -5. . . .0..... . . .9.............. ..............43 Q..... . . .9.............. ..............40 3 . . .9.............. ...............41 e.2........ . .26. . .31..... . . .41 &.......... ........20. . .30...... . .40........50____ ...........93 Obrázky vytvořil Vratislav Peška S - Sanger Phred+33., raw reads typically 40; X - Solexa 5olexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically {%, 40} 3 - Illumina 1.5+ Phred+64., raw reads typically (3, 41) with <3=unused, l=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33., raw reads typically (0, 41) P - PacBio Phred+33., HiFi reads typically (0, 93) Hlavičky readů @HWI-ST 19^^4 2 : C2H0GACXX: 8 :1101: 4 4 0 4 : 217 : Y : 0 : ACACGA ATGCNTTTTATAATCAAAGCAGAAGCTTTATGCTAGCTAGCATATAAT + «<«#2<: 5>: 944>: ??AAAAAAAAABAAAABBBBB?????????? @ - začátek readů (hlavička) @HWI-ST193 - Jméno stroje (od výrobce) 542 - run ID C2H0GACXX - flowcellid 8 - flowcell lane 1101 - tile number 4 4 0 4 - x-coordinate 217 9 - y-coordinate 1 - část páru Y - filter status (Y/N) 0 - status kontrol. Bit. (0 nebo sudá čísla) ACACGA - sekvence indexu FASTQC • První náhled na NGS data (short reads, illumina) • Výstup - HTML formát • Možnost spouštět z terminálu (linux) i v rámci GALAXY (web) Obrázky vytvořil Vratislav Peška FASTQC summary + basic statistics Špatná data Dobrá data Summary ^1 Basic Statistics ^|Per base sequence quality ^)Per tile sequence quality ^|Per sequence quality scores ^IPer base sequence content ^^Per sequence GC content l^jfrPer base N content (fyb Sequence Length Distribution iQ^Sequence Duplication Levels i^^Qverrepresented sequences Adapter Content Analýza (mapování, sestavování genomu) Basic Statistics Measure Filename File type Encoding Total Sequences Sequences flagged as poor quality © Sequence length 40 *GC 45 good_sequence_short.txt Conventional base calls Illumina 1.5 256000 Summary Basic Statistics ^^Ppr ha<;ft qeqiiftnre quality 1^1 Per tile sequence quality Q ^^Per sequence quality scores Per base sequence content ^j^Per sequence GC content ^|Per base N content ^^Sequence Length Distribution Sequence Duplication Levels Overrepresented sequences ^lAdapter Content Obrázky vytvořil Vratislav Peška Pre-processing Trimming Filtrování Basic Statistics Measure Value 1 Filename bad_sequence.txt File type Conventional base calls Encoding Illumina 1.5 Total Sequences 395288 Sequences flagged as poor quality e Sequence length 46 %GC 47 Per base sequence quality Dobrá data Špatná data ... ^Per base sequence quality %^Per base sequence quality Quality scores across all bases (lllumina 15 encoding) Quality scores across all bases (lllumina 1.5 encoding) Position in read (bp) Position in read (bp) Obrázky vytvořil Vratislav Peška Per tile sequence quality Dobrá data Per tile sequence quality Špatná data ÖPer tile sequence quality Quality per tile Quality per tile 18 20 22 24 Position in read (bp) Obrázky vytvořil Vratislav Peška Per sequence quality scores Dobrá data Špatná data Per sequence quality scores Quality score distribution over all sequences 40OOO Average Quality per read 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 Mean Sequence Quality (Phred Score) 'Per sequence quality scores Quality score distribution over all sequences 60000 50000 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Mean Sequence Quality (Phred Score) Obrázky vytvořil Vratislav Peška Per base sequence content Dobrá data Per base sequence content sequence content across all bases 10 12 14 16 ib 20 22 24 26 28 30 32 34 36 3B 40 Position in read [bp i 1 2 3 4 5 6 7 Špatná data p er base sequence content Sequence content across all bases 80 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Position in read [bp) Obrázky vytvořil Vratislav Peška Per sequence GC content Dobrá data Per sequence GC content GC distribution over all sequences GC count per read Theoretical Distribution 0 2 4 5 8 11 14 17 20 23 26 29 32 35 33 41 45 49 52 55 58 61 65 69 72 75 79 82 85 89 92 95 99 Mean GC content (%) Špatná data Per sequence GC content GC distribution over all sequences 0 2 4 6 8 11 14 17 20 23 26 29 32 35 38 41 45 45 52 55 58 61 65 69 72 75 79 82 85 89 92 95 99 Mean GC content (%) Obrázky vytvořil Vratislav Peška Per base N content Dobrá data Per base N content N content across all bases 40 30 20 l%N 1 23456739 10 12 14 16 IS 20 22 24 26 28 30 32 34 36 38 40 Position in read (bp) Špatná data 'Per base N content N content across all bases 1 23456789 10 12 14 15 18 20 22 24 26 28 30 32 34 36 38 40 Position in read (bp) Obrázky vytvořil Vratislav Peška Sequence Length Distribution Dobrá data Špatná data ©Sequence Length Distribution ©Sequence Length Distribution Distribution of sequence lengths over all sequences Distribution of sequence lengths over all sequences Sequence Length (bp) Sequence Length Ibp) Obrázky vytvořil Vratislav Peška Sequence Duplication Levels Dobrá data Špatná data Obrázky vytvořil Vratislav Peška Overrepresented sequences Dobrá data ©Overrepresented sequences No overrepresented sequences Špatná data Overrepresented sequences Sequence Count Percentage AGAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTC 2065 GATTGGCGTATCCAACCTGCAGAGTTTTATCGCTTCCATG 2647 ATTGÚCGTATCCAACCTGCAGAŮTTTTATCŮCTTCCATÚA 2914 CGATAAAMTGATTGGCGTATCCAACCTGCAGAGTTTTAT 1913 GTATCCAACCTGCAGAGTTTTATCGCTTCCATGACGCAGA 1679 AAAAATGATTŮGCGTATCCAACCTGCAGAGTTTTATCGCT 1B46 TGATTGGCGTATCCAACCTGCAGAGTTTTATCGCTTCCAT 1841 AACCTGCAGAGTTTTATCGCTTCCATGACGCAGAAGTTAA 1B36 GATAAAAATGATTGGCGTATCCAACCTGCAGAGTTTTATC 1831 AAATGATTGGCGTATCCAACCTGCAGAGTTTTATCGCTTC 1779 ATGATTGGCGTATCCAACCTGCAGAGTTTTATCGCTTCCA 1779 AATGATTGGCGTATCCAACCTGCAGAGTTTTATCGCTTCC 1769 AAAATGATTGGCGTATCCAACCTGCAGAGTTTTATCGCTT 1729 CGTATCCAACCTGCAGAGTTTTATCGCTTCCATGACGCAG 1713 ATCCAACCTGCAGAGTTTTATCGCTTCCATGACGCAGAAG 1768 CAGAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTT 1634 TGCAGAGTTTTATCGCTTCCATGACGCAGAAGTTAACAGT 1668 CAACCTGCAGAGTTTTATCGCTTCCATGACGCAGAAGTTA 1668 TATCCAACCTGCAGAGTTTTATCGCTTCCAT6ACGCAGAA 1639 GTCATGGAAGCGATAAAACTCTGCAGGTTGGATACGCCAA 1629 AACTTCTGCGTCATGGAAGCGATAAAACTCTGCAGGTTGG 1616 GCAGAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTT 1589 TGGCGTATCCAACCTGCA6AGTTTTATCGCTTCCATGACG 1569 GGCGTATCCAACCTGCAGAGTTTTATCGCTTCCATGACGC 1542 6.5224939181558763 9.5178592762542754 9.5995919327689971 6.4839599429979134 9.47534961859699966 9.4679912759197325 6.46573637449159995 9.46447147396328753 9.4632965734359651 9.45965169754155147 9.45995169794155147 9.4452449859343661 9.4374926926593269 6.43335492996991496 9.43299992944679253 9.42691S49799532476 9.4219798162159128 9.421979S16215912S 6.4123575722965221 9.49982777114497726 9.4988158597214993 6.39979856691829754 9.39692578575629S2 9.39969532299389683 ACCTGCAGAGTTTTATCGCTTCCATGACGCAGAAGTTAAC 1479 6.37415757624829384 No Hit No Hit NO Hit No Hit No Hit NO Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit No Hit NO Hit No Hit No Hit No Hit No Hit No Hit Possible Source Adapter Content Dobrá data Špatná data ©Adapter Content % Adapter lllutnina Universal Adapter lllumina Small RNA 3" Adapter lllumina Small RNA 5" Adapter Nextera Transposase Sequence SOLID Small RNA Adapter 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 0 21 22 23 2 4 25 26 27 28 29 Position in read (bp) ©Ad apter Content so 80 % Adapter lllumina Universal Adapter lllumina Small RNA 3' Adapter lllumina Small RNA 5' Adapter Nextera Transposase Sequence SOUD Small RNA Adapter 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Position in read (bp) Obrázky vytvořil Vratislav Peška GALAXY FASTA [fa:stei] Textový soubor, přípony *.fa, *.fas, *.fasta, *.fna Řádky - lineš - command line Zalamování (UNIX/DOS) • NCBI příklad • multifasta >hlavička-l GATCGATCG >hlavička-2 ATCGATCGATCG <- ■ o © Začínáme ^ H O ô °- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/026/: PES M O Q Q C >t Blockchaín IÍP !S Q W ^ Brno í Index of /genomes/all/GCA/026/543/865/GC l.are Last modified Size Parent Directory - GCA 626543865.1 Aaur 10X AMNH assembly structure/ 2022- ■12 ■Si 99:47 - CCA 026543S65.1 Aau- 10 X ä'-'.nh assembly report.txt 2022- ■12' ■ s: 00:46 9.7M GCA 826543865.1 Aaur ICx A'-.fiH assembly stats.txt 2922- •12 ■01 00:46 7.9K GCA 026543865.1 Aaur 10X AMNH feature count.txt.sz 2022 ■12 ■öl 00:46 172 GCA 026543865.1 Aaur 10X Av,r,H genomic.fna.gz 2922- ■12 ■01 00:46 560M GCA 026543865.1 Aaur 10X AMNH genomic.gbff. gz 2022 ■12 ■91 90:46 744M GCA 026543865.1 Aaur 10X a:-;nh genomic gaps.txt.gz 2022- •12 ■91 00:46 541K GCA 026543865.1 Aaur 10x AMNH wgsmaster.gbff.gz 2022 -12 ■01 99:46 1.3K README.txt 2020- ■29 ■02 16:26 43 K annotation hashes.txt 2022- •12 ■01 90:46 410 assembly status.txt 2023 -04 -2-. 97:25 14 md5checksums.txt 2022- •12 ■ei 00:47 1.7K HHS Yulneiabilitv Disclosure Obrázky vytvořil Vratislav Peška SAM (BAM) - sekvence, mapování, kvalita... • Textový-TAB-delimited/TAB-separated. *.sam *.bam • SAM = sequence alignment map (BAM - binární SAM tj. zapsaný formou jedniček a nul) Hlavička je nepovinná a skládá se z řádků: @HD - (řádek hlavička) VN:(verze formátu) SO:(alignment seřazen podle - unknown/unsorted/queryname/coordinate) GO:(group order-none/query/reference) @SQ - (slovník referenční sekvence, může být více @SQ řádků) SN: jméno referenční sekvence (každá SQ musí mít unikátní SN tag) LN: (délka referenční sekvence 1 - (123-1)) AS: identifikátor genomového assembly M5:MD5 kontrolní součet sekvence SP: (species) UR: (URI - http:; ftp: file-systém path) @RG - (read group information , může být více @RG řádků) ID:(unikátní pro každý @RG) CN:jméno sekvenačního centra DS:description DT:date run ISO8601 FO:fíow order KS:key sequence LB: I i b ra ry PG:program of processing Pkmedian insert size PĽpIatform model PU:platform unit unikátní identifikátor SM:sample name @PG - (info o programu pro vytvoření BAM/SAM) ID:unikátní identifikátor PN:jméno programu CLcommand line PP:previous @PG-ID DS:popis programu VN:verze programu @CO Line - (jednořádkový textový komentář) Sekce alignmentu 11 povinných polí (sloupců) + další volitelné (pole SEQ * = sekvence není uložena) QNAME: query name (=FASTQ) FLAG: RNAME:jméno referenční sekvence (jedna z SN z @SQ řádku) POS:první shoda v referenci vlevo MAPQ:kvalita mapování CIGARxharakter shody RNEXT:jméno reference dalšího namapovaného readů PNEXT:pozice dalšího namapovaného readu TLEN:délka pozorované shody SEQ: sekvence readů (=FASTQ) QUAL:quality scores(=FASTQ) QHD VN:1.6 SO:coordinate OSO SH:r«f LN:4S QNAME:_FLAG: RNAME: POS: MAPO: CIGAR:_S_EQ rOOl ##INFO= ##FORMAT- ##FORMAT= SAMPLEI SAMPLEZ AC=9;AN=7424 GT:DP:GQ 0/0:4:12 0/0:3:9 A06;AN=7446 GT:DP:GQ 0/1:4:12 6/0:3:9 AC=5;AN=7506 GT:DP:GQ 0/0:5:15 0/0:4:12 } metadata SAMPLES 0/1:1:3 3/0:1:3 0/0:5:15 SAMPLE4 0/1:9:24 6/0:9:24 0/0:9:24 SAMPLE5 1/0:4:12 0/1:4:12 0/0:4:12 T G C T G A A G T G AC=2;AN=7542 GT:DP:GQ 1/0:5:15 AC=9;AN=7424 AC=6;AN=7446 AC=5;AN=7506 AC=2;AN=7542 0/0:9:27 0/0:10:30 0/0:15:39 0/0:9:27 SAMPLES 0/0:5:15 0/1:5:15 0/0:4:12 1/0:13:39 SAMPLE7 0/0:4:12 0/0:4:12 0/0:4:12 0/1:14:42 GT:0/0:6/0:0/1:6/1:1/0:6/0:0/0 GT:0/1:0/0:0/0:0/0:0/1:6/1:0/0 GT:O/O:0/O:O/O:6/0:0/0:0/0:0/0 GT:1/0:O/O:0/0:0/O:0/0:I/O:0/1 DP:4:3:1:9:4:5:4 DP:4:3:1:9:4:5:4 DP:5:4;5:9:4:4:4 DP:5:9:10:15:9:13:14 GQ:12: 9: 3:24:12:15:12 Cq:12: 9: 3:24:12:15:12 GQ:15:12;15:24:12:12:12 GQ:15:27:30:39:27:39:42 data Obrázky vytvořil Vratislav Peška Jaký je rozdíl mezi SNP a SNV? • single nucleotide variant (SNV) • single nucleotide polymorphism (SNP) • single-base substituce (real-time PCR, microarrays, NGS) • SNV variatna na úrovni 1 nukleotidu v genomu populace • SNP variatna na úrovni 1 nukleotidu v genomu zárodečné linie alespoň u 1% populace. Obrázky vytvořil Vratislav Peška Genomic regions, transcripts, and products JOBA O Choose placement GRCh3S.pl2 { NC_000008.11) Obrázky vytvořil Vratislav Peška See rs248 in Variation Viewer __) ?i rfc_ooooo8.ii • I Fmd: >53,260 119,953,270 -QJ-' SI .Tě 119,953,280 119,953,290 119,953,300 119,953. 19,953,320 19,953,330 X Too15 * I O Trad;B " _____ Download - ij' ? 119,953,350 119,953,360 19,953,340 TATTTTTGSCA6AACTSTAA6CACCTTCATTTTCTTTTTCTTCĽA ATAAAAACCUTCTTUACATTC&TÖÖAAGTAAAAUAAAAAUAAG&TTTCCTCCTCAAATT&ATÖ&&A5ACCT&TTACAQÖTAGA&AACCCTATÖTCGGAACCTCGGGTAC, Genes, NCBI Homo sapiens Annotation Release 1D9.20210226 Live RefSHPa, dbSNP bl54 v2 123 BkVkVk« TTTTT/TT/TTTTTT rs9E5033912 ■ fl/C rsll570897 ■ C/T rs7515l4íe b ■ R/C/T rsl593142S12 ak*! CC/C rsl36645790S ■ T/C rsl459544942 ■ G/ľ) S/fl rs247 ■ fl/C rsl392747343 ■ fl/G rs248 rs343 ■ C/fl rsl475645037 ■ G/R rsll802474B9 ■ T/C rs7801888: rs777657377 360 = T/C rsl298622189 ■ G/R rs778752724 ■ R/G rs576232776 ■ C/T rs747 2! LtU TflflCTfl/Tfl 275 rsll774912 G/fl rs768718980 ■ rsl304275416 ■ C/T ■ T/fl rsl228932p"28 rs9287043E0 ■ C/R rsl590142580 ■ fl/G rsllS _ TGGR/TGGflGflTGGfl rs767827334 ■ C/T C/T rsl563E74212 ■ fl/G N&S6662759" rsllB678290 ■ T/C rs759167136 ■ G/R rs77482E ■ T/C rs770491860 ■ T/C rsl3350ž I T/C 8689 ■ G/C 22G4 ■ Fl/C Clinical, dbSNP bl54 v2 Cited Variations, dbSNF bl54 v2 rs343 ■ C/fl 1000 Genomes Pnase 3, dbSHF bl54 v2 UL LiU cs343 ■ C/H rs247 I I C/T I C/T I C/T fl/C Splice Donor Reaion Variations, dbSHF bl54 v2 Ylfam ing: Md track data fou nd in th is range Splice Acceptor Region Variations, dbSHP blS4 v2 V^mlng: Nc track data found in this range Missense Variations, dbSHP bl54 v2 Frameshift Variations, dbSNP blo4 v2 rsl475645007 ■ G/R rs7801886; rs751554íŕti lil LĹ 2_ LÍ_J G/fl LID G/fl L0J G/R lBj _____ LřU rsl292441763 ■ H/T rsll94174059 ■ C/G rs748459586 ■ G/C rs773843594 ■ T/C rail 24238117 rsllG67£ii^B rsllB67829ö I T/C T/C rsSSGEE.^/t* rsl228932 T/fl 2264 ■ fl/C :28 ■ T/C rs928704350 ■ rsl292441763 ■ fl/T rs7484E958S ■ G/C rslS63574212 C/R fl/G ľSĽ8t.&b27&9 rsl33507 0S89 I TflflCTfl/Tfl 119,953,320 119,953,330 119,953,340 56168 ■ G/fl ± Ú R/G C/T C/T G/C 9f6168 ■ G/fl X 0 X X 0 X X 0 X X 0 X i. 0 X X O x X 0 X HC_rjr_ir_008.ll: 20M..20M (109 nt) £ Tracks shown: LO/665 Command line introduction G (ň Q Type cmd and press enter dir (+ enter) cd cd.. G: del mkdir rmdir ipconfig netstat ping system info els šipky nahoru dolu tabulator Is there a Windows command line with smart bash-like autocompletions / command history? - Super User key F7 color OA title označ text + right-click rázky vytvořil Vratislav Peška IH3 Command Prompt Microsoft Windows [Version 10.0.19042.928] (c) Microsoft Corporation. All rights reserved. C:\Users\21286> Dot plot (dotter) Obrázky vytvořil Vratislav Peška Dotplot C/l a> < n a> co > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sekvence A AAATCGGCTAGCCGCAATGCCCAGTAAAGCAGCTTAAGAAA •• • : st • ••• • tt ttt Obrázky vytvořil Vratislav Peška Dotter Google jdotter Q, All Q Images Q] Videos GD News Q Shopping : More x (5 Q, Tools About 3,970 results (0.46 seconds) Viral Bioinforrrtatics Resource Center https://4virology.net> Virology ca Tools • JDotter Java Dot Plot Alignments (JDotter) is a platform-independent Java interactive interface for the Linux version of Dotter. a widely used program for... https://4virology.net > Help > Tool Help > Help Books • JDotter Documentation Java-Ootter (JDotter) is a platform-independent Java interactive interface for the Linux version of Dotter - a widely used program for generating dotplots Obrázky vytvořil Vratislav Peška IPK Gaterslehen httpV/pgrcipK-gatersleben de > jdotter • JDotter: Java Dot Plot Alignments JDotter is a platform-independent Java interactive interface for the Linux version of Dotter, a widely used program for generating dotplots of large DNA or... _» National Institutes of Health (gov) https://pubmed.ncbi.nlm.nih gov >... • JDotter: a Java interface to multiple dotplots generated by ... by R Brodie 2004 Cited by 131 —Java-Dorter (JDotter) is a platform-independent Java interactive interface for the Linux version of Dotter, a widely used program for generating.. BLAST Obrázky vytvořil Vratislav Peška U.S. National Library of Medicine National Center for Biotechnology Information BLAST ® » blastn suite Home ^^bT^t^^ blastp blastx tblastn tblastx Standard Nucleotide BLAST Enter Query Sequence Enter accession number(s), gi(s). or FASTA sequence(s) © cjear BLASTN programs search nucleotide databases using a nucleotide query, more.. Query subrange © From To Or, upload file Job Title Zvolit soubor Nevybrán žádný soubor © Enter a descriptive title for your BLAST search © O Align two or more sequences © Choose Search Set Database @) standard databases (nr etc.): (..) rRNA/ITS databases (.) Genomic + transcript databases O Betacoronavirus Nucleotide collection (nr/nt) Organism Optional Enter organism name or id-completions will be suggested ID exclude [ Add organišrřT) Virtualizace - cesta k linuxu (windows) Obrázky vytvořil Vratislav Peška WSL - window for subsystem linux Obrázky vytvořil Vratislav Peška Setting up Window Subsystem for Linux (WSL1) % System Information £ile Edit View Help System Summary EB Hardware Resources -■■■Components -(■■■Software Environment □ X Item Value OS Name Microsoft Windows 10 Pro Version 10.0.19042 Build 19042 Other OS Description Not Available OS Manufacturer Microsoft Corporation System Name ZAM-71200803-NB System Manufacturer Hewlett-Packard System Model HP ProBook 650 Gl System Type x64-based PC System SKU H5G79 EA#BCM Processor Intel(R) CorefTM) i5-4200M CPU @ 2.50GHz, 2 BIOS Version/Date Hewlett-Packard L77 Ver. 01.43, 1/25/2018 SMBIOS Version 2.7 Embedded Controller V... 22.60 BIOS Mode Legacy BaseBoard Manufacturer Hewlett-Packard BaseBoard Product 1993 BaseBoard Version KBC Version 16.3C Platform Role Mobile Secure Boot State Unsupported PCR7 Configuration Binding Not Possible Windows Directory C:\WINDOWS System Directory C:\WI N DOWS\system32 Boot Device \Device\HarddiskVolume1 Locale United States Hardware Abstraction L... Version = "10.0.19041.906" User Name ZAM-71200803- N B\21286 t:„_ -,___ ^,„i_,l r______n„j:.u-r.__ nainstalovat aplikaci (virtuální systém) Microsoft Store ^- Domů Hraní her Zábava Produktivita Slevy P Hlec Výsledky pro: linux Oddělení Všechna oddělení K dispozici pro PC Aplikace (136) Zobrazitvše Ubuntu Kali Lim c Ubuntu ■k-k-k-k: 38 *++**lp 20.04 LTS Zdarma Zdarma Nainstalován^ Ub intu 18.04 SUSE Linux Linux T< LT? Enterprise... Cheatsheet R a + f SU 5 Zdarma Zdarma Zdarma* z< Ubuntu 20.04 LTS download VS + instalace VS + restart celého počítače Obrázky vytvořil Vratislav Peška M eta Cent rum g y- Odkud přicházíte ? X + ^- -) O Cíl Ů https://metavo.metacentrum.cz/osobn i v3/^ 'lo ifl investice !■ aukce ifl cloud !■ knihy © Google Editorial System - Y... Third Call | EASI-Ge... @ Google *j Internet věcí je tady... Q ISSN 18Q2-26S5 Q Bookmarks ISISApp e-lnrrastrufctura CESNET Síť Výpočty Úložiště Spolupráce Multimédia Bezpečnost Identita etacentrum Odkud přicházíte ? Zvolili jste registraci ke službě, která js určena pouze akademickým pracovníkům, zaměstnancům a studentům vědeckovýzkumných instituci' v České republice a to pouze: Pro ověřeni' osobních údajů využíváme Českou akademickou federaci identit edulD.cz. Identitu a ověřené údaje poskytuje domovská organizace uživatele^ která garantu akademických a výzkumných institucí, které jsou členy edulD.cz, dokážeme ověřit totožnost on-line. Většina vysokých škol a akademických institucí v ČR je do federace ident instituce v seznamu není, ověření akademického původu zajišťuje CESNET. cesnet eduid.cz Mám účet v organizaci zapojené do edulD.cz Moje organizace není v edulD.cz a potřebuji ověřit svoji náhradní" identitu Přimé odkazy na přihlášeni pres vybrané instituce z edulD.cz M U U I M"aryh ■ i vili univers ■.ova niverzita Univerzita Karlova t! k Západočeská univerzita v Plzni Obrázky vytvořil Vratislav Peška Prenatální diagnostika ¥ _ _ _ _ _ _ Ii 11) j »<«)» 13 ií * « i« H H li n f r Klasický karyotyp (rozlišen, - s Mb) Array-CGH (rozlišení - 0,1 Mb) NGS (rozlišení- jednotlivé báze) Obrázky vytvořil Vratislav Peška Doba odezvy (TAT) v den odběru 1-2 týdny 2-3 týdny QF-PCR Array-CGH (rozlišení - 0,1 Mt>) NGS (rozlišení- jednotlivé báze) Cena $$$ $ $$ $$$ QF-PCR Array-CGH (rozlišení - 0,1 Mt>) NGS (rozlišení- jednotlivé báze) „Netušíme, kde a co hledat" m « • • • * 1 • * * • ■ e • • • • * o • « o * * * • • • • • • ■ • • * ■ • o ■ • ■ • ■ ,-i * m • • o ■ • m jm JL A Q • m m ■ * * • * ■ • • » • • • 0*1 * • • # • • o * * * 0 • • LJ . • ». • • • 1 m « ď A A ■ • * • A A ■ • • o » A m A A • D f*. ^ ■v W W » ■ 0 * * A A A A- ■ W » o • • • * * * • * • w * * W W • ■ • * ♦ ■ ■ • • o • • • • * ĺ1u1j LJ ♦ • 1111 EE n LJ Est EXS O Q • • EXE |"| A • • * • o o n • • • • • • • * ■ • * * • o r; w ^ w » o * ♦ # M SI 9 r, NGS v prenatální diagnostice •Panely genů pro specifické fenotypové skupiny • např. vrozené srdeční vady, RASopatie, skeletální vady • WES (popř. WGS) - u nás Heredity panel (klinický exom 3332 genů) •trend WES + CNV (2vl) • NIPT - neinvazivní prenatální testování Obrázky vytvořil Vratislav Peška RASopatie - sy Noonanové Syndrom Noonanové Noonan-like (CBL syndrom) Noonan-like sy with loose anagen hair Costello syndrom Hereditární gingivální fibromatóza Jaffe Campanacci syndrom Kardio-facio-kutánní syndrom (CFC) Legius syndrom (NFl-like) LEOPARD syndrom (Multiple lentigines) Neurofibromatóza typu 1 Neurofibromatóza typu 2 Neurofibromatosa-Noonan (NFNS) Watson syndrom RASopatie - sy Noonanové Rasopathy genes PTPN11 SHOC2 SOS1 A2ML1 RAF1 LZTR1 RIT1 RASA2 BRAF SOS2 KRAS MAP2K2 NRAS HRAS MAP2K1 SPRED1 RRAS NF1 CBL NF2 Indikace pro prenatální diagnostiku of RASopatií: • zvýšené šíjové projasnění (NT) nebo cystická hygroma (karyotyp, aCGH v normě) v kombinaci s: » hydropsfetalis » srdeční anomálie » polyhydramnion a/nebo pleurální výpotek » specifické faciální anomálie (hypertelorismus and micrognathia) » ledvinové anomálie RASopatie - sy Noonanové Klinický popis matka 32 let UZ: • 13/14/16/19 tg NT:2,3/3,1/1,8 mm • 14/16 tg jugular lymfatické vaky • 19 tg anomálie renálního duplexu a hypertrofie pravé komory Noonan syndrome IVF 13 gw - NT 6 mm 17 gw - total fetal hydrops with subcutaneous infiltration, ascites Mutation in ft/74: c.319A>G, p.Metl07Val de novo RIT1 (RIT1) ~5 % cases (Aokieto/.^oie), causal gene since 2013 14 gw - NT 8 mm, lymphatic sacs, hygroma colli, renal pelvis bilat 2.5 mm, susp. CHD 16 gw - NT 5 mm, lymphatic sacs, agenesis ductus venosus, 19 gw NT 4 mm, dilatation of renal pelvis 8 mm, hypertelorism, low set earlobes, CHD RAF1 C.770OT de novo (several publications) RASopatie - sy Noonanové Molekulární diagnostika: QF-PCR norma (13,18, 21, X a Y) Karyotyp norma array-CGH - žádná patogenní varianta 1 RASopatie NGS panel 1700 1734 1768 1802 1836 187 C T Ľ C C C G T A A G T A T C i 142 145 148 151 154 - 82 bp - 112 910 840 bp 112 910 850 bp 112 010 860 bp I_|_I_I_I hr12:112910*W ■ Total count: 194 _ \:0 j z: 99 [51 %, 21+■, 73- ] 1 3:0 l~:&5(49%,11+,84-] ■J : 0 HACATCC TGCCC m ^^to^toWM L ^^iHllllkB LalUllH y ^^iHllllllH y ^^^^^^H PTRN11 Kauzální mutace: PTPN11: c.853T>C (p.Phe285Leu) - de novo Obrázky vytvořil Vratislav Peška Cardio panel 230 genes CHD~90% RASopathies AORTOPATHIES CONGENITAL HEART DISEASE CARDIOMYOPATIE RASopathies ARHYTMIAS Obrázky vytvořil Vratislav Peška Ethiology of CHD WES - celoexomové sekvenování WES - celoexomové sekvenování Milroy syndrom Mutace v genu FLT4 c.3075G>A p.(Metl025lle) vascular endothelial growth factor 3 - regulace vývoje struktur lymfatického systému KARYOTYP/QF-PCR diagnostická výtěžnost Downův syndrom Pataův syndrom Edwardsův syndrom Turnérův syndrom Triploidie další aneuploidie nebalancované přestavby (5-10 Mb) Obrázky vytvořil Vratislav Peška KARYOTYP/QF-PCR + array-CGH diagnostická výtěžnost Downův syndrom Pataův syndrom Edwardsův syndrom Turnérův syndrom Triploidie další aneuploidie nebalancované přestavby (5-10 Mb) Mikrodelece a mikroduplikace (10 kb) UPD, AOH Obrázky vytvořil Vratislav Peška KARYOTYP/QF-PCR + array-CGH + WES diagnostická výtěžnost Downův syndrom Pataův syndrom Edwardsův syndrom Turnérův syndrom Triploidie další aneuploidie nebalancované přestavby (5-10 Mb) Mikrodelece a mikroduplikace (10 kb) UPD, AOH Monogenní nemoci Obrázky vytvořil Vratislav Peška Interpretace SNP variant D Ameriran College nJ Medical Oenelic and Cibfimuhs ACMG STANDARDS AND GUIDELINES Genetics inMedicine Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology Sue Richards, PhD1, Nazneen Aziz, PhD216, Sherri Bale, PhDJ, David Bick, MD4, Soma Das, PhD5, Julie Gastier-Foster, PhD*Ta, Wayne W. Grody, MD, PhD*1*11, Madhuri Hegde, PhD12, Elaine Lyon, PhD11, Elaine Speetor, PhD14, Karl Voelkerding, MD13 and Heidi L. Rehm, PhD15; on behalf of the ACMG Laboratory Quality Assurance Committee Obrázky vytvořil Vratislav Peška Databáze • aktuální verze!!! (hg38 a GRCh38) • UCSC, NCBI, Ensembl - Genome Browser • LOVD • HGMD • ClinVar • OMIM + dbSNP • UniProt Obrázky vytvořil Vratislav Peška ji )I lí 11 JI í( li jí u ii n lil bl II II •! v "s "li* ■ ■ al 1966 1967 TRI SOM Y 1970 1980 1990 1997 2002 2008 2015 \^) 1975 1985 1994 1 1 1 MA. Další zdroje fetálního materiálu pro prenatální diagnostiku Volná fetální DNA v maternální cirkulaci Detekovatelná od 5. tg Po porodu mizí za 30-60 minut 3 - 25 % celkové cirkulující DNA 0brázkY vYtvořil Vratislav Peška Původ z trofoblastu Volná fetální DNA cirkulující v krvi matky určení pohlaví pro X vázaná onemocnění, př. hemofílie, DMD při nejasnosti genitálu na UZ RhD genotypizace plodu určování otcovství monogenní onemocnění AD: achondroplázie, thanatoformní dysplázie, Huntingtonova chorea, myotonická dystrofie AR: beta-talasemie, cystická fibróza, kongenitální adrenální hyperplázie X vážené: hemofílie, retinitis pigmentosa NIPT aneuploidií Obrázky vytvořil Vratislav Peška Obrázky vytvořil Vratislav Peška Ort Ve i})) n konec • Když nad limit -> amplifikace 23+1 STR markerů Forenzní analýza DNA - amplifikace • Analýza 23+1 STR markerů • Znásilnění - vylučovací metoda Y • mtDNA - vylučovací metoda v ženské linii _ . |. D19S433 j vWA | TPOX j D1SS51 I 70 140 210 2B0 350 420 70 140 210 280 350 420 2000 0 Obrázky vytvořil Vratislav Peška Forenzní analýza DNA - vyhodnocení geneMapper CODIS - combined DNA index system (FBI) Databáze DNA PČR (jména nezávisle na CODIS) /"*—f^^\\ fcfefrJS;) C BTS Bureau of - NCVS YSMPe!/ Justice Statistics \^Zi^i!/ About BJS • Topics Publications * Data ^ Funding & Awards ^ Programs Home/ Glossary CODIS CODIS is an acronym for Combined DNA Index System, which is a computer software program that operates local, state, and national databases of DNA profiles from convicted offenders, unsolved crime scene evidence, and missing persons. DNA (Deoxyribonucleic Acid) I Combined DNA Index System (CODIS) Forensic sciences Contact Us | Subscribe 1 Sign In | •* 13 CODIS Core STR Loci with Chromosomal Positions TH01 8 d VWA j D5S818 FGA n 5 |CSF1PO^ 5 S 4 5 6 7 B D8S1179 é 8 D7S820 News Resources v D13S317 e 11 i B i D16S539 ■ D18S51 D O Ô O i B s s ■ ; D21S11 S w s u u 10 11 12 u. AM EL AM EL U 13 14 15 16 17 18 19 20 21 22 X FAQs Glossary Related Links Multimedia Obrázky vytvořil Vratislav Peška Závěr: Seznámili jsme se s metodami analýzy (např. genomových) sekvencí v základním i aplikovaném výzkumu. Uvedli jsme si, kde genomové sekvence získat (se kve nací/sta h ováním z databází) a jak pracovat s velkými objemy datových souborů Probrané databáze: GenBank, SRA, a přehled několika dalších Vysvětlili jsme si funkci a strukturu formátů, ve kterých jsou sekvenační data uchovávána a zpracovávána Probrali jsme sw Geneious, FASTQC, JDotter, RepeatExplorer, úvod práce v linuxu a příkazové řádce Zběžně jsme se seznámili s forenzní analýzou DNA a sekvenační prenatální diagnostikou