From discovery to technology explosion• 1868: Discovery of DNA • 1953: Watson and Crick propose double helix structure • 1977: Sanger sequencing • 1985: PCR • 2000: Working draft human genome announced (Sanger method) • 2005: 454 sequencer launch (pyrosequencing) • 2006: Genome Analyzer launched (Solexa sequencing) • 2007: SOLiD launched (ligation sequencing) • 2009: Whole human genome no longer merits Nature/Science paper • 2010: “third-gen” systems $ human Genome $3 billion $2-3 million $250k $50k $20k <$1k 6 Oxford Nanopore Sensor array chip: many nanopores in parallel DNA Sequencing Proteins Polymers Small Molecules Adaptable protein nanopore: Application Specific GenericPlatform Electronic read-out system Mechanical damage during tissue homogenization. Wrong pH and ionic strength of extraction buffer. Incomplete removal / contamination with nucleases. Phenol: too old, or inappropriately buffered (pH 7.8 – 8.0); incomplete removal. Wrong pH of DNA solvent (acidic water). Recommended: 1:10 TE for short-term storage, or 1xTE for long-term storage. Vigorous pipetting (wide-bore pipet tips). Vortexing of DNA in high concentrations. Too many freeze-thaw cycles (we tested 5, still Ok). Debatable: sequence-dependent DNA degradation Two strategies • Whole genome shotgun (bottom-top) • Clone-by-clone (top-bottom) Genome sequencing • A rapid progress in next generation sequencing technologies promises to provide complete (reference) DNA sequences • The bottleneck: – NOT the sequencing capacity – BUT the ability to assemble many short reads with prevalence of repeated DNA (and polyploidy) Sequencing without a limit? Genome sequencing GenBank 1982 Los Alamos Sequence Database Walter Goad Frederick Sanger 1958 – Nobel prize – insuline structure 1975 - Dideoxy sequencing method 1977 – Φ-X174 (5,368 bp) sequence 1980 – second Nobel prize λ phage sequence shotgun method (48,502 bp) Genome sequencing • 1986 Leroy Hood: automatic sequencing machine • 1986 Human Genome Initiative Leroy Hood Genome sequencing • 1995 John Craig Venter first bacterial genome John Craig Venter Craig Venter Global Ocean Sampling Expedition Synthetic genomics Human Longevity Inc http://www.youtube.com/watch?v=J0rDFbr hjtI Which applications are labs performing? 2010 Human genome reference 2010 Human genome reference Anne Wojcicki CEO - manželka spoluzakladatele Google Sergey Mikhaylovich Brin 23andme (30% GSK) • http://www.454.com Genome Sequencer 20 System 454 pyrosequencing (2005) DNA library preparation Fragmentace DNA Ligace adaptoru Vychytání DNA molekul denaturace emPCR Vznik emulze (olej) emPCR emPCR Vychytání kuliček Vychytání kuliček denaturace Sekvenační primer Disperze na sklíčko Disperze na sklíčko Parametry mikroreaktorů Parametry mikroreaktorů sekvenace sekvenace sekvenace sekvenace sekvenace sekvenace sekvenace sekvenace SOLID (Sequencing by Oligonucleotide Ligation and Detection) 2-base encoding sequencing (2007) Solexa (2007) HELICOS (2008) True Single Molecule Sequencing (tSMS) Single Molecule Real-Time (SMRT) Pacific Biosciences 20 zeptolitrů Ion Torrent Oxford nanopore Další technologie • Mikroelektroforéza • Sekvenování na bázi microarray CHALLENGES IN GENOME SEQUENCING De novo genome assemblies using only short read data of NGS technologies are generally incomplete and highly fragmented due to  Large duplications  High proportion of repetitive DNA - chromosomal approach, BAC-by-BAC sequencing - challenge!  Large genome size (~17 Gb)  Polyploidy (3 subgenomes) Chromosomal approach BAC-BY-BAC SEQUENCING BAC clones  Physical map is composed of contigs of overlapping BAC clones  BAC contigs are landed on the chromosome through markers comprised in the contigs SOLUTIONS FOR THE REPEATS  Long mate-pair reads > 10 kb  Long read technologies – PacBio, Oxford Nanopore  Optical mapping  Single-molecule mapping of genomic DNA hundreds of kilobases to several megabases in size  Creates sequence-motif maps, which provide long-range template for ordering genomic sequences  Visualisation of reality “Seeing is Believing” Three enzymatic approaches  restriction enzymes: sequence-specifically cleave DNA immobilized on a surface  nicking enzymes: fluorescent labelling of the nicking site in solution (BioNano Genomics - Irys)  methyltransferase enzymes: labelling with ultra-high density OPTICAL MAPPING Nicking Strand displacement Incorporatio n of fluorescent nucleotides BIONANO GENOME MAPPING ON NANOCHANEL ARRAYS 3 Fluorescence imaging Lam et al., Nat. Biotechnol. 30(8) 2012 4 Map construction DNA linearization2 5 Building consensus map Nickase (Nt.BspQI) 1 Sequence-specific labeling U U A