CG920 Genomics Lesson 8 Next Generation Sequencing Roman Hobza Institute of Biophysics of the Czech Academy of Sciences hobza@ibp.cz 2 From discovery to technology explosion  1868: Discovery of DNA  1953: Watson and Crick propose double helix structure  1977: Sanger sequencing  1985: PCR  2000: Working draft human genome announced (Sanger method)  2005: 454 sequencer launch (pyrosequencing)  2006: Genome Analyzer launched (Solexa sequencing)  2007: SOLiD launched (ligation sequencing)  2009: Whole human genome no longer merits Nature/Science paper  2010: “third-gen” systems $ human Genome $3 billion $2-3 million $250k $50k $20k <$1k 2 3 4 5 6 77 7 7 Oxford Nanopore Sensor array chip: many nanopores in parallel DNA Sequencing Proteins Polymers Small Molecules Adaptable protein nanopore: Application Specific GenericPlatform Electronic read-out system 8 Mechanical damage during tissue homogenization. Wrong pH and ionic strength of extraction buffer. Incomplete removal / contamination with nucleases. Phenol: too old, or inappropriately buffered (pH 7.8 – 8.0); incomplete removal. Wrong pH of DNA solvent (acidic water). Recommended: 1:10 TE for short-term storage, or 1xTE for long-term storage. Vigorous pipetting (wide-bore pipet tips). Vortexing of DNA in high concentrations. Too many freeze-thaw cycles (we tested 5, still Ok). Debatable: sequence-dependent DNA degradation 9 10 Two strategies • Whole genome shotgun (bottom-top) • Clone-by-clone (top-bottom) Genome sequencing 11 • A rapid progress in next generation sequencing technologies promises to provide complete (reference) DNA sequences • The bottleneck: – NOT the sequencing capacity – BUT the ability to assemble many short reads with prevalence of repeated DNA (and polyploidy) Sequencing without a limit? 12 12 Genome sequencing GenBank 1982 Los Alamos Sequence Database Walter Goad 13 Frederick Sanger 1958 – Nobel prize – insuline structure 1975 - Dideoxy sequencing method 1977 – Φ-X174 (5,368 bp) sequence 1980 – second Nobel prize λ phage sequence shotgun method (48,502 bp) 14 Genome sequencing  1986 Leroy Hood: automatic sequencing machine  1986 Human Genome Initiative Leroy Hood 15 Genome sequencing  1995 John Craig Venter first bacterial genome John Craig Venter 16 Craig Venter Global Ocean Sampling Expedition Synthetic genomics Human Longevity Inc http://www.youtube.com/watch?v=J0rDFbr hjtI 17 Which applications are labs performing? 17 18 2010 Human genome reference 19 2010 Human genome reference 20 Anne Wojcicki CEO - manželka spoluzakladatele Google Sergey Mikhaylovich Brin 23andme (30% GSK) 21 22 23 24 25 26 27 28 29 30  http://www.454.com Genome Sequencer 20 System 454 pyrosequencing (2005) 31 DNA library preparation 32 Fragmentace DNA 33 Ligace adaptoru 34 Vychytání DNA molekul 35 denaturace 36 37 emPCR 38 Vznik emulze (olej) 39 emPCR 40 emPCR 41 Vychytání kuliček 42 Vychytání kuliček 43 denaturace 44 Sekvenační primer 45 Disperze na sklíčko 46 Disperze na sklíčko 47 Parametry mikroreaktorů 48 Parametry mikroreaktorů 49 sekvenace 50 sekvenace 51 sekvenace 52 sekvenace 53 sekvenace 54 sekvenace 55 sekvenace 56 sekvenace 57 SOLID (Sequencing by Oligonucleotide Ligation and Detection) 2-base encoding sequencing (2007) 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Solexa (2007) 76 77 78 HELICOS (2008) True Single Molecule Sequencing (tSMS) 79 Single Molecule Real-Time (SMRT) Pacific Biosciences 20 zeptolitrů 80 Ion Torrent 81 Oxford nanopore 82 Další technologie  Mikroelektroforéza  Sekvenování na bázi microarray 83 CHALLENGES IN GENOME SEQUENCING De novo genome assemblies using only short read data of NGS technologies are generally incomplete and highly fragmented due to  Large duplications  High proportion of repetitive DNA - chromosomal approach, BAC-by-BAC sequencing - challenge!  Large genome size (~17 Gb)  Polyploidy (3 subgenomes) Chromosomal approach 83 84 BAC-BY-BAC SEQUENCING BAC clones  Physical map is composed of contigs of overlapping BAC clones  BAC contigs are landed on the chromosome through markers comprised in the contigs 85 SOLUTIONS FOR THE REPEATS  Long mate-pair reads > 10 kb  Long read technologies – PacBio, Oxford Nanopore  Optical mapping  Single-molecule mapping of genomic DNA hundreds of kilobases to several megabases in size  Creates sequence-motif maps, which provide long-range template for ordering genomic sequences  Visualisation of reality “Seeing is Believing” 86 Three enzymatic approaches  restriction enzymes: sequence-specifically cleave DNA immobilized on a surface  nicking enzymes: fluorescent labelling of the nicking site in solution (BioNano Genomics - Irys)  methyltransferase enzymes: labelling with ultra-high density OPTICAL MAPPING Nicking Strand displacement Incorporatio n of fluorescent nucleotides 87 BIONANO GENOME MAPPING ON NANOCHANEL ARRAYS 3 Fluorescence imaging Lam et al., Nat. Biotechnol. 30(8) 2012 4 Map construction DNA linearization2 5 Building consensus map Nickase (Nt.BspQI) 1 Sequence-specific labeling U U A Fluorescent dye conjugated nucleotides (Alexa 546 dUTP) were incorporated at  the Nt.BspQI sites by Vent (exo−) polymerase. Next, we stained the labeled DNA  molecules with the DNA‐intercalating dye, YOYO‐1, which facilitates visualization of the DNA molecule and measurement of its size. Then, we loaded the DNA onto a nanochannel array chip and applied an electric field, which gradually drives the long, coiled DNA molecules in free suspension through a series of micro‐ and nanofluidic structures. Once the nanochannels were populated by a set of linearized DNA molecules, we imaged  them with automated high‐resolution fluorescent microscopy. We determined the size  of each DNA molecule by directly measuring its contour length. The histogram peaks  represent the location of each sequence motif along the molecules. 87 88 88 Discussion