Nucleic acids • bc9-36 DNA, BBC DNA determines the nature of the future organism •The hereditary information in the egg cell determines the nature of the whole multicellular organism (mouse egg---mouse) •Each species is different, and each reproduces itself faithfully, yielding progeny that belong to the same species; the parent organism hands down information specifying, in extraordinary detail, the characteristics that the offspring shall have •Heredity is a central part of definition of life: it distinguishes life from other processes (the growth of crystals, the burning of a candle, the formation of waves on water), in which orderly structures are generated but without the same link between the peculiarities of parents and peculiarities of offspring bc1-1 Seaweed Fucus Mouse Sea urchin Seaweed Fucus egg Mouse egg Sea urchin egg Sea urchin egg Mouse egg Seaweed Fucus egg bc1-10 bc1-7 All cells store their hereditary information in the same linear chemical code (DNA) All cells replicate their hereditary information by templated polymerization All cells trnscribe portions of their hereditary information into the same intermediary form All cells use proteins as catalysts All cells translate RNA into protein in the same way In all cells: one gene = one protein bc1-2 bc1-3 bio7 Central dogma of molecular biology https://upload.wikimedia.org/wikipedia/commons/d/dd/Extended_Central_Dogma_with_Enzymes.jpg https://es.m.wikipedia.org/wiki/Archivo:Extended_Central_Dogma_with_Enzymes.jpg Basic principles of molecular biology 1.The information encoded within DNA, which directs the functioning of living cells and is transmitted to offspring, consists of a specific sequence of nitrogenous bases. 2. 2.The physiological and genetic function of DNA requires the synthesis of relatively error-free copies. DNA synthesis involves the complementary pairing of nucleotide bases. 3. 3.The mechanism by which genetic information is utilized to direct cellular processes involves the synthesis of another type of nucleic acid called ribonucleic acid (RNA). • • Basic principles of molecular biology 4. RNA synthesis occurs through the complementary pairing of ribonucleotide bases with the bases in a DNA molecule. 5. 5.Several types of RNA are responsible for the synthesis of the enzymes, structural proteins, and other polypeptides, that are required for organismal function. 6. Central dogma of molecular biology“ describes the flow of genetic information from DNA through RNA and eventually to proteins. • • DNA synthesis = replication; DNA-dependent RNA synthesis = transcription, protein synthesis = translation Proof of DNA as a carrier of genetic information Indirect evidence: - DNA is localized on chromosomes, RNA and proteins are in cytoplasm - The amount of DNA in somatic cells is in correlation with number of chromosomes, sex cells carry half of the DNA amount - DNA is more stable than RNA or proteins Direct evidence: - Transformation in Streptococcus pneumoniae – change of v virulence - Analysis of Enterobacteria phage T2 – into the bacterial cell comes only DNA, not proteins - RNA viruses: the carrier of genetic information is RNA (coronaviruses, HIV) • Section * 1928 British scientist – Frederick Griffith * Wanted to know how bacteria made people sick, especially pneumonia * Griffith isolated ppt download https://byjus.com/biology/griffith-experiment-genetic-material/ Transformation in Streptococcus pneumoniae (Griffith 1928) – „transforming factor“ • The Discovery of DNA - Ade, Lauren timeline | Timetoast timelines Bacterial Genetics - Conjugation, Transduction, Transformation What substance is the transforming factor? (O. Avery, C. Mac Leod, M. McCarty 1944) • History of Genetics timeline | Timetoast timelines DNA Experiment - Home watcrick 1953 - J. Watson; F. Crick; M. Wilkins, R. Franklin (1962 – the Nobel Prize) bc9-29 Genetická informace je zakódována v DNA The information used to construct the Watson-Crick model included the following •The chemical structures and molecular dimensions of deoxyribose, the nitrogenous bases and phosphate •The 1:1 ratios of adenine:thymine and guanine:cytosine in the DNA isolated from a wide variety of species investigated by Erwin Chargaff - Chargaff´s rules •Superb X-ray diffraction studies performed by Rosalind Franklin indicating that DNA is a symmetrical molecule and probably a helix •The diameter and pitch of the helix estimatd by Maurice Wilkins and his colleague Alex Stokes from other X-ray diffraction studies Deoxyribonucleic acid - structure •The antiparallel orientation of the two polynucleotide strands allows the formation of hydrogen bonds between the nitrogenous bases that are oriented toward the helix interior • •There are two types of base pair (bp) in DNA: • A-T (adenine-thymine), G-C (guanine-cytosine) • •Because each base pair is oriented at a right angle to the long axis of the helix, the overall structure of DNA resembles a twisted staircase • Nucleotides • bc9-21 bc9-22 bc9-23 • bc9-19 bc9-20 • bc9-14 • bc9-15 • bc9-13 The dimensions of crystalline DNA have been precisely measured •One turn of the double helix spans 3.4 nm and consists of approximately 10.4 bp • •The diameter of the double helix is 2 nm; there is sufficient space in the double helix interior only for base pairing between a purine and a pyrimidine • •The distance between adjacent base pairs is 0.34 nm dipl hrdlicka6 DNA stability • •DNA is relatively stable molecule •Several types of noncovalent bonding contribute to this stability: –Hydrophobic interacions beween he sacked base pairs in the double helix – an imporant (but poorly understood) role in stabilizing DNA: p-p interactions –Sugar-phosphate backbone is hydrophilic and therefore DNA´s external surface is solvated with water –Hydrogen bonding between complementary bases promotes stability as well as providing a mechanism for accurate pairings between the bases bc9-30 • bc9 figure-04-07 „Quarternary structure“ of DNA – nucleosome, chromatin • bc9-11 Packaging of DNA Výsledek obrázku pro dna packaging Analysis of DNA In situ hybridization ■. ■Rudkin GT, Stollar BD. High resolution detection of DNA-RNA hybrids in situ by indirect immunofluorescence. Nature. 1977 Feb 3;265(5593):472-3. 70´s – fluorescenční in situ hybridizace DNA sequenicng • proces of determination of the order of nucleotides in DNA •Idea – to vizualize the bases of DNA in ammner that they can be sorted and identified •70´s - 2 ways - „chemical“ and radioactive • Chemical sequencing (1976) A. Maxam, P. Gilbert Maxam-Gilbert Sequencing - an overview | ScienceDirect Topics •based on chemical modification of DNA and subsequent cleavage at specific bases •the method requires radioactive labelling at one end •cchemical treatment generates breaks at a small proportions of one or two of the four nucleotide based in each of four reactions (G,A+G, C, C+T) •series of labelled fragments is generated, from the radiolabeled end to the first ‘cut’ site in a molecule • the fragments in the four reactions are arranged side by side in gel electrophoresis for size separation • the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred. Sanger (dideoxy termination) sequencing •1977 – F. Sanger and colleagues, Cambridge •based on the random incorporation of chain-terminating dideoxynucleotides (ddNTPs) by DNA polymerase during in vitro DNA replication •Enzymatic (DNA polymerase), cost effective, less handwork, AUTOMATIZATION! • • Sanger's method of gene sequencing - Online Biology Notes https://www.onlinebiologynotes.com/sangers-method-gene-sequencing/ • https://agctsequencing.files.wordpress.com/2012/08/figure1.jpg Left: X-ray that shows the columns and bands for the four nucleotides. Right: Bands and how they could be used to identify the order of the nucleotides • What is the Difference Between Maxam Gilbert and Sanger Sequencing - Pediaa.Com https://the-dna-universe.com/wp-content/uploads/2020/11/Direct_Blotting_Electrophoresis_System_GATC _1500-1.jpg The direct blotting electrophoresis system GATC1500 (1984) https://coimages.sciencemuseumgroup.org.uk/images/675/medium_1989_1242__0001_.jpg Applied Biosystems 370A Prototype Automated DNA Gene Sequencer (Hood, Hunkapiller 1987) 3730xl DNA Analyzer ABI 3730xl DNA Analyzer (384 well plate, 2010) Applied Biosystems SeqStudio Genetic Analyzer by Thermo Fisher Scientific product image Applied Biosystems SeqStudio Genetic Analyzer • Automated DNA Sequencing » Magazine Science PCR – polymerase chain reaction •Method widely used to rapidly make millions to billions of copies (complete or partial) of a specific DNA sample •Discovery of DNA polymerase (1957 – Kornberg, mechanism of DNA replication •Development of synthetic DNA oligonucleotides (early 60´s Khorana studying a genetic code) •Thermostable DNA polymerase from Thermus aquaticus was isolated (1969 –Brock) •PCR involves using short synthetic DNA fragments called primers to select a segment of the genome to be amplified, and then multiple rounds of DNA synthesis to amplify that segment •Developed in CETUS Corparation by Kary B. Mullis in 1983 (Nobel prize 1993) Stock vektor „Pcr Cycle Scheme Showing Dna Molecule“ (bez autorských poplatků) 377812474 | Shutterstock PCR – polymerase chain reaction • https://cdn.britannica.com/77/22477-050-16EFB7B3/process-polymerase-chain-reaction.jpg Next generation sequencing (NGS) •Massive parallel (shotgun) sequencing •Used for – analysis of large number of genes in one experiment •Effective cost per base ratio •Utilization of computer proccessing of data and bioinformatics • NGS workflow NGS technologies •ABI SOLiD •(Life ) https://cdn.technologynetworks.com/tn/images/body/ngsseo31615391054366.png 454 Life Science (Roche Inc) Semiconductor sequencing (Life) Sequencing by synthesis (SBS) Illumina Inc. NGS technologies – sequencing capaicity The evolution of sequencing methodologies. Timeline (x-axis) indicates the introduction of the first, second and third generation sequencing technologies against the number of kilobases of DNA that could be sequenced per day per machine (y-axis). Nowadays…NovaSeq 6000 (Illumina) NovaSeq 6000 System •the most powerful high-throughput Illumina sequencing system to date •48 human genomes in 2 days •Very variable and robust (2 flowcell port = cost effective) •WGS, WES, panels, RNA seq… Follow David Goode's (@dl_goode) latest Tweets / Twitter 3rd generation of NGS “single molecule NGS“ •SMRT (Single Molecule, Real-Time Sequencing) •Oxford Nanopore Pacbio and third generation sequencing – YourGenome An Ambitious Unicorn Hopes To Up-end DNA Analysis | QNewsCrunch Historiy of whole genome sequencing https://ars.els-cdn.com/content/image/1-s2.0-S2001037019303277-gr1.jpg Gianni et al., Computational and Structural Biotechnology Journal, 2019 Historiy of whole genome sequencing The evolution of sequencing methodologies. Timeline (x-axis) indicates the introduction of the first, second and third generation sequencing technologies against the number of kilobases of DNA that could be sequenced per day per machine (y-axis). Human genome project (HGP) 1990-2003 https://www.mun.ca/biology/scarr/timelineHGP_image2.jpg • The Human Genome Project: Aims, Objectives, Techniques and Outcomes – Genetic Education Human genome project - origin Evolution of Sequencing Technology - Richard Myers - A personal perspective on DNA sequencing from 1978 to 2015 Edwin Mellor Southern - journal.pgen.1003344.g001.png George Church - Wired Health E. Southern G. Church HGP - backgronud •1984 – 1986 The U.S. Department of Energy (DOE) and the International Commission for Protection against Environmental Mutagens and Carcinogens (ICPEMC) initiatee the early meetings assess the feasibility of a Human Genome Project •1988 –The National Institutes of Health (NIH) assembles scientists, administrators and science policy experts to plan for a possible Human Genome Project –Two published reports recommend creating an effort to sequence the human genome (National Research Council; he U.S. Congress Office of Technology Assessment) •1989 •The National Center for Human Genome Research (NCHGR) is established to carry out the United States Human Genome Project. The center's first director is James D. Watson •1990 The Human Genome Project begins with an initial five-year plan –NIH allocates the first funds to research grants aimed at developing the scientific approaches, technologies, and resources needed to map and sequence the human genome – (expected 3 billions dollars for 15 years) – The National Center for Human Genome Research (NCHGR) 1.The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S. 2.The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U. K. 3.Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S. 4.United States DOE Joint Genome Institute, Walnut Creek, Calif., U.S. 5.Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, Houston, Tex., U.S. 6.RIKEN Genomic Sciences Center, Yokohama, Japan 7.Genoscope and CNRS UMR-8030, Evry, France 8.GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., USA 9.Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany 10.Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China 11.Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash. 12.Stanford Genome Technology Center, Stanford, Calif., U.S. 13.Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, Calif., U.S. 14.University of Washington Genome Center, Seattle, Wash., U.S. 15.Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan 16.University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S. 17.University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, Norman, Okla., U.S. 18.Max Planck Institute for Molecular Genetics, Berlin, Germany 19.Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y., U.S. 20.GBF - German Research Centre for Biotechnology, Braunschweig, Germany • NCHGR - hierarchical shotgun method https://www.nature.com/scitable/content/21124/v409_p861_409860a0-f2_large_2.jpg A fingerprint clone contig of overlapping clones was assembled using a computer program and is shown at the top of this diagram. The fingerprint clone contig is depicted as several short, horizontal line segments arranged in parallel. The segments are black, blue, or red. Blue and red segments represent clones with minimal overlap, which are selected for sequencing. Once these overlapping clones have been sequenced, the set is called a sequenced-clone contig. In a closer look at a portion of the sequenced-clone contig, two overlapping individual clones, labeled A and B, are sequenced to at least draft coverage. Sequenced clone A is depicted as red dashed line segments and sequenced clone B is depicted as blue dashed line segments. The data from these sequenced clones are merged to form a merged sequence contig, depicted as a purple line segment. By ordering and orienting the data with mRNA, paired end reads, and other information, the sequences are linked to form a sequence-contig scaffold. HGP – major achievements •1995 - complete physical map of human genome ( =physical locations of identifiable landmarks on chromosomes backbone for genome assembly) HGP – major achievements •1996 - The Bermuda principles •„all human genomic sequence information should be made freely available and placed in the public domain within 24 hours of being generated by federally funded large-scale human sequencing centers“ • 1.Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). 2.Immediate publication of finished annotated sequences. 3.Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society • • HGP-The Celera story •established in May 1998 by Dr. J. Craig Venter (former HGP) from The Institute for Genomic Research (TIGR) •TIGR - „shotgun“ sequencing genome of H. influenzae, •HGP - $300 million of private funding for 3 years (whole genome in 2001) •intended to sell subscriptions to its database, release data quarterly, and obtain patents on genes and related technologies. •Political pressure - joint effort, draft of human genome published in 2001 in same time HGP-The Celera story •Whole genome shotgun sequnecing •two independent data sets together with two distinct computational approaches •Celera 27.27 million DNA sequence reads, each with an average length of 543 base pairs, derived from five different individuals •Bactigs - DNA from HGP (GeneBank) 16.05 million sequence reads •Whole genome assembly from 43.32 million sequence reads A diagram of ovals and arrows outlines the steps in Celera’s two-pronged assembly strategy to determine the sequence of the human genome. Ovals represent computational processes and arrows between ovals represent the direction and sequence of the steps in the assembly process. Whole genome assembly •In whole-genome assembly, the BAC fragments (red line segments) and the reads from five individuals (black line segments) are combined to produce a contig and a consensus sequence (green line). The contigs are connected into scaffolds, shown in red, by pairing end sequences, which are also called mates. If there is a gap between consecutive contigs, it has a known size. Next, the scaffolds are mapped to the genome (gray line) using sequence tagged site (STS) information, represented by blue stars (Venter, C. et al. The sequence of the human genome. Science 291, 2001) This diagram illustrates the whole-genome assembly method. The BAC fragments and Celera sequence reads were organized into contigs, which were assembled into larger scaffolds. The scaffolds were then mapped to the genome using physical map information. The Human Genome Project - The Human Genome Project NCHGR Celera HGP – major achievements 1999 - 1st human whole chromosome sequence obtained – chromosome 22 HGP – major achievements •On Feb. 12, 2001 HGP and Celera announced a working draft of the sequence of the human genome — the genetic blueprint for a human being announced (HGP – Nature; Celera – Science) •President Bill Clinton holds a ceremony at the White House to announce this achievement. The Human Genome Project in 2020 Hindsight Volume 409 Issue 6822 15 February 2001 https://images-na.ssl-images-amazon.com/images/I/51WTRGTwGuL._SX382_BO1,204,203,200_.jpg 16 February 2001 HGP – major achievements •On April 14, 2003, the IHGSC announced the successful completion of the Human Genome Project •Project finished more than two years ahead of schedule with $2.7 billion total budget President George W. Bush awards the Presidential Medal of Freedom to physician Francis S. Collins, director of the National Human Genome Research Institute, in the East Room Nov. 5, 2007. White House photo by Eric Draper HGP - results • •The human genome contains roughly 3.2 billion base pairs, •The human genome contains 97% repetitive junk DNA content, only 2 to 3% portion of the genome encodes proteins •There are round 25,000 to 30,000 genes protein coding genes, •The average human gene consists of 3,000 nucleotide bases, but sizes vary greatly (largest gene is the dystrophin having 2.4Mb in size) •Gene-rich areas of the genome are predominantly made up of G and C bases, whereas gene-poor regions are mainly composed of A and T bases •Chromosome 1 has the most genes (2968), whereas the Y chromosome has the least (231) •The order of 99.9% of nucleotide bases is exactly the same in all people •The genome of us has 1.4 million known SNPs. • • HGP- outcomes Area Goal Achieved Date Genetic Map 2- to 5-cMresolution map (600 - 1,500 markers) 1-cM resolution map(3,000 markers) September 1994 Physical Map 30,000 STSs 52,000 STSs October 1998 DNA Sequence 95% of gene-containing part of human sequence finished to 99.99% accuracy 99% of gene-containing part of human sequence finished to 99.99% accuracy April 2003 Capacity and Cost of Finished Sequence Sequence 500 Mb/year at < $0.25 per finished base Sequence >1,400Mb/year at <$0.09 per finished base November 2002 Human Sequence Variation 100,000 mapped human SNPs 3.7 million mapped human SNPs February 2003 Gene Identification Full-length human cDNAs 15,000 full-lengthhuman cDNAs March 2003 Model Organisms Complete genome sequences of E. coli, S .cerevisiae, C. elegans, D. melanogaster Finished genome sequences of E. coli, S. cerevisiae, C. elegans, D. melanogaster, plus whole-genome drafts of several others, including C. briggsae, D. pseudoobscura, mouse and rat April 2003 Functional Analysis Develop genomic-scale technologies High-throughput oligonucleotide synthesis DNA microarrays Eukaryotic, whole-genome knockouts (yeast) Scale-up of two-hybrid system for protein-protein interaction 1994 1996 1999 200 HGP - outcomes •Ethical, legal and social implications (ELSI) –5% of the annual budget of the NHGRI was dedicated to develop ELSI –Genes can’t be patented –Informed consent that should be guaranteed to those who have a genetic test –Issues related to privacy and confidentiality of genetic information of a person must be taken care of –The ELSI program at NHGRI now serves as a model for large, publicly funded science efforts • 2 The outcomes of the Human genome project allows us to unravel some of the mysteries of life https://link.springer.com/book/10.1007/978-90-481-3261-4 “Every dollar we invested to map the human genome returned $140 to our economy — every dollar,” (B. Obama, 2013) •https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost The Cost of Sequencing a Human Genome Following projects – HapMap (2003) Julia Krushkal 4/9/2017 The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The. - ppt video online download https://slideplayer.com/slide/3053410/ Following projects – 1000Genomes (2008-2015) •The goal of the 1000 Genomes Project was to find common genetic variants with frequencies of at least 1% in the populations studied •WGS of 2,504 individuals from 26 populations •Creation of global human genome reference Sudmant et al., 2015, Nature figure 1 figure 3 +1 Million Genomes •22 EU countries, the UK and Norway signed Member States’ declaration on stepping up efforts towards creating a European data infrastructure for genomic data and implementing common national rules enabling federated data access Image reads "1+Million Genomes" B1MG logo Local population genome projects •Worldwide, there are 86 or more project focused on improve genetic diagnostics and to pave the way for the integration of precision medicine into health systems An external file that holds a picture, illustration, etc. Object name is 40246_2021_315_Fig2_HTML.jpg •Kovanda et al., 2021, Hum. Genomics