Molecular diagnostics The human genome the total genetic information (DNA content) in human cells  nuclear  mitochondrial - double-stranded DNA is organized into one circular molecul. Exclusively maternal inheritance The human nuclear and mitochondrial genomes 22 000 1 – 2 % coding DNA Pseudogenes Nuclear genom 3 000 Mb cca 22 000 genes Mitochondrial genome 16,6 kb 37 genes Genes Extragenic DNA 22 tRNA genes 13 structural genes 2 rRNA genes Not coding DNA Gen fragments Introns untranslated regions Unique sequences Repetitive sequences repeating sequences Interspearsed sequences 1% DNA is coding It comprise two genomes: The human genome Superstructure Human genome project (HUGO)  Identify all of the genes in human DNA  Determine the sequence of the 3 billion chemical nucleotide bases that make up human DNA  Store this information in data bases  Develop faster, more efficient sequencing technologies  Develop tools for data analysis  Address the ethical, legal, and social issues (ELSI) that ay arise form the project  $3-billion project founded in 1990 by the United States Department of Energy and the U.S. National Institutes of Health. The international consortium comprised also geneticists in the United Kingdom, France, Germany, Japan, China and India. Human genome project (HUGO) A parallel project was conducted outside of government by the Celera Corporation  June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome Human genome project Key findings of Genome Project: 1. There are approx. 22,000 genes in human beings, the same range as in mice and twice that of roundworms. Understanding how these genes express themselves will provide clues to how diseases are caused. 2. All human races are 99.99 % alike, so racial differences are genetically insignificant. 3. Most genetic mutation occurs in the male of the species and as such are agents of change. They are also more likely to be responsible for genetic disorders. 4. Genomics has led to advances in genetic archaeology and has improved our understanding of how we evolved as humans and diverged from apes 25 million years ago. It also tells how our body works, including the mystery behind how the sense of taste works. The flow of genetic information in the cell is DNARNAprotein A gene is expressed in two steps  Transcription: RNA synthesis  Translation: Protein synthesis The central dogma of molecular biology The central dogma of molecular biology the transfer of sequence information between sequential information-carrying biopolymers - DNA and RNA (both nucleic acids), and protein The general transfers describe the normal flow of biological information: - DNA can be copied to DNA (DNA replication), - DNA information can be copied into mRNA, (transcription), - proteins can be synthesized using the information in mRNA as a template (translation) Mutations Any alteration in a gene from its natural state; may be disease causing or a benign, normal variant Frequency less then 1 % Mutations - positive (variability, selection) - negative (4500 monogenic diseases, ageing) - neutral Each human: 5 – 10 patologic mutations Mutations are changes in the DNA base sequence These are caused by errors in DNA replication or by mutagens Types of mutations NORMAL GENE mRNA BASE SUBSTITUTION BASE DELETION Protein Met Lys Phe Gly Ala Met Lys Phe Ser Ala Met Lys Leu Ala His Missing  Silent mutations do not alter the amino acid sequence of the polypeptide  Missense mutations - an amino acid change does occur • Example: Sickle-cell anemia • If the substituted amino acids have similar chemistry, the mutation is said to be neutral  Nonsense mutations change a normal codon to a termination codon  Frameshift mutations involve the addition or deletion of nucleotides in multiples of one or two • This shifts the reading frame so that a completely different amino acid sequence occurs downstream from the mutation Mutations in the coding sequence of a structural gene Clasification of mutations according to its effect on gene product 1. Product with lower to zero function (loss-of-function) - typical product is enzyme - type of mutation is frequently deletion 2. Product with abnormal function (gain-of-function) - typical product is nonenzymatic protein - frequently in tumours (somat. mutation), rarely in monogenic diseases - deletions do not lead to new function Type 1 frequently recessive, type 2 dominant mutations In some genes- both types of mutations Disease Inheritance Is Complex Mutation 2 No Symptoms Mucus Production Gene Normal No Symptoms Mild Symptoms Severe Symptoms Mutation 1 Mutation 3 Gene Changes in Cystic Fibrosis Major types of Genetic diseases a.) chromosomal diseases  are the result of the addition or deletion of entire chromosomes or part of chromosomes  most major chromosome disorders are characterised by growth retardation, mental retardation and variety of somatic abnormalities  typical examples of major chromosomal disease is Down syndrom (trisomy 21), Edwards sy (trisomy 18), Patau sy (trisomy 13) b.) monogenic diseases (single gene defects)  only a single gene is altered (mutant) → flawed protein → manifestation (development) of a disease  inherited in simple Mendelian fashion  some 6000 distinct disorders are now known (sicle cell anemia, familial hypercholesterolemia, cystic fibrosis, Hemophilia A., Duchenne Muscular Dystrophy, Huntington Disease...) c.) multifactorial diseases  result from the interaction of multiplex genes, each of which may have a relatively minor effect  environmental factors contribute to the manifestation of these diseases (e.g. nutrition, exercise)  for this group of illnesses, the contribution of the gene can be thought of as a “predisposition”  examples: diabetes mellitus, hypertension, schizophrenia and congenital defects such as cleft lip, cleft palate and most congenital heart diseases  very common in the population Human pedigree Autosomal dominant inheritance process Only one of the two homologous genes is mutated and although another normal gene is present (heterozygosity), the illness still appears (dominant gene effect). If, therefore, one of the parents carries this gene, there is a 50% probability that it will be transmitted to each child. Both men and women can be affected by this. This inheritance pattern accounts for over 60% of monogenic diseases,representing by far the most common inheritance process. Obviously a mutated protein in just half the amount will have a pathological effect on the human organism in such cases. E.g. achondorplasia Autosomal recessive inheritance In this inheritance pattern, both homologous genes must be mutated (homozygosity) in order to produce an illness in the affected person. Individuals, who only receive one version of the mutated gene are called carriers. Both sexes can be affected. If, for example, both parents are carriers, there is a 25% chance that the child will receive both mutated genes and so develop the illness. Many metabolic diseases fall into this category (e.g. cystic fibrosis, phenylketonuria, adrenogenital syndrome, haemochromatosis). X chromosome inheritance (sexlinked inheritance) Women have two X chromosomes. If they have a recessively acting mutated gene on one X chromosome, they are carriers for the corresponding illness. Men have only one X chromosome, since the other sex chromosome is a Y chromosome. If they have the mutated gene on the X chromosome, they will develop the illness as a rule. If a woman is a carrier for the illness inherited by the X chromosome, there is a 50% chance that she will pass on this illness to her son. Her daughters have a 50% chance of becoming a carrier for this illness. Identification of inherited diseases 1.) Phenotype analysis Genes are directly responsible for the production of hormones, enzymes and other proteins. Investigation procedure: Diagnostic measurement of altered or missing proteins using blood or urine analysis. This provides indirect evidence of a mutation of the gene responsible for this. Examples: Phenylketonuria, alpha1-antitrypsin deficiency 2.) Chromosome analysis (cytogenetic investigations) This includes microscope examinations to investigate chromosome alterations in terms of number (duplication or loss of individual chromosomes = numeric chromosome aberration) and in terms of structure (wrong composition, chromosome breaking = structural chromosome aberration). There is no detailed investigation of individual genes in such cases. Indication: Anomalies in children (malformations, retarded development) in the context of prenatal diagnosis, tendency to miscarriages, infertility. 3.) Molecular genetics testing (DNA analysis, genome analysis DNA tests) This provides evidence of a gene mutation responsible for producing the illness. Here it is determined whether the sequence of the DNA bases (nucleotide sequence) has changed within the affected DNA/RNA diagnosis of genetic diseases Not all mutation test use DNA. Testing RNA by RT-PCR has advantages when screening genes with many exons ( NF1 gene, DMD gene...) or seeking splicing mutations. Very important in molecular genetic testing is using a proteinbased functional assay, which may classify the products into two simple groups: functional and nonfunctional – essential question in most diagnostics monogenic and also polygenic diseases sometimes do not occur in both twins, even though the genetic information is the same in identical twins. This is due to several factors: Penetrance: not every pathogenic mutation leads to the manifestation of a disease in the lifetime of a person. Expressivity on the other hand describes quantitative differences in the manifestation of the disease/symptoms. Sometimes, the two concepts are difficult to separate, when, for example, a disease is so weakly manifested that it can no longer be diagnosed. Limitations of DNA analysis The age at which the disease manifests itself can vary strongly. An example of this is Huntington’s chorea. Differences in the onset of diseases are sometimes explained by so-called dynamic mutations. In passing on to the next generation, the disease-inducing mutation can lead to an earlier onset of the illness (anticipation) involving the extension of a mutated sequence of bases. In many cases, genetic information is manifested in a different way when it is inherited from the mother than when it is inherited from the father. Here one speaks of imprinting. Molecular genetics testing (DNA analysis, genome analysis DNA tests) A.) Direct testing – DNA from a patient is tested to see whether or not it carries a given pathogenic mutation B.) Indirect testing (gene tracking) - linked markers are used in family studies to discover whether or not the consultand inherited the disease-carrying chromosome from a parent A.) Direct testing • provides evidence of a gene mutation responsible for producing the illness. It is determined whether the sequence of the DNA bases (nucleotide sequence) has changed • to see wheter the DNA of tested person has a gene normal or mutant Detection of mutation in relevant gene always confirms the clinical diagnosis we must know which gene to examine the relevant „normal“ (wild type) sequence Mutation testing methods can be divided into two groups: 1. Mutation detection methods (scoring) – test the DNA for the presence or absence of one specific mutation. Searching for known mutations 2. Mutation screening methods (scanning) – screen a sample for any deviation from the standard sequence. 1. Mutation detection methods – test a DNA for the presence or absence of one specific mutation searching for known sequence change is possible for: - diseases where all affected people in the population have one particular mutation - most affected people in the population have one of limited number of specific mutations - diagnosis within a family - once mutation is characterized, other family members need to be tested for that particular mutation 2.Mutation screening methods - screen a sample for any deviation from the standard sequence The mutation screening is possible for diseases where a good proportion of patients carry independent mutations. Testing for unknown mutations in laboratory suffer two limitations: methods are quite laborious and expensive for use in diagnostic service, which needs to produce answers quickly detect differences between the patient´s sequence and published normal sequence ( not distinguish between pathogenic and nonpathogenic changes.) Polymerase chain reaction (PCR) To amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence. The method relies on thermal cycling, consisting of cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA Kery Mullis – 1983 discovered the PCR procedure, for which he was awarded the Nobel prize PCR selective amplification of specific target DNA sequence within heterogeneous collection of DNA (total genomic DNA or complex cDNA) requires: -sequence information from the target sequence for construction two oligonucleotide primer sequences ( 15 – 30 nucleotides long ) -denatured genomic DNA -heat stable DNA polymerase -DNA precursors (four deoxynucleotide triphosphates dATP, dCTP, dGTP and dTTP) PCR involves sequential cycles composed of three steps: - Denaturation ( typically at about 93 – 95o C ) - Reannealling (at temperatures usually from about 50 o – 70o C, depending on Tm of the expected duplex - DNA synthesis – typically at about 70 –75o Senzitivity of PCR allows us to use a wide range of samples:  blood samples  monthwashes or buccal scrapes  chorionic villus biopsy samples  amniocentesis speciments  ome or two cells (removed from eight-cell stage embgryos)  hair, semen  archived pathological specimens Guthrie cards (spot of dried blood) Electrophoresis  to separate and visualize DNA or RNA fragments by size and reactivity  migration of DNA in electric field  ethidium bromide  Agarose electrophoresis  Polyacrylamide gel electrophoresis (PAGE)  sequence analysis: (synonym: sequencing) Process by which the nucleotide sequence is determined for a segment of DNA denaturating gel gradient electrophoresis (DGGE) DGGE: the sequence-specific denaturation characteristics in a chemical gradient (in the gel) lead to partial separation of strands. This in turn leads to differential mobility and results in a single band per variant ds DNA ss DNA SSCP in gel (Single-strand conformation polymorphism) non mt/non mt Non mt/mutation - + mutation/mutation SSCP: after denaturation, single strands form a sequence-specific structure. This structure leads to differential mobility in a non-denaturing matrix and two bands per variant SSCP in capillary non mt/non mt non mt/mutation mutation/mutation mV time RFLP  Unique sequence primers are used to amplify a mapped DNA sequence from two related individuals, A/A and B/B, and from the heterozygote A/B. In the case of the heterozygote A/B, two different PCR products will be obtained, one which is cleaved three times and one which is cleaved twice. mutation scanning (synonym: mutation screening): A process by which a segment of DNA is screened via one of a variety of methods to identify variant gene region(s). Variant regions are further analyzed (by sequence analysis or mutation analysis) to identify the sequence alteration Some Clinical Implications  Mutation scanning is used when mutations are distributed throughout a gene, when most families have different mutations, and when sequence analysis would be excessively time-consuming due to the size of a given gene.  Mutation scanning may cover the entire gene or select regions.  The sequence alteration identified in a segment of DNA may be a benign variant (polymorphism), a diseasecausing mutation, or an alteration of undetermined significance. Types of sequence alterations that may be detected: - Pathogenic sequence alteration reported in the literature - Sequence alteration predicted to be pathogenic but not reported in the literature - Unknown sequence alteration of unpredictable clinical significance - Sequence alteration predicted to be benign but not reported in the literature - Benign sequence alteration reported in the literature Possibilities if a sequence alteration is not detected Patient does not have a mutation in the tested gene (e.g., a sequence alteration exists in another gene at another locus) Patient has a sequence alteration that cannot be detected by sequence analysis (e.g., a large deletion) Patient has a sequence alteration in a region of the gene (e.g., an intron or regulatory region) not covered by the laboratory's test array CGH (aCGH)  for analysing copy number variations (CNVs) in the DNA of a test sample compared to a reference sample,  compare two genomic DNA samples arising from two sources  used for: genomic abnormalities in cancer, submicroscopic aberrations, preimplantation genetic diagnosis  inability to detect structural chromosomal aberrations without copy number changes, such as mosaicism, balanced chromosomal translocations and inversions Next generation sequencing (NGS)  Four main technologies  All massively parallel sequencing  – Sequencing by synthesis • Sanger/Dideoxy chain termination • Pyrosequencing (Roche/454) • Reversible terminator (Illumina ) • Ion torrent (Life Technologies) • Zero Mode Waveguide (Pacific Biosciences) - 3rd generation sequencing  – Sequencing by ligation • SOLiD (Applied Biosystems)  – Direct reading of DNA sequence - 3rd generation sequencing • Nanopore sequencing • Electron microscope Sequencing Matrices Sanger, 96-well, 8 capillaries 96 x 600 bp / 24 h 1400 € Pyrosequencing, 2 regions 1,000,000 x 600 bp / 20 h 5500 € Revers. terminator, MiSeq 10,000,000 x 250 bp / 40 h 1150 € Sequencing DNA clusters one base at a time A mix of sequencing primers (complementary to one of the adapter sequences), DNA polymerase and differentially fluorescent labelled reversible chain terminator dNTPs (A, C, T and G) are added to flow cell Depending on the first nucleotide in the cluster, a specific fluorescent reversible chain terminator dNTP is incorporated leading to a stop in DNA synthesis! After washing unincorporated nucleotides away, a laser excites the flow cell and detects which of the four fluorescent chain terminator dNTPs were incorporated in each cluster on the flow cell. i.e. decodes the first sequenced base Once an image recording what was the first nucleotide to be incorporated in each cluster has been taken, both the fluorescent dyes and the blocking group that prevents extension of the DNA are removed (hence ‘reversible chain terminator dNTPs) and the cycle is repeated Reversible Terminator (HiSeq, MiSeq, NextSeq) Pyrosequencing (GS FLX, GS Junior) Sequencing by synthesis Ion torrent sequencing At each time, a chip is flooded with a single nucleotide. If the nucleotide matches the sequence, H+ is released and pH is changed. If it does not match the sequence, pH is not changed. Change in the pH is measured. Sequencing by synthesis Oligo Ligation Detection (SOLiD) Zero Mode Waveguide (Single molecule real time seq) 3rd generation sequencing Nanopore sequencing (direct reading) historically first type of DNA diagnostic method most of the mendelian diseases went through a phase of gene tracking and moved on to direct test once the genes were cloned with some diseases, even though the gene has been cloned, mutations are hard to find mutations are scattered widely over a large gene the existence of homologous pseudogenes the lack of mutational hot spots never confirm clinical diagnosis! B.) Indirect testing linkage analysis: (synonym: indirect DNA analysis) Testing DNA sequence polymorphisms (normal variants) that are near or within a gene of interest to track within a family the inheritance of a disease-causing mutation in a given gene DNA sequence polymorphisms  Single nucleotid polymorphismus (SNP) – substitution of bases. In genome approx. 30 mil. SNP  Minisatellite (VNTR) consist of repetitive, generally GC-rich, variant repeats (> 6bp) that range in length from 10 to over 100bp, these variant repeats are tandemly intermingled  Microsatelite – Short Tandem Repeats (STR) consist of short sequence typically from 2 to 6 nucleotides long tandemly repeated several times (2 – 100x), and characterised by many alleles Use of polymorphic regions  Identification of persons/samples DNA  paternity testing (VNTR, STR)  Undirect diagnostics of monogenic diseases  Searching of new genes  SNP and multifactorial diseases The three steps of linkage analysis  Establish haplotypes: Multiple DNA markers lying on either side of (flanking) or within (intragenic) a generegion of interest are tested to determine the set of markers (haplotypes) of each family member.  Establish phase: The haplotypes are compared between family members whose genetic status is known (e.g., affected, unaffected) in order to establish the haplotype associated with the disease-causing allele.  Determine genetic status: Once the disease-associated haplotype is established, it is possible to determine the genetic status of at-risk family members. Indirect DNA analysis gene CFTR - intron 8 - polymorphic site (CA)n chr.7 from motherfrom father GTATCACACACATTCGG allele A1: ------ GTATCACATTCGG---- the lenght of this allele is 130 bp allele A2: -----GTATCACACACATTCGG--- the lenght of this allele is 134 bp chr.7 chr.7 chr.7 chr.7 mutation in CFTRgene dF508 / non non / ? dF508 / ? non / non A1 / A3 A1 / A2 A1 / A2 A1 / A3 informative A1 / A3 A1 / A1 A1 / A1 A1 / A3 non informative Linkage analysis is often used when direct DNA analysis is not possible because the gene of interest is unknown or a mutation within that gene cannot be detected in a specific family. In most instances, the haplotype itself has no significance; it has meaning only in the context of a family study. The accuracy of linkage analysis is dependent on:  The accuracy of the clinical diagnosis in affected family member(s).  The distance between the disease-causing mutation and the markers. Linkage analysis may yield false positive or false negative results if recombination of markers between maternally and paternally-inherited chromosomes occurs during gamete formation. The risk of recombination is proportional to the distance between the disease-causing mutation and the markers. The risk of recombination is lowest if intragenic markers are used. The informativeness of genetic markers in the patient's family. If the DNA sequence for a given variant differs on the maternallyinherited and paternally-inherited chromosomes, that marker is informative. If the DNA sequence for a given variant does not differ on the two chromosomes, that marker is not informative. Indirect diagnosis – Neurofibromatosis type 1 135 135 181 185 135 131 181 179 131 131 179 179 135 131 181 179 135 131 187 179 131 135 179 179 135 131 181 179 Polymorfic systems GXAlu / i27b IVS38GT /i38 131 131 179 179 Autosomal dominant unknown mutation haplotype in assotiation with unknown mutation A A 6 6 A C 3 5 A B 3 1 A B 3 1 A A 2 3 A C 2 2 A C 3 5 C A 5 6 C A 5 6 B A 1 3 A A 3 2 A D 2 2 A A 2 2 A D 2 2 A D 2 2A C 3 2 F508del unknown mutation Polymorfic systems IVS17BTA alleles 1 -6 IVS8BTA alleles A - D haplotype in assotiation with unknown mutation A D 3 2 Indirect diagnosis – cystic fibrosis Autosomal recessive [F508]+[=] [=]+[=] [F508]+[=] [=]+[=] [F508]+[G542X] [A1]+[A1] [A1]+[A3] [A2]+[A5] [A1]+[A2] [A3]+[A5] Indirect diagnosis – cystic fibrosis de novo mutation Retinoblastoma RB1 Mutation analysis of Rb1 was done Pathology in Rb1 gene was not detected Polymorfic markers •extragene (DS13S 1307, DS13S 272, DS13S 164) •intragene (Rb1.20B) A1: DS 13S 1307 [141] DS 13S 272 [133] DS13S 164 [179] Rb1.20B [3] A2: DS 13S 1307 [151] DS 13S 272 [133] DS13S 164 [188] Rb1.20B [4] A3: DS 13S 1307 [139] DS 13S 272 [127] DS13S 164 [179] Rb1.20B [1] A4: DS 13S 1307 [139] DS 13S 272 [133] DS13S 164 [186] Rb1.20B [1] A5: DS 13S 1307 [139] DS 13S 272 [131] DS13S 164 [188] Rb1.20B [1] A6: DS 13S 1307 [126] DS 13S 272 [133] DS13S 164 [188] Rb1.20B [2] A7: DS 13S 1307 [126] DS 13S 272 [129] DS13S 164 [188] Rb1.20B [4] A8: DS 13S 1307 [139] DS 13S 272 [127] DS13S 164 [178] Rb1.20B [5] RB1 [A1]+[A2] [A3]+[A4] [A7]+[A8] [A5]+[A6] Retinoblastoma - Indirect diagnostics Haplotype with pathology cannot be established Explanation: • occurance of mutation in another system of cell division and growth regulation • nonhereditary form of retinoblastoma in both cousins [A1]+[A3] [A6]+[A7]