Genome and chromosome evolution Martin A. Lysák CEITEC and Faculty of Science, Masaryk University www.plantcytogenomics.org Genome size variation Polychaos dubium …perhaps the largest known genome – 670 billion base pairs (670 Gb) (~200-times larger than the human genome, 3.2 Gb; some authors suggest treating the value with caution – Amoeba proteus has ~34 - 43 Gb…) Protopterus aethiopicus Eukaryotic chromosome 4 „Species-specific“ chromosome sets = karyotypes • haploid chromosome set (n) • diploid chromosome set (2n) = pairs of homologous chromosomes Anatomy of eukaryotic chromosome NOR NOR • Traditional view: chromosome fission (agmatoploidy) and fusion (symploidy) → extensive chromosome number variation • holocentrics: huge variation in chromosome numbers [the largest number of chromosomes in animals (2n = 446) is found in the blue butterfly Polyommatus atlantica with holokinetic chromosomes] • in c. 5,500 angiosperm species • chromosome numbers from n = 2 up to n = 110 chromosome segregation in anaphase difuse kinetochor → Holocentric or holokinetic chromosomes: chromosomes without a localized centromere Juncaceae Cyperaceae Myristica fragrans (Myristicaceae) Drosera (Droseraceae) … Angiosperm species with holokinetic chromosomes Chionographis (Melanthiaceae) Eukaryotes: minimal chromosome numbers Myrmecia pilosula „Jack jumper ant“, Australia; males (haploid) n = 1, females (diploid) 2n = 2 five angiosperm species e.g., Haplopappus gracilis, Asteraceae, n = 2 Eukaryotes: highest chromosome numbers Polyommatus atlanticus n = c. 220 fern Ophioglossum reticulatum n = c. 530 Genome and chromosome evolution ➢ genome size variation - variation in coding DNA amount - variation in non-coding DNA amount ➢ chromosome number variation Genome size Chromosome number decrease increase Variation in genome size and chromosome number is driven by two principal processes DNA/genome duplication DNA recombination recombination Genome size increase • amplification of retrotransposons (and tandem repeats) • gene and segmental duplications • polyploidy Genome size variation in angiosperms is driven by amplification (and elimination) of repetitive DNA Genome size variation in seed plants is driven by amplification (and elimination) of repetitive DNA. Repeat turnover changes in very large genomes ( 10 Gb). Novák et al. 2020, Nat Plants 10 Gbp Content of repeats present in more than 20 copies in the genomes of 101 seed plant species ranging in size from 0.063–88.55 Gbp LTR (Long Terminal Repeat) retrotransposons (LTR-RTs) Gag – gene for the Gag protein INT – integrase PBS – primer binding site PR – protease RT – reverse transcriptase VLP – virus-like particle (a) rt is inserted into itself (c) DNA structure after 2 rounds of retrotransposition (b) the event is repeated Genome size increase by retrotransposition (nested retrotransposon insertion) Genome size increase by gene duplication • replication slippage (errors in replication → gene duplication) • ectopic recombination (between two direct repeats, typically TEs) • unequal crossing-over in meiosis (due to missaligned chromosomes) • via retrotransposition = retrogenes (cellular mRNA is transcribed into cDNA by reverse transcriptase of a retrotransposon or retrovirus; retrogene does not contain introns = lacking regulatory elements = pseudogene, but can evolve into a functional gene) Retrogenes • mRNA is reverse-transcribed into cDNA and inserted in a new genomic position inactive gene copy retrogene chimeric retrogene (functional by capturing elements of another gene) Science 325, 995 (2009) Extra copies on chromosome 18: Short legs and moderate risk for slipped disc. e.g. Cairn Terrie and West Highland Terrier. Extra copies on chromosome 12: Legs not quite so short but greater risk for slipped disc. e.g. French Bulldogs and Beagles. Extra on chromosome 18 as well as 12: Short legs and high risk for slipped disc. e.g. Dachshund and Welsh Corgi Segmental duplications • duplicated segment of chromosomal DNA (usually defined as  1 kb in length,  95% sequence identity) • either tandem or interspersed organization, either intra-chromosomal or inter-chromosomal • also known as low copy repeats (LCRs) • human genome: 159 Mb gene-rich duplicated (5.5% of the genome) = c. Arabidopsis genome Variation in the segmentally duplicated amylase locus in humans 21 © Carl Warner Once upon a time in the land of giant broccoli trees… Polyploidy (whole-genome duplication) HegartyandHiscock2008, CurrentBiol AUTOPOLYPLOIDY ALLOPOLYPLOIDY Hegarty and Hiscock 2008, Current Biol Examples of allopolyploid speciation T. aestivum (2n = 6x = 42) T. turgidum 2n = 28 Ae. tauschii 2n = 14    Phylogenomic history of bread wheat (Triticum aestivum; AABBDD). Three rounds of hybridization/polyploidy. Marcussen et al. (2014), Science • the unicellular eukaryote Paramecium tetraurelia • most of 40,000 genes arose through at least 3 successive whole-genome duplications • most recent duplication most likely caused an explosion of speciation events that gave rise to the P. aurelia complex (15 sibling species) • some genes have been lost, some retained • many retained (duplicated) genes do not generate functional innovations but are important because of the gene dosage effect Whole-genome duplications in protozoa Whole-genome duplications in yeast • genome comparison between two yeast species, Saccharomyces cerevisiae (n = 16) and Kluyveromyces waltii (n = 8) • each region of K. waltii corresponding to two regions of S. cerevisiae • the S. cerevisiae genome underwent a WGD after the two yeast species diverged • in nearly every case (95%), accelerated evolution was confined to only one of the two paralogues (= one of the paralogues retained an ancestral function, the other was free to evolve more rapidly and acquired a derived function) Kellis et al. 2004, Nature First evidence of a WGD in plants. Alpha WGD in Arabidopsis. What does the duplication in the Arabidopsis genome tell us about the ancestry of the species? As the majority of the Arabidopsis genome is represented in duplicated (but not triplicated) segments, it appears most likely that Arabidopsis, like maize, had a tetraploid ancestor …The diploid genetics of Arabidopsis and the extensive divergence of the duplicated segments have masked its evolutionary history. AGI (2000) Nature 449, 2007 The formation of the palaeo-hexaploid ancestral genome occurred after divergence from monocots and before the radiation of the Eurosids. Star = a WGD (tetraploidization) event. β γ αp The γ triplication may have been an ancient auto-hexaploidy formed from fusions of three identical genomes, or allo-hexaploidy formed from fusions of three somewhat diverged genomes. Tang et al. 2008, Genome Res Van de Peer (2017) Nat Rev Genet (modified) Dicots Monocots Angiosperms Gymnosperms Gamma (6x) Epsilon (4x) Tau (4x) Multiple whole-genome duplications in evolution of land plants WGD events in seed plants and angiosperms Jiao et al. (2011) Nature; Clark and Donoghue (2017) Proc R Soc 399 – 381 319 – 297 Myr ago Theres is evidence of ancient polyploidy throughout the major angiosperm lineages. It means that a genome-scale duplication event probably occurred PRIOR to the rapid diversification of flowering plants Charles Darwin’s abominable mystery solved (?) "The rapid development as far as we can judge of all the higher plants within recent geological times is an abominable mystery." (Charles Darwin in a letter to Sir Joseph Hooker, 1879) assumed ancient whole-genome duplication events (e.g.  - gamma WGD) De Bodt et al. 2005 Archaefructus liaoningensis (140 million year old fossil)  (319 – 297) Afropollis (245 million year old angiosperm pollen) l a g (27 – 65 million years) diversification (267 – 247) PNAS 106 (2009) Could WGD event(s) help plants to survive the mass extinction (one or more catastrophic events such as a massive asteroid impact) at the Cretaceous– Tertiary boundary ? 33 Drumheller, Alberta K-Pg boundary K-Pg extinction was the consequence of the Chicxulub [čikšulub] impact event. 66 million years ago Possible establishment of polyploid plants following the K/Pg mass extinction (66 million y. ago) Lohaus and Van de Peer (2016) Curr Opin Pl Biol ➢ WGDs clustered around the Cretaceous–Tertiary (KT) boundary ➢ the KT extinction event the most recent mass extinction (one or more catastrophic events such as a massive asteroid impact and/or increased volcanic activity) ➢ the KT extinction event extinction of 60% of plant species, as well as a majority of animals, including dinosaurs 35 Polyploidization – Diploidization cycle „diploids“ polyploids genome reshuffling descending dysploidy (chromosome no. reduction) genome downsizing and/or upsizing diversification / species radiation WGD – 4x / WGT – 6x Wendel et al. (2016) Genome Biol Whole-genome duplication and diploidization Genome diploidization: biased fractionation and (sub)genome dominance Liang and Schnable (2018) Mol Plant Subgenome 1 Subgenome 2 Biased (sub)genome fractionation and dominance can be explained by the mode of polyploidization Class IIClass I Garsmeur et al. (2013) Mol Biol Evol The fate of duplicated genes Adams and Wendel (2005) Genome evolution through cyclic WGD and diploidization 40 Allopolyploid origin and diploidization in the tribe Microlepidieae (Brassicaceae) • Australia: 15 genera, 47 species • New Zealand: Pachycladon, 11 species • chromosome number variation (from n = 4 to n = 24) Whole-genome duplication and diploidization ♂♀ Crucihimalayeae n = 8 Descurainieae/ Smelowskieae n = 7 allotetraploid ancestor n = 15 INTER-TRIBAL HYBRIDIZATION Pachycladon n = 10 crown group n = 4 - 7 n = 12 LONG DISTANCE DISPERSAL (15 genera / 42 spp.) Mandáková et al. (2010) Plant Cell (2010) BMC Evol Biol (2017) Mol Ecol Arabidella n = 24 2n = 4x = 30 2n = 8, 10, 12, 14, 20 (1 / 11) (1 / 3) AUTOPOLYPLOIDY Allopolyploid origin and diploidization in the tribe Microlepidieae 42 Imagine blue mustards!? South African Heliophila 43 Huang et al. (2023) Plant J Genome of Heliophila variabilis H. variabilis n = 11 330 Mb (~2 arabidopsis genome) 44 A B C AK1 I J AK4 D E AK2 F G H AK3 K-L M-N AK5 S T U AK7 V W X O P Q R AK6+8 subgenome #1 (n = 7) A B C AK1 I J AK4 D E AK2 F G H AK3 K-L M-N AK5 S T U AK7 subgenome #2 (n = 8) V Wa X Q AK8/6 O P Wb R AK6/8 A B C AK1 I J AK4 D E AK2 F G H AK3 K-L M-N AK5 S T U AK7 subgenome #4 (n = 8) V Wa X Q AK8/6 O P Wb R AK6/8 A B C AK1 I J AK4 D E AK2 F G H AK3 S T U AK7 subgenome #3 (n = 7) O P Wb R AK6/8 V Wa X Q AK5/6/8 M-N K-L Huang et al. (2023) Plant J 2n = 8x = 60 2n = 22 H. variabilis: rediploidization of an allo-octoploid genome ✓ reduction by 38 chromosomes (63% diploidization) ✓ end-to-end translocations ✓ nested chromosome insertions Genome size decrease (downsizing) • recombination • chromosome rearrangements Recombinational deletions after double-strand breaks (DSBs) – DSB repair Chromosome rearrangements (…in principle again DSBs and recombination) lost Robertsonian translocationLarge-scale deletion • unequal homologous recombination including unequal crossing-over • illegitimate recombination (non-homologous end joining, NHEJ) Genome size decrease (downsizing) Genome size decrease by unequal homologous recombination between two LTRs or between two LTR- retrotransposons ~70% of retrotransposon sequences in the A. thaliana genome are no longer autonomous: solo LTRs = probably the consequence of unequal homologous recombination = inactive, truncated elements cannot contribute to genome expansion Deletion through unequal crossing-over complementary nucleotides Two main pathways of non-homologous end joining (NHEJ) DNA lost (but some DNA can be inserted - filler DNA) microhomology-mediated end joining (MMEJ) NHEJ in plant somatic cells • NHEJ seems to be the main mode of DSB repair in higher eukaryotes • NHEJ might lead, in some cases, to genomic changes (deletions, insertions or various kinds of genomic rearrangements) • genomic alterations in meristematic cells can be transferred to the offspring • alternative NHEJ can mediate genome size loss Arabidopsis vs. tobacco (genome size larger in tobacco) - tobacco: almost every second deletion event is accompanied by the insertion of filler sequence - Arabidopsis: no insertions - overall length of the deletions is about one-third shorter in tobacco than in Arabidopsis >>> inverse correlation between genome size and the medium length of deletions >>> ??? species-specific differences in DSB repair pathways can contribute to the evolution of eukaryotic genome size ??? 1C = 4.5 Gb 1C = 157 Mb - A. thaliana (157 Mb) has lost 6 more introns than Arabidopsis lyrata (210 Mb) since the divergence of the two species but gained very few introns Genome and chromosome evolution ➢ genome size variation - variation in coding DNA amount - variation in non-coding DNA amount ➢ chromosome number variation Chromosome number variation: chromosome rearrangements 52 n = 8 n = 8 n = 7 n = 8 n = 5 n = 6 DSB miss-repair Chromosome rearrangements results from double-strand breaks and their miss-repair chromosome rearrangement Chromosome rearrangements – the role of repeats In organisms with repetitive DNA, homologous repetitive segments within one chromosome or on different chromosomes can act as sites of DSBs and their missrepair, i.e. non-allelic homologous recombination. Deletion formation by breakage and rejoining Deletion formation by intra-chromosomal (unequal) recombination Sometimes during meiosis two chromatids from homologous chromosomes (A) are misaligned during a cross-over event (B) as a result, one chromatid gained a duplicated region and the another lost a deleted region (C). The duplication as well as the deletion are inherited by resulting gametes. Deletion (and duplication) formation by unequal cross-over misaligned homologous chromosomes Inversions Inversions as balanced rearrangements are generally viable and show no particular abnormalities at the phenotypic level. Many inversions can be made homozygous. Inversion heterozygote - cells that contain one normal haploid chromosome set plus one set carrying the inversion. Microscopic observation of meioses in inversion heterozygotes reveals an inversion loop. meiotic inversion loop Inversion formation by intra-chromosomal recombination inverted repeats inversion Two types of inversions mechanism of inversion formation: breakage and rejoining Can be “adaptive” when it stabilizes/disrupt a superior combination of alleles on a chromosome (examples seen in Drosophila) Inversions and recombination: evolutionary significance Inversions may suppress recombination Chromosome rearrangements (typically inversions) may reduce gene flow by suppressing recombination. Inversions allow genes located in these regions to differentiate, in contrast to genes in freely recombining collinear regions. Reciprocal translocations attachment of chromosome fragment to a non-homologous chromosome (leading to deletions and duplications in the progeny) exchange of chromosome fragments between non-homologous chromosomes Unequal reciprocal translocation Robertsonian translocations are the most common recurrent structural anomaly in humans, with about 1 in 1000 individuals carrying this rearrangement. The carriers of ROBs have 45 chromosomes instead of the normal 46. Robertsonian translocations - ROBs (centric „fusions“) • type of a reciprocal translocation between two acrocentric/telocentric chromosomes • also called whole-arm translocations or centric-fusion translocations • named after the American insect geneticist W. R. B. Robertson, who first described a Robertsonian translocation in grasshoppers in 1916 • evolutionary significance >>> chromosome number reduction (from 2 acrocentric chromosomes one metacentric chromosome) lost lost Dicentric ROB (more frequent) Monocentric ROB Speciation by Robertsonia translocations („centric fusions“) End-to-end chromosome translocations („chromosome fusions“) centromere inactivation/elimination In principle unequal reciprocal translocation with breakpoints in (sub)telomeric regions. The second translocation product is minute and eliminated. Chromosome „fusion“ – the origin of the human (dicentric) chromosome 2 2n = 48 2n = 46 2n = 48 2n = 46 Chromosome „fusion“ – the origin of the human (dicentric) chromosome 2 inactive centromere active centromere Chiatante et al. (2017), MBE Did the origin of „fusion“ chromosome 2 contributed to reproductive isolation of hominid species from great apes? 2n = 48 2n = 46 • different no. of chromosomes → reproductive isolation • loss of gene(s) → adaptive advantage • gene linkage? changed regulation of gene expression? diploma & doctoral students martin.lysak@ceitec.muni.cz www.plantcytogenomics.org