6579 http://www.pozitivni-noviny.cz/IMAGES-1/Premyslovci/ORIGINAL-72dpi.jpg http://pearlsofprofundity.files.wordpress.com/2013/02/donald-duck-family-tree-2.jpg?w=535&h=394 COALESCENCE Fate of individual gene copies in the population ® gene trees Genetree LF1ca4e9_Kopie_Paroubek_03a d6aa2ff3-f2e2-4053-80b1-6293c3dd033.jpg Chimp with Cracker image by Gypsy48 File:Male silverback Gorilla.JPG Species trees vs. gene trees: gene A LF1ca4e9_Kopie_Paroubek_03a d6aa2ff3-f2e2-4053-80b1-6293c3dd033.jpg Chimp with Cracker image by Gypsy48 File:Male silverback Gorilla.JPG Species trees vs. gene trees: gene B Poly1 Phylogenetic relationships of 2 descendant populations (eg. mtDNA): polyphyly paraphyly reciprocal monophyly Ancestral polymorphism and lineage sorting barrier polyphyly paraphyletic stage Ancestral polymorphism and lineage sorting reciprocal monophyly species A species B Ancestral polymorphism and lineage sorting druh A druh B Problem: it is often difficult to distinguish between incomplete lineage sorting and consequences of gene flow incomplete lineage sorting recent gene flow Ancestral polymorphism and lineage sorting W-F population: haploid or diploid-hermaphrodite finite size, no fluctuations of N random mating complete isolation (no gene flow) discrete generations no age structure no selection variance of gamete sampling ® Poisson distribution Wright-Fisher model: http://blog.uvm.edu/cgoodnig/files/2014/05/Sewall_Wright.jpg http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/Doc1_files/image001.gif Sewall Wright Ronald A. Fisher Lineage sorting in W-F model: http://blog.uvm.edu/cgoodnig/files/2014/05/Sewall_Wright.jpg http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/Doc1_files/image001.gif Sewall Wright Ronald A. Fisher Koalescence_1_c.tif time http://blog.uvm.edu/cgoodnig/files/2014/05/Sewall_Wright.jpg http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/Doc1_files/image001.gif Sewall Wright Ronald A. Fisher Koalescence_1_c.tif time Lineage sorting in W-F model: http://blog.uvm.edu/cgoodnig/files/2014/05/Sewall_Wright.jpg http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/Doc1_files/image001.gif Sewall Wright Ronald A. Fisher time Lineage sorting in W-F model: time http://blog.uvm.edu/cgoodnig/files/2014/05/Sewall_Wright.jpg http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/Doc1_files/image001.gif Sewall Wright Ronald A. Fisher Koalescence_1_c.tif lineage sorting Lineage sorting in W-F model: Koalescence1 Koalescence2 Koalescence3 Coalescent: John F.C. Kingman [JFC Kingman in 2002] Koalescence_2_c.tif current generations time John F.C. Kingman [JFC Kingman in 2002] coalescence Coalescent: time John F.C. Kingman [JFC Kingman in 2002] Koalescence_2_c.tif coalescence Coalescent: time John F.C. Kingman [JFC Kingman in 2002] Koalescence_2_c.tif MRCA Coalescent: MRCA = most recent common ancestor time Koalescence_1_c.tif John F.C. Kingman [JFC Kingman in 2002] we don’t know how many copies were in generation of MRCA Coalescent: time John F.C. Kingman [JFC Kingman in 2002] Koalescence_2_c.tif MRCA we don’t know what was before MRCA Coalescent: time Koalescence3 N = 20 copies in population n = 5 copies in sample usually n << N MRCA iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg Probability of encounter of 2 cockroaches is n(n – 1)/4N, where n = number of cockroaches in box, N = number of „places“ in box iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg after coalescence, number of cockroaches (copies) is reduced by 1 ... iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg iStock_5308392Small_Cockroach.jpg with decreasing number of cockroaches (n), time to next contact (coalescence) increases after coalescence, number of cockroaches (copies) is reduced by 1 ... iStock_5308392Small_Cockroach.jpg ... to finish with just 1 copy Koalescence3 Koalescence5 with dereasing number of remaining copies, the process of coalescence gets slower (for large n ~ 4N, for 2 copies ~ 2N) coalescence of last k copiies takes (1 – 1/n)/(1 – 1/k) Þ first 90% copies coalesce during 9% of total time, remaining 91% of time we wait for coalescence of last 10% copies! if there are 100 lineages, probability that 101st lineage adds deeper root is only 0,02% Þ including additional gene copies is unlikely to result in deeper (older) MRCA Kingman’s coalescent: n f(n) distribution of time between coalescences is approximately exponential: with decreasing number of copies (n), time to next coalescence increases*) *) see number of cockroaches in box Koal_Simtree2 Koal_simtree adding other sequences is unlikely to result in deeper coalescence with decreasing number of free copies the process slows down Koalescence5 50 gene copies, 10 randomly chosen: in this case, 10 copies are sufficient for finding the deepest root of coalescent tree If we are interested in „old“ coalescences, we don’t need large samples eg. only 2 copies render, on average, 50% of coalescent time for the whole population! By contrast, if we are interested in time to first coalescence from n to n – 1, estimate Ne/[n/(n – 1)] is sensitive to n eg. range of mean time between first and last coalescence for 10 genes is 0,0444Ne to 3,60Ne; by increasing n to 100 genes, range will be 0,0004Ne – 3,96Ne Therefore, for estimates of old evolutionary events, small samples are sufficient, for estimates of recent events, large samples are necessary by increasing n 10´ range increases 100´ ... ... for last coalescence almost no difference Coalescent is affected by various factors, eg.: mutation recombination selection changes of population size Þ we can use coalescent theory for estimating these parametres Coalescent is affected by various factors, eg.: by migration 10.12.jpg Weak migration leads to most coalescences within local populations,.... .... to increasing time to MRCA and its variance MRCA 10.14.jpg Coalescent is affected by various factors, eg.: by recombination Selekce_tree neutral recent balancing selective sweep Effect of selection on shape of coalescent tree positive selection results in shorter coalescence balancing selection results in longer coalescence Effect of changes in population size on shape of coalescent tree growing population: coalescent rate decreases declining population: coalescent rate increases Ne = 100 Ne = 10 Ne = 25 360 gen. 90 gen. 36 gen. n = 10 Gene vs. species trees once more: long intervals between speciation events ® gene and species trees are identical short intervals between speciation events ® gene and species trees can differ (hemiplasy) since we assess divergence among sequences and not between species, our estimates are necessarily overestimated discrepancies between gene and species trees can be minimized by using markers with low Ne, eg. mtDNA or Y chromosome PHYLOGEOGRAPHY John C. Avise studies principles and processes affecting geographic distribution of genealogical lineages in a way, it combines microevolutionary processes (population genetics) with macroevolution (phylogenesis) mostly intraspecific studies or related species Phylogeography: The History and Formation of Species macMST Mus macedonicus Mustela erminea mouse Minimum Spanning Tree (MST) Mimum Spanning Network (MSN) Median-joining network etc. Recent expansion: rapid expansion of a single haplotype accumulation of low number of mutations star structure Changes of population size Tajima’s test (Tajima’s D) mismatch distribution (rozdělení párových neshod) coalescent, ML or BA, MCMC Bayesian Skyline Plot (bayesovský panoramatický graf) 1. Tajima’s test based on comparison of haplotype diversity and nucleotide diversity primarily it is test of selective neutrality, but it can also indicate population expansion or bottleneck Let’s revisit the neutral theory: equlibrium heterozygosity q = 4Nem if evolution neutral, q can be estimated in various ways, e.g. as mean number of pairwise differences p (or qp)*, or as qW**: where S = number of segregating sites *) nucleotide diversity **) Watterson’s theta If NT and model of infinite sites: qp = qW Fumio Tajima (1989): Eg.: * * * * 1 ACCCG AATTC CAATC CGGTT 2 AACTG AATTC GAATC CGGTT 3 AACTG AATTC CAATC CGGTT 4 ACCTG AATTC TAATC CGGAT pairwise comparisons: 1-2: 3 differences 1-3: 2 differences 1-4: 3 differences 2-3: 1 differences 2-4: 3 differences 3-4: 3 differences av. p = (3+2+3+1+3+3)/6 = 2,5 S = 4 segregating sites qW = 4/(1/1 + 1/2 + 1/3) = 4/1,83 = 2,186 qp - qW = 2,5 – 2,186 = 0,314 1. Tajima’s test very negative values indicate population expansion – prevalence of „young“ polymorphisms, when new haplotypes were arising, but nucleotide diversity still low programs Arlequin, DnaSP etc. likewise Fu’s test etc. 2. Mismatch distribution pairwise comparison of all sequences ® histogram Divergence (%) Divergence (%) Divergence (%) Sequences very similar Sequences very divergent Mixture of similar and divergent sequences 10.9.jpg pairwise differences growing stable test of agreement between real distribution and prediction: Harpending’s raggedness index (Harpending 1994) sum of squared deviations time of expansion/bottleneck: t = 1/2u, where u is mutation rate for whole sequence we can also estimate population size before and after expansion Mismatch Fig2 3. ML a Bayesian inference MCMC comparison of stable population model and model of exponential growth/decline using LRT with 1 degree of freedom program Fluctuate: growth parametre g ML i BA approach LTT LTT stable population exponential growth 4. Bayesian Skyline Plot (BSP) 10.10.jpg Bayesian skyline plot distribution of genealogical lineages in time BSP is based on this approach programs BEAST/Tracer changes in population size between nodes classical BSP generalized BSP 10.jpg domesticus domesticus - Europe musculus - Europe origin outside Europe expansion to Europe origin outside Europe expansion to Europe Mouse colonization of Europe 10.jpg Karmin et al. Genome Research 2015 kat_I Possible results of phylogeografical studies (Avise 2000) •Category I: •distinct allopatric lineages •barriers to gene flow or low dispersion •differences because of lineage sorting, or accumulation of new mutations Apteryx australis Image1 (3) Kiwi2 kat_II •Category II: sympatric, but deep lineages Þ secondary contact of previously separated populations •Category III: •allopatric, only slightly separated lineages •closely related, but geographically localized haplotypes •recently, populations in contact •but: gene flow sufficiently low → drift and lineage sorting → divergence of populations •often: – Category I on coarse scale Category III on fine scale • eg.: Geomys pinetis •Category IV: •sympatric, only slightly separated lineages •strong gene flow •absence of geographic barriers or •recent expansion Anguilla rostrata Random dispersion of larvae Panmictic aggregation during spawning •Category V: •combination of III and IV •low divergence of lineages •some lineages widely distributed (likely ancestral), others (new) geographically limited •we should use private haplotypes as characters Genealogical concordance Fishes in SE USA rybystrom rybymap Genealogical concordance (congruence on different levels) •Various parts of gene sequence • • •More sequences (genes) of the same species • • •More species in the same region • • •Support of biogeographical regions (more species, more areas) concj Genetic consequences of glaciations Chorthippus parallelus Chorthippus_parallelus_F •Refugia (Iberian, Apennine, Balkan peninsulas) • •In refugia, small populations during relatively long time • •Lineage sorting (+ mutations) • •Subsequent expansion → intraspecific hybrid zones • •But in several species, there were also northern refugia! Horáček, Vesmír 94 (2015) b a1 a2 c1 c2 A A B A C a b c a a b c c a2 a1 b c1 c2 B A C dispersal vicariance a1 a2 b1 b2 c A A B B C a a b a1,a2 B A C b1,b2 c geographic structure in: mtDNA YES autosomes yes chr. Y yes geographic structure in: mtDNA NO autosomes yes chr. Y *** geographic structure in: mtDNA (in females) YES autosomes no chr. Y no geographic structure in: mtDNA NO autosomes no chr. Y no Relationship between genetic population structure, sex-specific dispersal and gene flow regimes (Avise 2000) female dispersal and gene flow low high markers: mtDNA sequences Y chr. sequences microsatellites SNP Control region ?Small (15-20 kb), circle molecule ? ?Without introns ? ?Minimum of non-coding regions ? ?Uniparental (maternal) ? ?Non-recombining ? ?Only one type in many copies in the cell ? ?Neutrality (same fitness of different variants) Why mtDNA advantageous? ... and why the question marks? Problems for population genetics: •Neutrality • •Interspecific transmission • •Nuclear pseudogenes • •Biparental inheritance • •Recombination • Neutrality? influence on fitness (experimental evidence): mouse (Mus) fruit fly (Drosophila) human OXPHOS A schematic representation of the mitochondrial OXPHOS system | Download Scientific Diagram Interspecific introgression: hairs in Spain: presence of Lepus timidus mtDNA in L. granatensis, L. castroviejoi and L. europaeus however, L. timidus disappeared at the end of the last glacial; multiple transmission of various mtDNA lineages = mtDNA capture http://d1vn86fw4xmcz1.cloudfront.net/content/royptb/363/1505/2831/F1.large.jpg?width=800&height=600 &carousel=1 http://d1vn86fw4xmcz1.cloudfront.net/content/royptb/363/1505/2831/F2.large.jpg?width=800&height=600 &carousel=1 Nuclear Mitochondrial DNA = NUMT: copies of mtDNA segments integrated to nuclear DNA loss of function molecular fossils similarity with original sequence ® risk of amplification instead of mtDNA Þ problem!! various appearance in different groups and different species within the groups eg.: numt > 12,5 kb in 7 felid species humans: 27 numts after split from chimpanzee lineage What to do? ultracentrifugation (usually fresh samples needed, or at least deep-frozen) tissues with large number of mitochondria (eg. muscles) long-range PCR RT-PCR electronic PCR (in species with known genomes) Recombination of mtDNA: necessary conditions: biparental inheritance – fusion of mitochondria existence of protein machinery for recombination: also in humans biparental inheritance: despite myths, father’s mitochondria usually transmitted to the zygote, where they are labelled and subsequently eliminated (in mammals, mitochondria are labelled by father’s nuclear genes) ® in some species paternal leakage: Mus, Drosophila, Parus, Homo Recombination of mtDNA: biparental inheritance: Gyllensten et al.,1991: Paternal inheritance of mitochondrial DNA in mice. Nature 352: 255–257. F1 hybrids Mus spretus ´ C57BL frequency of paternal mtDNA relative to maternal » 10-4 Shitara et al.,1998: Genetics 148: 851–857. F1 hybrids Mus spretus ´ C57BL leakage of paternal mtDNA not in all tissues only in F1, not in subsequent generations (in backcrosses) ® species-specific exclusion