15/05/2017 1 III. POPULATION HISTORY MODELLING SPECIES SUBPOPULATIONS (DEMES) POPULATIONS MOLECULAR ECOLOGY 6 April 2017 We are interested in genetic structure of a population(s) and HOW HAS BEEN CREATED 15/05/2017 2 We are interested in genetic structure of a population(s) and HOW HAS BEEN CREATED We are interested in genetic structure of a population(s) and HOW HAS BEEN CREATED 15/05/2017 3 SOURCE AREA COLONIZED AREA Historical data available Genetic data available Population history (& genetic data)  Past evolutionary and demographic processes have left traces in the genetic variation – analyzing them we attempt to reconstruct evolutionary history of populations  Studying population history = modelling – Selection of the most appropriate model (evolutionary scenario) – Estimation of parameters (e.g. time of events, number of founders, duration of bottlenecks, population size, mutation rate)  Description of recent invasions (invasion genetics)  Description of older history (phylogeography) 15/05/2017 4 Inferring population history – ABC modelling  We have observed data (e.g. microsatellite genotypes)  We know genetic variation and structure  We would like to know which demographic processes and how and when have created such an observed data = population evolutionary history  Why is ABC approach useful in modelling population history? It allows to deal with much more complex models with many parameters and a lot of complex data (many samples, populations, genetic loci, sequences) and hence models much more realistic  model choice and parameter estimation  exact LIKELIHOOD function is intractable in complex situations and can be bypassed (approximated) by a SIMILARITY MEASURE between many simulated (under various models) and a single real (observed) data  data SIMULATION under various models  COMPARISON of simulated and observed data – model choice  Acoording to the most supported model we can ESTIMATE VALUES of its parameters – parameter estimation Approximate Bayesian Computation 15/05/2017 5 Decreasing of dimensionality SIMULATED DATASETS OBSERVED DATASET VERSUS VERSUS SUMMARY STATISTICS Scenario 1 Scenario 2 Scenario 3 Observed data set kolonizace3_PCA_1_2_5700 P.C.1 ( 60.0%) 543210-1-2-3-4-5-6 P.C.2(18.6%) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 Comparison of simulated and real dataset to infer probability of various models (evolutionary scenarios of population history) 15/05/2017 6 NEW APPROACH Approximate Bayesian Computation (ABC) Beaumont et al. 2002, Genetics - estimations of parameters - useful for model choice among various scenarios applied on the same data - the likelihood criterion is replaced by a similarity criterion between simulated & observed datasets - measured by a distance between summary statistics computed on both datasets 0 0 1 1 2 5 5 5 9 24 50 52 84 79 115 0 20 40 60 80 100 120 140 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 in genetics, evolutionary biology and ecology “Approximate Bayesian Computation” topic in Web of Science 15/05/2017 7 ABC approach used successfully for description of recent invasion scenarios  Estoup & Clegg 2003, Molecular Ecology: Zosterops lateralis, Pacific islands  Estoup et al. 2004, Evolution: Bufo marinus, Australia  Pascual et al. 2007, Molecular Ecology: Drosophila subobscura, invasion over Atlantic ocean  Lombaert et al. 2010, PLoS ONE: Harmonia axyridis, invasion through the Atlantic and subsequently to the whole world 15/05/2017 8 Software Sunnåker et al. 2013, PLOS Computational Biology SOFTWARE Bez GUI: SimCoal – simulator + ABC regression – Anderson et al. 2005 msBayes – simulator + ABC regression – Hickerson et al. 2007 S GUI: ONeSAMP – ABC rejection – jen jedna Wright-Fisher populace – Tallmon et al. 2004 popABC – ABC rejection – Lopes et al. 2009 abc – Csilléry et al. 2011, Methods in Ecology and Evolution 15/05/2017 9 no simple software solution => inaccessible to most biologists BUT NOW  Do It Yourself: DIYABC software allows to infer populaton history using the ABC approach (Cornuet et al. 2008, 2010, 2014) DIYABC 15/05/2017 10 Genetic data  Sequences  SNPs  Genotypes 1. SIMULATION STEP: a very large reference table is produced and recorded prior parameter distributions scenario mutation model summary statistics (e.g. number of alleles, expected heterozygosity,fst) the most time-consuming step based on the genealogical tree of sampled genes and coalescent theory ABC works in 3 steps Cornuet et al. 2008, Bioinformatics 15/05/2017 11 SOURCE REGION COLONIZED REGION Historic background Genetic data (microsatellites, SNPs) SOURCE REGION Historic background COLONIZED REGION 15/05/2017 12 SOURCE REGION Historic background COLONIZED REGION SOURCE REGION Historic background Prior distribution of parameters describing the scenario: Ty, Tz --- divergence times – establishment of Y and Z populations Uniform distribution(min 100, max 500 generations) COLONIZED REGION 15/05/2017 13 100 500 Ty 10 10000 X 0,0001 0,001 µ Prior distribution of parameters describing the model Evolutionary scenarios = models SIMULATED DATASETS Genetic data → summary statistics scenario X Y Z Ty Tz µ mean number of alleles mean heterozygosity 2 3797 7013 9839 484 486 0.00083 8.4 13.2 11.3 0.7841 0.8669 0.8589 3 3648 1355 1206 453 209 0.00072 7.9 6.1 4 0.7894 0.6371 0.5465 1 6802 7945 3929 176 346 0.0003 8.8 11.4 7.1 0.7877 0.8367 0.7824 1 4715 9090 5767 290 301 0.00048 7.5 12.6 9.1 0.7842 0.8211 0.7919 3 134 2714 3804 406 342 0.00029 1.4 4.8 4.7 0.0651 0.5182 0.5906 1 9331 902 4882 305 197 0.00096 13.6 6.5 13 0.863 0.5471 0.8294 3 1912 1785 6813 385 124 0.00035 4.3 5.5 7.1 0.5924 0.6414 0.7134 1. SIMULATION STEP: a very large reference table is produced and recorded 2. REJECTION STEP: only the simulated data closest to the observed dataset are retained based on Euclidian distances in multidimensional space of summary statistics DIYABC works in 3 steps 15/05/2017 14 Scenario 1 Scenario 2 Scenario 3 Observed data set kolonizace3_PCA_1_2_5700 P.C.1 ( 60.0%) 543210-1-2-3-4-5-6 P.C.2(18.6%) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 Comparison of our observed dataset with simulated ones and inferring posterior distributions of scenarios SOURCE REGION Now: posterior distributions will be estimated according to the winning scenario THE WINNER COLONIZED REGION 15/05/2017 15 1. SIMULATION STEP: a very large reference table is produced and recorded 2. REJECTION STEP: only the simulated data closest to the observed dataset are retained 3. ESTIMATION STEP: Estimating posterior distributions of parameters through a local linear regression procedure DIYABC works in 3 steps Scenario 1 Observed data set kolonizace3_PCA_1_2_5700 P.C.1 ( 60.0%) 43210-1-2-3-4-5 P.C.2(18.6%) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 Posterior distributions of parameters are estimated according to the most supported scenario 100 500 Ty 10 10000 X 0,0001 0,001 µ