03/05/2016 1 03/05/2016 03/05/2016 SOURCE ARE 1 Historical data available Genetic data available COLONIZED AREA Population history (& genetic data) Past evolutionary and demographic processes have left traces in the genetic variation - analyzing them we attempt to reconstruct evolutionary history of populations Studying population history = modelling - Selection of the most appropriate model (evolutionary scenario) - Estimation of parameters (e.g. time of events, number of founders, duration of bottlenecks, population size, mutation rate) Description of recent invasions (invasion genetics) Description of older history (phylogeography) 3 03/05/2016 Inferring population history - ABC modelling We have observed data (e.g. microsatellite genotypes) We know genetic variation and structure We would like to know which demographic processes and how and when have created such an observed data = population evolutionary history Why is ABC approach useful in modelling population history? It allows to deal with much more complex models with many parameters and a lot of complex data (many samples, populations, genetic loci, sequences) and hence models much more realistic Population; African population Non-African populatk _Nature Reviews | Genet Approximate Bayesian Computation ■ model choice and parameter estimation ■ exact LIKELIHOOD function is intractable in complex situations and can be bypassed (approximated) by a SIMILARITY MEASURE between many simulated (under various models) and a single real (observed) data ■ data SIMULATION under various models ■ COMPARISON of simulated and observed data - model choice ■ Acoording to the most supported model we can ESTIMATE VALUES of its parameters - parameter estimation 4 Decreasing of dimensionality SIMULATED _JI_y OBSERVED DATASET DATA^pS VERSUS SUMMARY STATISTICS VERSUS Comparison of simulated and real dataset to infer probability of various models (evolutionary scenarios of population history) kolonizace3_PCA_l_2_5700 m i 1 1 1 i 1 1 1 i 1 1 1 i 1 * 1 i * * 1 i * 1 * i 1 * 1 i * * * i * 1 * i 1 * * i -6 -5-4-3-2-10 1 2 3 4 P.C.1 ( 60.0%) 03/05/2016 ■ ■ L'lHL! I. - ili-<«-LK.-iiL-s ^.i[T:.l.tonn Approximate Bayesian Computation in Population Genetics Mark A. Beaumont,*"1 Wenyang Zhang+ and David J, Balding1 * School of Animal and Microbial Sciences, The University of Reading, Whiteknighls, Reading RG6 6Aj, United IGngdon 'Institute oj ."VJi'1';,j.',-;;^;.0. i 1";. \. { n/vpfi/i", "f K,Jr,-i. IJjHtrrf'uri:. for:! i 11 iSl. ■! r:r;-'<\ Kingdom rind ''Ijsfvirtmsnf i j..1'. '!..:::':. I < j;. V ,':,.>■■.■' .'it', St. Man's Cerrnfm\ Norfolli Plmv. London W2 //¥.. Uuih'd Kingdom Manuscript received March 22, 2002 .Vfvpifl |i>r | ml ihi ii ii in < >t i'ilii i 1. i<"i'2 ABSTRACT V . f.r f.n i m iUH I. t .pp r ifr.n. I' i* r*. in T.Tt,i il ir.ln n . r r. if. r-.-r ri - imn-i in suvtisti<"s' 1 In1 method is suited In OMiiples. pr'ilileins tli.U ,e,"ts<- n jji -| ;ii la i l< -i i lie net lis. v\\< w \ w\-.\ id i sis i k-vl'j|iL-(l ] ii i Ills set 1111-; I r. ■ nil- i . mi 111 m I" 11 -| n-1111 ■. 11 1111 ■ | -i ■ i 11 ■! i h'.i 111 111 n n i i >f a parameter, such n tit m. an ,-r .1- r.-.ir, nr. it.* if f.i iimil,-d ..irh.nr tplinl LL l.h.-. .1 ral iliiirr- Thi* i* t l-.i ■.-■H I iv lii iiii:; .i 11 .il-ln ir.n r; ^ro'.i' in i>] si.....I.n; ;l | i.ir. n i n -i; i" e.ili u-s < m sum il.il ri I sum man si. it tsi ks. .iihI i lien -■iili'-lnuliii.^ i Lie i il">:-erveil Miium.irv smilmlis iiilh tin r< ; ;i ■ n ■! i - ■ |' i- ii i ■ 11 1 \ \- 11 k i 11- " I i «>i ill I lies mairv oi Llae achai liases nE LtaeesuLi st.ilistical inEeL vjkl" i.itli 111 ■ - . i .n 11'i n. n 1.1 n. il .1 liciei uv oJ mei 111 id;i h.nrd 'jn uifT.rrur. ' i Hi n * ' I . '.Hnr.i',;: -i ih* m. it. .H r rh fhr ninv.n- f :t .m *i. f- .r mi rrun. .Ih niir:-i.ih-i | .in. in llie sui.iliJ.Hii m sup. s■=■ i <>l l l i.i i-^.i. 11 < ■ pa i.unel els lli.it arise in population gene tics problems can be handled without difficult!'. Simulation results indicate computational and si.ui.lii\il i ll'Linno ili.il luiiip.uvs lavnrablv with tli.i.;- <>] alternative methods pivvinLislv proposed in the literature. We aki i o:ni|>are i lie rrkilt'.i el lii lem i i il 111 It: rniee:. i ihl ,iu i eil n sin;.; in;11 Inn Is based ritral.i:om.il47l-2l 05/11/4Q1 RESEARCH ARTICLE (bmc Bioinformatics Open Access Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (vl.O) Jean-Mane Cürnuet1, orpine Ravigne", Amaud E APPLICATIONS NOTE DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data Jean-Marie Coniuet1. Piene Pudlo1 : \ Julien Veyssier1 * \ Alexandre Delme-Gaicia1 \ Ma-tliien Gautiei1 , Raphael Leblois1 , Jean-Michel Mann" , and Aniaud Estoup1 ' Iura. UMRJ063 Cbgp. Montpellier. France.: UniverMte Montpellier 2. l*MR CMRS 5149.13M. Montpellier. France 1 Institut .te Biologie Coinpiuaiioimelle iIBC). 95 nie de la Oalera. 34095 Montpellier. France. * CNRS-UM:. Institut d Biologie Couiputationiielle. LIRMM. Motupelliei. France no simple software solution => inaccessible to most biologists BUT NOW Do It Yourself: I : software allows to infer populaton history using the ABC approach (Cornuet et al. 2008, 2010, 2014) DIYABC $ DIYABC 2.0.4 (19-03-2014) File Help üjNew MSS djNew SNP g^Open bJ Save (ijjsave alt IYABC Version 2 beta A computer software to maki Inference on population evolutiO' history using genetic data rosatellites, DNA sequences and ) obtained from population J http://www.montpellier.inrs.fr/CBGP/dig3bc/ 2.0.4 New Microsat/Sequence project New SNP project Open project 03/05/2016 Genetic data Sequences SNPs Genotypes J__J L_ ll....... jIjLI A ABC works in 3 steps 1. SIMULATION STEP: a very large reference table is produced and recorded prior parameter distributions scenario mutation model summary statistics (e.g. number of alleles, expected heterozygosity.fst) the most time-consuming step based on the genealogical tree of sampled genes and coalescent theory SIMULATION STEP Diswpafflm*lffvalj« Fra~i pricr cislribiJ a-is Sirriu a-e jertfic ca-a acxj'cirg to scenario end mutalMi model rtpMi ß, limes J Compute sumniwys1(5:islic!; Hcco'd xrsTitlt-i cn;l SLnma".fs:j; clic valjesinarEFerencelahe'ile REJECTION STEP anc simlated sjrnrnar, slstislics le'an simLlaled rata ie's dnsesl :n unserved dale ESTIMATION STEP Esltaal* posterior p* otiaMlty sc«rsri33 3j ogiüi; 'cgrtssisn Esumate posltflot dlstflbitfon or p5rsrrc:cr sy leca 1 ntar rüg'cssioi Cornuet et al. 2008, Bioinformatics 10 03/05/2016 11 SOURCE REGION DIYABC works in 3 steps 1. SIMULATION STEP: a very large reference table is produced and recorded 2. REJECTION STEP: only the simulated data closest to the observed dataset are retained based on Euclidian distances in multidimensional space of summary statistics SIMULATION STEP Diswpafflm*lffvalj« Fra~i pricr cislribiJ a-is Sirriu a-e jertfic ca-a acxj'cirg to scenario end mifalMi model repMl b, limes J Compute summary stalistks Hcco'd xrsTitlt-i cn;l SLnma".fs:j; clic val jes in □ reFerpnee lah e 'lie REJECTION STEP anc si-nLlded sjrnrnarj slstislics dnsesl :o observed dale ESTIMATION STEP Esltaal* posterior piotaWSl> sc«rsri33 3j ogiili; 'cgrcssisn Esiimgte posltflot dlstflbitfon or p5rsrrc:cr sy leca 1 ntar rtg'cssioi 13 Comparison of our observed dataset with simulated ones and inferring posterior distributions of scenarios kolonizace3_PCA_l_2_5700 03/05/2016 DIYABC works in 3 steps 1. SIMULATION STEP: a very large reference table is produced and recorded 2. REJECTION STEP: only the simulated data closest to the observed dataset are retained 3. ESTIMATION STEP: Estimating posterior distributions of parameters through a local linear regression procedure SIMULATION STEP Diaivparamftlwvalü« J rrom prior flsttlbitfons ^ SirriLj a-p jerr/idtoladCCOTiigId SMnarurinrJmi.latiDn mode IFof each swnarb1| ' Compute summwy statistics repeat n, imes I {Ricnd paramttvr and summary sldcüc I valjesinarEFerencelahe'ile REJECTION STEP C:rfiLledi!ili3i:cs sck'cer cbucvcd □nc simlated 5jmmarv sialics levari sirnUBled tald daws) :o tinse-jec CrTa ESTIMATION STEP -i-i-iatE? pawnor pictawiiv srswmu: ogi:1i; •cjressun EstindrpDstericrcist-ibLlan or pErsrrcicr leca I near rtg'cssion Posterior distributions of parameters are estimated according to the most supported scenario kolonizace3_PCA_l_2_5700 -3.5 -5-4-3-2-10 1 2 3 4 P.C.1 ( 60.0%) 15 03/05/2016 hoto: Jaroslav Červený Black rat {Rattus rattuš) invasion in Senegal Konečný et al. 2013, Molecular Ecology Rattus rattus distribution 16 istoric background 03/05/2016 Genetic data 14 microsatellites (9 - 22 alleles, mean: 14.14) mean allelic richness - 3.06 (range 1.87 - 4.71) mean expected heterozygosity - 0.538 (range 0.323 - 0.762) both allelic richness and heterozygosity decreased with longitude J 18 03/05/2016 vi Genetic information ^^^^^^^^^^^ >n of ourrnodels a rs = evo^Jtionary £ Historie information Formulation of ourrnodels and related parameters = evo^Jtionary scenarios Model choice Parameters estimation 19 03/05/2016 ABC analysis in four steps - four questions 20 03/05/2016 21 03/05/2016 Comparison of our observed dataset with simulated ones and inferring posterior distributions of scenarios kolonizace3_PCA_l_2_5700 22