>gi|5835135|ref|NC_001644.1| Pan paniscus mitochondrion, complete genome GTTTATGTAGCTTACCCCCTTAAAGCAATACACTGAAAATGTTTCGACGGGTTTATATCACCCCATAAAC AAACAGGTTTGGTCCTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCGTCCCGTGAG TCACCCTCTAAATCACCATGATCAAAAGGAACAAGTATCAAGCACACAGCAATGCAGCTCAAGACGCTTA GCCTAGCCACACCCCCACGGGAGACAGCAGTGATAAACCTTTAGCAATAAACGAAAGTTTAACTAAGCCA TACTAACCTCAGGGTTGGTCAATTTCGTGCTAGCCACCGCGGTCACACGATTAACCCAAGTCAATAGAAA CCGGCGTAAAGAGTGTTTTAGATCACCCCCCCCCCAATAAAGCTAAAATTCACCTGAGTTGTAAAAAACT CCAGCTGATACAAAATAAACTACGAAAGTGGCTTTAACACATCTGAACACACAATAGCTAAGACCCAAAC TGGGATTAGATACCCCACTATGCTTAGCCCTAAACTTCAACAGTTAAATTAACAAAACTGCTCGCCAGAA CACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCATATCCCTCTAGAGGAGCCTGTTCT GTAATCGATAAACCCCGATCAACCTCACCGCCTCTTGCTCAGCCTATATACCGCCATCTTCAGCAAACCC TGATGAAGGTTACAAAGTAAGCGCAAGTACCCACGTAAAGACGTTAGGTCAAGGTGTAGCCTATGAGGCG GCAAGAAATGGGCTACATTTTCTACCCCAGAAAATTACGATAACCCTTATGAAACCTAAGGGTCGAAGGT GGATTTAGCAGTAAACTAAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCCCGT CACCCTCCTCAAGTATACTTCAAAGGATATTTAACTTAAACCCCTACGCATTTATATAGAGGAGATAAGT CGTAACATGGTAAGTGTACTGGAAAGTGCACTTGGACGAACCAGAGTGTAGCTTAACATAAAGCACCCAA CTTACACTTAGGAGATTTCAACTCAACTTGACCACTCTGAGCCAAACCTAGCCCCAAACCCCCTCCACCC TACTACCAAACAACCTTAACCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGTAAATCGGCG CAATAGATATAGTACCGCAAGGGAAAGATGAAAAATTACACCCAAGCATAATACAGCAAGGACTAACCCC TGTACCTTTTGCATAATGAATTAACTAGAAATAACTTTGCAAAGAGAACTAAAGCCAAGATCCCCGAAAC CAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATA GGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTA AATTTACCTACAGAACCCTCTAAATCCCCCTGTAAATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTA GACACTAGGAAAAAACCTTATGAAGAGAGTAAAAAATTTAATGCCCATAGTAGGCCTAAAAGCAGCCACC AATTAAGAAAGCGTTCAAGCTCAACACCCACAACCTCAAAAAATCCCAAGCATACAAGCGAACTCCTTAC GCTCAATTGGACCAATCTATTACCCCATAGAAGAGCTAATGTTAGTATAAGTAACATGAAAACATTCTCC TCCGCATAAGCCTACTACAGACCAAAATATTAAACTGACAATTAACAGCCCAATATCTACAATCAACCAA MODULARIZACE VÝUKY EVOLUČNÍ A EKOLOGICKÉ BIOLOGIE CZ.1.07/2.2.00/15.0204 PF_72_100_grey_tr ubz_cz_black_transparent 15 coin tosses: ® score TTHHHTHTTTHTHHT tj. 7´ head (H), 8´ tail (T) MAXIMUM LIKELIHOOD, ML (maximální věrohodnost) Probability of head = p, tail = (1 – p) Because tosses independent Þ probability of final score = (1 – p)´(1 – p)´p´p´p´(1 – p)´p´(1 – p)´(1 – p)´(1 – p)´p´(1 – p)´p´p´(1 – p) = = p7(1-p)8 maximum = 0,4666 » 7/15 score TTHHHTHTTTHTHHT [7´ head (H), 8´ tail (T)] Likelihood = conditional probability of data (final score) given the hypothesis: L = Pr(D│H) = Pr(7´ head, 8´ tail │hypothesis) Hypothesis? Eg. H = coin is not „biased“, ie. p = 1/2 Þ L = 3,0517.10-5 If the coin is biased so that we get tail in 2/3 cases: p = 1/3 Þ L = 1,7841.10-5 Þ result of tosses 1,7´ more probable with unbiased coin p maxL maximum value of the likelihood function maximum likelihood estimate (MLE) of a hypothesis parameter Maximum likelihood in phylogenetic analysis 1 TCAAAAATGGCTTTATTCGCTTAATGCCGTTAACCCTTGCGGGGGCCATG 2 TCCGTGATGGATTTATTTCCGCAATGCCTGTCATCTTATTCTCAAGTATC 3 TTCGTGATGGATTTATTGCAGGTATGCCAGTCATCCTTTTCTCATCTATC 4 TTCGTGACGGGTTTATCTCGGCAATGCCGGTCATCCTATTTTCGAGTATT data: tree: topology t branch lengths n + evolutionary model q = hypothesis L = P(D│H): D = sequence matrix (data), H = t (topology) + n (branch lenghts) + q (model) 1 j N 1 TCAAAAATGGCTTTATTCGCTTAATGCCGTTAACCCTTGCGGGGGCCATG 2 TCCGTGATGGATTTATTTCCGCAATGCCTGTCATCTTATTCTCAAGTATC 3 TTCGTGATGGATTTATTGCAGGTATGCCAGTCATCCTTTTCTCATCTATC 4 TTCGTGACGGGTTTATCTCGGCAATGCCGGTCATCCTATTTTCGAGTATT x y x y x y n1 n2 n3 n4 n5 ni = branch lengths 1 j N 1 TCAAAAATGGCTTTATTCGCTTAATGCCGTTAACCCTTGCGGGGGCCATG 2 TCCGTGATGGATTTATTTCCGCAATGCCTGTCATCTTATTCTCAAGTATC 3 TTCGTGATGGATTTATTGCAGGTATGCCAGTCATCCTTTTCTCATCTATC 4 TTCGTGACGGGTTTATCTCGGCAATGCCGGTCATCCTATTTTCGAGTATT x y n1 n2 n3 n4 n5 x: 4 nucleotides y: 4 nucleotides Þ 4 ´ 4 = 16 possible scenarios L(1) = P(y)´P(y®x)n3´P(x®C)n1´P(x®C)n2´P(y®A)n4´P(y®G)n5 L(j) = P(scenario 1) + …. + P(scenario 16) 1 j N 1 TCAAAAATGGCTTTATTCGCTTAATGCCGTTAACCCTTGCGGGGGCCATG 2 TCCGTGATGGATTTATTTCCGCAATGCCTGTCATCTTATTCTCAAGTATC 3 TTCGTGATGGATTTATTGCAGGTATGCCAGTCATCCTTTTCTCATCTATC 4 TTCGTGACGGGTTTATCTCGGCAATGCCGGTCATCCTATTTTCGAGTATT ni = branch lengths x y n1 n2 n3 n4 n5 x: 4 nucleotides y: 4 nucleotides Þ 4 ´ 4 = 16 possible scenarios 1 j N 1 TCAAAAATGGCTTTATTCGCTTAATGCCGTTAACCCTTGCGGGGGCCATG 2 TCCGTGATGGATTTATTTCCGCAATGCCTGTCATCTTATTCTCAAGTATC 3 TTCGTGATGGATTTATTGCAGGTATGCCAGTCATCCTTTTCTCATCTATC 4 TTCGTGACGGGTTTATCTCGGCAATGCCGGTCATCCTATTTTCGAGTATT ni = branch lengths all sites: L = L(1) ´ L(2) ´ … ´ L(j) ´ … ´ L(N) = lnL = lnL(1) + lnL(2) + … + lnL(N) = Search for maximum likelihood of the tree ® eg. Newton (Newton-Raphson) method https://upload.wikimedia.org/wikipedia/commons/e/e0/NewtonIteration_Ani.gif https://upload.wikimedia.org/wikipedia/commons/e/e0/NewtonIteration_Ani.gif Maximum likelihood tree search: heuristic search stepwise addition ... eg. PHYLIP star decomposition ... eg. MOLPHY; neighbor-joining tree branch swapping Heuristic search Likelihood (ML) and parsimony (MP) A A A G No. changes Parsimony No. changes Parsimony Simulation Likelihood and consistency Konzistence_tab “wrong” Konzistence_tab2 “true” “long-branch repulsion” Farris (anti-Felsenstein, inverse Felsenstein) zone Konzistence_tab2 A C B D 2 throws: 1. throw = 2. throw = BAYESIAN ANALYSIS (Bayesovská analýza) ML: Probability of data given hypothesis Bayesian approach: Conditional probability of hypothesis given data P(H½D) Example.: set of 100 dice, from which we choose one we know that of 100 dice, 80 are ‘fair’ and 20 biased for 6 What is the probability our dice is biased? probability of individual results: all the same in unbiased dice, varied in biased dice: unbiased biased 1/6 1/21 1/6 3/21 1/6 3/21 1/6 4/21 1/6 4/21 1/6 6/21 Thomas Bayes Posterior probability that the coin is biased is given by the Bayes equation: P(H½D) is called posterior probability (aposteriorní pravděpodobnost) posterior probability is a function of likelihood L = P(D½H) and prior probability (apriorní pravděpodobnost) reflecting our a priori expectation or knowledge P(D½H) ´ P(H) P(H½D) = S[P(D½Hi)´P(Hi)] likelihood prior probability sum of numerators across all alternative hypotheses prior probability (biased) = 0,2 (20/100 biased dice in the set) Pr. of getting with unbiased dice: P = 1/6 ´ 1/6 = 1/36 Pr. of getting with biased dice: P = 3/21 ´ 6/21 = 18/441 P(biased| ) = P( |biased) ´ P(biased) P( |biased) ´ P(biased) + P( |fair) ´ P(fair) 18/441 ´ 2/10 = = 0,269 18/441 ´ 2/10 + 1/36 ´ 8/10 unbiased biased 1/6 1/21 1/6 3/21 1/6 3/21 1/6 4/21 1/6 4/21 1/6 6/21 For our example of 2 dice throws: Bayesian method in phylogenetic analysis: likelihood prior probability sum across all hypotheses (= marginal likelihood) posterior probability Parameters of Bayesian analysis mostly continuous Þ P ® probability density functions either ML estimates ® empirical BA or all combinations ® hierarchical BA prior distribution posterior distribution likelihood marginal likelihood q = set of (continuous) parameters in the model, including the tree, substitution model parameters, clock rates, etc. Analysing BEAST output using Tracer | BEAST Documentation Analysing BEAST output using Tracer | BEAST Documentation outcome = probability distribution HPD = highest posterior density Markov process: t-1: A ® t0: C ® t+1: G … P same across the whole phylogeny = homogenous Markov process Problem: calculations too complex Þ impossible to solve analytically, only numerically solution: Monte Carlo methods random sampling, approximation of reality when sample size high Markov chain Monte Carlo (MCMC) Metropolis-Hastings algorithm: Change of parameter x ® x’ 1.if P(x’) > P(x), accep x’ 2.if P(x’) ≤ P(x), calculate R = P(x’)/P(x) since P(x’) ≤ P(x), R must be ≤ 1 3.generate random number U from uniform distribution from interval (0, 1) 4.if R ≥ U, accept x’, if not, retain x Metropolis-Hastings algorithm: directed movement of robot across arena: Bayes3 “burn-in” stationary phase (plateau) MrBayes: http://morphbank.ebc.uu.se/mrbayes/ Reversible jump MCMC: allows changing number of parameters in each MC step we can use eg. for modelling variation of evolution between sites in sequences, for choosing models or for making non-homogenous substitution models (eg. different base composition along branches) Metropolis coupled MCMC (MCMCMC, MC3): 1 „cold“ chain, 3 „heated“ chains same starting point, due to stochasticity rapid divergence of „robots“ Problem with priors 15 coin tosses score 5 H : 10 O maximum likelihood = 0,333 prior = 0,5 due to prior, posterior pr. shifted to the right 30 coin tosses score 10 H : 20 O difference from ML smaller Which priors to choose? uniform (min, max) lognormal (m, s) gamma (a, b) exponential (l) Minimal divergence time Time Setting priors: Hyperparameter Parameter Lognormal (m, s) Gamma (a, b) Time-trees and phylodynamics Phylodynamics = synthesis of mathematical epidemiology and statistical phylogenetics without replacement = jackknife with replacement = bootstrap Resampling methods http://saunapraha.cz/wp-content/uploads/IMG_6173-300x224.jpg http://www.taxjusticeblog.org/lottery.jpg http://www.kvalitninoze.cz/images/sklady/113110_2.jpg Measuring tree reliability Boot bootstrap: Boot bootstrap: bootstrap: Bayesian analysis: posterior probabilities parametric bootstrap: evolutionary model Hypothesis testing Test of molecular clock: Relative rate test (RRT): AC=BC? Linearized trees removing significantly different taxa 7.6.tif Relaxed molecular clock multiplicator unscaled time scaled time (expected no. substitutions/site) enable changing rates along branches Tree comparison Are two trees significantly different? Tests of paired positions: winning sites test Felsenstein’s z test Templeton’s test Kishino-Hasegawa test (KHT, RELL) For more than two trees: Shimodaira-Hasegawa (SH) test To what degree are two trees different? Tree distances: partition metric quartet metric path difference metric methods incorporating branch lengths Problems with tree distances Tree comparison Consensus trees strict consensus source trees strict consensus tree majority-rule majority-rule consensus source trees Consensus trees problem with consensus trees – combined vs. separate analysis, supermatrix vs. supertree consensus trees in resampling methods Bayesian analysis: consensus tree maximum a posteriori tree = tree with greatest posterior probability (i.e. was sampled most often in the MCMC) maximum credibility tree = tree with the maximum product of the posterior clade probabilities (BEAST, TreeAnnotator) 95% highest posterior density (HPD) interval of node heights Phylogenetic programs phylogeny inference: http://evolution.gs.washington.edu/phylip/software.html PAUP* PHYLIP MOLPHY, PHYML, MEGA ... ML MrBayes, BEAST ... BA managing trees: TreeView FigTree