CIBIV m. MFPL CIBIV m. MFPL B Distance to Mm-Br1 centroid Distance to Mm-Br1 centroid Datamanagement/Visualization CIBIV m. MFPL Our research object CIBIV m. MFPL Our research object eiidospore (chitin) plasma membrane ■ rote in) \ \ polaroplast anchoring disk polar rube coiled polar filament Generalized microspoildian spore MFPL Our research object E hellem E cuniculi E. intestinalis E. bienetisi E intestinalis Intracellular development of E bieneusi and E intestinalis spores O O O pYmYYmYYYYYYYYVvv. -<■ 99,9 VS&« o!o or Enterocytozoon bieneusi CIBIV MFPL What are Microsporidia? Streptophytes Land plants Green Algae Charophytes Chlorophytes Trebouxiophytes Ulvophytes Prasinophytes Mesostigma Discicnstates Heterolobosea Kinctoplastids Diplonemids Excavates Algae Floridiophytes Bangiophytes Cyanidiophytes Glaucophytes Alveoiates ■fco i^ ./srY^Éfré Stramenopiles akobids Trimastix Oxymonads Trichomonads Hypermastigotes Carpcdiemonas Retortamonads Diplomonads Malawimonads Apicomplexa Colpodella Dinoflagellates Oxyhrris Perkinsus Ciliates Colponema Ellobiopsids Diatoms Raphidiophytes Eustigmatophytes Chrysophytes Phaeophytes Bolidophytes Blastocysts Actinophryids Labyrinthulids Thraustrochytrids Oomycetes Opalinids Bicosoecids Cerco, Cercomonads Euglyphids Phaeodarea Heteromitids Thaumatomonads Chlorarachniophytes Phytomyxids Haplosporidia Foraminrfera Polycystines Acantharia Rhizaria Ascomycetes Basidiomycetes Zygomycetes Microsporidia Chytrids Nucleariids Animals Choanoflagellates Capsaspora Ichthyosporea Op sfokonts Haptophytes Cryptomonads Dictyostelids Myxogastrids Protostelids Lobosea Archa moebae Amoebozoa 'Unikonts' Modified from Keeling et al., (2005) TRENDS in Ecology and Evolution Vol.20 No.12 CIBIV m. MFPL Our Workplan Functional annotation Phylogeny reconstruction Ortholog prediction CIBIV m. MFPL Introduction into Molecular Evolution A B C D E F G Living species - - Fossil species u Ancestral (common) species CIBIV Hill MFPL he evolution of biological sequences k- 4 # CIBIV Hill MFPL he evolution of biological sequences k- 4 # CIBIV Hill MFPL he evolution of biological sequences k- 4 # Sequences are changed by substitutions CIBIV Hill MFPL he evolution of biological sequences *■ 4 f ACGG AT CC ATTAC AAfM P ÍVaTi T ACTG Sequences are changed by substitutions and insertions/deletions CIBIV Hill MFPL he evolution of biological sequences T ACTG Contemporary sequences comprise the leafs of the tree CIBIV Hill MFPL he evolution of biological sequences What do we have ACGGGATCCCAATAC ACGGGATCCCATTAC CCGGGATAGCTTCCATTAC ACCCCCTATCCACTGGATTAC ACGACATATCCACTGGATTCC CIBIV Hill MFPL he evolution of biological sequences What do we have ACGGGATCCCAATAC ACGGGATCCCATTAC CCGGGATAGCTTCCATTAC ACCCCCTATCCACTGGATTAC ACGACATATCCACTGGATTCC CIBIV HI/mf«. Fading the homologous positions a c G A T G CIBIV Hill MFPL Fir|ding the homologous positions A C G A T G - A C G T A C G - A T - T - A T G Sequence Alignment CIBIV m. MFPL >What should a biologicalľ >To what extent can we d< Mathematically Optimal Alignmen Sequence Alignment minimize, Biologically Optimal Alignment CIBIV m. MFPL >What should a biologicalľ >To what extent can we d< Mathematically Optimal Alignmen Sequence Alignment minimize, Biologically Optimal Alignment CIBIV m. MFPL Sequence Alignment ^ mathematical function to measure the biological quality of an alignment a... o{a) = 2JS{ai,bi) i=\ Objective: find a that maximizes a(a)! CIBIV m. MFPL Sequence Alignment Given two sequences A B={b1,b2,__jbm} and a S(a.,b.) = +5,if ai=bj -2, if a. * b. -6, for introduction of a gap CIBIV m. MFPL Sequence Alignment Given two sequences A B={b1,b2,__jbm} and a S(a.,b.) = +5,if ai=bj -2, if a. * b. -6, for introduction of a gap E. s -< -n < i~ — 2 ^ 73 Z o m o ■z. O > "D H 01 o nj N» NJ CO W W CO 4** CO CO CO W > c/> N) M M N) - en -t -P- CO -P* NJ CO CO NJ ± NJ NJ ^ ■^ - NJ NJ i -g TJ CO nj N) ^ - ± ± - NJ ^ - NJ NJ O ■ŕ. > nj OJ CO CO u ■P" CO NJ l\> NJ NJ K) - o o> O ^ Ňj is CO CO CO ISJ O O -- o O -* z ^ CO CO CO i». CO CO - N) - o K) o> o w ro CO NJ CO CO NJ ->• O O NJ Ol m ňj - CO Ňj Ňj CO o -■■ ^ O CTI D rŇj NJ - CO CO CO Ňj i. O 00 I to Ňj CO CO Ňj CO - NJ C^ 73 co nj CO NJ nj CO i CTI 7Š - ^ o -* NJ -> en S to - o CO NJ p* - M - o -> 4* r- to - - *«. < -* OJ O) -n INJ -J < ^ S £ -< -n < r — s * ~J> x D m o ^ G) > TJ -i (n O 3 s í «1 ^1 hi ■ : C'J 'J ä íl < r (D oT i r 2r K 0 3 TJ í: (B O 3" l'.'J Q :'.'■ ° i. 10 0\ NJ j CJi cji; -k i, !., i,^ -K :„ NJ CO : NJ i O r o DJ H ►d > NJ : CO :CO 0\ : Cfl |C/i -O : CH |C/i O O r-j CO -P* — i CO o i l^ NJ \ NJ co : co O : O O j j_, J-.; o >-* : NJ NJ : CO O N) - O o - o - —> ': —L \ NJ - o: o O O ŕ o -i- - - - 0\ o o: o o - r-j ■'■ o \ —' o '-" --'i m -K nj ;-K r, j [■-j [.o CO ■p- NJ ! NJ NJ \ CO NJ : NJ O O O NJ [-0 - ľ^ ! J ■--j ■--j CO ŕ [-J - .O - -K ;lji 10 NJ NJ : —* - ~ "' - io °;nj [.o í -j CO NJ : NJ Ň:0 O - r '-' ^ 4^ n - -p.;6i [O i, Ňi° g — r, j u NJ : . N) : — * - O -J :<0 iq' i fD CIBIV m. MFPL There are far too many al possibility > number of possible ps > for two sequences of I Alignment algorithms CIBIV m. MFPL A dynamic programming apprc > A mathematical description i.e. an recursive objective fu > The computation of all inter Alignment algorithms CIBIV m. MFPL Unde Al: rlying pr rT G^ inciple: c T +5 -6 --6 -\ T G T - u ^^^^^^l j. ,+5 -6 , ^^H Alignment algorithms CIBIV m. MFPL Alignment algorithms Scoring function +5,ifai = bj Sia^b.) = -2, if a. * b j -6, for introduction of a gap T G 3 C 4 T 5 C 6 6 7 T 8 A 1 t 2 T 3 C 4 A 5 T 6 A Objective function a(i,j-ľ) + S(gap,bj) oii-Ufi + Sia^gap) o(i,j) = max CIBIV m. MFPL Alignment algorithms n oji-lj-V + Sja^b,) a(i,j) = max<| o|(/,j-l) + S{gap,b.) o{i-\,j) + S{apgap) a(i,j) is the optimal alignment score up to and including aj and bj CIBIV m. MFPL Needleman-Wunsch algorith 0 1 2 T 6 3 4 5 C T C 6 G 7 T 8 A 0 0 -6 -12 -18-24-30-36-42-48 1 t -6 2 T 3 C -12 -18 t 4 A -24 5 X -30 6 A -36 i Alignment algorithms r S(ai,bj) = L +5,if at = bj -2, if a. * b j -6, for introduction of a gap CIBIV m. MFPL Needleman-Wunsch algorith 0 i T 2 3 4 5 G C T C 6 G 7 T 8 A 0 0 -6 - -12 -18 -24 -30 -36 -42 -48 1 t -6 5 2 T 3 C -12 -18 4 A -24 5 T -30 6 A -36 Alignment algorithms r S(ai,bj) = L +5,if at = bj -2, if a. * b j -6, for introduction of a gap CIBIV m. MFPL Needleman-Wunsch algorith 0 1 2 T 6 3 4 5 C T C 6 G 7 T 8 A 0 0 -6 -12 -18-24-30-36-42-48 1 t -6 5 -1 2 T 3 C -12 -18 t 4 A -24 5 X -30 6 A -36 i Alignment algorithms r S(ai,bj) = L +5,if at = bj -2, if a. * b j -6, for introduction of a gap CIBIV m. MFPL Needleman-Wunsch algorith 0 1 2 T 6 3 4 5 C T C 6 7 8 G T A 0 0 -6 -12 -18-24-30-36-42-48 1 t -6 k 5 1-1 I-7pl 3-19-25-31 -37 2 T -12 -1 3 „-3 -2^-8^-14^-20^-26 t v tí It TT 3 C -18 -7-3 8^2 3 ,.-3 I-9 „-15 ■ a. ■ . Sa. a. ,i 1 a 4 A -24 -13 -9 2 6^0 1 „-5 -4 .TT T A jk ^ 5 X -30 -19-15 -4ľ7 4, -2 6^0 L f I 6 A -36 T \, T 1 T ^ -25-21-10 1 5 2 0 11 Alignment algorithms r S(ai,bj) = L +5,if at = bj -2, if at * b j -6, for introduction of a gap CIBIV m. MFPL Needleman-Wunsch algorith "ö 1 2 T T G C 4 5 6 7 8 T C G T A Ľ C -6 -12-18-24-30-36-42-48 -6 12 ^8 5 ^-1 I-7 1-13^-19^-25^-31^-37 -1 3 -3 -2 -8 -14-20-26 -7-38^2 3 ^-3^-9^-15 13 -9260 4 A 5 T B A -24 ■30 ?\ ? ? \ \ \ \ 19-15 -4 7 4-2 6 -36 f \-f- T* -25-21 -10 1 1.-5 -4 0 2 0 11 Alignment algorithms r x * v * J CIBIV m. MFPL Needleman-Wunsch algorith 0 1 2 T 6 3 4 5 C T C 6 7 8 G T A 0 0 -6 -12 -18-24-30-36-42-48 1 t -6 5 -1 -7 -13-19-25-31 -37 2 T -12 -1 3 ^-3 -2^-8^-14^-20^-26 ? <. t^ It TT 3 C -18 -7-3 8^2 3 ^-3^-9^-15 ■ A. ■ . ■ A. A jí I A. 4 A -24 -13 -9 2 6^0 1 I-5 -4 .TT T A jk ^ 5 X -30 -19-15-4ľ7 4, ff i ■ ■ f * -2 6^0 L ti 6 A -36 T \, T 1 T \ -25-21-10 1 5 V T ^ 2 0 11 Alignment algorithms CIBIV m. MFPL Needleman-Wunsch algorith 0 1 2 T 6 3 4 5 C T C b G 7 8 T A 0 0 -6 -12 -18-24-30-36-42-48 1 t -6 5 1-11-7 "-13-19-25-31-37 2 T -12 * T \. -V \. r -1 3^-3 -2^-8^-14 * L ■ J 1 x r20^-26 3 C -18 -7-3 8^2 3 ^-3^-9^-15 ■ M. ■ . ■ A. M. 4 A -24 -13 -9 2 6^0 1 ,^-5 -4 5 X -30 -19-15-4ľ7 4, • k. w I ■ w ▲ -2 6^0 L ti 6 A -36 T \, T 1 T ^ -25-21-10 1 5 V T ^ 2 0 11 Alignment algorithms CIBIV m. MFPL 3 4 5 6 7 8 C T C G T A -6 -12-18-24-30-36-42-48 5 -11-71-13-19-25-31-37 -fc—▼ \ \ \ * -1 3 _ -3 -2 ^-8 ,r14^-20^-26 3 ^-3^-9^-15 r-5 -4 -19-15 -4 7 4^-2 6^0 -f] -f4- * f% -25-21 -10 1 5 2 0 [ 11 Alignment algorithms *TGCTCGTA* *T—TCATA* Alignment Score: 11 CIBIV m. MFPL Smith-Waterman pair-wise lo TT b 5 T 6 C T C G T A 0 0 0 0 0 0 0 0 0 0 1 T 2 T 0 I x 5 0 0 5 0 0 5 0 0 5 3 0 5 3 0 5 3 3 C 0 0 3 8^2 "10J4 I 0 [3 tt ti 1 4 A 0 0 0 2 6 4 8 2 5 5 X 0 n° \ \ \ \ 0 7 4 2 13 7 6 A 0 0 3 0 1 5 2 7 18 Alignment algorithms Sia^bj) = < -2, if a, * b j -6, for introduction of a gap o(iJ) = max« a(ij -1) + S(gap) a(i -1 J) + S(gap) 0 CIBIV m. MFPL Smith-Waterman pair-wise lo TT b 5 0 1 T 6 C T C G T A 0 0 0 0 0 0 0 0 0 T 0 5 0 0 5 0 0 5 0 T 0 5 3 0 5 3 0 5 3 C 0 0 3 8 2 10 J 4 0 3 ft ft 4. <■'. A 0 0 0 2 6 4 8 2 5 L L L T I 5 T 0 n° \ \ % \ 0 7 4 2 13^7 Iva X I ■! . b A 0 0 3 0 1 5 [2 7 18 Alignment algorithms *TCGTA* *TCATA* , Alignment Score: 18 CIBIV m. MFPL Both, Needleman-Wunsď methods are exact methc optimal solution for the o Drawback: Computation. Alignment algorithms CIBIV MFPL Alignment algorithms: BLAST* 100 I 1 I 1982 1986 1990 1994 1998 2002 2006 *Gapped Blast, Altschul era/. (1997) Nucleic Acids Res. 25:3389-3402. CIBIV m. MFPL 1. Given a query q and a 1 of length k (/c-mers) of to 5 for amino acids an 2. Extend each hit to a /o< BLAST heuristics CIBIV m. MFPL Finding k-mers quic Preprocess the databas For each sequence in ti hash-table. BLAST heuristics CIBIV m. MFPL BLAST heuristics (1) For the query find the list ol high scoring words of length w Query Sequence of tengti L Mawmumof lw»i wont* (typicafy w • 3 br prorewTs) — For each word from the query sequence I find twhsl of words thai w* sc ore toast T when scored usmq a pavscore nvjtriKHr«f Imp 85831. ^H ZINC PROTEASE (INSULINASE FAMILY) [ Encephalitozoon cuniculi emb | CAD2 5 | " I IMC BOřVPlTIlClT fXHSULIHASB "HTTV, [Pr,Tl,.n^.™.n ,.„ liculi i.b -Ml] Length-882 [Encephali B59255 ECU06 0750 1 ZINC PROTEASE (INSULINASE FAMILY) tozoon cuniculi GB-M1) (10 or fewer PubKod links) Identitie s = 882/8B2 (100%), Positives ■ 882/B82 (100%), Gaps - 0/882 (0 ■ i:.i . Query 1 MNALITIARCLESIRMVHKNLTDTTRYEYVEIPKCMRALIMSDPGLDKCSCAVSVRVGSF í, ú sbjct 1 MNALITIARCLESIRMVHKNLTDTTRYEYVEIPNGMRALIMSDPGLDKCSCAVSVRVGSF 60 Query 61 D[::i7o;15í;:.A[ii'::.::;:i:i.[-:!(::';:>:Y?v:;:i(;L:^vrL:;>;M]icx;ľHíTTV(;EA7VYYFDVRľEA DDPADAÜGLAHFLEHMLFMGTEKYPVEDGLSYFLSKNNGGYNATTYGEATVYYFDVRPEA DDPADAQGLAHFLEHMLFMGTEKYPVEDGLSYFLSKNNGGYNATTYGEATVYYFDVRPEA 11 J 120 Query 121 Sbjct 121 FEEAVDMFADFFKSPLLKRDSVEREVSAVNSEFCNGLNNDGWRTWRMMKKCCKKEQALSK FEEAVDMFADFFKSPLLKRDSVEREVSAVNSEFCNGLNNDGWRTWRMMKKCCKKEQALSK FE EAVDMFAD F F KS P LLKRDS VE REV SAVN S E FCN GLN N DGWR TWRMMK KCC KKE QALS K L ŕ J http://blast.ncbi.nlm.nih.gov/Blast.cgi CIBIV m. MFPL The NCBI BLAST family of programs blastp compares an amino acid query database blastn compares a nucleotide query s< database blastx compares a nucleotide query s' Different BLAST programs CIBIV MFPL Multiple Sequence Alignments Ther_tengcongensis Clos_acetobutylicum Clos_tetani Desudesulfuricans Vibr vulnificus Caulcrescentus Micr_degradans Vibr_cholerae Shew_oneidensis RatbetalsGC Rat_beta2_sGC Nost_punctiforme Nost_sp. consensus>50 Ther_tengcongens i s Closacetobutylicum Clos_tetani Desu desulfuricans Vibr_vulnificus Caulcrescentus Micrdegradans Vibrcholerae Shew_oneidensis RatbetalsGC Rat_beta2_sGC Nostpunctiforme Nostsp. consensus>50 KPVAKD. KPISKR. EPISSK. DD.QGD. IEQDRM. RAIDAG. TKLSDS. NRVEEK. Q L L P N N . TDAEKGK EEGADG. TDMEEN. QHTSSK. ß2 VA Aj 1 TlXSi IAJI.RÍ I H L H j LS;LHj v,E|L|H| KgK. ryKjG. iQiD - KjAjC. T Q' R . Hk?q. P jR . e!eg D0HG D0EG T0CG aG JULflJLa_QJLQ_QJLQ_fi_fiJ 14 0 15 0 1SIKY d.iH..v.k. ß3 yp.a.pP.f, 115 12 0 ß4 :■ CALAQgF L CVLAE j L ]CAÄAE; cfcae' S S K F sNnhf 3|C|AjKH A RQ A EH -:Ia|a!kQ A Q'H TV QQ AV KD iLiGTR LGKR g.a. . 15 5 EEI EKI KV EL EVI ADV QPV K TEI TDV TEV TKV SlvlE E E K Q. TA . q. e . H T aa|c c p DT I L T GEKDGF TKES.. TNES.. GKG... NPTG. . V Ľ K G Ľ . VHQGA. MHCGA. MHTGA. ....QR NEEVER . ... NR ....FR SRLKV . . VVL . . LKL . RTTA SHVRF S A C V F DECLI DHCEI DHCML SEECD TGKKE DEGAE ETGED R K K F T KLK RVT N A V V E I : E HTQ HVVT HDE.Í h|dJi|fj . v. f . 18 5 KNPVFEYKKN DKDIYYKKSF QKNIYLKKEF IK........ VKGKQDG. . . PSLEAAAND. I......... LPS....... QND....... LIEEKESKEE LVVQKAHRQI LVIYKPN. . . SIKYEDSNLY CIBIV IffM MFPL Multiple Sequence Alignments Optimal Solution: Extend Needleman-Wunscl sequences m c tu 3 5 Q) j*Z^\ /^ * sequence A CIBIV ^ MFPL Multiple Sequence Alignments Sum Of Pairs Seql: AGA—CTA Seq2: G-A—CTT Seql Seq2 Seq3 Seql Seq3 S(a.,bj) +5,/f a. =b. -2, if a. * b. -6, for introduction of a gap SUM OF PAIRS SCORE: 16 CIBIV m. MFPL Multiple Sequence Alignments Progressive alignment strategy The sequences are more than two seq alignments) are sir Sequences or MSA; Programming afa^bi). n,m COx, COy o{al ,b') = 1 n m nxm ^^S(ax>bPX0>xX0>y x=l y=l score for aligning column i from alignment (or sequence) a to column j from alignment or sequence b number of sequences in alignments a and b, respectively score for aligning position / in sequence x from alignment a to position y in sequence y from alignment b respective weights of the sequences x and y CIBIV >4i MFPL Evolutionary relationships of Species Speciation Speciation 3' re CIBIV ^ MFPL Evolutionary relationships of genes and gene products Speciation Speciation 3' re CIBIV m. MFPL Speciation Speciation Evolutionary relationships of genes and gene products 3' re CIBIV m. MFPL Speciation Gene duplication Speciation Evolutionary relationships of genes and gene products N CIBIV m. MFPL Speciation Gene duplication Speciation Evolutionary relationships of genes and gene products N CIBIV m. MFPL Speciation Gene duplication Speciation Evolutionary relationships of genes and gene products CIBIV m. MFPL Speciation Gene duplication Speciation Evolutionary relationships of genes and gene products CIBIV m. MFPL Speciation Speciation Orthology prediction > > > CIBIV H\ MFPL Reciprocal Blast O Reciprocal Best Blast Hit Pair Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 Orthology prediction The RBH approach CIBIV m. MFPL Orthology prediction: The InParanoid approach Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 CIBIV >4i MFPL Orthology prediction: The InParanoid approach Proteome A Reciprocal Blast Reciprocal Best Blast Hit Pair L_____________J Para Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 CIBIV >4i MFPL Orthology prediction: The InParanoid approach Proteome A Reciprocal Blast Reciprocal Best Blast Hit Pair L_____________J Para Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 CIBIV m. MFPL Orthology prediction: The InParanoid approach Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 CIBIV m. MFPL Orthology prediction: The InParanoid approach Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 CIBIV m. MFPL Orthology prediction: The InParanoid approach JSpeciation Modified from Remm et al. J. Mol. Biol (2001) 314: 1041-1052 CIBIV >4i MFPL Tree reconstruction CIBIV m. MFPL Working Hypothesis: Species evolution is tree like A B C D E F G Living species - - Fossil species u Ancestral (common) species CIBIV MFPL Tree reconstruction The Tree of Life Archaea v-- Bacteria Eucarya CIBIV m. MFPL Tree notation internal branch multiiurcation external brine h CIBIV h MFPL \ 1! / N 9—® *\^ How many trees exist? CIBIV m. MFPL How \ \ t N e—® '1^ many unrooted trees exist? (2/1-5)! 2*») = T^--------- 2"-3(rc-3)! MIO) = 2027025 6(55) = 2.9x10 84 6(100) = 1.7x10 182 CIBIV >4i MFPL ((A.B.CMD.E)) ((D.E)(A.B.C)) Tree formats E DCAB ECBA D ((E. D) (C. A. B)) «E. (C. B)) (A, D)) CIBIV m. MFPL Three representations of the same tree (((((CIOIN:0.4222,HOMSA:0.2777)97:0.0575, (ACRMI: 0.2611, HYDMA:0.3700)100:0.0745)100:0.0764,DROME: 0.4200)100:0.1034, CAEEL:0.6027) :0.5804,SACCE:0.5832); CIBIV m. MFPL Paraphyletic group (e.g. reptiles) Some more notations Some more notations Character based phylogeny reconstruction: A character has to be expressed in at least two states in the taxa under study. Taxa are grouped on the basis of shared character states. CIBIV ^ MFPL Some more notations Character based phylogeny reconstruction: A character has to be expressed in at least two states in the taxa under study. Taxa are grouped on the basis of shared character states. An evolutionary derived character (state) is called an Apomorphy CIBIV ^ MFPL Some more notations □ □ Character based phylogeny reconstruction: A character has to be expressed in at least two states in the taxa under study. Taxa are grouped on the basis of shared character states. An evolutionary derived character (state) is called an Apomorphy řyn-Apomorphy: an evolutionary derived haracter (state) shared by a group of taxa. CIBIV ^ MFPL Some more notations □ □ □ Character based phylogeny reconstruction: A character has to be expressed in at least two states in the taxa under study. Taxa are grouped on the basis of shared character states. An evolutionary derived character (state) is called an Apomorphy E yn-Apomorphy: an evolutionary derived haracter (state) shared by a group of taxa. ut-Apomorphy: an evolutionary derived character (state) present only in a single taxon Some more notations Character based phylogeny reconstruction: A character has to be expressed in at least two states in the taxa under study. Taxa are grouped on the basis of shared character states. An evolutionary derived character (state) is called an Apomorphy E yn-Apomorphy: an evolutionary derived haracter (state) shared by a group of taxa. ut-Apomorphy: an evolutionary derived character (state) present only in a single taxon Plesiomorphy: an ancestral character (state) shared by a group of extant taxa. CIBIV ^ MFPL Some more notations Character based phylogeny reconstruction: A character has to be expressed in at least two states in the taxa under study. Taxa are grouped on the basis of shared character states. An evolutionary derived character (state) is called an Apomorphy E yn-Apomorphy: an evolutionary derived haracter (state) shared by a group of taxa. ut-Apomorphy: an evolutionary derived character (state) present only in a single taxon Plesiomorphy: an ancestral character (state) shared by a group of extant taxa. Homoplasy: A derived character (state) that is shared for reasons other than common decent.. CIBIV m. MFPL How to infer a tree from data Data Method Evaluation Criterion Characters (Alignment) Maximum Parsimony Parsimony Statistical Approaches: Likelihood, Bayesian k Evolutionary Models Distances Distance Methods CIBIV >4i MFPL The Maximum Parsimony Principle William of Ockham, 1285-1347/49 CIBIV H MFPL The Maximum Parsimony Criterion CIBIV H MFPL The Maximum Parsimony Criterion í CIBIV H MFPL The Maximum Parsimony Criterion CIBIV H MFPL The Maximum Parsimony Criterion CIBIV H MFPL The Maximum Parsimony Criterion CIBIV H MFPL The Maximum Parsimony Criterion CIBIV H MFPL The Maximum Parsimony Criterion t ^ H CIBIV „ami,,,, The MFPL lllfc: Maximum Parsimony Tree < >^ CIBIV m. MFPL The Fitch algorithm (1970) CIBIV m. MFPL The Fitch algorithm (1970) CIBIV m. MFPL Uroo -- £- ■tree The Fitch algorithm (1970) 1. Initialize state set Sk at each leaf k with the characters from the alignment. 2. Construct the state sets of all internal nodes in a post-order-traversal starting at the root node 3. Let k be the current node and i,j its descendents, then build the intersection of S, and Sy. 1. If Si/lSj* {}: set Sk=SinSj 2. If S,- n Sj = {}: set Sk=Sf U Sj and increase the tree length by 1. CIBIV m. MFPL Uree ~ 3 The Fitch algorithm (1970) 1. Initialize state set Sk at each leaf k with the characters from the alignment. 2. Construct the state sets of all internal nodes in a post-order-traversal starting at the root node 3. Let k be the current node and i,j its descendents, then build the intersection of S, and Sy. 1. If Si/lSj* {}: set Sk=SinSj 2. If S,- n Sj = {}: set Sk=Sf U Sj and increase the tree length by 1. 4. Continue with the traversal until the state set Sroot has been reconstructed. CIBIV m. MFPL \ ÍA,C,G} {A,C} Ltree — ^ The Fitch algorithm (1970) 1. Initialize state set Sk at each leaf k with the characters from the alignment. 2. Construct the state sets of all internal nodes in a post-order-traversal starting at the root node 3. Let k be the current node and i,j its descendents, then build the intersection of S, and Sy. 1. If Si/lSj* {}: set Sk=SinSj 2. If S,- n Sj = {}: set Sk=Sf U Sj and increase the tree length by 1. 4. Continue with the traversal until the state set Sroot has been reconstructed. CIBIV m. MFPL MP Objective Function A ^lJJ\XV i>Xk" i) CIBIV H MFPL Aspects of Maximum Parsimony 1. Parsimony is ofte 2. One has no choio algorithm assume backmutations d( CIBIV m. MFPL T T A A A A T T The criterion of distance CIBIV m. MFPL I C A T T A A A A T T Edit distance (Hamming) CIBIV m. MFPL C A T T A A A A T T Edit distance (Hamming) 2 2 CIBIV m. MFPL Distance based tree reconstruction d measured A P » vx + v2 + v3 + v4 CIBIV m. MFPL The Four-Point-Condition Theorem: Four-Point-Coi A distance matrix (djj, i,j= and only if d(a,b) + d(c,d) ^ max{d(a,c) + d(b,d),d(a,d) + d(b,cj\ for all fl,Ŕ,c,ííe{l,2,...,/i} aBIV HI/ „FPL Theorem: The ultrametric A distance matrix (djj, i,j= like tree, if and only if The ultrametric inequality d(A,B) < max{d(A,C),d(B,C)} for all triple (^,5, Q CIBIV H\ MFPL The Neighbor Joining Algorithm* 1 1 2(n - 2) ~ 2 — y.» 34i MFPL The Neighbor Joining Algorithm D(A,B) +------- y D(A,k) - D(B,k) \ m~2ť3 ) v(A,W) = - 2 \ v(B,W) = D(A,B)-v(A,W) CIBIV m. MFPL The Neighbor Joining Algorithm D(W,k) = -(D(A,k) + D(B,k)-D(A,B)) CIBIV H MFPL The Maximum Likelihood criterion Modelling Sx: ...AAGGCTTCAG... Time t v A CIBIV H MFPL The Maximum Likelihood criterion Modelling Sx: ...AAGGCTTCAG... Time t v A CIBIV H MFPL The Maximum Likelihood criterion Modelling Sx: ...AAGGCTTCAG... Time t v A CIBIV H MFPL The Maximum Likelihood criterion Modelling Sx: ...AAGGCTTCAG... Time t v A CIBIV m. MFPL Modelling sequence evolution / Q = \ -ab c\ a - d e b d - f cef "/ Á n = (jtA ,tic ,jig ,jtT)) CIBIV H\ MFPL Evolutionary models are ofi rate matrix Q and characte V Modelling sequence evolution A C G T '-ab cN ß- a - d e b d - f ^_ [c e f -j CIBIV MFPL DNA sequence evolution models different bai»c frequencies HKY85 3 subst. types (tnttvenions, 2 transitions) 2 subst. types - GTR KXI Further modification: rate heterogeneity: invariant sites, f-distributed rates, mixed CIBIV [^%\íí MFPL Protein sequence evolution models Generally this is the same for | matrices. Some protein model; • Poisson model ("JC69" for proteins, rarely used) • Dayhoff (Dayhoff eř a/., 1978, general matrix) • JTT (Jones eř a/., 1992, general matrix) 9 WAG (Whelan & Goldman, 2000, more distant sequences) a VT (Müller & Vingron, 2000, distant sequences) • mtREV (Adachi & Hasegawa, 1996, mitochondrial sequences) • cpREV (Adachi eř a/., 2000, cloroplast sequences) • mtMAM (Yang eř a/., 1998, Mammalian mitochondria) » mtART (Abascal eř a/., 2007, Arthropod mitochondria) • rtREV (Dimmic era/., 2002, reverse transcriptases) CIBIV m. MFPL The Likelihood function c G ľľl •>) CIBIV m. MFPL The likelihood of sequei S: G TCCTGA AGAAATAAAC S': G TCCTGA AGAAATAAAC 'he Likelihood function E c -C "ČĎ O) o o CO CO CO E3 i Ifl i CC co co co l o O: co 0.00 0.05 0.10 0.15 0.20 0.25 branch length [subst. per site] CIBIV >4i MFPL Given a tree with branch l< the computation of likelih< Usually no sequences are (ancestral sequences). He possible labeling at the inr lIXc =zJXJ+l x \G C J \G C ) \G C' Tree likelihoods - + L X + "'+^ X CIBIV >4i MFPL Given a tree with branch l< the computation of likelih< Usually no sequences are (ancestral sequences). He possible labeling at the inr lIXc =zJXJ+l x \G C J \G C ) \G C' Tree likelihoods - + L X + "'+^ X CIBIV m. MFPL Calculating tree likelihoods with d = 0.1 V x e íl,..,5}, and R.(0.1) = \ l' J' y v [0.09 if í = y CIBIV m. MFPL Calculating tree likelihoods Ls(i) = [PiC(d,) x L(Q] x [PiG(d2) x L(G)], V i G {A,C,G,7} with i/x = 0.1 V x e {l,..,5}, and fV(O.l) = [0.91 if í = y 10.03 if íV y CIBIV m. MFPL Calculating tree likelihoods Ls(i) = [PiC(d,) x L(Q] x [PiG(d2) x L(G)], V i G {A,C,G,7} ^6(o= n v= {3,4,5} 2W)xM.O y={A,C,G,r} ,v/e{A,c,G,r> with d, = 0.1 V * E {l,..,5}, and fV(O.l) = [0.91 if i = j 10.03 if íV y CIBIV m. MFPL Calculating tree likelihoods Ls(i) = [PiC(d,) x L(Q] x [PiG(d2) x L(G)], V i G {A,C,G,7} ^6(o= n v= {3,4,5} 2W)xM.O y={A,C,G,r} ,v/e{A,c,G,r> (t) = ^jt. x L6(i) = 0.005489 ;={A,C,G,r} with i/x = 0.1 V x e {l,..,5}, and fV(O.l) = [0.91 if í = y 10.03 if íV y CIBIV m. MFPL Calculating tree likelihoods L(T) = Y\Lk) = 0.0054892 x 0.005489 = 0.0000001653381 lnL(r) = ^m\nL{k) = -15.61527 CIBIV MFPL To compute optimal branch the branch lengths. Choose to an adjacent node (B). Cc recursively (C). Adjust the likelihood value (D). Optimizing branch lengths c D CIBIV m. MFPL 1. Exhaustive Search: < hence an optimal solu taxa 2. Branch and Bound: < from the search when Guarantees to find th' Finding the best tree CIBIV m. MFPL X B D •3920.21 Finding the best tree Heuristic search B CIBIV m. MFPL JS D C •3920.98 J^W' C D •3689 22 "X 3 320 Ľ I Finding the best tree Heuristic search B A D c V •4710.37 \ A D E 4560.70 ♦ A ' B •X" C \ í D i i B .V-. 4 D / -4521.39 B / \ E D A 4610.40 A E B •4579.17 CIBIV m. MFPL Finding the best tree Heuristic search CIBIV >4i MFPL Finding the best tree Tree rearrangements H G DC Nearest Neighbor Interchange (NNI) O(n) NNI trees Subtree Pruning + Regrafting (SPR) 0(n2) SPR trees Tree-Bisection + Reconnection (TBR) 0(n3) TBR trees CIBIV m. MFPL Definition: A split Y|Z ir leaves/taxa into two sub an edge from the tree. x AB|CDEF ABCIDEF Summarizing trees CIBIV m. MFPL Definition: A split Y|Z ir leaves/taxa into two sub an edge from the tree. Definition: Two splits W not contradictory, if at le wnz,xnY,xnzi: x AB|CDEF ABCIDEF Summarizing trees X AC|BDEF ABCIDEF CIBIV MFPL Summarizing trees ACIBDEF ABCIDEF ABCDIEF ABCIDEF -3(100%) ACIBDEF -2(66.7%) ABICDEF - 1 (33.3%) ABCDIEF - 1 (33.3%) strict consensus ABCIDEF B semi-strict ABCIDEF ABCDIEF majority-rule ACIBDEF ABCIDEF Tree C CIBIV MFPL Sample x ..AGUUCCAAAA, ..AGUUCCAAAA, ..ACCCCCAAAA, ..AUCCCCAACC, Assessing the confidence of trees The (non-parametric) Bootstrap A C H D i I A C B M KM) I» E I Tiee x CIBIV m. MFPL the goal? J I\ IW