>gi|5835135|ref|NC_001644.1| Pan paniscus mitochondrion, complete genome
GTTTATGTAGCTTACCCCCTTAAAGCAATACACTGAAAATGTTTCGACGGGTTTATATCACCCCATAAAC
AAACAGGTTTGGTCCTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCGTCCCGTGAG
TCACCCTCTAAATCACCATGATCAAAAGGAACAAGTATCAAGCACACAGCAATGCAGCTCAAGACGCTTA
GCCTAGCCACACCCCCACGGGAGACAGCAGTGATAAACCTTTAGCAATAAACGAAAGTTTAACTAAGCCA
TACTAACCTCAGGGTTGGTCAATTTCGTGCTAGCCACCGCGGTCACACGATTAACCCAAGTCAATAGAAA
CCGGCGTAAAGAGTGTTTTAGATCACCCCCCCCCCAATAAAGCTAAAATTCACCTGAGTTGTAAAAAACT
CCAGCTGATACAAAATAAACTACGAAAGTGGCTTTAACACATCTGAACACACAATAGCTAAGACCCAAAC
TGGGATTAGATACCCCACTATGCTTAGCCCTAAACTTCAACAGTTAAATTAACAAAACTGCTCGCCAGAA
CACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCATATCCCTCTAGAGGAGCCTGTTCT
GTAATCGATAAACCCCGATCAACCTCACCGCCTCTTGCTCAGCCTATATACCGCCATCTTCAGCAAACCC
TGATGAAGGTTACAAAGTAAGCGCAAGTACCCACGTAAAGACGTTAGGTCAAGGTGTAGCCTATGAGGCG
GCAAGAAATGGGCTACATTTTCTACCCCAGAAAATTACGATAACCCTTATGAAACCTAAGGGTCGAAGGT
GGATTTAGCAGTAAACTAAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCCCGT
CACCCTCCTCAAGTATACTTCAAAGGATATTTAACTTAAACCCCTACGCATTTATATAGAGGAGATAAGT
CGTAACATGGTAAGTGTACTGGAAAGTGCACTTGGACGAACCAGAGTGTAGCTTAACATAAAGCACCCAA
CTTACACTTAGGAGATTTCAACTCAACTTGACCACTCTGAGCCAAACCTAGCCCCAAACCCCCTCCACCC
TACTACCAAACAACCTTAACCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGTAAATCGGCG
CAATAGATATAGTACCGCAAGGGAAAGATGAAAAATTACACCCAAGCATAATACAGCAAGGACTAACCCC
TGTACCTTTTGCATAATGAATTAACTAGAAATAACTTTGCAAAGAGAACTAAAGCCAAGATCCCCGAAAC
CAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATA
GGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTA
AATTTACCTACAGAACCCTCTAAATCCCCCTGTAAATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTA
GACACTAGGAAAAAACCTTATGAAGAGAGTAAAAAATTTAATGCCCATAGTAGGCCTAAAAGCAGCCACC
AATTAAGAAAGCGTTCAAGCTCAACACCCACAACCTCAAAAAATCCCAAGCATACAAGCGAACTCCTTAC
GCTCAATTGGACCAATCTATTACCCCATAGAAGAGCTAATGTTAGTATAAGTAACATGAAAACATTCTCC
TCCGCATAAGCCTACTACAGACCAAAATATTAAACTGACAATTAACAGCCCAATATCTACAATCAACCAA


MODULARIZACE VÝUKY EVOLUČNÍ A EKOLOGICKÉ BIOLOGIE
CZ.1.07/2.2.00/15.0204
PF_72_100_grey_tr ubz_cz_black_transparent

1_1
 phylogenetic tree = phylogeny (fylogenie): rooted, unrooted
 branches = edges (větve): peripheral, internal, central
 nodes = vertices (uzly): internal, terminal
 dichotomy = bifurcation, polytomy = multifurcation
 OTU = operational taxonomic unit, HTU = hypothetical taxonomic unit
 tree topology
Definition of basic concepts:

1_1
path (dráha)
lineage (linie)
connects two terminal nodes
connects terminal node with root
Definition of basic concepts:

1_2
Definition of basic concepts:


http://www.almob.org/content/figures/1748-7188-2-8-1-l.jpg
http://www.vizachero.com/R1b1/R1bSplits.png


How many trees?


number of electrons in visible universe
(Eddington number)
> Avogadro
constant*)
*) 6,022 140 76×1023 mol−1

What type of data can we use?
DATA
Distances
Discrete
characters
Immunology
DNA-DNA hybridization
Binary
Multistate
unordered
ACGTTAGCT
ordered
  A®B®C
11010010011
ABCDEF

Types of data
Nucleotide and protein sequences:
H_sapiens MTPMRKINPLMKLINHSFIDLPTPSNISAWWNFGS
P_troglod ATGACCCCGACACGCAAAATTAACCCACTAATAAA
site = character
base = character state

retroelements: SINE (Alu, B1, B2), LINE
microsatellites, SNP
Types of data

Grafika1
Problem with homology of sequences


1.3.tif
Individual sites in DNA sequences may not be fully independent!
Problem with homology of sequences

DNA databases:
EMBL (European Molecular Biology Laboratory) – European Bioinformatics Institute,
               Hinxton, UK: http://www.ebi.ac.uk/embl/
GenBank – NCBI (National Center for Biotechnology Information), Bethesda,
               Maryland, USA: http://www.ncbi.nlm.nih.gov/Genbank/
DDBJ (DNA Data Bank of Japan) – National Institute of Genetics, Mishima, Japan:
               http://www.ddbj.nig.ac.jp/
Database managment: usually packages Sybase or ORACLE
outputs: ASCII (American Standard Code for Information Interchange)
Sequences

Protein databases:
SWISS-PROT – University of Geneve & Swis Institute of Bioinformatics:
          http://www.expasy.ch/sprot/ a http://www.ebi.ac.uk/swissprot/
PIR (Protein Information Resource) – NBRF (National Biomedical Research Foundation,
          Washington, D.C., USA) & Tokyo University & JIPID (Japanese International Protein
          Information Database, Tokyo) & MIPS (Martinsried Institute for Protein Sequences,
          Martinsried, Germany): http://www-nbrf.georgetown.edu/
PRF/SEQDB (Protein Resource Foundation) – Ósaka, Japan:
          http://www.prf.or.jp/en/os.htm
PDB (Protein Data Bank) – University of New Jersey, San Diego & Super-computer
          Center, University of California & National Institute of Standards and Technology:
          http://www.rcsb.org/pdb/
Sequences

FASTA:
>H_sapiens
ATGACCCCAATACGCAAAATTAACCCCCTAATAAAATTAATTAACCACTCATTCATCGACCTCCCCACCC
CATCCAACATCTCCGCATGATGAAACTTCGGCTCACTCCTTGGCGCCTGCCTGATCCTCCAAATCACCAC
AGGACTATTCCTAGCCATACACTACTCACCAGACGCCTCAACCGCCTTTTCATCAATCGCCCACATCACT
CGAGACGTAAATTATGGCTGAATCATCCGCTACCTTCACGCCAATGGCGCCTCAATATTCTTTATCTGCC
TCTTCCTACACATCGGGCGAGGCCTATATTACGGATCATTTCTCTACTCAGAAACCTGAAACATCGGCAT
...
>P_troglod
ATGACCCCGACACGCAAAATTAACCCACTAATAAAATTAATTAATCACTCATTTATCGACCTCCCCACCC
CATCCAACATTTCCGCATGATGGAACTTCGGCTCACTTCTCGGCGCCTGCCTAATCCTTCAAATTACCAC
AGGATTATTCCTAGCTATACACTACTCACCAGACGCCTCAACCGCCTTCTCGTCGATCGCCCACATCACC
CGAGACGTAAACTATGGTTGGATCATCCGCTACCTCCACGCTAACGGCGCCTCAATATTTTTTATCTGCC
TCTTCCTACACATCGGCCGAGGTCTATATTACGGCTCATTTCTCTACCTAGAAACCTGAAACATTGGCAT
...
>P_paniscus
ATGACCCCAACACGCAAAATCAACCCACTAATAAAATTAATTAATCACTCATTTATCGACCTCCCCACCC
CATCCAATATTTCCACATGATGAAACTTCGGCTCACTTCTCGGCGCCTGCCTAATCCTTCAAATCACCAC
AGGACTATTCCTAGCTATACACTACTCACCAGACGCCTCAACCGCCTTCTCATCGATCGCCCACATTACC
CGAGACGTAAACTATGGTTGAATCATCCGCTACCTTCACGCTAACGGCGCCTCAATACTTTTCATCTGCC
TCTTCCTACACGTCGGTCGAGGCCTATATTACGGCTCATTTCTCTACCTAGAAACCTGAAACATTGGCAT
...
File formats:

GenBank:
ORIGIN
        1 tgaaatgaag atattctctt ctcaagacat caagaagaag gaactactcc ccaccaccag
       61 cacccaaagc tggcattcta attaaactac ttcttgtgta cataaattta catagtacaa
      121 tagtacattt atgtatatcg tacattaaac tattttcccc aagcatataa gcaagtacat
      181 ttaatcaatg atataggcca taaaacaatt atcaacataa actgatacaa accatgaata
      241 ttatactaat acatcaaatt aatgctttaa agacatatct gtgttatctg acatacacca
      301 tacagtcata aactcttctc ttccatatga ctatcccctt ccccatttgg tctattaatc
      361 taccatcctc cgtgaaacca acaacccgcc caccaatgcc cctcttctcg ctccgggccc
      421 attaaacttg ggggtagcta aactgaaact ttatcagaca tctggttctt acttcagggc
      481 catcaaatgc gttatcgccc atacgttccc cttaaataag acatctcgat ggtatcgggt
      541 ctaatcagcc catgaccaac ataactgtgg tgtcatgcat ttggtatttt tttattttgg
      601 cctactttca tcaacatagc cgtcaaggca tgaaaggaca gcacacagtc tagacgcacc
      661 tacggtgaag aatcattagt ccgcaaaacc caatcaccta aggctaatta ttcatgcttg
      721 ttagacataa atgctactca ataccaaatt ttaactctcc aaacccccca accccctcct
      781 cttaatgcca aaccccaaaa acactaagaa cttgaaagac atatattatt aactatcaaa
      841 ccctatgtcc tgatcgattc tagtagttcc caaaatatga ctcatatttt agtacttgta
      901 aaaattttac aaaatcatgc tccgtgaacc aaaactctaa tcacactcta ttacgcaata
      961 aatattaaca agttaatgta gcttaataac aaagcaaagc actgaaaatg cttagatgga
     1021 taattttatc cca
//
File formats:

PHYLIP (“interleaved” format):
6 1120
H_sapiens    ATGACCCCAA TACGCAAAAT TAACCCCCTA ATAAAATTAA TTAACCACTC
P_troglod    ATGACCCCGA CACGCAAAAT TAACCCACTA ATAAAATTAA TTAATCACTC
P_paniscus   ATGACCCCAA CACGCAAAAT CAACCCACTA ATAAAATTAA TTAATCACTC
G_gorilla    ATGACCCCTA TACGCAAAAC TAACCCACTA GCAAAACTAA TTAACCACTC
P_pygmaeus   ATGACCCCAA TACGCAAAAC CAACCCACTA ATAAAATTAA TTAACCACTC
H_lar        ATGACCCCCC TGCGCAAAAC TAACCCACTA ATAAAACTAA TCAACCACTC
             ATTCATCGAC CTCCCCACCC CATCCAACAT CTCCGCATGA TGAAACTTCG
             ATTTATCGAC CTCCCCACCC CATCCAACAT TTCCGCATGA TGGAACTTCG
             ATTTATCGAC CTCCCCACCC CATCCAATAT TTCCACATGA TGAAACTTCG
             ATTCATTGAC CTCCCTACCC CGTCCAACAT CTCCACATGA TGAAACTTCG
             ACTCATCGAC CTCCCCACCC CATCAAACAT CTCTGCATGA TGGAACTTCG
             ACTTATCGAC CTTCCAGCCC CATCCAACAT TTCTATATGA TGAAACTTTG
File formats:

NEXUS (PAUP*, “interleaved”):
#NEXUS
begin data;
dimensions ntax=6 nchar=1120;
format datatype=DNA interleave datatype=DNA missing=? gap=-;
matrix
P_troglod   ATGACCCCGACACGCAAAATTAACCCACTAATAAAATTAATTAATCACTC
P_paniscus  ATGACCCCAACACGCAAAATCAACCCACTAATAAAATTAATTAATCACTC
H_sapiens   ATGACCCCAATACGCAAAATTAACCCCCTAATAAAATTAATTAACCACTC
G_gorilla   ATGACCCCTATACGCAAAACTAACCCACTAGCAAAACTAATTAACCACTC
P_pygmaeus  ATGACCCCAATACGCAAAACCAACCCACTAATAAAATTAATTAACCACTC
H_lar       ATGACCCCCCTGCGCAAAACTAACCCACTAATAAAACTAATCAACCACTC
P_troglod   ATTTATCGACCTCCCCACCCCATCCAACATTTCCGCATGATGGAACTTCG
P_paniscus  ATTTATCGACCTCCCCACCCCATCCAATATTTCCACATGATGAAACTTCG
H_sapiens   ATTCATCGACCTCCCCACCCCATCCAACATCTCCGCATGATGAAACTTCG
G_gorilla   ATTCATTGACCTCCCTACCCCGTCCAACATCTCCACATGATGAAACTTCG
P_pygmaeus  ACTCATCGACCTCCCCACCCCATCAAACATCTCTGCATGATGGAACTTCG
H_lar       ACTTATCGACCTTCCAGCCCCATCCAACATTTCTATATGATGAAACTTTG
end;
File formats:

Clustal X:
P_troglod  ATGACCCCGACACGCAAAATTAACCCACTAATAAAATTAATTAATCACTCATTTATCGAC
P_paniscus ATGACCCCAACACGCAAAATCAACCCACTAATAAAATTAATTAATCACTCATTTATCGAC
H_sapiens  ATGACCCCAATACGCAAAATTAACCCCCTAATAAAATTAATTAACCACTCATTCATCGAC
G_gorilla  ATGACCCCTATACGCAAAACTAACCCACTAGCAAAACTAATTAACCACTCATTCATTGAC
P_pygmaeus ATGACCCCAATACGCAAAACCAACCCACTAATAAAATTAATTAACCACTCACTCATCGAC
H_lar      ATGACCCCCCTGCGCAAAACTAACCCACTAATAAAACTAATCAACCACTCACTTATCGAC
           ********    *******  ***** ***  **** **** ** ****** * ** ***
P_troglod CTCCCCACCCCATCCAACATTTCCGCATGATGGAACTTCGGCTCACTTCTCGGCGCCTGC
P_paniscus CTCCCCACCCCATCCAATATTTCCACATGATGAAACTTCGGCTCACTTCTCGGCGCCTGC
H_sapiens  CTCCCCACCCCATCCAACATCTCCGCATGATGAAACTTCGGCTCACTCCTTGGCGCCTGC
G_gorilla  CTCCCTACCCCGTCCAACATCTCCACATGATGAAACTTCGGCTCACTCCTTGGTGCCTGC
P_pygmaeus CTCCCCACCCCATCAAACATCTCTGCATGATGGAACTTCGGCTCACTTCTAGGCGCCTGC
H_lar      CTTCCAGCCCCATCCAACATTTCTATATGATGAAACTTTGGTTCACTCCTAGGCGCCTGC
           ** **  **** ** ** ** **   ****** ***** ** ***** ** ** ******
File formats:

Line 1 begins with a '@' character and is followed by a sequence identifier and
an optional description (like a FASTA title line).
Line 2 is the raw sequence letters.
Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and
any description) again.
Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of
symbols as letters in the sequence.
FASTQ:
File formats:


Progressive alignment - ClustalX
1.Alignment of sequence pairs ® pairwise distances
2.Construction of guide tree (eg. Neighbor-Joining)
3.Alignment of all sequences according to guide tree
I.
I.
II.
III.
3 phases:


Problem with progressive alignment
gorilla AGGTT
horse AG-TT
panda AG-TT
6 species:
penguin A-GTT
chicken A-GTT
ostrich AGGTT
gorilla AGGTT
horse AG-TT
panda AG-TT
penguin A-GTT
chicken A-GTT
ostrich AGGTT
AGGTT
AG-TT
AG-TT
AG-TT
AG-TT
AGGTT
AGGTT
A-GTT
A-GTT
A-GTT
A-GTT
AGGTT

Many other alignment programs: e.g. MAFFT, MUSCLE, Geneious...


There are also methods without alignment:


 UPGMA
 neighbor-
joining
 Fitch-
Margoliash
minimum
evolution
 maximum
parsimony
 maximum
likelihood
 Bayesian a.
distances
characters
Data types
Methods

Efficiency: how fast is the method?
Power: how many characters we need?
Consistency: does increasing characters result in true tree?
Robustness: how does it work when assumptions are
violated?
Falsifiability: does it allow testing assumptions?
How to assess the methods?

MAXIMUM PARSIMONY, MP
(maximální úspornost)
I
II
III
A
1
0
1
B
0
0
1
C
1
0
0
D
0
1
0
E
1
0
1
2 steps
1 step
2 steps
minimal number of steps = 3
real number of steps = 5
Þ 2 extra steps ® homoplasy
William of Ockham (c. 1287 – 1347)
Occam’s razor

MP1
1. arbitrary root
Estimation of number of steps: Fitch algorithm

MP1
1. arbitrary root
2. Downward:
w = C or T
x = T
y = A or T
z = T
Estimation of number of steps: Fitch algorithm

MP1
1. arbitrary root
2. Downward:
w = C or T
x = T
y = A or T
z = T
3. Upward:
z = T, nebo A
DELTRAN
(DELayed TRANsformation)
ACCTRAN
(ACCelerated TRANsformation)
total length = 3
Estimation of number of steps: Fitch algorithm

parsimony-informative and non-informative characters (sites)
  - invariant sites (symplesiomorphies)
  - singletons (autapomorphies)
•
•
index of consistency, CI
retention index, RI
rescaled consistency index, RC
homoplasy index, HI
RC = CI ´ RI
HI = 1 - CI
Problem of homoplasy:
m = min. no. of possible steps
s = min. no. needed for explaining the tree
g = max. no. of steps for any tree

Metods of parsimony:
Fitch: X ® Y a Y ® X
neseřazené znaky (A ® T nebo A ® G etc.)
Wagner: X ® Y a Y ® X
seřazené znaky (1 ® 2 ® 3)
Dollo
Dollo: X ® Y a Y ® X, potom nelze X ® Y

  … restriction-site and
  restriction-fragment data
Camin-Sokal: X ® Y,
not Y ® X
  … SINE, LINE
weighted = transversion p.
generalized p.: cost matrix = step matrix
“relaxed Dollo criterion”

2.5.tif
*) M is an arbitrarily large number, guaranteeing that only one transformation to each derived
state will be permitted.
Wagner
Fitch
Dollo
transversion
*)

Parsimony and consistency
((A,B),(C,D))*
p>>q
“true”
((A,C),(B,D))*
“wrong”
* tree written in Newick format

Konzistence_obr
„Felsenstein
zone“
In the Felsenstein zone, parsimony is inconsistent
Parsimony and consistency

Simulation
Parsimony and consistency


LBA
long branches
Konzistence_tab2
long-branch attraction (LBA)
Parsimony and consistency

Search for optimal tree
1.Exact methods:
a) exhaustive search
b) branch-and-bound

BaB1
starts with 3 taxa, sequential addition
if the tree is longer than
a randomly chosen tree, the process is terminated
branch-and-bound

Bayes1
all possible trees
2. Heuristic search

stepwise addition
star decomposition
branch swapping
Bayes1
heuristic search

Swap
nearest-neighbor
interchanges (NNI)
subtree prunning
and regrafting (SPR)
tree bisection and
reconnection (TBR)

Jukes-Cantor (JC): equal base frequencies
equal substitution rates
Evolutionary models and distance methods
 Base after substitution
A C G T
A -¾ ¼ ¼ ¼
Original base C ¼ -¾ ¼ ¼
G ¼ ¼ -¾ ¼
T ¼ ¼ ¼ -¾
- a a a
a - a a
a a - a
a a a -
Q =

Kimura 2-parameter (K2P): transitions ≠ transversions
TsTv
- b a b
b - b a
a b - b
b a b -
Q =
If a = b, K2P = JC

 - pCb  pGa pTb
pAb   -  pGb  pTa
pAa pCb   -  pTb
pAb  pCa pGb   -
Q =
If pA = pC = pG = pT, F81 = JC
Felsenstein (F81): different base frequencies
 - pC   pG  pT
pA   -  pG   pT
pA  pC    -  pT
pA   pC  pG    -
Q =
Hasegawa-Kishino-Yano (HKY): different base frequencies
   transitions ≠ transversions
General time-reversible (GTR, REV): different base frequencies
different substitution rates

Jukes-Cantor (JC)
pA=pC=pG=pT
a=b
Felsenstein (F81)
pA¹pC¹pG¹pT
a=b
Kimura‘s two-parameter (K2P)
pA=pC=pG=pT
a¹b
Hasegawa-Kishino-Yano (HKY)
pA¹pC¹pG¹pT
a¹b
Felsenstein (F84)
pA¹pC¹pG¹pT
a=c=d=f=1, b=(1+K/pR), e=(1+K/pY), kde pR=pA+pG pY=pC+pT
Kimura’s three-substitution-type (K3ST)
pA=pC=pG=pT
a¹b
Tamura-Nei (TrN)
pA¹pC¹pG¹pT
a¹b
General-time reversible (GTR)
pA¹pC¹pG¹pT
a, b, c, d, e, f
unequal base frequencies
more than 1 type of substitution
2 transition types

Heterogenity of substitution rates
in different parts of sequences
Gama
Gamma distribution:
shape parameter α
discrete gamma model
invariant sites
   ® GTR+Γ+I
   nebo GTR+G+I
the higher a, the more homogeneous are substitutions

Model comparison:
Likelihood ratio test (LRT):
nested models
LR = 2(lnL2 – lnL1)
c2 distribution, p2 – p1 degrees of freedom
Akaike information criterion (AIC):
nonnested models
AIC = -2lnL + 2p, kde p = number of free parametres
better model ® lower AIC
Bayesian information criterion (BIC):
nonnested models
BIC = -2lnL + plnN, where N = sample size
Amazon.com: KaleaBoutique Collectible Beatles Nesting Doll Memorabilia Stacking Matryoshka Doll in
Doll 5.75 Inches Tall : Toys & Games

hierarchical LRT – ModelTest (Crandall and Posada), jModelTest
Model comparison:


dynamic LRT:
LRT
Model comparison:

More parametres Þ more realism, but …
•
… also less confidence (estimates based on the same amount of data!)
Model comparison:

Distances
computed for each pair of taxa, from distance (or similarity) matrix
– tree inference
distance methods base on assumption that if we know true distances,
we can very easily infer the true phylogeny
advantage: very fast and simple (also with a calculator)

                            1                        10
20                          30
sequence 1:   ACCCGTTAAGCTTAACGTACTTGGATCGAT
sequence 2:   ACCCGTTAGGCTTAATGTACGTGGATCGAT
p-distance:  p = k/n = 3/30 = 0,10
Diff
problem of
saturation:

Distances for some models:


Dist2


Cluster analysis - UPGMA
1.Find min d(ij)
2.Calculate new matrix (ŠB-k) = [d(B-k)+d(Š-k)]/2
3.Repeat 1 a 2.
chimp bonobo gorilla human orang.
chimp (Š)    --
bonobo (B) 0,0118    --
gorilla (G) 0,0427 0,0416    --
human (Č) 0,0382 0,0327 0,0371    --
orangutan (O) 0,0953 0,0916 0,0965 0,0928    --

Š
B
Č
G
O
UPGMA (unweighted pair-group method using arithmetic means):
d[(BŠČ)G] = {d(BG)+d(ŠG)+d(ČG)}/3
WPGMA: d[(BŠČ)G] = {d[(BŠ)G] + d(ČG)}/2
single-linkage (metoda nejbližšího souseda)
complete-linkage (m. nejvzdálenějšího souseda)
  ŠB gorilla human orang.
ŠB    --
gorilla (G) 0,0422    --
human (Č) 0,0355 0,0371    --
orangutan (O) 0,0935 0,0965 0,0928    --

UPGMA and consistency
additive distances: dAB + dCD £ max (dAC + dBD, dAD + dBC)
tj. distance between 2 taxa equals sum of branches
  connecting them
ultrametric distances: dAC £ max (dAB, dBC)
A
B
C
D
A
B
C
additive tree           ultrametric tree

Simulation
UPGMA and consistency


Algorithmic method
Principle of minimal evolution ® minimizes sum of branch lenghts S
Each pair of nodes adjusted according to its divergence from others
Single additive tree
Neighbor-Joining, NJ

NJ2
star tree


NJ2
finding nearest neighbors
star tree

NJ2
distance recalculation
finding nearest neighbors
star tree

NJ2
S = 32,4
S = 29,5
S = 28,0
repeating...
distance recalculation
finding nearest neighbors
star tree

     Drawbacks of distance data:
1.loss of information during transformation
2.after transformation to distances, we cannot infer original data
(different sequences may result in the same distance)
3.
3.we cannot study the evolution in different parts of sequence
4.
4.difficult biological interpretation of branch lengths
5.
5.we cannot combine more distance matrices