POPULATION GENETICS
Biological Hierarchy
É
G.ines
Populations Species
Communities
Ecosystems
I. GENETIC DIVERSITY
13 March 2017
POPU LAT IC
SPECIES
1
X
I--1	
	
SUBPOPULATIONS
I. GENETIC DIVERSITY-ANALYSIS OF SINGLE POPULATIONS
15.5.2017
POPULATION and problems of definition
• a population is a group of interbreeding indiviuals that exist together in time and space
• to develop the basic concepts of population genetics, we initially consider the ideal population = large, random-mating
ALLELE FREQUENCY
• proportion of an allele in comparison to all the others alleles of the same locus (gene) in a population sample
• basic characteristics for genetic diversity (variation) of a population
• population genetics studies genetic diversity and processes that have created it and influence it - i.e. the dynamics of distribution and frequency of alleles (genotypes —► phenotypes), i.e. processes shaping evolution:
increase of gen. diversity: mutation and migration
decrease of gen. diversity: genetic drift (and natural selection)
2
15.5.2017
MUTATIONS
increase genetic diversity responsible for variation/heterogeneity in populations - essential to evolution 1. substitutions (transitions, transversions)
non-coding regions
GTC -> GTA synonymous Va| ^ Va|
nonsynonymous missense
nonsense
}
silent substitutions
GTC -> TTC
Val -> Phe AAG -> TAG
Lys -»■ ochre (stop)
insertion deletion
ACGGT ACGGT
ACAGGT AGGT
}
®
©
transice Ot
= indels -> frameshift mutations
a
transice
Mutation rate - rate at which number of various types of mutations occur in a given position over time
OBSERVATION
Callimorpha dominula
prastevnfk
3
OBSERVATION
Callimorpha dominula
přástevník hluchavkový
Scarlet tiger m<
Tab le 3.1. Data from a collection of 1612 scarlet tiger moths.
i'hcnotype
No. of individuals
White spotting Intermediate Little spotting
1469 138
Genotype and allele frequency
AA Aa
a a
Q
R
Relative numbers = frequencies: genotype f.: P(GAA), Q(GAa), R(Gaa)
allele (gene) f.: p (A), q (a)
P+Q+ R=-\ p+q= 1
Genotype A A AA>
Number n2
Frequency P=n1/A/ Q=n2IN
p = (2^ + n2)/2N
R=n3/N
g = (n2 + 2n3)/2N
Total N
Hardy-Weinberg Equilibrium (HWE)
Ex. Single locus with 2 alleles
Allele	Allele frequency
A	P
a	q
Genotype	Expected genotype frequency
AA	P2
Aa	2pq
aa	q2
p + q= 1
p, q - Allele frequencies known from our samples
= Hardy-Weinberg equilibrium
> Observed genotype frequencies (H0) are known from our samples
> deviation of H0 from HWE conditions => for example x2 test
Expected heterozygosity, (HJ under HWE
He=1 -(p2+q2).....for 1 locus with the allele frequencies p and q
Assumptions for ideal population in HWE
• random-mating
• negligible effect of mutations and migration („closed populations")
• infinitely large population (negligible effect of random fluctuations in allele frequencies in time - genetic drift) - in HWE population the allele frequencies are stable = do not change between generations
• Mendelian inheritance of the analysed loci
• neutral loci - not under selection
• diploid, sexually reproducing organisms with discrete generations
•   loci are independent from each other - test for Jinkage disequilibrium"
in ii
I      0 0 1 vs. or
2 loci physically close to each other |
(decreased probability of recombination__
- linkage disequilibrium)
2 loci physically distant (probability of recombination not influenced - linkage equilibrium)
LINKAGE DISEQUILIBRIUM (LD)
loci in LINKAGE EQUILIBRIUM - segregate independently of each other during meiosis
the most common reason for non-random association
among loci (LD) is the proximity of two loci on a
chromosome (others e.g. small pop. size - gen. drift, immigration, overlapping generations, admixture, etc.)
haplotype diversity - p(AB) * p(A) x p(B)
in presence of LD:
we have fewer independent loci for our genetic analysis than anticipated
neutral loci (alleles) linked to selected ones will appear non-neutral
presence of LD needs to be tested when analysing data from multiple loci
q=-\-p
0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9 1.0
p = Jl-q
Figure 3.4 The combinations of homozygote and heterozygote frequencies that can be found in populations that are in HWE. Note that the frequency of heterozygotes is at its maximum when p-q-0.5. When the allele frequencies are between 1/3 and 2/3, the genotype with the highest frequency will be the heterozygote.
15.5.2017
Example of genetic diversity estimation in a sample of 4 individuals (on 4 loci)
Ind 1	170/170	223/227	116/116	316/316	
Ind 2	170/172	223/225	112/112	316/316	
Ind 3	172/172	223/225	112/112	316/316	
Ind 4	170/172	223/227	112/112	316/316	
Počet alel	2	3	2	1	2
Ho	0,5	1,00	0	0	0,375
P	0,5	P = 0,5	0,75	1,00	
q	0,5	q = 0,25 r = 0,25	0,25	0	
He	0,5	0,625	0,375	0	0,375
He=1-(p2+q2) He=1-(p2+q2+r2)	Proportion of polymorphic loci (polymorphism) = 0,75				
Is our population in HWE?
Callimorpha
7
Is our population in HWE?
Table 3.1. Data from a collection of 1612 scarlet tiger moths.
Phenotype	No. of individuals	Assumed genotype	No. of A alleles	No. of a alleles
White spotting	1469	A4	1469x2=2938	-
Intermediate	138	Aa	138	138
Little spotting	5	aa	-	5x2 10
d ihr üfiirlrt tiger mot Ii, Panax t a tics in tin- Muring of"the onigra
iA m »I < i AR KP nb iif.nis t m.fcv
www.shutterstock.com 60840859
Deviation from HWE
• HWE test - e.g. Genepop software („exaet probability tests") - any case of significant deviations from HWE indicates that some of HWE assumptions were not fulfilled —► detailed inspection required:
• heterozygote excess
- negative assortative mating (i.e. intentional mating of distinct individuals)
- used loci are advantageous in heterozygote situation (= balancing selection favouring heterozygotes, e.g. MHC genes)
- mutation
- migration
• heterozygote deficit
- inbreeding (all loci are equally affected), assortative mating
- genetic structure in populations
- null alleles (only some loci affected by heterozygote deficit)
Quantifying genetic diversity		
Polvmorfism (proportion of Polymorphie loci) - P • polymorphic locus = with at least two alleles with having frequency of more numerous allele being less or equal 0.95 (or 0.99) • e.g. a population sample with four polymorphic loci out of five —► P = 0.8		r
50 Number of alleles - N„ -2 v> •   number of alleles per locus (mean over loci) f	-□- Coralllna v Gastoctonlum —-•--- Geltdlum — -o —    Perumytllua .	N ^ a ^ \\
o 25		
•   number of alleles corrected for sample size § (rarefaction method e.g. in FSTAT software) z n		
Observed heterozvaositv - H„ < •   observed frequency of heterozygote genotypes (mean over loci)	1                500              1000 1500 Sample size	
HAPLOID DIVERSITY
• genetic diversity for haploid data
HAPLOTYPE DIVERSITY (h; Nei et Tajima198ip? frequency of different haplotypes
it _     N   i-t _        2\   x, -haplotype frequency of each haplotype in the sample —     — 1 ^' N - sample size
NUCLEOTIDE DIVERSITY (tt; Nei 1987)
- quantifies the mean nucleotide divergence between sequences
- probability that two randomly chosen homologous nucleotides will be identical
^       _        x, and x; - respective frequencies of the rth and/th sequences 7f = 2—t xixj^ij      Try - number of nucleotide differences per nucleotide site ij between the /th and /th sequences
15.5.2017
WHAT INFLUENCES GENETIC DIVERSITY?
• influenced by a multitude of factors
• varies considerably between populations
MOST IMPORTANT DETERMINANTS OF GENETIC DIVERSITY:
> genetic drift
>population bottlenecks
> natural selection
> methods of reproduction
GENETIC DRIFT
population not infinitely large —> population not in HWE —> increase of influence of CHANCE —> allele frequencies vary between generations
in absence of selection, each allele goes to:
1. fixation ^     DECREASE of
2. extinction genetic diversity
more quickly in smaller populations
genetic drift - process causing a population's allele frequencies to change from one generation to the next as a result of CHANCE
10
GENETIC DRIFT
Population n-20
Random samplinn_and genrtic-drift _
j ' j j j i j > j j j j j . j
very profound effect of genetic drift in small populations - founder effect, bottleneck
inextricable link between genetic drift and population size - the effective population size
15.5.2017
12
OVERVIEW
Mutation \
Sexual reproduction /
Balancing selection Directional selection
I TGenetic diversity J-Genetic diversity
i Gene flow.
/
Inbreeding
Small population ■ size
Immediate loss of alleles'
t
— lNe-
Variance in reproductive success, uneven sex ratio
- Population -bottlenecks
Figure 3.16 An overview of some of the main factors that influence levels of genetic diversity within populations.
Freeland era/. 2011
Assumption for population structure analysis:
• neutral loci = no effect of selection included
• classical population genetics approach = populations are (thought to be) known (e.g. we want to quantify level of genetic differentiation between two localities / ?populations)
• BUT populations are not usually known (e.g. due to no obvious spatial heterogeneity over the distribution range) -we want to reveal any potential population differentiation/structure according to our genetic data
15.5.2017
We are interested in genetic structure of populations
\
i
N
15
Recently observed genetic structure indicates what happened v      in the past
Genetic structure - any pattern in the genetic make-up of individuals within a population
AIMS:
• Detection of any genetic structure (subdivision) in a population (in my dataset)
• Are there any differences between ..different" (in space and time) populations?
• Quantification of such differences = description of genetic structure in population
• What factors shape (have shaped) these differences? e.g. population history
• Is there any migration/connection between different populations? = detection and quantification of gene flow, what influences gene flow (e.g. spatial heterogeneity)
• What happens during migration/connection of populations? = hybridisation
15.5.2017
Population genetic structure
neutral markers
GENETIC DRIFT
- creates subpopulation differentiation
(changes in allele frequencies -extremely up to fixation of distinct alleles)
MUTATION
may increase differentiation (not necessarily - homoplasy)
MIGRATION (GENE FLOW)
- AGAINST subpopulation differentiation
IZZI
drift
Effect of population structure on heterozygosity
• Wahlund effect- first documented by Swedish geneticist Sten Wahlund (1901-1976) in 1928
• two isolated subpopulations with fixed distinct alleles
• both SUBPOPULATIONS are in HWE, but the pooled dataset (the whole POPULATION) shows deficit of heterozygotes
17
Wahlund effect (isolate breaking)
Homozygosity reduction when subpopulations merge
Wahlund, S. (1928) Zusammensetzung von Population und Korrelationserscheinung vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas, 11: 65-106
Wahlund effect - an example
• Bunnersjbarna lake (northern Sweden) - „brown trout"
• one trait with 2 alleles
	170/170	170/172	172/172	Total	P	2pq
		(= Ho)				(=He)
Přítok	50	0(0)	0	50	1.000	0.000
Odtok	1	13 (0.26)	36	50	0.150	0.255
Whole	51	13 (0.13)	36	100	0.575	0.489
lake
(expected)     (33.1)       (48.9) (18.1)
I I p2 = 0.5752 q2 = 0.4252
Ryman etal. 1979_
Wright's F-statistics
^is> FST, F|T
Masatoshi Nei *1931
Wright (1950), Nei (e.g. 1987)
Sewall Wright 1889-1988
detecting and describing population structure
describe heterozygosity (i.e. deviation from HWE) at different levels
Estimate of population structure effect on genetic diversity
Total population
J13KL12
116
(TÜ3) SI
H4) ©<®| |®»@^S)
S2
<2z>
S3
• 3 levels (Total, Subpopulation, Individual)
• xsubpopulations (x = 1 to k; here k=3)
• each subpopulation has Nx individuals
• AA, AB, BB -genotypes with different symbols
•e.g. 11-13= 13st individual from the 1st subpopulation
F-statistics and heterozygosity
Hj - averaged observed heterozygosity of an individual in a subpopulation Hs - expected heterozygosity of an individual in a subpopulation under HWE HT - expected heterozygosity of an individual over the total population under
HWE
k
J-J  — ~y ^ H jk ^x = observed heterozygosity in subpopulation x
x=l
rj 2 p 2 = frequency of i-th _ k averaged expected
H S ~ 1 " 2^ Pi,x        allele in subpopulation x Hs = Y Hjk    heterozyg°slty m ;=l j*    °' subpopulation
pa = allele frequency in HT — 2Pq(Jq the total population
> for two alleles at a single locus (Wright 1950)
> more complicated for more alleles (Nei 1987)
F-statistics
F1S
Hs - Hj  Heterozygosity decrease of an individual due to Jjs      non-random mating in a subpopulation (vs. HWE)
Hete Ozygosity_Mpan hgtprn7vnn<iitv within <iiihnnniilatinrm_
popu ations ±HT - Hs ^Influence of division of the total population in
subpopulations (i.e. heterozygosity decrease due to Wahlund effect)
jj — ffj   Total coefficient of inbreeding FIT - measures FIT = heterozygosity decrease of an individual in
T       relation to the total population
(1-FIT)= (1-FST)(1-FIS)
Weir & Cockerham (1984)   / (~ FIS), 9 (~ FST), F (~ FIT) Correction for sample size and number of subpopulations
Computation of F-statistics
I- Mean allele A frequency in the whole population
	Subpopulation 1 (N1-40)				Subpopulation 2 (N2=20)				I	
Locus	AA	AB	BB	P1M	AA	AB	BB		Aw	Note
Loc 1	10	20	10	0.5	5	10	5	0.5	0.5	HWE
Loc II	16	8	16	0.5	4	4	12	0.3	0.4	heterozygote deficit
Loc III	12	28	0	0.65	6	12	2	0.6	0.625	heterozygote excess
Loc IV	0	0	40	0.0	20	0	0	1.0	0.5	alternatively fixed alleles
Computation of allele frequencies
	Observed heterozygosity		Expected heterozygosity			Wright's F-statistics		
Locus	H1 ffl	H2ffl	H'ffl	Hsa	H™			
Loc I	0.5	0.5	0.5	0.5	0.5		0.0	0^>
Loc II	0.2	0.2	0.2	0.46	0.48 I		0.042	0.583
Loc III	0.7	0.6	0.65	0.4675	0.46875 1		0.0027	-0.387
Loc IV	0.0	0.0	0.0	0.0	0.5			1.0
Mean						0.058	0.261	0.300
Mean values of F-statistics may hide distinct evolution history of different loci
F-statistics
• FiS decrease of heterozygosity in local subpopulation
high values - inbreeding
• FiT summary measure - limited use
• FST = subdivision measure = limited gene flow between subpopulations (i.e. existence of a barrier -Wahlund effect)
- originally developed for estimation of the amount of allelic fixation due to genetic drift (fixation index)
Permutation test of Fst significance
1. Real measured populations
2. Merged into a single dataset
3. 1000 x randomly re-separated populations
Real Fst
1000 x simulated Fst
TWO DIFFERENT CASES:
Fst = 0.072
Fst = 0.0013
0.8 % simulated values higher than real Fst p = 0.008 (i.e. significant difference)
35.4 % simulated values higherthan real Fst p = 0.354 (e.g. non-significant difference)
FST computation - an example
Přítok	50	0(0)	0	50	1.000	0.000
Odtok	1	13 (0.26)	36	50	0.150	0.255
Whole lake	51	13(0.13)	36	100	0.575	0.489
(expected)	(33.1)	(48.9)	(18.1)			
HL-Hi = 0.489-0.128 = 0728
HT 0.489
As a consequence of gene flow barrier: Heterozygosity is about 72.8% lower than would be under HWE
Ryman etal. 1979_
FST analysis - BE AWARE
Global vs. pairwise indices
Absolute values depends on heterozygosity level of used loci!!! (i.e. microsatellite-based FST cannot be compared to allozyme-based FST) Demands standardization: FST' = FST/FSTmax(Hedrick 2005)
- e.g. GenAlEx
In case of null alleles presence: needs to be corrected! (increased FST - increase of homozygosity); FreeNA software
Giant Panda
192 feces samples—► 136 genotypes-53 unique genotypes separation by a river (ca 26 ky ago) and by roads (recently)
even the roads are important barriers, even if less
Tabic 3 Pairwise F^r in populations		the Xiaoxiangling and D	axiangling
Patch	A	B C	
A k! C :.i	ÚÚ33* 0.1(17* (!.](I7*	tí.Utó* Ü.Ü97*              v. •	
'Significant level after Bo.nfcrro.ni correction (P < (101).
15.5.2017
<3ST (Nei 1973)
• Analogy of FSTfor haploid (haplodiploid) organisms, mtDNA sequences
• Takes into account haplotype (gene) diversity instead of heterozygosity
• Haplotype diversity = probability that any two randomly chosen sequences in a population will be different
• Pracuje tedy jen s frekvencemi alel, ne s procentem heterozygotů
• Analogy of FST
• Takes into account the size of alleles
(number of repeats in microsatellite loci)
• Assumption of a known mutation model
assumption of SMM (stepwise mutation model)
• Indicates traces of mutations
• RSt>Fsj higher effect of mutations
• RSJ=FSJ higher effect of genetic drift
• Randomisation tests for RST significance (Hardy et al. 2003, program SPAGeDi 1.1)
24
15.5.2017
Arlequin ver. 2.000			
AMOV/     ™rr |		• • • •	
Excoffier et al. 1992 c™,*™,,,™,:			
Url: http://an1hropoloaie.urige.chiarlequini Mail: arlequiri@sc2s.unlge.ch		• •• •	
•  Analysis of Molecular Variance			
•  Analysis of allele frequencies variance (before in Cockerham & Weir 1987,1993)
• Quantifies population differentiation
• Takes into account difference between alleles - allelic state (mutations)
• Program ARLEQUIN
• Data: sequences
microsatellites (assuming SMM stepwise mutation model)
Hierarchical AMOVA
How much variation may be explained by:
• differentiation in big groups of populations
• differentiation in populations within the groups
• differentiation between individuals
within the populations
* t«
i • _ •
25
15.5.2017
Bombus pascuorum
Widmer& Schmid-Hempel 1999
F/<I>    d.f. SSDt
Variance component
Among populations
Among legions
Among populations within regions
17 77.71 0.07
17 5198.20 5.02
4 56.15 0.0S
4 34*1.94 4.55
11 24.35 11 1773.71
0.02 2.16
tSum of squared deviations. •P<0m1.
Microsatellites, AMOVA Most explained by the Alps
Total variance1
„51" 8.74*
5.16* 7.49*
3.53*
Between north and	F	1	38.57	0.11	7.12*	
soiitr, of Alps	a>	1	2622.89	7.25	11.74*	
Among populations noith and	F	16	.=9.14	5.52	1.46*	
south of the Alps, respectively	m	16	2575.31	2.18	3.53*	
AMOVA and F-statistics
description of results, not causes —>■ possible alternative explanations (use of population history analyses - based on coalescency and allele phylogenetics)
26
Clustering methods
DISTANCE-BASED methods
• a tree or a plot is constructed according to a pairwise distance matrix
• clusters then may be defined visually
MODEL-BASED methods
• observations from each cluster are random draws from some parametric model
• inference for the parameters corresponding to each cluster is done jointly with inference for the cluster membership of each individual
• standard statistical methods are used (e.g. maximum-likelihood in Bayeasian methods)
Turdus hellen
Fragments of humid tropical forest
Localities Chawia, Ngangao, Mbololo, Yale (Kenya)
7 microsatellite loci
Neighbour-joining
* wrongly clustered individuals
1-1 m
Clustering method based on microsatellite distances
15.5.2017
Factorial correspondence analysis
■j:
Danube and Struma basins	dan1 dans \ ■ dan3 \
■ elb ■ rh ■ net .uk-l°	
dn£b	
/■ visi ■ vise   ■ dpr y ( "dns	/    ■ vol \
Dnieper, Dniester and Vistula basins	■ tur2 \ V" tlir1 J
\      vari / Turkey and QreeceX^ y	
-1.5 -1 -0.5 0 0.5 1
Factorial axis 1 (13%)
Fig. 2 A two-dimensional plot of the factorial correspondence analysis performed using geweux based on 12 microsatellite loci. Three geographical groups are bounded by grey lines.
- each locus as one variable, reduction of number of variables
- Genetix - inference about population structure
- individuals vs. populations
STRUCTURE program
Pritchard, Stephens and Donnelly 2000, Genetics
• a mode I-based Bayesian clustering method
• uses multilocus genotype data (e.g.
microsatellites, RFLPs, SNPs; various levels of ploidy)
• MCMC algorithm
• INFERS POPULATION STRUCTURE:
- presence of population structure
- assignment of individuals to populations
- identification of migrants or admixed individuals (parameter Q - individual membership coefficient)
28
Model implemented in STRUCTURE assumes:
- K populations/clusters (K may be unknown)
- each of K populations is characterized by a set of allele frequencies at each locus
- within each of K populations marker loci are at LINKAGE EQUILIBRIUM with each other and in HARDY-WEINBERG EQUILIBRIUM
under these assumptions each allele at each locus in each genotype is an independent draw from the appropriate frequency distribution, and this is completely specified by the probability distribution P{X\Z,F)
X- genotypes of the sampled individuals
Z- unknown populations of origin of the individuals
P- unknown allel frequencies in all populations
MODELS in STRUCTURE	
✓	\
ANCESTRY MODELS	ALLELE FREQUENCY
	MODELS
• no admixture model	
• admixture model	• independent frequencies
	model
• linkage model	
• models with	• correlated frequencies
informative priors	model
Ancestry models:
NO ADMIXTURE MODEL
• each individual is discretely from one of the K populations
• the output reports the posterior probability that individual / is from population K
• the prior probability for each population is 1/K
This model is appropriate for studying fully discrete populations and is often more powerful than the admixture model at detecting subtle structure.
Ancestry models:
ADMIXTURE MODEL
• individuals may have mixed ancestry
• each individual has inherited some proportion of its genome from each of the K populations = Q
• the output records the posterior mean estimates of these proportions
Recommended as a starting point for most populations.
"It is a reasonably flexible model for dealing with many of the complexities of real populations. Admixture is a common feature of real data, and you probably won't find it if you use the no-admixture model."
15.5.2017
Allele frequency models: INDEPENDENT FREQUENCIES MODEL
• the allele frequencies in each population are independent draws from a distribution that is specified by a parameter A
• this prior says that we expect allele frequencies in different populations to be reasonably different from each other
Allele frequency models:
CORRELATED FREQUENCIES MODEL
• frequencies in the some populations are likely to be similar (probably due to migration or shared ancestry)
• this prior says that the allele frequencies in different populations may be quite similar between the populations
• better clustering for closely related populations
• but may increase the risk of over-estimating K
• If one population is quite divergent from the others, the correlated model can sometimes achieve better inference if that population is removed.
Falush, Stephens and Pritchard 2003, Genetics
31
MODELS in STRUCTURE
✓ N
ANCESTRY MODELS     ALLELE FREQUENCY
MODELS
no _acl mjxtu^gJUQfilfih
admixture model
linkage model
models with informative priors
I • independent frequencies _ jjiodel_B
correlated frequencies model
How long to run it
it is not possible to determine suitable run-lengths theoretically this requires some experimentation on the part of the user
burnin length: how long to run the simulation before collecting data to minimize the effect of the starting configuration
•   typically a burnin of 10,000— 100,000 is more than adequate
run length: how long to run the simulation after the burnin to get accurate parameter estimates
• several runs at each K, possibly of different lengths, and see whether you get consistent answers
• you can get good estimates of the parameter values (P and Q) with runs of 10,000-100,000 steps, but accurate estimation of Pr(X|K) may require longer runs
• at least 500,000
In practice your run length may be determined by your computer speed and patience as much as anything else.
STRUCTURE program
Pritchard, Stephens et Donnelly 2000, Genetics		
		
File Project Parametri Set Plotting Vie	w Help	
to s   QX ! •		
. ftoiK^irnOMaSrMleMaB	Project Dau lQrHflr.l	
• fvofÉCUnfarmaCW • SimjUOoi Summary Q , Parameter Stt» & ,. tnnmm • paramMt_run_10 (k "2) • pararr»»tjvfi_tl(<c-3) • paramMt_run_U (k«3 ] • pnwjIBjl tK"J) • iJlnwIu   H-i.l ) • paramwt_run_l3 (k«J ) • pararr««t_rurt_16 (k-4 ) • paramMt_run_17 (km j • param»M_run.lS (•-*	Label        Pop D      loa» 1      loa» 2     loa» 1     loa» 4     Loa» 5     loa» 6     loa» t     loo» 8	
	199           198           199           201           191           207           207 IUI 195           198           199           201           191           207           207 183 188 198           197           201           191           207           207 183 189 198           197           201           191           207           207 1B3 198 198           199           201           191           207           207 183 189           198           197           201           19t           »7           207 183 195           198           199           201           191           20'           209 183 18«           198           199           201           191           207           209 183 199 198             197            201             191            207            207 183 199           198           197           201           191           207           207 183 189           198           197           201           191           207           209 183	
• parama»t_njn_lfl (k»4) • pararn»#t_rui_l (•.■! ) • pvamMt_run_20 () • p**mwt_fm_21 (>«5 ) • paramMt_rvi_22 (K"S ] • param»tt_run_23 (k«a ) • pjrimwi an 2-* (► =5 ) • pararr»«jn/i_25 11 .5) • (k«1) • paramaat_run_) («•!) • paramM_run_4 (• = i)	199             198             199            201             19t             207            207 1B3 189           198           19«           201           191           207           207 18] 199           198           197           201           191           207           207 im 195          196          201          201          191          307          205 183 199           198           199           201           191           207           207 183 199          198          197          201          191          207         209 183 185             198             19'            201             191             207            207 183 199           198           199           201           19t           207           209 189 195           198           201           201           191           207           207 183 IB'j          im          198          -'-■!          i-'i          207 UH 199           198           197           201           191           207           203 183	
• param*tt_run_5t<t"l) • paramMOun,6 (k"2) • paramNOun_t ft.;) • ptramwt_run_S (k«2 ) • pjramMt_n#i_e ( ► »2 )	188 198           201           201           19t           207           207 18] 189 198           199           203           191           207           207 1B1 199           198           199           20t           191           207           207 183 199          M          tM          SU          I'M          207          203 im 195          198          199          201          19t          207          207 183 199           198           201           201           191           207           207 183 .6.                  IM                  lO".                  -W.                  101                  -wi                  W IM	
		
Data format: genotypes of an						
individual in TWO rows						
George	1	loc_a -U	loc_b 14.",	lOC-C 00	loc ,1 0	loc je 92
George	1	-9	-9	Iii	0	94
Paula	1	L06	142	C8	1	92
Paula	1	L06	L48	64	0	94
Matthew	2	1II)	145	-9	0	92
Matthew	2	1 HI	1 Is	66	!	-!)
Bol)	2	108	142	64	1	04
Bob	Ľ	-9	1 12	-9	0	94
Anja	1	1 12	142	-0	1	-!)
Anja	1	114	142	0G	1	94
Peter	1	-!)	1 15	(ili	0	-9
Peter	1	1 1(1	14.-,	-9	1	-9
Carsten Carsten	2 2	Ills 1 HI	145 14.',	02 64	0 1	-9 92
Needs to be specified:						
number of individuals, ploidy of the data, number of loci, missing value symbol						
rintfififih						
Data format: genotypes of an individual in ONE row									
Esul KV22 KV23 KV24 KV2i KV26 KV27 KV7B KV79 KV30 KV31 KV32 KV33 KV34 KV15 KV36 KV3? KV38 KV39 KV40 KV41 KV42 KV41 KV44 KV45 KV46 KV48 HRl hr; hr 3 HR4 HRl HR6 HR9 HRIO HRll HRl I HRl 3 i i If 1 LET62 LET63 LET64 LET65 LET66 LET67 I i u 1 LET69 LET70 LET71 Ltt72 1 ET'4 . , : ■ ■ LET76 LET77 LET7S LET79 LET80 LEI81 LET82 LETS 3	Süi        SSu7 J 217 \ 217 ; 117 i 217 217 217 ! S? ; 2U ;' 211 217 ; 211 I 217 I Ul	suS S l7 1 l7 ] L7 ] l7 3 l7 : L7 1 [7 1 L7 1 l7 3 [7 3	sul3     Ssul i 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 49 149 44 149 44 149 44 149 44 144 44 149 49 149 44 149 49 149 49 149 49 149 44 149 44 149 49 149 44 149 49 149 44 149 44 149 19 149 44 149	SU16 ee 64 66 66 64 66 64 64 64 66 ee 66 66 66 64 66 66 64 66 66 64 66 64 66 64 66 66 66 66 64 64 66 66 66 66 66 64 66 66 66 66 64 64 66 66 66 66 66 66 Ob 66 64 66 64 66 64	G'i ll'Hi              ICS 1 66       232 232 66       232 232 66        232 237 66        232 232 66       232 232 66       232 232 66        212 217 66       212 232 66       232 232 64       2)2 237 64       2» 7» «6       2)2 2)2 66       2)2 2)2 66        232 232 66       232 232 66       235 2» 66      2)2 2)2 66        232 2)2 66       2)2 2)2 66       2)2 2)2 66       2)1 2)2 66       2)2 2)2 66       2)2 2)2 66       2)2 2)2 66       2)2 2)2 66        228 2» 66       2)2 2)2 66      2)2 2)2 66       228 2)2 66       278 2)2 66       2)2 2)2 66       2)2 2)2 66       228 2)1 66       2)2 2)2 66       2)2 2)2 66       228 228 66        228 212 66       228 2)2 66      2)2 2)2 66       2)2 2)2 66        2)2 2)2 6»       2)2 2)2 66      2)2 2)2 66       2)7 2)7 66        2)2 232 «6       2)2 2)2 66      2)2 2)2 66       2)? 2)2 66       2)2 232 66       2)2 2)2 66       2)2 2)2 66       2)1 2)2 66       2)2 2)2 66       2)2 2)2 66       2)2 2)2 66       2)2 232 66        212 2)2 66        2)2 2)2 64       2)2 2)2	«S41 so so 80 80 60 80 so so 80 80 80 so so 80 so so 80 so 80 so so so so 80 80 80 bo 80 80 so so 80 80 80 80 80 80 80 80 80 so so 80 80 80 80 so 80 so 80 80 80 so	ftit "S41 80       193       193 194 80       193       19) 194 80       19]       193 194 80        193        193 194 80       193       193 194 80       193       19) 194 BO        193        193 194 80        193        193 194 so       193       193 194 80       193        193 194 so        193        191 194 80       193       193 194 80       193       193 194 80        193        193 194 80        193        191 144 80       193       193 194 so       193       193 194 so        193        193 194 80        193        193 144 80       193       193 194 80       193       193 194 80       191        193 194 80        193        193 144 80       193       193 194 80       193       19) 194 so       191       14) 194 80        190        19) 194 80       190       193 194 80       190       19) 194 80        190        191 194 80        193        193 194 80       190       193 194 80       191        19) 194 so        191        141 144 80        191        191 194 80       191       193 194 80       190       190 194 80        190        191 144 so       190       191 194 80       190       190 144 so       190       191 194 80        190        193 194 80       191       193 194 80       191       19) 194 so       190       190 144 80       191       191 194 so       193       19) 194 80       190       19) 194 so       191        191 144 80       191       19) 194 80       193       19) 144 80       190       19) 194 80       191        19) 144 so       190       19) 194 80       190       191 144 80       191       19) 194 80       19)       193 144 so        191        191 194 60       190       191 144 so       190       193 194 80        193        193 194 80       191       19) 194	94 1 94 1 94 1 94 1 94 1 94 1 94 1 94 1 94 1 44 1 94 1 94 1 94 1 94 1 94 1 94 1 94 1 94 1 44 1 94 1 94 1 94 1 44 1 94 1 94 1 94 1 2) 1 94 1 44 1 28 1 94 1 94 1 2% 94 94 1 94 1 44 1 94 1 44 1 94 94 25 94 94 1 25 1 94 1 25 1 94 1 44 1 94 1 94 1 94 1 25 1 94 1 94 1 94 1 44 25 3 94 1	37       137       87         87         124       124       IIS       IIS H 40       140       87         67         124       124       US       US j 37        137        87          67          126        176        IIS        IIS M 40        140        87          87          122        122        IIS        IIS 1 37       137       87         87         122       122       IIS       IIS J 17       137       87         87         126       126       115 115 37        1)7        87          67          177        174        117 117 37       137       87         87         120       122       IIS IIS 37       137       87         87         124       124       IIS IIS )7       1)7       |7         87         122       172        IIS IIS 17        1)7        «7          87          122        172        115 IIS 37       1)7       67         67         122       122       113 113 37        137       87         87         122       124       115 IIS H7        1)7        87          87          124         174         IIS IIS 17        1)7        87          67          122        174        11) 111 )7       137       87         67         124       124       IIS IIS 17       1)7       87         87         122       122       115 IIS 40        140        87          87          124         124        IIS HS 17        1)7        87          87          122        122        HS IIS )7       1)7       87         67         120       120       11) 113 37       1)7       87         87         124       124       115 115 17       1)7       67         87         124       124        HS HS )7       1)7       87         87         124       124       HS US 37        137       87         87         122       122       IIS IIS 37       1)7       87         87         122       122       115 115 )7       1)7       67         87         122       122       US US )7       117       87         87         122       122       IIS US 17       1)7       87         87         122       122       US US 46       149       87         87         122       122       U7 U7 17       146       87         87         1 7?       122       U) 11) )7        146        87          87          122        124        US US 46       146       87         67         122       122       US US )7        1)7        87          67          124        124         11) US 40        148        87          87          124        126        117 117 17       117       87         87         124       124       US IIS 17       1)7       87         67         124       124       U3 US 37       146       87         87         120       120       US HS 37        146        87          67          122        127        US HS 37       149       67         67         124       124        ll 118 40       140       87         87         124       126        11 U7 17       146       87         87         122       122       111 US 40        146        87          87          122        122        US US 17       140       87         87         122       124       US US 37      146      87        87        122      122      US U7 17       146       87         67         127        122        US US 46        146        87          87          122        122        111 IIS 17       146       87         87         122       122       U) Ul 40       146       87         87         122       122       11) U) )7       146       87         87         12?       122       11) US 17        140        87          87          122        122        Ul US 40       140       87         87         122       127       Ul US )7       1)7       87         87         122       122       US US 46       146       87         87         122       12?       US US 17       117       87         87         122       122       Ul Ul 46       146       87         67         122       122       US US 17       117       87         87         122       122       US US 40        140        87          87          122        122        11) 11) 37        140       87         87         122       122       US 117 40       140       87         87         122       122       113 U3 )7       140       87         67         122       122       US US 46        146        87          87          126        126        US US )7       166       87         87         122       122       U) Iii
Data format: microsatellites of haploid organisms
File Edit Fornat Vinn Help									
		■sat_l	«sat_2	■sat_3	msat_4	■sat_5	msat_6	■sat_7	ns at
1-001	1	240	195	219	225	199	197	191	221
1-002	1	240	20)	210		2 01	IS)	193	
1 CIO 3	1	2JU	20)	210	207	100	23-	101	221
1-004	1	2.111	20)	11 9	20-	205	237	197	10-
1-005	1	240	195	21 g	22 3	100	191	187	219
1-006	1	240	205	210	22 3	203	101	10 3	210
1-00?	1	2 4U	205	210	207	203	101	197	227
1-008	1	233	US	21"	2 3)	199	1"1	193	22"
1-009	1	240	205	210	223	loa	2 3"	101	221
1-010	1	240	201	210	200	205	237	197	219
1-011	1	24m	205	20 '	22)	201	237	1" 1	213
1-012	1	23)3	20)	210	225	201	237	10 ■	200
1-013	1	2 in	20-	210	223	loo	101	103	221
1-014	1	24il	20)	210	20-	201	2 3-	18-	22-
1-015	1	240	20)	207	20 *	201	101	197	227
1-016	1	240	205	207	223	1"	237	103	200
1 ui-	1	233	201	210	225	201	23-	103	10-
1-018	1	240	19)	l:o	225	1 jg	185	197	213
1-019	1	240	205	207	22 5	i n	23"	193	219
1-OPO	1	233	19S	210	207	201	23-	197	197
2-001	2	2 35	203	21)	189	20 3	191	191	100
2-002	2	23)	201	213	149	20 3	1"1	101	18)
2-003	2	16-	20 i	21)	180	203	191	191	183
2-004	2	2 35	201	215	180	203	205	101	105
2-005	2	lo ■	20 3	213	180	203	101	191	18 1
2-006	2	2 31	lo ■	21)	184	20 3	101	101	18 1
2-00"	2	235	201	213	180	20 3	101	101	183
2-008	2	16-	201	21 3	189	20 3	191	201	101
2-009	2	240	201	21 3	180	20 3	103	101	10)
2-010	2	2 35	201	211	U9	20 3	I'll	201	131
2-011	2	235	201	213	180	20 3	191	101	183
2-012	2	240	201	213	I 80	20 3	205	201	183
2-013	2	240	201	213	ISO	20 3	191	191	181
2-014	2	2333	203	213	18 '	20 3	101	201	183
2-015	2	233	201	11)	180	20 3	20)	201	183
2-016	2	24 0	219	21)	185	20 3	205	201	183
2-017	2	10-	201	215	180	201	191	201	18 1
2-018	2	233	201	213	180	20 3	205	201	183
2-019	2	240	20 3	21)	180	20 3	191	101	183
2-020	2	240	201	213	IS"	20 3	10)	101	181
3-001	3	105	213	207	201	10-	101	205	203
	Program STRUCTURE		- graphical output	
	Inferring (lie value of K the number of populations, for (lie T. helleri data		Ngangao	
K	tag™	pirn		
1 2 n	-3144 -2769 -267S -2S83 -2S8&	Ml 0.993 0.007 o.oooos t	/         * \	Chawia • Mbololo
J 4 5			/         ♦ \	Ngangao
	recent migrants a hybrid?		Q-values (pravděpodobnost přiřazení k danému clusteru)	1 p J
		Mo		•
		Chawia		Mbololo
CD
CD >
Ó
Admixture model - allows assignement of an individual to several clusters
Barplot for K = 7
_HHIH^.'._I I
m
Genome proportion of each individual assigned to each of K clusters
Eurasia
K=4
									
L									
r ! I I i
c o
i 3 = !
What K is the best???
-20000 -21000 --22000 ■ o   -23000 ■
-24000 ■ -25000 ■ -26000 ■ -27000 --28000 ■ -29000 ■ -30000
20
♦
♦
K (number of clusters)
25
Mohcriu Ecology <2«Rl 14, 2611 -2h31
doi: 10.1111 /j.lJftS-JMX.aiOi.OI'TKU
Detecting the number of clusters of individuals using the software structure: a simulation study
G. EVANNO.S. REGNAUT and]. COUDET
DeynfJifKAi of Eootogj/tnd Evolution, BUogyhu&Hng, Uniutnltyof Ltmntimc, Cil 1015 Imwuo, Swtturkoid
..""[•11
5* 10 IS 2D
c\ck\
10 15 20
■""Ii
10 15 20
15 20
K=5
Post-processing of the STRUCTURE outputs
Main PIpHne   Distruct (or many K's   Compare   Best K   Download   Help   Contact & Citing Issues
Clumpak - Cluster Markov Packager Across
K
clumpak was designed to aid users in four main objectives:
Separate distinct solutions obtained from STRUCTURE-like programs.
Compare and align solutions obtained for different K values.
Compare results obtained using different models/data subsets/programs.
Indicate the preferred value of K according to Evanno et al.
Graphical output from STRUCTURE a serie of barplots with increasing K
K=2    K=3    K=4   K=5   K=6   K=7    K=6    K=9 K=10 K=11 K=12 K=13 K=14 K=15
Jorced clustering"
Picture of hierarchical structure between clusters
Bartäkovä et al. 2013
15.5.2017
Bartáková et al. 2013
38