GENETIC SIGNATURES OF NATURAL SELECTION Jamie Winternitz Institute of Botany and Vertebrate Biology, Czech Academy of Sciences OPVK_hor_zakladni_logolink_RGB_eng.jpg Outline of talk 1.The Chimp and the River •Negative-frequency dependent selection •Phylogenetic methods 2.The Island Fox •Balancing selection •Accounting for demography 3.Men in the Mountains •Positive selection •Genome scans ¤ ¤ http://cdn.vectorstock.com/i/composite/51,40/chimpanzee-vector-735140.jpg http://th04.deviantart.net/fs71/200H/i/2011/032/f/a/gray_fox_by_silvercrossfox-d2kzedp.jpg A strange set of symptoms ¨1980s USA ¨Opportunistic infections ¨Ubiquotious fungus Pneumocystis jirovecii ¨Oral candidiasis (yeast) ¨Depleted wbc counts (thymus-dependent lymphocytes) ¨Kaposi’s sarcoma Something is wrong with the immune system Clusters of infection ¨AIDS high incidence in homosexuals linked by sexual interactions -> infectious disease ¨Incidence among intravenous drug users -> blood-borne ¨Cases among hemophiliacs who received processed/filtered blood transfusions ->must be a virus “Patient 0” (Zero) ¨A Canadian airline steward named Gaëtan Dugas was referred to as "Patient 0" in an early AIDS study by Dr. William Darrow of the CDC http://upload.wikimedia.org/wikipedia/en/thumb/7/74/AIDS_index_case_graph.svg/1280px-AIDS_index_cas e_graph.svg.png 2500 sexual partners https://aws-dist.brta.in/2014-12/b86ece42bfad282f07669f5f3c3a1516.jpg HIV Worldwide http://monacoreporter.files.wordpress.com/2013/11/adult-hiv-prevalence-rate-2012-globalhealth.png http://www.sciencediplomacy.org/files/styles/slide_main/public/2-worldmapper_227.jpg HIV variation ¨Retrovirus (Reverse transcription) ¨ ¨No proofreading = high error rate ¨ ¨For a virus with a genome about 10 thousand bases in length, that means that basically every time HIV replicates itself, it makes a mistake. ¨ ¨High viral production 108 copies per day ¨ ¨Recombination, genetic drift, genetic shift, bottlenecks and immune-driven selection HIV Types & subtypes HIV-1 HIV-2 Group M Group N Group O Worldwide distribution Africa Group P Discovered Aug 2009 A B C D F G H J K Recombinants http://www.cell.com/cms/attachment/2002995535/2011441225/gr2.jpg HIV-1 group M is responsible for 95% of HIV infections globally. SIV in captive primates Evolutionary relationship among the different simian immunodeficiency viruses >30 African Old World monkey species are naturally infected with various SIV strains Absent in Asian Old World monkey species http://www.clipartbest.com/cliparts/KTn/ez4/KTnez48qc.png Symptoms of SIV ¨Monkey hosts appear to tolerate heavy viral loads ¨No pathogenic effects ¨Suggests long coevolution http://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Monkey_waiting.jpg/788px-Monkey_waiting.jp g SIV precursor to HIV http://www.hithiddenhiv.org/images/contenu/instituts/monkeys_ulm.jpg Cross-species transmission Chimpanzee (Pan troglodytes verus) from the Tai National http://cdn2.arkive.org/media/F3/F3DC965D-7B72-47C4-8222-8941FC7915EA/Presentation.Large/Eastern-chi mpanzees-feeding-on-red-colobus-monkey.jpg https://c1.staticflickr.com/5/4119/4926551434_837245034a_z.jpg Chimps may have contracted SIV-like infection from Old World Monkeys Spillover Locatelli Banner Bontrop and Watkins 2005. Zoonotic transfers of SIV to humans have been documented on no fewer than eight occasions HIV: Where http://pages.stern.nyu.edu/%7Eigiddy/cases/cameroon.jpg map of HIV origins HIV: When ¨2 samples from same year, same city: ¨1959-60 Kinshasa, DRC. ¨ ¨12% genetic distance between DRC60 and ZR59 directly demonstrates that there were already at least two distinct clades of HIV in 1960. ¨ ¨MRCA ~1890-1920 http://www.nature.com/nature/journal/v455/n7213/images/nature07390-f1.2.jpg Worobey et al 2008 Nature http://news.bbcimg.co.uk/media/images/52000000/gif/_52000833_drc_kinshasa2_040411.gif http://www.nature.com/news/2008/081001/images/hiv-1.jpg ¡MHC Gene Family §MHC immune genes of vertebrates §Self vs. non-self §High diversity http://www.scripps.edu/newsandviews/e_20100503/enlarge.jpg MHC peptide T-cell receptor http://img.ehowcdn.com/article-new/ehow/images/a06/c2/n5/build-wood-spice-rack-800x800.jpg Major Histocompatibility Complex Structure & function of MHC http://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Antigen_presentation.svg/842px-Antigen_pre sentation.svg.png ¨Class I ¤Receptors on all cells ¤Intracellular pathogens ¤Cytotoxic “Killer” Tcells ¤ ¨Class II ¤B-cells and lymphocyes ¤Extracellular pathogens MHC evolution ¨MHC gene lineages are shared across primates ¨Humans and chimps share 98.6% genetic similarity Bontrop and Watkins 2005. MHC Supertypes and HIV ¨Binding motifs across alleles that recognize same protein fragments ¨ ¨Similar supertypes = similar binding affinities ¨ ¨Short as 1 year or less to a lack of disease progression after more than 35 years and counting in some rare individuals. Supertype associations. ¨ ¨ http://openi.nlm.nih.gov/imgs/512/161/3483247/3483247_1471-2148-12-68-7.png Cross-species protection ¨Some chimpanzee MHC class I-restricted immune responses target conserved epitopes of the HIV-1 virus ¨ ¨ These Patr alleles are characterized by relatively high frequency numbers. Identical viral epitopes are recognized by human long-term nonprogressors http://www.retrovirology.com/content/figures/1742-4690-10-53-5-l.jpg de Groot and Bontrop Retrovirology 2013 10:53 SIV, HIV and primate MHC resistance http://www.retrovirology.com/content/figures/1742-4690-10-53-2.jpg Selective sweeps and genetic hitchhiking ¨Evidence of reduced MHC I variation ¨Extant variation recognizes/resists HIV-1 ¨Evidence of lost MHC Class II loci http://www.nature.com/scitable/content/24827/schaffner_positiveselection-f1_FULL.jpg Outline of talk 1.The Chimp and the River •Negative-frequency dependent selection •Phylogenetic methods 2.The Island Fox •Balancing selection •Accounting for demography 3.Men in the Mountains •Positive selection •Genome scans ¤ ¤ http://cdn.vectorstock.com/i/composite/51,40/chimpanzee-vector-735140.jpg http://th04.deviantart.net/fs71/200H/i/2011/032/f/a/gray_fox_by_silvercrossfox-d2kzedp.jpg Balancing selection http://www.nature.com/scitable/content/32020/loewe_negative-f1_FULL.jpg ¨Selection alters allele frequencies. ¨Selection for even “balanced” allele frequencies http://www.nature.com/scitable/content/32020/loewe_negative-f1_FULL.jpg http://www.nature.com/scitable/content/32020/loewe_negative-f1_FULL.jpg http://www.nature.com/scitable/content/32020/loewe_negative-f1_FULL.jpg Genetic drift ¨Genetic drift alters allele frequencies ¨ ¨Sampling error with sexually reproducing individuals ¨ ¨(Effective) population size matters http://upload.wikimedia.org/wikipedia/commons/a/a0/Random_genetic_drift_chart.png Island Fox ¨“The San Nicolas Island fox (Urocyon littoralis dickeyi) is genetically the most monomorphic sexually reproducing animal population yet reported and has no variation in hypervariable genetic markers.“ http://inneroptics.net/files/cache/wm_b40ede4ee43dc60c872382a56eb9a449.jpg Aguilar A et al. PNAS 2004;101:3490-3494 Problems with reduced diversity ¨Lower resistance to pathogens ¨Reduced fitness (deleterious recessive alleles unmasked) ¨Problems in distinguishing kin from non-kin http://4.bp.blogspot.com/-fiAKKY75lII/Tym2bQgjYlI/AAAAAAAAAH0/aAPs26NdWa4/s400/ebola.jpg http://3.bp.blogspot.com/-V6XTFO4cTqU/UMNK3SFwZzI/AAAAAAAAP8I/yIpiOPDcXVE/s400/inbreeding.jpg http://news.bbcimg.co.uk/media/images/50072000/jpg/_50072220_reed-warbler-and-cuckoo-andy-sands.jpg Population history ¨Levels of genetic variation reflect population size and colonization history ¨San Nicolas Island population having the second smallest effective population size and a recent colonization history http://www.iayork.com/Images/9-21-07/ChannelIslands.png Aguilar A et al. PNAS 2004;101:3490-3494 Location of the six southern California Channel Islands where island foxes are found. Dotted line indicates hypothesized colonization routes (3). Approximate colonization times in years before present (ybp) based on the archeological record are provided (4). Fox neutral genetic variation http://www.sbcondors.com/california-trails/fox/fox2.jpg Mean heterozygosity (number alleles) Selective pressures on fox http://www.amoeba.com/admin/uploads/blog/Eric_B/IslandFox.jpg ¨Canine pathogens ¨Recent canine distemper epidemic ¨Inbreeding avoidance and discriminates between kin and non-kin in territorial encounters Has MHC variation been maintained? ¨To determine whether MHC variation has been maintained by natural selection despite the intense genetic drift implied by the genetic monomorphism of neutral genetic markers: ¨ ¨Assess genetic variability at two class II MHC genes (DRB and DQB) and three class II MHC-linked microsatellite loci. ¨ ¨Compare variation in San Nicolas Island foxes with those on the other Channel Islands ¤estimate levels of MHC variation in populations ancestral to the San Nicolas population ¤account for the influence of population history on levels of MHC variation. ¨ ¨Simulations to establish the intensity of selection needed to maintain the observed heterozygosity ¨Objective ¨ ¨ ¨Quantify MHC variation ¨ ¨Compare MHC variation before and after population separation ¨ ¨Simulations ¨ ¨ Results: MHC variation http://www.sbcondors.com/california-trails/fox/fox2.jpg Mean heterozygosity (number alleles) http://www.iayork.com/Images/9-21-07/ChannelIslands.png Similar MHC allelic diversity to ancestral populations Results: Simulations ¨SMM: stepwise-mutation model for microsatellites ¨ ¨IAM: infinite-alleles model for MHC ¨ ¨μ: mutation rate Heterozygosity ~ effective population size x mutation rate x selection coefficient Strength of selection ¨LD between DQB and microsats, but not DRB and microsats ¨ ¨Genetic monomorphism at neutral loci and high MHC variation could arise only through: ¤an extreme population bottleneck of <10 individuals ¤≈10–20 generations ago ¤unprecedented selection coefficients of >0.5 on MHC loci. (range: 0.05–0.15 in nature) High periodic selection “rescued” MHC diversity Critique of story Hedrick 2004. Heredity 93, 237–238 §Lack of LD between DRB and microsats. §Strong recent selection should show association between microsats near DRB and DRB alleles. Critique of story Hedrick 2004. Heredity 93, 237–238 http://www.iayork.com/Images/9-21-07/ChannelIslands.png DRB shows no variation at all on San Miguel or San Clemente Islands Critique of story ¨If DRB were the gene under strong balancing selection, then it is surprising that it shows no variation at all on San Clemente Island, a much larger population. ¨ ¨If strong selection on DRB, or even other closely linked loci, then the two closely linked MHC microsatellite loci would be expected to still show linkage disequilibrium with DRB. ¨ ¨Combination of nonselective effects (founder effects) and not-so-extreme balancing selection responsible for empirical results Meta-analyses and bottlenecks ¨Most pops have less MHC variation than neutral variation. Why? ¨Meta-analysis with 109 populations (17 studies) Positive values indicate loss of genetic diversity from pre-bottlenecked ⁄ control to bottlenecked populations. Meta-analyses and bottlenecks Usually, selection acting on MHC loci prior to a bottleneck event, combined with drift during the bottleneck, will result in overall loss of MHC polymorphism that is ~15% greater than loss of neutral genetic diversity. Outline of talk 1.The Chimp and the River •Negative-frequency dependent selection •Phylogenetic methods 2.The Island Fox •Balancing selection •Accounting for demography 3.Men in the Mountains •Positive selection •Genome scans ¤ ¤ http://cdn.vectorstock.com/i/composite/51,40/chimpanzee-vector-735140.jpg http://th04.deviantart.net/fs71/200H/i/2011/032/f/a/gray_fox_by_silvercrossfox-d2kzedp.jpg http://www.zdf.de/ZDF/zdfportal/blob/26504890/2/data.jpg Men of the mountains ¨In 1924 George Mallory and Walter Irvine, 2 first Europeans thought to have achieved summit of Mount Everest, vanished on the descent. http://www.deerpark0.demon.co.uk/images/library---2377.jpg Death on the mountain ¨In 1998, Mallory’s body was discovered frozen on slope ¨Since 1922, over 250 people have died climbing Everest, majority due to events exacerbated by acclimatization issues ¨ http://i.ytimg.com/vi/lYw6YrzRnOA/maxresdefault.jpg http://www.affimer.org/photos/mallory-legs.jpg http://movies.flabber.nl/everest/mount-everest-wallpaperpracht.jpg The Death Zone ¨Above 8,000 metres (26,000 ft) ¨“Drunk”, fatigue, headaches, nausea, loss of appetite, ear-ringing, blistering and purpling and of the hands and feet, and dilated veins ¨Body tries to get more oxygen to the brain by increasing blood flow -> swelling ¨High Altitude Cerebral Edema (HACE) ¨High Altitude Pulminary Edema (HAPE) High altitude adaptations ¨Decreased oxygen availability (>2,500 m) ¨Decreased barometric pressure ¨ ¨Physiological changes ¤ increased lung volumes, ¤increased breathing ¤higher resting metabolism ¤hemoglobin changes ¤ Geography of human adaptation to high altitude ¨Andean Altiplano, Ethiopian Highlands, Tibetian Plateau ¨Populated 11,000 - 25,000 years ago Bigham et al 2010. PLOS Genetics Genome scans for selection ¨Goal: Identify candidate genes for high-altitude adaptation based on signatures of positive selection in Tibetian and Andean populations ¨ ¨What are we looking for? ¨How do we know if the region is under selection vs random variation between individuals? Design of study 1.Contrast high-altitude populations with low-altitude population controls 1.Andean vs Mesoamerican and East Asian 2.Tibetan vs European and East Asian 2.Use 4 different complimentary tests of natural selection 3.Compare independent high-altitude population results Tests of natural selection ¨1) natural-log ratio of heterozygosity (lnRH) ¨2) standardized difference of Tajima’s D ¨3) whole genome long range haplotype (WGLRH) ¨ ¨Statistical significance determined using genome-wide empirical distributions generated by data. 1) Ratio of heterozygosity (lnRH) ¨Natural log of ratio of heterozygosity between 2 pops of interest (High vs Low altitude pops) ¨Sliding window of 100,000bp in 25,000bp increments along a chromosome ¨ http://previews.figshare.com/1434335/preview_1434335.jpg window sliding Negative lnRH values = regions with reduction in variation in high altitude population Tajima’s D ¨Under neutrality: ¨ ¨ ¨(Average #pairwise polymorphisms-standardized #segregating sites)/stdDev(d) ¨Average Heterozygosity = # of Segregating sites ¨ E(π)= (4+0+4)/3 = 2.67 ¨E(S) = 4 sites/(1/1+1/2) = 2.67 ¨D = 2.67-2.67/sqrt[Var(d)] = 0, Neutrality ¨If AvgHet > Segregating sites, D>0: Intermediate freq alleles, Balancing selection or recent pop bottleneck that removed rare alleles ¨If AvgHet < Segregating sites, D<0: High freq of singletons, Positive or purifying selection, selective sweep ¨ An external file that holds a picture, illustration, etc. Object name is 181fig1.jpg E[\pi]=\theta=E\left[\frac{S}{\sum_{i=1}^{n-1} \frac{1}{i}}\right]=4N\mu Number of pairs = n(n-1)/2 = 4(3)/2 = 12/2 = 6 Blue Table Pi=(5+3+2+2+3+3)= 18/6 = 3 S = 5 sites/(1/1+1/2+1/3) = 5/(1.83) = 2.73 D = 3-2.73 = 0.27 D>0 Green Table Pi=(5+5+5+0+0+0) = 15/6 = 2.50 S = 5 sites/(1/1+1/2+1/3) = 2.73 D = 2.5-2.73 = -0.23 = D<0 Worked D examples 1 2 3 4 5 6 7 8 A 0 1 0 0 1 0 0 0 B 0 0 0 1 0 0 1 1 C 0 0 0 0 0 0 1 0 D 0 1 0 1 0 0 0 0 1 2 3 4 5 6 7 8 A 1 1 1 1 1 0 0 0 B 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 Must know the standard deviation to determine significance Frequency spectrum ¨In a standard neutral model ¤Random mating ¤Constant population size ¤No population subdivision Singletons Many low freq-variants High freq-variants An external file that holds a picture, illustration, etc. Object name is 181fig1.jpg How often do you only have 1 variant in your sample? 2 of the same? 3 observed? This is your percent. Under neutrality, there are random mutations, many are silent, deleterious are removed. The occurrence of increasing freq of alleles decreases. Negative selection- low frequency of new variants, most are singletons, don’t get a lot of high-frequency new variants Positive selection- High-frequency alleles increase because they are selected Sweep- excess of HIGH frequency alleles because the beneficial alleles and alleles nearby are dragged to fixation. Excess of LOW freq alleles because essentially selection against all other variants to reduce their freq. 2) Standardized difference in D Negative standardized D = regions under selection in high altitude population controlling for demographic events 3) Whole genome long range haplotype (WGLRH) Young allele (neutral) •Low frequency •Long range LD •No time for recombination Old allele (neutral) •Low or high frequency (drift) •Short range LD •Lots of recombination Young selected allele •High frequency •Long-range LD •Hitch-hiking of linked sites Chromosome Long range haplotype Figure 1 Decay of EHH in Simulated Data for an Allele at Frequency 0.5 http://www.johnmyleswhite.com/notebook/wp-content/uploads/2011/03/density_comparison1.png Compare Relative Extended Haplotype Homozygosty to flexible gamma distribution parameterized with maximum likelihood methods from rest of dataset Values in upper 5% tail of gamma distribution = regions under positive selection in high altitude population Results: individual ancestry estimates http://myctrring.com/wp-content/uploads/2010/10/peru_map_outline.png http://d-maps.com/m/asia/china/tibet/tibet15s.gif Andean Tibetan Results: population stratification Andean Tibetan Results: Genome scans ¨MANY significant SNPs for both populations, varying by test ¨Strength of selection, time since selection, and recombination background all affect signal and test sensitivity Results: Genetic variation at cellular oxygen sensing gene E: Haplotypes with arrow showing highest significant SNP Grey region is gene A&B: Allele frequency distribution of 2 highest ranked SNPs for Andeans and Tibetans Derived =Red Positive selection =Black C: Significant are in Red for Andeans D: and for Tibetans THM: Adaptation has occurred independently at this gene in the two highland groups Take Home Message 1.The Chimp and the River •Phylogenetic methods to detect selection in a parasite and host • 2.The Island Fox •Balancing selection to resist effects of drift, but be careful with conclusions • 3.Men in the Mountains •Positive selection across the genome can affect different region for convergent phenotypes ¤ http://cdn.vectorstock.com/i/composite/51,40/chimpanzee-vector-735140.jpg http://th04.deviantart.net/fs71/200H/i/2011/032/f/a/gray_fox_by_silvercrossfox-d2kzedp.jpg Acknowledgements The excellent popular science book Spillover: Animal Infections and the Next Human Pandemic by David Quammen Funding Sources: European Social Fund in the Czech Republic, European Union, Ministry of Education, OP Education for Competitiveness, Veda vsemi smysly (CZ.1.07/2.3.00/35.0026) OPVK_hor_zakladni_logolink_RGB_eng.jpg http://www.distinctlymontana.com/sites/default/files/spillover.jpg http://ngm.nationalgeographic.com/2013/05/wrangel-island/img/09-arctic-fox-pup-670.jpg Thanks for your attention!