Mapping and modeling species distributions Department of Botany and Zoology, Masaryk University Bi9661 Selected issues in Ecology, Autumn 2013 Borja Jiménez-Alfaro, PhD Introduction: ASSESSING SPECIES DISTRIBUTIONS “attempts to explain why species and higher taxa are distributed as they are, and why the diversity and taxonomic composition of the biota vary from one region to another” Biogeography ASSESSING SPECIES DISTRIBUTIONS Philip Sclater (1829-1913) is the science that attempts to document and understand spatial patterns of biodiversity. Reconstruct the origins, dispersal, and extinctions of taxa Primarily focused on evolution, dispersal and vicariance Historical biogeography Primarily focused on present distributions, species responses to biotic environment and interactions with other organisms Ecological biogeography Combines historical and ecological biogeography, investigating the relationships between communities (abundance, distribution, and diversity of species) and abiotic conditions Paleoecology ASSESSING SPECIES DISTRIBUTIONS Work on the protection and restoration of natural environments Conservation biogeography Special issue Diversity and distributions 13 (3) 2006 ASSESSING SPECIES DISTRIBUTIONS Special feature: The application of predictive modeling of species distribution to biodiversity conservation Computational power (computers) Geographic Information Systems Geostatistics Recent tools in biogeography MAPPING MODELING ASSESSING SPECIES DISTRIBUTIONS 2002 ASSESSING SPECIES DISTRIBUTIONS 2010 ASSESSING SPECIES DISTRIBUTIONS 2011 ASSESSING SPECIES DISTRIBUTIONS Predicting species occurrences and estimating ranges Modeling ecological spatial responses Reconstructing past distributions Biogeography of genetic and physiological data Assessing responses to climate changes Establishing diversity patterns (endemicity, richness) and much more… Species distribution models (ecological niche models) are used for: ASSESSING SPECIES DISTRIBUTIONS ASSESSING SPECIES DISTRIBUTIONS ASSESSING SPECIES DISTRIBUTIONS ASSESSING SPECIES DISTRIBUTIONS Three main steps: 1. compile spatial data associated with the target element and environmental data for the area of interest 2. build a statistical model based on the association of the element to environmental variables at sites of known occurrence 3. Map the model via GIS across the area of interest ASSESSING SPECIES DISTRIBUTIONS Environmental Data Spatial Model Range Prediction Specialist Review Species- Environment Relationship Localities Environmental DataSpecialist’s Knowledge D E D U C T I V E I N D U C T I V E Distribution Modeling www.naturserve.org ASSESSING SPECIES DISTRIBUTIONS Franklin 2009 ASSESSING SPECIES DISTRIBUTIONS Model building process (from Guisan and Zimmerman 2000) ASSESSING SPECIES DISTRIBUTIONS Distribution modeling (per se) is EASY Just some technical skills are required Anyone can compute it with user-friendly software Applying distribution modeling is more TRICKY You need a good purpose to do it (research question, conservation goal) You must know how to do it properly ASSESSING SPECIES DISTRIBUTIONS About this course: Part 1 – Mapping Dealing with occurrence data Environmental variables Spatial terms and PRACTICE with GIS Part 2 – Modeling Background theory (niche concept) Modeling methods Maximum Entropy and PRACTICE with MaxEnt Part 3 – Mapping and Modeling Model implementation and evaluation Applications and future challenges Using your OWN DATA and GROUP PRACTICE Part 1: MAPPING OCCURRENCE DATA A record for one species/organism/community in one locality Presences (and absences) are the MAIN dependent variable for Species Distribution Modeling (but there are more) What is occurrence data? OCCURRENCE DATA Guisan & Zimmerman (2000) Two main options: Where to obtain occurrence data? OCCURRENCE DATA 2. Using existing data (e.g. biodiversity databases) Pro: huge amount of data around the world Con: uncertainties on sampling, accuracy, etc. 1. Desing your own field sampling Pro: you have control on your data Con: many times you have not time or money Probabilistic design is required to quantify the species responses along gradients, in order to consider the edges of environmental distribution 1. Sampling design OCCURRENCE DATA Geographic distribution of a species OCCURRENCE DATA Distribution in the environmental space OCCURRENCE DATA Distribution in the environmental space is different! OCCURRENCE DATA There is no universal design for all questions, but… Even better can be mixed designs, e.g random stratified sampling Simple random design is used for relatively homogeneous spaces (when the probabilities of occurrences are equal) but it is not a good option if you have to sample organisms which are rare or disjuntly distributed Regular, systematic, clustered or stratified designs are prefered to sample occurrence data if the organism is clearly influenced by geographial, environmental or topotraphical gradients OCCURRENCE DATA You “should” avoid bias survey sampling -> Geographic bias: along roads, near the cities,… -> Taxonomic bias: wrong identification of species OCCURRENCE DATA E.g.: identification of bias in biological collections of Lupinus hspanicus (Parra-Quijano et al.): diferent geographic cover Genetic bank Both Herbarium / literature You “should” avoid purposive sampling -> non-probabilistic, based on aprioristic knowledge -> usually produces undersampling of the study subject OCCURRENCE DATA E.g.: Comparison of sampling survey desings for predicting lichen species in USA (Edwards et al. 2006) A visual example of designed versus purposive sampling (vegetation plots in Picos de Europa, Spain) Designed (systematic) (N = 80) Purposive (biased) (N = 100) OCCURRENCE DATA More important that the number of observations is the degree to which the range of the environmental space occupied by the species are covered in the sample (= COMPLETENESS) and the frequency of events (records of species presences) from the sample (= PREVALENCE) OCCURRENCE DATA How many samples? Or, a better question, What is the mínimum sample size for my study? For SDMs, there are some rules: → A mínimum of 50 observations can be fine → 20-40 times as many observations as predictors → For rare species and some algorithms, 20 occurrences can be enough…! OCCURRENCE DATA Wisz et al. 2008. Effect of sample size on the performance of species distribution models. Diversity and Distributions 14: 763 Very few samples can be valid for rare organisms … but it depends on the method OCCURRENCE DATA In summary, the quality of our data for modeling distributons will depend on many factors: -> EXTENT of the study area and ACCURACY of occurences -> The ECOLOGY of the species -> How we sample the ENVIRONMENTAL SPACE -> How many PRESENCES and ABSENCES are sampled -> The PREDICTORS and the modeling METHOD OCCURRENCE DATA 2. Using existing data (e.g. biodiversity databases) OCCURRENCE DATA -> SDMs are mainly used to map unkwon species distributions -> Species mapping has however a long history using known distributions from many different sources Main types of sources: - Grid-based atlases (compilation of information) - Natural history collections (museums, botanic gardens) - Surveys (conservation, vegetation or faunistic surveys) Grid-based Atlases Pro: Cover large territories and represent distribution ranges well Con: Coarse grain (10 km, 50 km) and small spatial acuracy ATLAS FLORAE EUROPAEAE 50km x 50 km 2559 species (20% of European flora) OCCURRENCE DATA Natural history collections Pro: Large amount of data for all the world Con: Low spatial resolution and high uncertainty OCCURRENCE DATA Biodiversity Surveys Pro: Spatial accuracy is heterogeneous, although can be good Con: Generally biased or purposive samping Czech national Phytosociological Database OCCURRENCE DATA OCCURRENCE DATA There is some sampling design behind this? PROBABLY NOT From Franklin 2009 Is the data valid? PROBABLY YES Problems associated with biodiversity databases OCCURRENCE DATA 1. Low spatial accuracy: location and coordinates (if existing) are generally imprecise 2. Unknown sampling design: generally biased or purposive, but in general not reported How this affects our data: - Incomplete distributions (bias) - Undersampling - Pseudo-replication - Spatial autocorrelation of samples - Low spatial accuracy of the analyses Next week we will go back to the spatial issues How to solve these limitations? OCCURRENCE DATA Georeferenciating: it takes time but it allow us to measure spatial uncertainty Resampling: to have some control of the data (e.g. analyzing subsets separately) Adaptative sampling: resampling after a first assessment Evaluating bias: using spatial information Measuring spatial autocorrelation Georeferencing OCCURRENCE DATA The main challenge of biological collections is the assignment of geographic coordinates to millions of historical records (Baker & al., 1998) Spatial autocorrelation OCCURRENCE DATA Oversampling of aras produces pseudo-replication and further overfiting of the models. What to do? Sampling (or resampling) according to spatial criteria Assessing spatial autocorrelation (e.g. Moran’s I) after modeling b) Part 1: MAPPING ENVIRONMENTAL VARIABLES ENVIRONMENTAL VARIABLES In the ecological space, what factors are important for the distribution of species? At the macro-distributional scale, ultimate controlling factors have to do with energy requirements of species. Energy requirements are, in turn, determined by physiology and morphology http://ruig.grid.unep.ch/?p=95 World Potential Evapotranspiration ENVIRONMENTAL VARIABLES And… what is the primary source of energy for the Earth? ENVIRONMENTAL VARIABLES Solar energy brings Light (quantity and quality) Heat ENVIRONMENTAL VARIABLES What factors affect solar radiation and temperature? Topography: - Elevation - Slopes - Exposure Latitude ENVIRONMENTAL VARIABLES which is also affected by latitude and topography But the factory of primary production (vegetation) ... also needs WATER ENVIRONMENTAL VARIABLES There are also functional manifestations of the interplay of all these factors evapotranspiration productivity ENVIRONMENTAL VARIABLES In sum NOTE: Energy and water income is dynamic in time. For some questions regarding the eco-geographic distribution of species, the time dimension is crucial There are distal factors that determine directly or indirectly the distribution of all species (at broad spatial scales): -> Amount of light -> Amount of heat -> Amount of water -> Topography ENVIRONMENTAL VARIABLES When modelling individual species, more proximal variables become relevant: -> Soil types -> Evapotranspiration -> Primary productivity -> Light quality -> Number of frost days NOTE: A necessary field of research is needed to achieve a better understanding of the inclusion of different types of variables in the modelling process, as well as the effect of redundancy on model quality. Implications ENVIRONMENTAL VARIABLES E.g.: interactions, parasitisms……. But EACH organism has its own requirements In statistical terms, there are two main variables: QUANTITATIVE elevation, temperature, precipitation, etc. QUALITATIVE (CATEGORICAL) Soil type, land cover, vegetation ->They will be used differently in the modeling process Types of environmental variables ENVIRONMENTAL VARIABLES In practice, we should distinguish the most appropiate variables for our study case, and especially the SCALE Types of environmental variables Broad scale studies: More focused on DIRECT variables, mostly climatic: Temperature, precipitacion, solar radiation, evapotranspiration Local scale studies: More focused on INDIRECT variables, mostly topogaphic: Elavation, slope aspect, exposition, topograhical índices, etc. ENVIRONMENTAL VARIABLES Conceptual model of relationships between resources, direct and indirect variables, and their influence on plant performance (from Guisan & Zimmerman 2000) ENVIRONMENTAL VARIABLES ENVIRONMENTAL VARIABLES WORLDCLIM Averaged from long-term (30yr) series of Temp and Prec BIOCLIM: “Bioclimatic variables are derived from the monthly temperature and rainfall values in order to generate more biologically meaningful variables. These are often used in ecological niche modeling (e.g., BIOCLIM, GARP). The bioclimatic variables represent annual trends (e.g., mean annual temperature, annual precipitation) seasonality (e.g., annual range in temperature and precipitation) and extreme or limiting environmental factors (e.g., temperature of the coldest and warmest month, and precipitation of the wet and dry quarters). A quarter is a period of three months (1/4 of the year)” ENVIRONMENTAL VARIABLES WORLDCLIM (www.worldclim.org) BIO1 = Annual Mean Temperature BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp)) BIO3 = Isothermality (P2/P7) (* 100) BIO4 = T Seasonality (standard deviation *100) BIO5 = Max T of Warmest Month BIO6 = Min T of Coldest Month BIO7 = T Annual Range (P5-P6) BIO8 = Mean T of Wettest Quarter BIO9 = Mean T of Driest Quarter BIO10 = Mean T of Warmest Quarter BIO11 = Mean T of Coldest Quarter BIO12 = Annual Prec BIO13 = Prec of Wettest Month BIO14 = Prec of Driest Month BIO15 = Prec Seasonality BIO16 = Prec of Wettest Quarter BIO17 = Prec of Driest Quarter BIO18 = Prec of Warmest Quarter BIO19 = Prec of Coldest Quarter ENVIRONMENTAL VARIABLES National data at higher resolution (e.g. Spain, 200m x 200m) ENVIRONMENTAL VARIABLES TOPOGRAHY Potential solar radiation (from the Digital Elevation Model) Indirect variable reflecting heat acumulation ENVIRONMENTAL VARIABLES TOPOGRAHY Topographic Position Index (from the Digital Elevation Model) Indirect variable reflecting moisture and wind exposure ENVIRONMENTAL VARIABLES MORE VARIABLES (AT DIFFERENT SCALES) ENVIRONMENTAL VARIABLES MORE VARIABLES (AT DIFFERENT SCALES) ENVIRONMENTAL VARIABLES MORE VARIABLES (AT DIFFERENT SCALES) A map of land use in Europe. Yellow: cropland and arable, light green: grassland and pasture, dark green: forest, light brown: tundra or bogs, unshaded areas: other (including towns and cities). ENVIRONMENTAL VARIABLES Last inter-glacial (LIG; ~120,000 - 140,000 years BP) Mid-Holocene (~6000 BP) Paleoclimatic models ENVIRONMENTAL VARIABLES Last glacial maximum (LGM; ~21,000 years BP) Paleoclimatic models ENVIRONMENTAL VARIABLES Climate future projections ASSESSING SPECIES DISTRIBUTIONS NEXT WEEK Part 1 – Mapping Dealing with occurrence data Environmental variables Spatial issues and PRACTICE with GIS (October 18) Part 2 – Modeling Background theory (niche concept) Modeling methods Maximum Entropy and PRACTICE with MaxEnt Part 3 – Mapping and Modeling Model implementation and evaluation Applications and future challenges Using your own data and GROUP PRACTICE