Moderní trendy v analýzách a mapování kriminality +Geoinformační technologie v sociální geografii Petr Kubíček kubicek@geogr.muni.cz Laboratory on Geoinformatics and Cartography (LGC) Institute of Geography Masaryk University Czech Republic PREDICTIVE MODELLING USING GIS What is predictive modelling? • Modelling with GIS – descriptive – predictive – prescriptive. • Using historical data to extrapolate the future conditions. • Case studies: – Agriculture - predictin the yield using RS. – Archeology – predicting the antient arechological sites using ModelBuilder. – Crime mapping and prediction using Risk Terrain modelling. Case study I – precision farming• Predictive modeling in agricutlture(J. Berry). • Corn yield– low(39 red) . high(279 – green) – dependant varialble – the phenomenon we want to predict. • Independent variables – are used to uncover the spatial relationship— prediction equation. • RS data – relative reflectance of red light (RED) off the plant canopy and nearinfrared response (NIR). Case study I •Korelační diagram (Scatter plot) pro všechny existující dvojice hodnot. •Predikční rovnice vytvořená pomocí regresní analýzy – křivka nejlépe charakterizující datové rozložení. •Využití predikční rovnice pro další lokality. The set of “blue dots” in both of the scatter plots represents data pairs for each grid location. The blue lines in the plots represent the prediction equations derived through regression analysis. Case study I • What is a problem? • A major problem is that the “r-squared” statistic for both of the prediction equations is fairly small (26% and 4.7%) which suggests that the prediction lines do not fit the data very well. • Use a combination of both independent var: • Normalized Density Vegetation Index (NDVI) • NDVI= ((NIR – Red) / (NIR + Red)) Case study I predicted value - actual measurement = error map Case study • Results are smoothign real yield. • Not a real calibration or validation, merely empirical verification of the technique. • How can we improve the model?? Case study • Closer look to error map. • bad—average error of 2,62 bu/ac. • 67% of the estimate within+- 20 bu/ac. • BUT – extreme location as far as +144 a -173 q/ha. • This is not satisfying. Case study I• Solution? • Data set stratification– division into the similar characteristics groups. • The vertical bars identify the breakpoints at plus/minus one standard deviation and divide the map values into three strata—zone 1 of unusually high under-guesses (red), zone 2 of typical error (yellow) and zone 3 of unusually high over-guesses (green). Case study • prediction equation works better for particular zones than for the whole field. • the breakpoints at plus/minus one standard deviation and divide the map values into three strata. • Prediction works fine for zone 2. • For zones 1 adn 3 are the results over- and under estimated. • We must define a specific prediction equation for each zone. Overall prediction using three separate prediction equations • Input– NDVI and yield map, Error zones map. • For each map location, the algorithm first checks the value on the Error Zones map then sends the data to the appropriate group for analysis. • Once the data has been grouped, a regression equation is generated for each zone. • The “rsquared” statistic for all three equations (.68, .60, and .42 respectively) suggests that the equations fit the data fairly well and ought to be good predictors. • the composite prediction map generated by applying the equations to the NDVI data respecting the zones. Case study I• Visual comparison: – Real yield – Prediction map for the whole field – Prediction map for stratified prediction • Error map for stratified predictions 80%within +- 20 bu/ac. • The average error is only 4bu/ac • Fairly good guessing of yield based on a remote sensing shot of the field nearly a month before the field was harvested  Other ways to stratify 1) Geographic Zones, such as proximity to the field edge; 2) Dependent Map Zones, such as areas of low, medium and high yield; 3) Data Zones, such as areas of similar soil nutrient levels; and 4) Correlated Map Zones, such as micro terrain features identifying small ridges and depressions. Lessons learned • Error map is important in evaluating and refining the prediction equations. This point is particularly important if the equations are to be extended in space and time. • Other locations (PLACE) and dates (TIME) should be used to verify performance. • In precision agriculture it is possible to combine the detail and continuous RS data available just occasionally (once a month/season) with regular sensor measurements done in point locations. Predictive modelling in ArcGIS Single procedures are used independently and repetitively – use ModelBuilde: ModelBuilder offers three main benefits: 1) It records all of the steps involved in a procedure; 2) 2) It allows the procedures to be easily repeated and shared with others; and 3) 3) It provides a visual representation to help with understanding what is going on in the procedure. Case study Predictive model for Mayan archaeological sites • In the context of archaeology, a predictive model is "a tool that indicates the probability of encountering an archaeological site anywhere within a landscape" • Developing a predictive model essentially consists of trying to determine the logic and preferences in site selection of the people who built the archaeological sites in question. . • It includes a descriptive analysis of environmental factors to see if any of their possible combinations seem to be repeatedly associated with a type of archaeological site. A quest for spatial pattern. • Example: previously-known Mayan sites seemed to prefer occupying locations that are near the ocean and close to glades where white sage (Salvia apiana) tends to grow. • analyze a region to see which areas in it meet those criteria, and direct your field searches to them 1. Limit the region of interest • clip the rivers layer down to just the region of northern Guatemala on which we are focusing. • Input + output + funkce. Step 2: Measure proximity of sites to rivers • rivers feature class has been reduced to our study area. • procedure for determining how close the archaeological sites are located to rivers. For this we will be using the Near tool, which calculates the distance of points from line features within a given search radius. • Near = 5km given search radius (NEAR_DIST). • Other localities labeled -1. Step 3: Combine vegetation & soils feature classes • determining which environmental attributes surround each of our archaeological sites. • in which vegetation, soils, and aspect polygons the sites reside?. • analysis requires a multi-step procedure • Use of Identity tool. • Vegetaiton+soils= PP1 • PP1 + aspect=PP2 4. Environment conditions for archeological sites • Archeo sites +PP2 using IDENTITY. • Attributes selection–Idnetity tools preserves all attributes. • Frequency tool. Step 5: Determine site attributes • Simplify output: – NEAR_DIST - distance – DESC_ vegetation – R_FERT - soil – ASP_CODE - aspect Final model in ModelBuilder 6. RUN and results • Check the clusters based on Frequency. • Establish working hypthesis for selected sites (whot is different/ what complies with the original hypothesis) • Test the hypothesis both using GIS and in the field. CRIME MAPPING AND ANALYSIS The role of „place‟ in crime Two key considerations (Spencer Chainey) • Crime has an inherent geographical quality. • Crime is not randomly distributed. • DO YOU AGREE? Crime has an inherent geographical quality The four dimensions of crime: • Legal (a law must be broken). • Victim (someone or something has to be targeted). • Offender (someone has to do the crime). • Spatial (it has to happen at a place somewhere, in space and time). • CRIME – Person V – Person O - WHERE Crime is not randomly distributed If crimes were random: – Equal chance of them happening anywhere at anytime. But crime is not randomly distributed • Concentrated into places of activity – Crime hotspots • Series follow geographic patterns – Serious and volume crime Where it all has begun? • From pin maps to virtual pin maps. • Space and time limitations and overlaps. • Crime typology problems. Current use of GIS in police practice • Community policing Major GIS Trends in Law Enforcement Predictive Policing • Geographic Profiling • Temporal patterns • Weather • Risk-Terrain Modelling • Socioeconomic Indicators • Near-Repeat Patterns Descriptive vs. Predictive modelling Topics to be covered in detail • Hot spot analysis • Near repeat victimisation • Risk Terrain mapping principles Hot Spot mapping ?? 178919.pdf • most hot spot analysis methods fall into one of five categories: • visual interpretation, • Choropleth mapping, • grid cell analysis, • Cluster analysis, • and spatial autocorrelation Crime mapping techniques Point mapping • The most common approach for displaying geographic patterns of crime is point mapping • Interpret spatial patterns and hot spots in the crime point data can be difficult . Point and graduated symbols • Point maps do have their application for: – mapping individual events of crime, – small volumes of crime, – and repeat locations through the use of graduating symbol sizes • less effective for identifying hot spots of crime, particularly from large data volumes. Spatial ellipses • SW based spatial clustering • Creating standard deviational ellipses around crime point clusters. • spatial ellipse techniques using hierarchical clustering and the Kmeans clustering routine. • Plausible for Hot spot areas identification. • However, no prioritizationthe main crime hot spots to assist in prevention targeting. Thematic mapping of geographic boundaries • A popular technique for representing any spatial distribution . • Geographic boundaries usually are defined administrative or political areas such as census blocks, polling districts, wards, or borough boundaries. • Due to the varying size and shape of most geographic boundaries, thematic shading can mislead the audience in identifying where the spatial cluster of crime may exist. Quadrat thematic mapping – raster based analysis • Use of uniform grid. • Thematic value: – a count of crimes per grid cell - SUM. – a density value calculated from the count and cell area. • Uniformity - loss of spatial detail within each quadrat and across quadrat boundaries. This can lead to problems of inaccurate interpretation. Interpolation and continuous surface smoothing methods • IDW, kriging, spline?? Non-continuous crime surface! • surfaces that represent the distribution of crime should act as visualizations for helping them understand crime patterns. • Methods that suit the analysts‘ application should therefore represent, as a continuous surface, the relationships or densities between crime point distributions. • The quartic kernel estimation method requires two parameters to be set prior to running. These are the grid cell size and bandwidth (search radius). • Bandwidth is the parameter that will lead to most differences in output when varied. • Guidelines exist for working out suitable values for these two parameters. Hotspot Mapping • Areas with high concentrations of crime. • Sherman (1995) defined hot spots ―as small places in which the occurrence of crime is so frequent that it is highly predictable, at least over a 1-year period.‖ • HM uses locations of past events to anticipate locations of future similar events. • Continuous surface hot spot maps : • Allow easier interpretation of crime clusters • reflect more accurately the location and spatial distribution of crime hot spots. Quartic kernel density Hot spot Tech box – Hot spot mapping methods • Technical problems: – Kernel type and its size: • Normal, Uniform, Quartic, Triangular.. Kernal type dependencies • Triangular vs. Normal (Gauss) Kernel size • Even more important than type. • Adaptive size vs. Fixed size • Adaptive – varying size covering at least defined number of events. For heterogeneous data types (different concetrations of events/crimes in different parts of the city). • Fixed size – regardless the number of events. Can be crime specific. Variations in time • Each hot spot map considered in this lecture accounts only for a specific snapshot period in time. • New areas of research are beginning to explore space-time interaction . • These methods aim to reveal whether certain types of crime display temporal hot spots in particular areas (e.g., crime hot spots that emerge only on certain days of the week). • The creation of crime hot spot animations to visualize space and time interaction. Crime analysis - example Analysing vehicle crime in central London: • Hypothesis: ―We think it relates mainly to local residents having their cars stolen at night‖ (The Police) • Crime analysis involves breaking the problem apart and exploring the specifics of the problem • We have a series of questions that we can turn into hypotheses • Explore ‗place‘ across these • Helping to explain the problem Locals vs visitors Vehicle statistics Detail view Methodology for KDE (Horák et al, 2015) Standardized manual of kernel density estimation (KDE) utilisation for identification of anomalous crime localities. • Data preparation – point data, correct and precise. • Method settings – fixed, adaptive, cell size, extent • Data processing – highest values only (based on purpose) • Visualization 50 – 200 – 400 m extent Complete data vs. Upper 10% Near Repeat Victimization Concept • After an initial crime event, nearby targets have an increased risk of victimization for a short period of time. • Space and time clustering • High Crime Areas - Primarily high crime areas are high because of numbers of repeat victims. • The British Crime Survey contains no area where more than half the people are victimised, but does contain areas where those victimised each suffer many times. Austin Repeat Residential Burglaries 1999 Explaining Repeat Victimisation • Possible explanations - contagion or boost Boost Explanations • repeat victimization reflects the successful outcome of an initial offense. Specific offenders gain important knowledge about a target from prior experience and use this information to re-offend. - PřF MU?? Contagion (Flag) Explanations • some targets are unusually attractive to criminals or particularly vulnerable to crime. Predictive Crime Analysis • „Predictive policing in the context of place is the use of historical data to create a spatiotemporal forecast of crime hot spots. • that will be the basis for police resource allocation decisions with the expectation that having officers at the proposed place and time will deter or detect criminal activity.― Risk Terrain Modeling Prediction • Risk terrain modeling (RTM) is an approach to risk assessment in which separate map layers representing the influence and intensity of a crime risk factor at every place throughout a geography is created in a geographic information system (GIS). • Map layers are combined to produce a composite “risk terrain” map with values that account for all risk factors at every place throughout the geography. • Available in PDf – ask your lecturer  RTM steps 1. Select an outcome event of particular interest 2. Choose a study area 3. Choose a time period 4. Obtain base maps of your study area 5. Identify aggravating and mitigating factors related to the outcome event 6. Select particular factors to include in the RTM 7. Operationalize the spatial influence of factors to risk map layers 8. Weight risk map layers relative to one another 9. Combine risk map layers to form a composite map 10. Finalize the risk terrain map to communicate meaningful and actionable information. Step 1 -2 1. Select an outcome event of particular interest Gun shooting incidents. 2. Choose a study area on which risk terrain maps will be created. The Township of Irvington, NJ. Step 3 STEP 3: Choose a time period to create risk terrain maps for. • Six month time period: January 1 to June 30. • It is expected that this time period will adequately assess the place‐based risk of shootings during the next 6‐month time period (July 1 to December 31). • Data availability and comparability ?? Is it really justifiable and valid for the Czech Republic? Step 4 • STEP 4: Obtain base maps of your study area. • Two base maps were obtained from Census 2000 TIGER/Line Shapefiles: – 1) Polygon shapefile of the Township and – 2) Street centerline shapefile for the Township. Step 5 STEP 5: Identify aggravating and mitigating risk factors that are related to the outcome event. • Three aggravating factors were identified based on a review of empirical literature: – dwellings of known gang members (habitual offenders), – locations of retail business infrastructure (bars, strip clubs, bus stops, check cashing outlets, pawn shops, fast food restaurants, and liquor stores), – locations of drug arrests (places, where the police action happened). Step 6 • STEP 6: Select particular risk factors to include in the risk terrain model. • All three risk factors identified in Step 5 will be included. • Raw data in tabular form (i.e. Excel spreadsheets) was provided by the Township police and the many datasets they maintain, validate and update regularly to support internal crime analysis and police investigations. • Attributes + addresses + time stamps + ?? • State of the art of the investigation including the punishment and legal procedure. Step 7 • STEP 7: Operationalize risk factors to risk map layers. • The tabular data was geocoded to street centerlines of Irvington to create point features representing: – the locations of gang members‘ residences (hiden on the map to protect the gang members), – retail business outlets, – and drug arrests, respectively as three separate map layers. Step 7a – gang member residence The spatial influence of the ―gang members‘ residences‖ risk factor was operationalized as: ―Areas with greater concentrations of gang members residing will increase the risk of those places having shootings.‖ So, a density map was created from the points of gang members‘ residences. Step 7b - infrastructure • The spatial influence of the ―infrastructure‖ risk factor was operationalized as: • ―High concentrations of bars, strip clubs, bus stops, check cashing outlets, pawn shops, fast food restaurants, and liquor stores will increase the risk of those dense places having shootings.‖ Step 7C – the drug arrest the ―drug arrest‖ risk factor was operationalized as: • ―Areas with high concentrations of drug arrests will be at a greater risk for shootings because these arrests create new ‗open turf‘ that other drug dealers fight over to control.― Step 7 – map density method details • Kernel density values were calculated for each of the risk map layers so that points lying near the center of a cellʹs search area would be weighted more heavily than those lying near the edge, in effect smoothing the distribution of values. • Cells within each density map layer were classified into four groups according to standard deviational breaks. The dark blue colored cells had values in the top five percent of the distribution and were considered the ―highest risk‖ places. Step 7d – distance from infrastructure • The spatial influence of the ―infrastructure‖ risk factor was also operationalized as: • ―The distance of one block, or about 350ft (app. 100 m), from a facility poses the greatest risk of shootings because victims are often targeted when arriving at or leaving the establishment.‖ 7e – final operationalization • We are only interested in knowing where places are the most at risk for shootings, so we used a binary‐valued schema to designate the ―highest risk‖ places across all four risk map layers. • The highest risk places of each risk map layer, respectively, will be given a value of ―1‖; all other places will be given a value of ―0‖. • All risk factors are operationalized as aggravating factors, so these values will remain positive. Step 7 - reclassification Step 7 – final comparison • We now have four (final) risk map layers, operationalized from three risk factors. • Binary reclassification – 0 – 1 • The cells of different map layers are the same size and were classified in a standad way, the risk map layers can be summed together to form a composite risk terrain map. Step 8 + 9 - Inter Risk Map Layer Weighting and CRTM All risk map layers will carry equal weights to produce an un‐weighted risk terrain model. It is assumed, for example, that being in a place with a high concentration of drug arrests poses the same risk of having a shooting as being in a place with a high concentration of gang member residences. Unless we know better  !! STEP 10 - Finalize the Risk Terrain Map to Communicate Meaningful Information. • Clip our risk terrain map to the boundary of Irvington. • produce a final map with shades of grey and layout. Step 10 – make the risk count • convert the risk terrain map from raster to vector we can (still using the regular structure converted to square polygons): • count the number of shootings that actually occur in the high‐risk areas during the subsequent time period; • calculate the square area of the highest risk areas (i.e., places with a composite risk value of 3); Step 10 – make the risk count • Select all street segments within these areas to inform police commanders about where patrols might be increased. • Operationalise the command and controll on the day by day basis. RTM validation • Comparison with the subsequent time period (June 1 – December 31) – high risk RTM classes and hot spot analysis of actual shooting accidents. • About 50% (15 out of 31) of the shootings during the subsequent time period (July 1 to December 31) happened in these high‐risk cluster areas. Things to remeber • Remember, risk terrain modeling is only a tool for spatial risk assessment; it is not the solution to crime problems. • You (the analyst) give value and meaning to RTM, so be innovative in your thinking about risk factors and how risk terrain maps can be applied to police operations. Risk Terrain Modelling Synthesis Homework for the last lesson • Work on your dissertation topic. Prepare a brief Introduction section (max. 1,5 page). • Expected structure: 1. Problem and broad context for your specific topic 2. What is already known – only important references! 3. What we need to know – the need for your topic made clear. 4. Purpose/Aim or Research Question (s) for your topic. • Write in a way that takes the reader from general to specific, from the known to the unknown. • Prepare a presentation based on your manuscript – 6-8 minutes max.