Moderní trendy v analýzách a mapování
kriminality
+Geoinformační technologie v sociální
geografii
Petr Kubíček
kubicek@geogr.muni.cz
Laboratory on Geoinformatics and Cartography (LGC)
Institute of Geography
Masaryk University
Czech Republic
PREDICTIVE MODELLING
USING GIS
What is predictive modelling?
• Modelling with GIS – descriptive – predictive –
prescriptive.
• Using historical data to extrapolate the future
conditions.
• Case studies:
– Agriculture - predictin the yield using RS.
– Archeology – predicting the antient arechological
sites using ModelBuilder.
– Crime mapping and prediction using Risk
Terrain modelling.
Case study I – precision
farming• Predictive modeling
in agricutlture(J.
Berry).
• Corn yield– low(39 red)
. high(279 – green)
– dependant varialble
– the phenomenon we
want to predict.
• Independent variables
– are used to uncover
the spatial relationship—
prediction equation.
• RS data – relative
reflectance of red light
(RED) off the plant
canopy and nearinfrared
response (NIR).
Case study I
•Korelační diagram (Scatter plot) pro
všechny existující dvojice hodnot.
•Predikční rovnice vytvořená pomocí
regresní analýzy – křivka nejlépe
charakterizující datové rozložení.
•Využití predikční rovnice pro další lokality.
The set of “blue dots” in both of the scatter plots represents data pairs for each grid
location. The blue lines in the plots represent the prediction equations derived through
regression analysis.
Case study I
• What is a problem?
• A major problem is that the “r-squared” statistic for
both of the prediction equations is fairly small (26%
and 4.7%) which suggests that the prediction lines
do not fit the data very well.
• Use a combination of both independent var:
• Normalized Density Vegetation Index
(NDVI)
• NDVI= ((NIR – Red) / (NIR + Red))
Case study I
predicted value - actual measurement = error map
Case study
• Results are smoothign real yield.
• Not a real calibration or validation, merely
empirical verification of the technique.
• How can we improve the model??
Case study
• Closer look to
error map.
• bad—average error
of 2,62 bu/ac.
• 67% of the
estimate within+-
20 bu/ac.
• BUT – extreme
location as far as
+144 a -173
q/ha.
• This is not
satisfying.
Case study I• Solution?
• Data set stratification– division into the similar
characteristics groups.
• The vertical bars identify the breakpoints at plus/minus one
standard deviation and divide the map values into three
strata—zone 1 of unusually high under-guesses (red), zone 2
of typical error (yellow) and zone 3 of unusually high
over-guesses (green).
Case study
• prediction equation works
better for particular zones
than for the whole field.
• the breakpoints at
plus/minus one standard
deviation and divide the map
values into three strata.
• Prediction works fine for
zone 2.
• For zones 1 adn 3 are the
results over- and under
estimated.
• We must define a specific
prediction equation for each
zone.
Overall prediction using three
separate prediction equations
• Input– NDVI and yield map, Error zones
map.
• For each map location, the algorithm first checks the
value on the Error Zones map then sends the data to
the appropriate group for analysis.
• Once the data has been grouped, a regression
equation is generated for each zone.
• The “rsquared” statistic for all three equations (.68,
.60, and .42 respectively) suggests that the
equations fit the data fairly well and ought to be
good predictors.
• the composite prediction map generated by applying
the equations to the NDVI data respecting the zones.
Case study I• Visual comparison:
– Real yield
– Prediction map for the whole field
– Prediction map for stratified prediction
• Error map for stratified predictions 80%within +- 20
bu/ac.
• The average error is only 4bu/ac
• Fairly good guessing of yield based on a remote sensing shot of
the field nearly a month before the field was harvested 
Other ways to stratify
1) Geographic Zones, such as proximity to the
field edge;
2) Dependent Map Zones, such as areas of low,
medium and high yield;
3) Data Zones, such as areas of similar soil
nutrient levels; and
4) Correlated Map Zones, such as micro terrain
features identifying small ridges and
depressions.
Lessons learned
• Error map is important in evaluating and
refining the prediction equations. This point
is particularly important if the equations
are to be extended in space and time.
• Other locations (PLACE) and dates
(TIME) should be used to verify
performance.
• In precision agriculture it is possible to
combine the detail and continuous RS
data available just occasionally (once a
month/season) with regular sensor
measurements done in point locations.
Predictive modelling in ArcGIS
Single procedures are used independently and
repetitively – use ModelBuilde:
ModelBuilder offers three main benefits:
1) It records all of the steps involved in a
procedure;
2) 2) It allows the procedures to be easily repeated
and shared with others; and
3) 3) It provides a visual representation to help
with understanding what is going on in the
procedure.
Case study Predictive model for
Mayan archaeological sites
• In the context of archaeology, a predictive model is "a tool
that indicates the probability of encountering an
archaeological site anywhere within a landscape"
• Developing a predictive model essentially consists of
trying to determine the logic and preferences in site
selection of the people who built the archaeological sites
in question. .
• It includes a descriptive analysis of environmental
factors to see if any of their possible combinations seem
to be repeatedly associated with a type of archaeological
site. A quest for spatial pattern.
• Example: previously-known Mayan sites seemed to
prefer occupying locations that are near the ocean
and close to glades where white sage (Salvia apiana)
tends to grow.
• analyze a region to see which areas in it meet those
criteria, and direct your field searches to them
1. Limit the region of interest
• clip the rivers layer down to just the region of
northern Guatemala on which we are focusing.
• Input + output + funkce.
Step 2: Measure proximity of
sites to rivers
• rivers feature class has been reduced to our
study area.
• procedure for determining how close the
archaeological sites are located to rivers. For this
we will be using the Near tool, which calculates
the distance of points from line features within a
given search radius.
• Near = 5km given search radius (NEAR_DIST).
• Other localities labeled -1.
Step 3: Combine vegetation &
soils feature classes
• determining which environmental attributes
surround each of our archaeological sites.
• in which vegetation, soils, and aspect polygons the
sites reside?.
• analysis requires a multi-step procedure
• Use of Identity tool.
• Vegetaiton+soils= PP1
• PP1 + aspect=PP2
4. Environment conditions for
archeological sites
• Archeo sites +PP2
using IDENTITY.
• Attributes
selection–Idnetity
tools preserves all
attributes.
• Frequency tool.
Step 5: Determine
site attributes
• Simplify output:
– NEAR_DIST - distance
– DESC_ vegetation
– R_FERT - soil
– ASP_CODE - aspect
Final model in ModelBuilder
6. RUN and results
• Check the clusters based on Frequency.
• Establish working hypthesis for selected sites (whot is
different/ what complies with the original hypothesis)
• Test the hypothesis both using GIS and in the field.
CRIME MAPPING AND
ANALYSIS
The role of „place‟ in crime
Two key considerations (Spencer Chainey)
• Crime has an inherent geographical quality.
• Crime is not randomly distributed.
• DO YOU AGREE?
Crime has an inherent
geographical quality
The four dimensions of crime:
• Legal (a law must be broken).
• Victim (someone or something has to be
targeted).
• Offender (someone has to do the crime).
• Spatial (it has to happen at a place somewhere,
in space and time).
• CRIME – Person V – Person O - WHERE
Crime is not randomly
distributed
If crimes were random:
– Equal chance of them happening anywhere at
anytime.
But crime is not randomly distributed
• Concentrated into places of activity
– Crime hotspots
• Series follow geographic patterns
– Serious and volume crime
Where it all has begun?
• From pin maps to
virtual pin maps.
• Space and time
limitations and
overlaps.
• Crime typology
problems.
Current use of GIS in police
practice
• Community policing
Major GIS Trends in Law
Enforcement
Predictive Policing
• Geographic Profiling
• Temporal patterns
• Weather
• Risk-Terrain Modelling
• Socioeconomic Indicators
• Near-Repeat Patterns
Descriptive vs. Predictive
modelling
Topics to be covered in detail
• Hot spot analysis
• Near repeat victimisation
• Risk Terrain mapping principles
Hot Spot mapping ??
178919.pdf
• most hot spot analysis methods fall into one of
five categories:
• visual interpretation,
• Choropleth mapping,
• grid cell analysis,
• Cluster analysis,
• and spatial autocorrelation
Crime mapping techniques Point
mapping
• The most common approach for displaying
geographic patterns of crime is point mapping
• Interpret spatial patterns and hot spots in the crime
point data can be difficult .
Point and graduated symbols
• Point maps do have
their application for:
– mapping
individual events
of crime,
– small volumes of
crime,
– and repeat
locations through
the use of
graduating symbol
sizes
• less effective for
identifying hot spots
of crime, particularly
from large data
volumes.
Spatial ellipses
• SW based spatial
clustering
• Creating standard
deviational ellipses around
crime point clusters.
• spatial ellipse techniques
using hierarchical
clustering and the Kmeans
clustering routine.
• Plausible for Hot spot areas
identification.
• However, no
prioritizationthe main
crime hot spots to assist in
prevention targeting.
Thematic mapping of
geographic boundaries
• A popular technique for
representing any spatial
distribution .
• Geographic boundaries usually
are defined administrative or
political areas such as census
blocks, polling districts, wards, or
borough boundaries.
• Due to the varying size and
shape of most geographic
boundaries, thematic shading can
mislead the audience in
identifying where the spatial
cluster of crime may exist.
Quadrat thematic mapping –
raster based analysis
• Use of uniform grid.
• Thematic value:
– a count of crimes per grid cell
- SUM.
– a density value calculated
from the count and cell area.
• Uniformity - loss of spatial
detail within each quadrat and
across quadrat boundaries. This
can lead to problems of
inaccurate interpretation.
Interpolation and continuous
surface smoothing methods
• IDW, kriging, spline?? Non-continuous crime
surface!
• surfaces that represent the distribution of crime
should act as visualizations for helping them
understand crime patterns.
• Methods that suit the analysts‘ application should
therefore represent, as a continuous surface, the
relationships or densities between crime point
distributions.
• The quartic kernel estimation method requires two
parameters to be set prior to running. These are the
grid cell size and bandwidth (search radius).
• Bandwidth is the parameter that will lead to most
differences in output when varied.
• Guidelines exist for working out suitable values for
these two parameters.
Hotspot Mapping
• Areas with high concentrations of crime.
• Sherman (1995) defined hot spots ―as small places
in which the occurrence of crime is so frequent
that it is highly predictable, at least over a 1-year
period.‖
• HM uses locations of past events to anticipate
locations of future similar events.
• Continuous surface
hot spot maps :
• Allow easier
interpretation of crime
clusters
• reflect more accurately
the location and
spatial distribution of
crime hot spots.
Quartic kernel
density Hot
spot
Tech box – Hot spot mapping
methods
• Technical problems:
– Kernel type and its size:
• Normal, Uniform, Quartic, Triangular..
Kernal type dependencies
• Triangular vs. Normal (Gauss)
Kernel size
• Even more important than type.
• Adaptive size vs. Fixed size
• Adaptive – varying size covering at least
defined number of events. For heterogeneous
data types (different concetrations of
events/crimes in different parts of the city).
• Fixed size – regardless the number of events.
Can be crime specific.
Variations in time
• Each hot spot map considered in this lecture
accounts only for a specific snapshot period in
time.
• New areas of research are beginning to explore
space-time interaction .
• These methods aim to reveal whether certain
types of crime display temporal hot spots in
particular areas (e.g., crime hot spots that
emerge only on certain days of the week).
• The creation of crime hot spot animations to
visualize space and time interaction.
Crime analysis - example
Analysing vehicle crime in central London:
• Hypothesis: ―We think it relates mainly to local
residents having their cars stolen at night‖ (The
Police)
• Crime analysis involves breaking the problem
apart and exploring the specifics of the problem
• We have a series of questions that we can turn
into hypotheses
• Explore ‗place‘ across these
• Helping to explain the problem
Locals vs visitors
Vehicle statistics
Detail view
Methodology for KDE (Horák et
al, 2015)
Standardized manual of kernel density
estimation (KDE) utilisation for identification
of anomalous crime localities.
• Data preparation – point data, correct and
precise.
• Method settings – fixed, adaptive, cell size,
extent
• Data processing – highest values only
(based on purpose)
• Visualization
50 – 200 – 400 m extent
Complete data vs.
Upper 10%
Near Repeat Victimization
Concept
• After an initial crime event, nearby targets have an
increased risk of victimization for a short period of
time.
• Space and time clustering
• High Crime Areas - Primarily high crime areas
are high because of numbers of repeat victims.
• The British Crime Survey contains no area
where more than half the people are
victimised, but does contain areas where those
victimised each suffer many times.
Austin Repeat Residential
Burglaries 1999
Explaining Repeat Victimisation
• Possible explanations - contagion or boost
Boost Explanations
• repeat victimization reflects the successful
outcome of an initial offense. Specific
offenders gain important knowledge about a
target from prior experience and use this
information to re-offend. - PřF MU??
Contagion (Flag) Explanations
• some targets are unusually attractive to
criminals or particularly vulnerable to crime.
Predictive Crime Analysis
• „Predictive policing in the context of place
is the use of historical data to create a
spatiotemporal forecast of crime hot
spots.
• that will be the basis for police resource
allocation decisions with the expectation
that having officers at the proposed place
and time will deter or detect criminal
activity.―
Risk Terrain Modeling
Prediction
• Risk terrain modeling
(RTM) is an approach to
risk assessment in which
separate map layers
representing the influence
and intensity of a crime
risk factor at every place
throughout a geography is
created in a geographic
information system (GIS).
• Map layers are combined
to produce a composite
“risk terrain” map with
values that account for all
risk factors at every place
throughout the
geography.
• Available in PDf – ask your
lecturer 
RTM steps
1. Select an outcome event of particular interest
2. Choose a study area
3. Choose a time period
4. Obtain base maps of your study area
5. Identify aggravating and mitigating factors related to
the outcome event
6. Select particular factors to include in the RTM
7. Operationalize the spatial influence of factors to risk
map layers
8. Weight risk map layers relative to one another
9. Combine risk map layers to form a composite map
10. Finalize the risk terrain map to communicate
meaningful and actionable information.
Step 1 -2
1. Select an outcome event of particular interest
Gun shooting incidents.
2. Choose a study area on which risk terrain
maps will be created.
The Township of Irvington, NJ.
Step 3
STEP 3: Choose a time period to create risk
terrain maps for.
• Six month time period: January 1 to June 30.
• It is expected that this time period will
adequately assess the place‐based risk of
shootings during the next 6‐month time period
(July 1 to December 31).
• Data availability and comparability ?? Is it
really justifiable and valid for the Czech
Republic?
Step 4
• STEP 4: Obtain base
maps of your study
area.
• Two base maps were
obtained from Census
2000 TIGER/Line
Shapefiles:
– 1) Polygon shapefile of
the Township and
– 2) Street centerline
shapefile for the
Township.
Step 5
STEP 5: Identify aggravating and
mitigating risk factors that are related to
the outcome event.
• Three aggravating factors were identified based on
a review of empirical literature:
– dwellings of known gang members (habitual
offenders),
– locations of retail business infrastructure (bars,
strip clubs, bus stops, check cashing outlets, pawn
shops, fast food restaurants, and liquor stores),
– locations of drug arrests (places, where the police
action happened).
Step 6
• STEP 6: Select particular risk factors to
include in the risk terrain model.
• All three risk factors identified in Step 5 will be
included.
• Raw data in tabular form (i.e. Excel spreadsheets)
was provided by the Township police and the
many datasets they maintain, validate and
update regularly to support internal crime
analysis and police investigations.
• Attributes + addresses + time stamps + ??
• State of the art of the investigation including
the punishment and legal procedure.
Step 7
• STEP 7: Operationalize
risk factors to risk map
layers.
• The tabular data was
geocoded to street
centerlines of Irvington
to create point features
representing:
– the locations of gang
members‘ residences
(hiden on the map to
protect the gang
members),
– retail business
outlets,
– and drug arrests,
respectively as three
separate map layers.
Step 7a – gang member
residence
The spatial influence of the ―gang members‘ residences‖ risk factor
was operationalized as: ―Areas with greater concentrations of gang
members residing will increase the risk of those places having
shootings.‖ So, a density map was created from the points of gang
members‘ residences.
Step 7b - infrastructure
• The spatial influence of the ―infrastructure‖ risk
factor was operationalized as:
• ―High concentrations of bars, strip clubs, bus
stops, check cashing outlets, pawn shops, fast
food restaurants, and liquor stores will increase
the risk of those dense places having shootings.‖
Step 7C – the drug arrest
the ―drug arrest‖ risk factor was operationalized as:
• ―Areas with high concentrations of drug arrests
will be at a greater risk for shootings
because these arrests create new ‗open turf‘ that
other drug dealers fight over to control.―
Step 7 – map density method
details
• Kernel density values were calculated
for each of the risk map layers so that
points lying near the center of a cellʹs
search area would be weighted more
heavily than those lying near the edge,
in effect smoothing the distribution of
values.
• Cells within each density map layer were
classified into four groups according
to standard deviational breaks. The
dark blue colored cells had values in the
top five percent of the distribution and
were considered the ―highest risk‖
places.
Step 7d – distance from
infrastructure
• The spatial influence of the ―infrastructure‖ risk
factor was also operationalized as:
• ―The distance of one block, or about 350ft
(app. 100 m), from a facility poses the greatest
risk of shootings because victims are often
targeted when arriving at or leaving the
establishment.‖
7e – final operationalization
• We are only interested in knowing where places
are the most at risk for shootings, so we used a
binary‐valued schema to designate the
―highest risk‖ places across all four risk map
layers.
• The highest risk places of each risk map layer,
respectively, will be given a value of ―1‖; all other
places will be given a value of ―0‖.
• All risk factors are operationalized as
aggravating factors, so these values will
remain positive.
Step 7 - reclassification
Step 7 – final comparison
• We now have four (final)
risk map layers,
operationalized from three
risk factors.
• Binary reclassification – 0 – 1
• The cells of different map
layers are the same size and
were classified in a standad
way, the risk map layers can
be summed together to
form a composite risk
terrain map.
Step 8 + 9 - Inter Risk Map
Layer Weighting and CRTM
All risk map layers will carry equal weights to produce an
un‐weighted risk terrain model. It is assumed, for example,
that being in a place with a high concentration of drug arrests
poses the same risk of having a shooting as being in a place
with a high concentration of gang member residences. Unless we
know better  !!
STEP 10 - Finalize the Risk Terrain
Map to Communicate Meaningful
Information.
• Clip our risk terrain map
to the boundary of
Irvington.
• produce a final map with
shades of grey and layout.
Step 10 – make the risk count
• convert the risk terrain map from raster to vector
we can (still using the regular structure
converted to square polygons):
• count the number of shootings that actually
occur in the high‐risk areas during the
subsequent time period;
• calculate the square area of the highest risk
areas (i.e., places with a composite risk value of
3);
Step 10 – make the risk count
• Select all street segments within these areas to
inform police commanders about where patrols
might be increased.
• Operationalise the command and controll on the
day by day basis.
RTM validation
• Comparison with the
subsequent time
period (June 1 –
December 31) – high
risk RTM classes and
hot spot analysis of
actual shooting
accidents.
• About 50% (15 out of
31) of the shootings
during the subsequent
time period (July 1 to
December 31)
happened in these
high‐risk cluster areas.
Things to remeber
• Remember, risk terrain modeling is only a tool
for spatial risk assessment; it is not the solution
to crime problems.
• You (the analyst) give value and meaning to
RTM, so be innovative in your thinking about risk
factors and how risk terrain maps can be applied
to police operations.
Risk Terrain Modelling
Synthesis
Homework for the last lesson
• Work on your dissertation topic. Prepare a brief
Introduction section (max. 1,5 page).
• Expected structure:
1. Problem and broad context for your specific topic
2. What is already known – only important references!
3. What we need to know – the need for your topic
made clear.
4. Purpose/Aim or Research Question (s) for your topic.
• Write in a way that takes the reader from
general to specific, from the known to the
unknown.
• Prepare a presentation based on your
manuscript – 6-8 minutes max.