1 5. SPATIAL INTERPOLATION 5.1 Introduction Spatial interpolation is the procedure for estimating the value of properties at unsampled sites within an area covered by existing observations (Waters, 1989). A wide variety of spatial interpolation techniques exist. You may already be familiar with some of these techniques, for example, contouring. Others, such as Theissen polygons, Delaunay triangles, and spatial moving averages are likely to be new. Many of the more complex interpolation techniques such as kriging and Fourier analysis require a comprehensive understanding of spatial statistics, mathematics and, in some cases, involve the solving of linked simultaneous equations. Don't panic! The primary objective of this section is to explain the concepts which underpin the interpolation process. You will be introduced, in detail, only to those interpolation techniques which you are likely to find in commercially available GIS software. Where a more complex technique is introduced, additional references will be provided to guide you to a more detailed and practical discussion of the approach. A good overview to spatial interpolation can be found in Burrough and McDonnell (1998) and in Lam (1983). 5.2 What do we mean by spatial interpolation? The definition by Waters (1989), provided in the first sentence of the section above, is a good starting point from which to explore what is meant by spatial interpolation. Another definition states the geographical nature of the problem more explicitly: 2 "Given a set of spatial data either in the form of discrete points or for subareas, find the function that will best represent the whole surface and that will predict values at points or for other subareas." (Lam, 1983) From a GIS perspective (and for those of you who have never come across the term before) it is useful to dissect these definitions and translate them into the language of attributes and entities introduced in Spatial Data Modelling and DatabaseTheory. Let's assume that you have a GIS which contains the location of a set of weather stations represented as a series of point entities. For each weather station a range of attribute information is recorded on a regular basis. One attribute observed for each station is the mean daily wind speed. Now, let us assume that a new airport is planned for the region. You would already have used a variety of the spatial operations outlined in Sections 2 and 3 of this module to identify a series of potential sites for the airport. The question that still remains is - which of these sites has the most favourable wind conditions? The problem is that none of the potential airport sites you have selected contain, or are adjacent to, a weather station. Even if one of the weather stations was bang in the middle of one of your preferred sites, the wind values available to you are for a point source and may not be representative of the wind conditions across the entire site. With this airport example in mind, let us reconsider the definition of interpolation given by Waters (1989): "Spatial interpolation is the procedure of estimating values of properties...." Let us replace the word `properties' with `attributes'. In our example the attribute we are interested in is mean daily wind speed "at unsampled sites...." 3 This indicates that we have a spatial reference which allows us to locate the position of our airport and also infers that we have a conceptual spatial model, in this case an area entity which represents the potential airport site "within the area...." This infers that we have a geographical region of interest or study in which we must perform our interpolation. The term `area' implies that our area of study will have a set of boundaries. In certain cases these boundaries may be physical, for example, the coastline of Britain. On the other hand, our boundaries may be political, such as administrative boundaries. One of the problems with all interpolation techniques is what to do about boundaries because although we are rarely concerned about what happens outside our area of interest we should be concerned about how the properties of the `outside world' influence our interpolation process. Unfortunately, in many cases we simply turn a blind eye. We will revisit the boundary issue in relation to individual interpolation techniques a little later in this section. "covered by existing observations." This implies that we have another set of spatial features (in the case of our example weather stations) for which we have a known set of attribute values, here mean daily wind speed. Therefore, in the language of GIS, the problem is which spatial interpolation procedures are designed to address how to estimate the attribute values for one set of spatial entities from those for a second set of spatial entities (which may or may not be of the same type) for which a set of observed attribute values exist. Or, in the case of our example how to make the `best guess' at what the wind speed is likely to be at each of our potential airport sites. 4 If we return to the definition by Lam (1983) two further concepts are introduced. These are Function and Surface. Lam's statement of the interpolation problem stresses that before we can estimate the values for our airport sites we must first derive a spatial surface from our known data points (weather stations). This surface should be defined by the nature of the relationship (function) which exists between our observed data points. In a spatial context this function is usually based on the observation that: " points (or spatial entities) close together in space are more likely to have similar values than points (entities) far apart" (Tobler's Law of Geography; Tobler, 1976). Other spatial relationships can be envisaged, for example repeating sequences in which similar values are found for points only at regular intervals and not for points in between. However, the vast majority of phenomena are thought to conform to Tobler's `law'. This `law' leaves open a wealth of possibilities for the exact nature of the function. If near points have similar values, then we need to consider just how dissimilar they can be and how dissimilarity is a function of distance. Our functions can be simple or complex, sensitive or insensitive to distance and other factors. The mapping of these functions produces surfaces in two dimensions. It is these surfaces which have to be `fitted' to empirical observations for estimates to be made at sites where values are unknown. It is worth remembering that with such surfaces, as with map overlay, there are serious problems with error in the fitting and estimation processes. Therefore, to summarise, the spatial interpolation process involves:- 1. Establishing the spatial entity type for which observed attribute data are available. 5 2. Establishing the spatial entity type for which you wish to estimate attribute values. 3. Establishing the scale of measurement of the observed information. For example, is it ordinal, ratio etc.? (Refer to Section 3, Box 2 of this module for a reminder of what these mean.). 4. Determining how much is known about the spatial behaviour of the observed attribute you are trying to interpolate. 5. Establishing the relationship between the sites for which you have observed attribute values and selecting an appropriate interpolation technique. In virtually every case this will result in choosing a mathematical function which fits a surface to the spatial entities for which you have data. 6. Extracting from your interpolated surface, if required, the `best guess value' for an entity of the same type or different type at a new location where real world data are unavailable. The interpolation process, described above, has also been summarised in Figure 5.1 below. 6 Figure 5.1 : A conceptual overview of the spatial interpolation process 5.3. Why is spatial interpolation important in GIS? Spatial interpolation techniques have wide application in the design, construction and application of a GIS. This is because we are rarely in the fortunate position of having access to all the data required for every location in our study area. It is much more likely that we will have available partial, and very often patchy data. Figure 5.2 presents a conceptual view of the more common situations in which GIS designers will find themselves with respect to data availability (the figure makes reference to a point data set by way of example). 7 Figure 5.2 : Four different views of data availability in the real world The basic role for interpolation in GIS is to fill in the missing data for those areas where real world observations are not available. In the case of Figure 5.2 this is the blank spaces or regions between data points. The important issue about interpolation, from a design perspective, is that any map layer we create through this process will contain 8 information which is only a representation of what we believe to be true. It is all too easy to forget this as we integrate and overlay data sets in the GIS environment. Therefore, understanding the quality of a map, or any data for that matter, derived from a set of GIS operations, relies on your appreciation of the limitations of the techniques that you have selected to generate the output data. The field of spatial interpolation is an area of current research interest. Many of the techniques that are described below have been adopted from the one dimensional methods originally developed for the analysis of time series data. Research is being undertaken into the methods and problems of transferring these methods to two dimensional data with different scales, sampling distributions and temporal properties. At the present time few, if any, of the commercial GIS in the market place will have even 50% of the techniques described below as part of their GIS toolbox. Many of these missing techniques will gradually appear in the next generation of GIS, so keep your eyes and ears open! Continuous Self Assessed Exercise 12 (time: 20 minutes) Choose two examples of GIS applications with which you are familiar. These could be from your work, examples provided with the Diploma or any additional reading you may have done. Make a list of all the instances where you think it has been necessary to use an interpolation technique to generate a new data layer. Indicate what the original data layer consisted of (for example, vegetation sample points), what the new data layer produced was (for example, vegetation zones) and what the entity types involved were (for example, point to area). 9 The list of possible uses of interpolation techniques within GIS is vast. Waters (1989) provides the following list of potential uses: z to provide contours for displaying data graphically z to calculate some property of the surface at a given point z to change the unit of comparison when using different data models in different layers z to aid in the spatial decision making process both in physical and human geography and in related disciplines such as mineral prospecting and resource exploration One of the most common methods used to classify the many different methods of interpolation is to group them according to the spatial entities on which they can operate; points, lines or areas. This approach will be used to structure the discussion which follows. 5.4. Point interpolation methods Point interpolation methods may be categorised under a number of different headings: z local or global z exact or approximate z gradual or abrupt z deterministic or stochastic. An individual interpolation method will fit several of these categories, for instance it may be local, exact, abrupt and deterministic (such as Theissen polygons). Before we can expect you to be able to recognise what type of interpolation methods you are using, we had better look at the ideas behind the terminology. 10 5.4.1 Local or global The terms local and global describe how an interpolation technique uses the observed data collected for a set of geographically spread sampling points. Global interpolation techniques apply a single function to all the points in a study area. In such cases, as Waters (1989) indicates, a change in one observed value would affect the entire interpolation process. Global interpolators tend to produce a surface with few abrupt changes, because, in general, they employ the principle of averaging. This reduces the influence of extreme values. Such methods are most appropriate when the surface being modelled is known to have an overall trend (For example, a pollutant may be known to decrease in occurrence in proportion to distance from a source). Where there is little or no knowledge about the overall trend in the surface you are trying to model then local interpolators are more appropriate. Local interpolation techniques apply the same function repeatedly to a small portion of the total set of sample points for which data have been observed. Then a surface is constructed by linking these regional observations together. Figure 5.3 illustrates the distinction between local and global methods of interpolation. A useful analogy is to think of global interpolators as a telescope where everything you wish to observe is present in the field of view and local interpolators as a microscope where you build up a complete picture by observing each part in turn. One question debated in the literature is when does a local interpolation technique cease to be local and become global? This is an interesting question since, in theory, the neighbourhood of points used in a local interpolation procedure could be extended to cover the entire study area. For our purposes we will consider a local interpolator to be 11 any technique which makes more than a single pass over the data before constructing a surface. Examples of local interpolation techniques we will review in the following sections include Theissen polygons, Delaunay triangles, moving averages and kriging. Global interpolators to be considered include line threading, trend surface analysis and Fourier analysis. Figure 5.3 : Global and local interpolators 12 5.4.2 Exact or approximate Exact interpolation techniques must honour all the points for which observed data are available. They are most appropriate when there is a high degree of certainty attached to measurements. Altitude, for example, is usually known with a high degree of certainty. You would not want to use an interpolation technique which did not honour an observed altitude value but instead replaced it with a smoother approximation. Approximate interpolators, on the other hand, do not have to honour all the points for which observed data are available. These techniques are most appropriate when there is some uncertainty about the ability to reproduce the observed value at a given sample point. Figure 5.4 shows a surface which has been fitted to a set of points using both techniques. Note how the surface produced by the approximate interpolator does not honour all of the data points. Figure 5.4 : Two surfaces produced by exact and approximate interpolators 13 Examples of exact interpolation techniques discussed later include; line threading, Theissen polygons and kriging. Examples of approximate interpolators include spatial moving averages, trend surface analysis and a variety of the filtering operations used in remote sensing. 5.4.3. Gradual or abrupt Gradual or abrupt interpolators are distinguished by the continuity of the surface they produce. Gradual interpolation techniques will produce a smooth surface with gradual changes occurring between observed data points. Abrupt interpolators will produce a surface which is stepped in appearance. An example of an abrupt interpolation method is buffering which you looked at in an earlier section. Another technique to be introduced later is Theissen polygons. This defines discrete territories associated with specific points. Waters (1989) suggests that it may be necessary to include barriers into a gradual interpolation process in which case an abrupt interpolator is appropriate. In terrain modelling, for example, you may wish to represent the presence of a cliff or in climatic modelling the presence of a weather front. For such features there has to be a definable step function which can be used with a gradual surface. 5.4.4. Deterministic or stochastic The most desirable situation to be in when faced with interpolating new values from a known data set is where there is sufficient knowledge about the phenomena to allow its behaviour to be described by a mathematical function. This is known as deterministic modelling. Figure 5.5 shows how knowledge of the laws of physics and measurements of actual velocity would allow the calculation, from seven sample points, of the trajectory of a bouncing ball. 14 Figure 5.5 : A deterministic model of the trajectory of a bouncing ball In addition, deterministic modelling allows extrapolation beyond the range of our sample data set. However, this is only possible if the context of the data values is understood. Unfortunately, few geographical phenomena are understood in sufficient detail or obey such precise rules to permit a deterministic approach to interpolation or extrapolation. There is a great deal of uncertainty about what happens at unsampled locations. To handle this problem stochastic models are frequently used. Stochastic models incorporate the concept of randomness suggesting that the interpolated surface, point line or area values are only one of an infinite number that might have been produced from the known data points. Figure 5.6 shows a series of random trajectories for our bouncing ball, all of which could in theory occur if the behaviour of our ball was assumed to be random. Kriging, trend surface analysis and Fourier analysis are all techniques which assume some degree of random behaviour in the observed data. 15 Figure 5.6 : Two possible random trajectories for the bouncing ball The remainder of this discussion of interpolation techniques will focus on methods that you are most likely to find in GIS. The discussion will focus on the interpolation of attribute data associated with point, line and area features. For the discussion of point interpolation methods, a clear distinction has been made between exact and approximate methods. 16 Continuous Self Assessed Exercise 13 (time: 30 minutes) In the above discussion we have introduced a range of concepts, terms and techniques which we are certain will be new to many of you. So far we have:- local interpolator global interpolator exact interpolator approximate interpolator gradual interpolator abrupt interpolator deterministic interpolator stochastic interpolator Theissen polygons kriging Fourier analysis trend surface analysis moving average line threading Before you proceed we suggest that you cross off the list those that you understand. Keep the list close to hand as you proceed through the following sections and cross off terms and concepts as they are explained. 17 5.5 Exact point interpolation methods 5.5.1 Line threading or eye balling This is the original, and still considered by some to be the best, method of interpolating a surface from a set of observed data. The process involves threading a line, by eye, between data points with the objective of enclosing points of similar value. In most cases, the person performing the line threading will use their local knowledge about the geographical area and their scientific knowledge about the phenomena they are studying to enhance the interpolation process. For example, in geological mapping, the presence or absence of a fault line may be known to the geologist but not evident in the point sampled data set. The geologist may be able to build this expert knowledge into the interpolation process, where the non-specialist would be unable to do so. Three basic limitations of the manual line threading approach have led to the computerisation of the method. These are: z the problem of handling a large number of points, z the subjective nature of the procedure, and z the time consuming nature of the process. Many automated line threading methods now exist which perform the task with varying degrees of intelligence. Over the last few years knowledge engineering techniques have been used to capture the knowledge from `local experts' and build expert systems for this type of interpolation. 18 Continuous Self Assessed Exercise 14 (time: 45 minutes) Use the manual line threading or eye balling technique to interpolate a surface for the point data set in Figure 5.7. Use an interval of 10 units to produce a contour map. Don't throw this diagram away as you will be using it again later. To check how good your own eye balling is compare your map with the figure under the 'SAE Answers' button. Do not worry if your interpolated surface does not exactly match ours. The line threading approach is very subjective and though the main trends of the lines you have threaded should match with ours it is certain that the exact location of the lines will differ. Figure 5.7 : Observed values for a set of irregularly spaced points 19 5.5.2 Theissen polygons Theissen polygons are an exact method of interpolation that assumes the unknown values of the points on a surface to be equal to the value of the nearest known point. This can be considered as a local interpolator because the characteristics of the global data set have no influence on the interpolation process. The Theissen polygon technique was originally used for computing areal estimates from rainfall data. The approach is graphically represented in Figure 5.8 and involves spreading the territory associated with a point until the boundary associated with the territory of a neighbouring point is encountered. If you like you can imagine a bubble being blown outwards from each point simultaneously until they meet. Where they meet the boundaries of the interpolated areas are drawn. If the data points are irregularly spaced a surface made up of a mosaic of irregular polygons will result. The attribute value of each point is then adopted by its' associated polygon. Figure 5.8 : Construction of Theissen polygons for a network of five points 20 The method of Theissen polygons is a robust technique and will always produce the same surface from the same set of data points. This is a disadvantage in that the technique is unintelligent and unable to respond to external knowledge about factors which may influence the values recorded at the observed data points. One of the most common uses for this technique is for the generation of area territories from a set of points. For example, in the UK the smallest level of census administrative regions is the Enumeration District (ED). For each ED a geographical centroid (an (x,y) coordinate location) is recorded. The precise boundary of each ED could be captured by digitising the appropriate polygon boundary. However, an approximate boundary can be generated by using the Theissen polygon method to interpolate the area of influence associated with each ED centroid. The area polygons produced will not accurately represent the true boundary of the administrative area but may well be sufficient for the graphical presentation of data. 21 Continuous Self Assessed Exercise 15 (time: 2 hours) For the point data set in Figure 5.7 generate a surface using the Theissen polygon approach. This is not as easy as it may appear! Work with a pencil and rubber so that you can change your mind about where the boundary between pairs of points should be as you proceed. A hint - start with the points on the boundary and work towards the points in the centre of the data set. When you think you have solved the problem (or if you are having problems) you can check your answer against ours under the 'SAE Answers' button. When you have completed your mosaic of Theissen polygons assign the observed attribute associated with each data point to its surrounding territory. Now, using the same classification interval of 10 units as in the previous exercise produce a new map removing the boundary lines between the territories of points which fall within the same classification band. Then use either different colours or some form of graded shading to distinguish between the areas (to check you are on the right lines you should compare your map with our example under the 'SAE Answers' button.). Compare this map with the one you produced using the threading approach. What similarities and differences do you note between the two images? What your comparison of the two maps should reveal is that both maps have a rough similarity. The greatest discrepancy occurs at the boundaries between class intervals and at the edge of the study area. 5.5.3 The Triangulated Irregular Network (TIN) You have already been introduced to the Triangulated Irregular Network (TIN) Model in Spatial Data Modelling. Developed in the early 1970's, the TIN is a simple way to construct a surface from a set of known points. It is a particularly useful technique for irregularly 22 spaced points. The TIN approach is an exact interpolation method. In the TIN model, the known data points are connected by lines to form a series of triangles. Figure 5.9 shows a TIN model for a set of 10 points. Because the value at each node of the triangle is known and the distance between nodes can be calculated, a simple linear equation can be used to calculate an interpolated value for any position within the boundary of the TIN. Figure 5.9 : A simple TIN for a set of 10 points To construct contour isolines from a TIN model you must first choose a contour interval. Then, identify all the arcs which would be dissected by a contour of a given value. To do this the computer must compare the values of the points which make up the nodes of a particular triangle. For example, in Figure 5.9 above, 9 out of the 19 arcs could be dissected by a value of 100. The next stage would be for the 23 computer to calculate the (x, y) coordinates of the point of intersection. Finally, the intersections are joined (see Figure 5.10). As you can see from Figure 5.10 one of the problems with producing contours from the TIN model is the angular character of the isolines produced. To overcome this some TIN models represent the surface in each triangle using a non-linear mathematical function chosen to ensure that the isoline changes continuously, not abruptly, at the edge of the triangle. The TIN model is also used to generate digital terrain models, which we consider in the next section of this module. Figure 5.10 : Contouring a TIN What should be apparent is that the isolines produced from the TIN method are much more angular than those produced by threading a line through the data points by hand. In addition, in the TIN model it is usual to interpolate a surface only within what is know as the `hull' of 24 the data set. The hull is defined as the area enclosed by the arcs joining the sampling points which are on the periphery of the data set. 5.5.4 Kriging Kriging, also known as the "Theory of regionalized variables", was developed by G. Matherson and D. G. Krigge as an optimal method of interpolation for use in the mining industry. Burrough (1986) notes that the origins of the method lie in the recognition that the spatial variation of many geographical properties is too irregular to be modelled by a smooth mathematical function. The basis of kriging lies in estimating the average rate at which the difference between values at points change with distance between points. This is the most complex of all the exact interpolation methods and until recently was absent from virtually all commercially available GIS software. It has found wide application in the earth and environmental sciences, such as geology, where it is essential to both honour the known data points and take into consideration local variability. The complex nature of the technique and the prerequisites in terms of statistical knowledge which are necessary to understand its application are too detailed for us to pursue here. If you are interested in learning more about kriging a comprehensive and readable account is provided in Oliver and Webster (1990). 5.6 Approximate point interpolation methods Approximate interpolation techniques are applicable when there is uncertainty associated with reproducing the measured values for the observed data points but where it is known that there are global trends overlain by local fluctuations (Waters, 1989). Approximate interpolation techniques, therefore, employ smoothing techniques which reduce the effects of local variability on the interpolated surface. 25 An important characteristic of all approximate point interpolation techniques is that the surface being interpolated does not have to pass through the known data points. 5.6.1 Spatial moving average The most common method used in GIS is the Spatial Moving Average. This process involves calculating a new value for each known location based upon a range of values associated with neighbouring points. The new value will usually be either an average or weighted average of all the points within a predefined neighbourhood. In some textbooks you will find that they use the term `moving window'. Applying a moving average involves making a number of decisions about the shape, size and character of the neighbourhood about a point. The most common shape for a neighbourhood is a circle, since points in all directions have an equal chance of falling within the radius of a given point. However, in the raster world a square or rectangular window of cells is often used. The size of the neighbourhood is determined by the user and is usually based upon assumptions about the distance over which local variability in the data is important. One weighting function which may be used to enhance the technique is the distance of points away from the centre of the neighbourhood. For example, a point which is only 100m away may be considered to be twice as important as another point 200m away. This is described as a linear weighting. It is also possible to have a non linear weight where points closer to the centre of neighbourhood are considered to be one or two orders of magnitude more important than those towards the edge of the neighbourhood. The number of values used in the calculation is also important. Burrough (1986), for example, suggests a spatial moving average should use between 4 and 12 data points with the norm being between 6 and 8. Including too many points creates a 26 very smooth surface, ultimately resulting in a single mean value for the entire study area. Conversely, too few points can result in individual points with extreme values dominating the interpolated surface. Once a mathematical model has been constructed to describe the character of the neighbourhood the next stage is to calculate a new value for each of the known data points. Figure 5.11 summarises this process for a neighbourhood in which each point is equally weighted. Figure 5.11 : Calculating a spatial moving average in the raster vector world 27 Continuous Self Assessed Exercise 16 (time: 1 and a half hours) Use the spatial moving averaging technique described above to calculate new values for the sampling locations in Figure 5.7. For this exercise you will need to hunt out, borrow or buy a pair of compasses. Or, if you are clever, invent one from a piece of string and a drawing pin! Set your compasses to a radius of 3cm and centre them on each of the sample points in turn and use them to identify the neighbourhood of a given point. Before you begin, make sure you have a separate copy of Figure 5.7 to hand. For each point in turn calculate a new value from the average of all the points within the neighbourhood. Use this value to replace the original observed value in Figure 5.8. You can check how you are doing by comparing your new values for the sample data points with the answers under the 'SAE Answers' button. Now use the TIN method described in Section 5.5.3 to produce a contour map for the new surface. Compare this surface with the one produced using the TIN and line threading approaches. What differences do you note? The basic problem with spatial moving average techniques is the number of variations that exist. It is, therefore, highly probable that the spatial moving average functions available in one proprietary GIS will produce significantly different results from those in another GIS. This is because different systems will allow you to select different distance decay models, select the number of points used in the model, and apply different procedures for handling those points which fall on the edge of a neighbourhood. In addition, in the raster world, it is more common for the neighbourhood to be defined as a group of cells (usually 9) compared with the circular neighbourhood used in the vector world. These problems do not arise if you have a complete understanding of 28 the phenomena which you are trying to interpolate because you can design a `perfect' moving window. The spatial moving average approach is best suited to those situations where:- 1 - observations are not considered to be exact representations of a true surface, but rather singular observations within a statistical sample 2 - where the measurement techniques used to measure values at discrete locations on the true surface are known to be subject to error 3 - where it is known that the true surface exhibits a general pattern within which there is local variability Examples include demographic data, survey or questionnaire responses and pollution monitoring observations. A spatial moving average will produce a surface which suppresses individual values and reveals broad structures in data. The nature of those structures depends to some extent on the filter size and the weightings, which you determine based on known objectives of the analysis. 5.6.2 Trend surface analysis To introduce the concept of trend surface analysis let us take a look at what it says next to the appropriate command from the IDRISI help: `TREND calculates linear, quadratic and cubic trend surface polynominal equations for spatial data sets and interpolates surfaces based on these equations'. 29 If this sounds like a statistical nightmare do not worry, the ideas are simple. All we have time to do here is introduce the basic principles. In addition, because the statistical assumptions of the model are rarely met in practice the technique is not a very robust interpolation method. Therefore, it is much safer to use one of the simpler methods described elsewhere in this section. For those of you interested in getting to grips with trend surface analysis in detail Unwin (1981) provides an excellent introduction which includes a worked example based around a set of five points. For those of you who have done some basic statistics then the concepts which underpin trend surface analysis should be immediately apparent. If you have not done any statistics then the following discussion should help. The trend surface method fits a surface to the known observation points by a method known as least squares. The shape of the surface is determined by the mathematical equation, or polynomial, which is used to describe it. A polynomial surface has some very important characteristics, most importantly it changes smoothly and predictably from one map edge to the other. The polynomial equation may be linear, quadratic or even cubic. A linear equation would be used to describe a tilted plane. A quadratic equation would define a simple hill or hollow and a cubic equation would define a more complex surface (Figure 5.11). These concepts should be familiar to anyone who has calculated regression equations (`lines of best fit'). 30 Figure 5.11 31 To fit a simple tilted plane to a surface of observed points all you need to do is solve a simple linear regression equation in which the spatial coordinates (x, y) are the independent variables, and the z value, the attribute of interest, the dependent variable. For those of you who are not familiar with regression what this statistical procedure does is fit what is often termed a ` best fit line' through observed data values. It is easiest to understand by way of a two dimensional example. Graph A in Figure 5.12 is known as a scatter plot and shows how height varies with distance along a transect. If we were certain that these height values were exact then we might join the points together as in Graph B to interpolate the height value at unsampled locations. However, if we are uncertain of the quality of our data, or we are more interested in the general trend, then it might be safer to place a line of best fit through our points (see Graph C).Conceptually, all trend surface analysis does is fit a similar `best-fit surface' to the known data points based on a mathematical definition of fit, i.e. least squares. Trend surfaces are, therefore, smoothing functions, which do not necessarily pass through the original data points. The main limitation of trend surface analysis is that it makes adjustments for all points simultaneously. I would suggest that these techniques are more appropriate for exploratory data analysis than for interpolation. 5.6.3 Fourier analysis Fourier analysis is another global interpolation technique which uses applied geostatistical methods to approximate a surface by overlaying a series of sine and cosine waves. The technique was developed for signal analysis and has been widely applied in climatic change modelling. At present it has received little attention as a tool for interpolating geographical phenomena. Burrough (1986) suggests this 32 is probably because it is best suited to data sets which exhibit periodicity (such as ocean waves and sand dunes) which are rare in most geographical phenomenon. Therefore, because of the lack of practical applications using this technique and the complex mathematical nature of the approach we will only mention it here. If your inquisitive nature gets the better of you then both Burrough (1986) and Waters (1989) suggest Davis (1986) as a readable introduction. Figure 5.12 : The best fit concept 33 5.7 Line interpolation methods You have already examined the most common method used to interpolate observed data associated with a linear feature. The technique known as buffering or corridor analysis was described in Section 2.4.1 of this module. Buffering is an exact, abrupt and deterministic interpolation technique which produces a surface to which an attribute can be applied which is some derivative of that associated with the linear feature. Buffering is an exact interpolation technique because the procedure honours the observed values associated with the linear entity from which it has been derived. It is abrupt because there are exact boundaries between the surface territories associated with the line features used to produce the buffer (though these will be less marked if a graded buffering approach is used to generate a surface). The technique is deterministic in that it assumes a linear feature will have a known fixed influence on its neighbourhood. The interpolated surface and subsequent values produced using buffering hold true only when the character of the linear feature is known to be the dominant feature in the area of interest. Another common interpolation operation found in many GIS is estimating point values from isolines. The most common instance of this application is the conversion of height contour lines into an elevation grid for the purpose of constructing a digital terrain module. Since digital terrain models are now an integral part of most GIS we will leave our discussion on the interpolation of point data from linear features until later in this module. 34 5.8 Area based interpolation methods Most socio-economic and environmental data are reported by reference to an areal unit. Examples include: population density by country; the number of employed or unemployed by administrative district; crime statistics by police beat and water quality measures by water districts. Within the UK alone there are some 70 different areal units which are used for reporting socio-economic data. Area based interpolation techniques are necessary when we wish to allocate the attributes associated with one set of area features to another. For example, in the UK we might wish to report crime statistics for police beat at electoral ward level. As Laurini and Thompson (1992) indicate, assigning a value is easy if either:- 1. the area units are an identical match, or 2. the source areas are a nested subset of the target area. It is when the areas do not match that interpolation is necessary and problematic. Waters (1989) summarises the areal interpolation problem as follows: "Given a set of data mapped on one set of source zones determine the values of the data for a different set of target zones." Burrough (1986), Waters (1989) and Lam (1983) identify two different approaches to solving the problem:- z non-volume preserving z volume preserving which will be considered in turn. 35 5.8.1 Non-volume preserving The non-volume preserving method is best described by way of example. The following example is based loosely around that described by Waters (1989). Consider the two maps in Figure 5.13. Map A shows the total population for four administrative districts on the planet Mars. Map B shows the flood corridor which has been established for one of the Martian built canals on the planet. What is the population density within the flood zone? 36 Figure 5.13 : Non-volume areal interpolation method 37 The first step is to calculate the population density for each district by dividing population by area. Assuming we know the area of each district to be 100 m units. This gives us a population density of 2, 4, 6 and 8 Martians/m unit for zones A, B, C and D respectively. The next stage is to identify a centroid for each region (this is the point at the geometric centre of the area) and then assign to this location the population density value associated with its enclosing area. This would give us Map C in Figure 5.13. The next stage would be to interpolate a gridded population density surface using any of the methods described earlier. Map D provides one possible surface which may result from such a process. Each grid cell value can then be adjusted to a population by multiplying the estimated density by the cell area (see Map E). Now, overlay the target map. In this case our flood corridor goes on top of the interpolated grid and then each grid value can be assigned to the flood and non-flood zones. From this map an estimated value of the total number of Martians at risk in the flood zone can be calculated (see Map F). Continuous Self Assessed Exercise 17 (time: 20 minutes) Taking a careful look at the above example can you identify the weaknesses in this approach? Hint: how does the total population across all zones compare between Map A and Map F? 38 The most important point which we hope you will have identified from the exercise above is that the total population of the whole region is not preserved. In our example, the total population has dropped from 2000 in Figure 5.13A to 1950 in Figure 5.13F. Another criticism of this method is that the choice of the centroid as the point location on which to base an interpolation technique may be a poor assumption. In our example there may well in fact be nobody living at this central location and all the Martians might live at the edge of zone A. A further problem is that this approach suffers from all the inadequacies associated with point interpolation methods as described earlier. 5.8.2 Volume preserving Volume preserving procedures involve overlaying the target zones on top of the source zones and then determining the proportion of the target zone which falls within each source zone. The total value of the attribute for each source zone is then given to the target zone according to the areal proportions. Figure 5.14 summarises this approach for estimating the total Martian population in our flood zone. The main problem is that the method assumes uniform density of the attribute within each zone. An enhanced version of this approach is the Pycnophylactic method developed by Tobler in 1979. 39 Figure 5.14 :Volume preserving areal interpolation methods 40 Continuous Self Assessed Exercise 18 (time: 45 minutes) The procedure for the Pycnophylactic method is summarised below: 1. overlay a dense grid of cells on the source map. 2. divide each zone's total value equally among the raster cells that overlap the zone. 3. smooth the values by replacing the value of each with the average of its neighbours. 4. sum the values of the cells in each zone. 5. adjust the values of all the cells within each zone proportionally so that the zone's total is the same as the original. For example if the zone is 10% low increase the value of each cell by 10%. Follow this procedure to calculate the total population for the flood zone. Use a raster size of 16 cells. To check you are on the right track the answer and procedure are summarised under the 'SAE Answers' button. This particular interpolation procedure raises an interesting question. What do you do about the cells whose neighbours are external to the study area and therefore unknown. Do you:- z assume the population is equal to zero, or z assume the population density is equal to the last neighbour? z simply ignore them as we did above? There is no simple answer to this question. In this case it is necessary to rely on your own judgement or external information to help you assess the situation and determine the appropriate action. Here another interesting problem arises, since when we are performing this 41 operation by hand it is possible for us to assign boundary values as we see fit. For example, if we know there is a desert to the top of our study area we can build zero values into the cell averaging procedure. However, if this process was being performed by a computer algorithm this information would have to come from another source? What other source? Artificial intelligence or another map layer perhaps? At the present time most commercial GIS have no simple answer to this problem. 5.9 Discussion In this section we have provided an overview of the objectives, importance and applications of a range of point, line and area interpolation techniques. As with so many other topics in this diploma, we could quite easily have written a whole module on this topic. However, our purpose is to provide you with a flavour of the subject and a rational for why, where and when you might use some of the more common methods. You probably now feel that selecting and applying the appropriate interpolation technique to a given GIS problem is much more complex than you had first considered. Unfortunately, there are few hard and fast rules that can be applied to selecting the appropriate interpolation procedure. Where possible we have tried to identify and isolate these in the above discussion (these are also summarised in Table 4 which is reproduced from Burrough (1986)). Perhaps the best piece of advice we can give is that you should try to ensure that you understand the data you are using and that you are aware of its characteristics and limitations. Remember though that even the most rigorous procedure is still only providing you with a `best guess' - the only exact method is to go out there and sample each and every location! 42 Table 4 : Methods of interpolation (Burrough, 1986) 43 If you are intrigued and enthralled by the complexities surrounding spatial interpolation procedures and wish to delve further then the bibliography at the end of this module should provide you with an appropriate starting point. If you have access to a GIS other than the one you are using as part of this course an interesting exercise would be to undertake a review of the spatial interpolation procedures which it contains. At the present time it will probably make use of one or a few of the methods described in this section. The more analytically intensive techniques are often absent from proprietary GIS because there has been no demand for them. It may be that until somebody is able to develop a usable interface to these more sophisticated procedures it is likely they will remain in the domain of the spatial scientists. So, after all that, why have we spent a large part of this module discussing interpolation techniques ? The answer lies in the fact that even if you aren't required to interpolate data in your job then it is still highly likely that you will be using data that have been interpolated by someone else. If this section has achieved nothing else then it should have warned you of the need to understand the methods which have been applied to those data. Before completing the final exercise I would like to draw your attention to the difference between interpolation and extrapolation. All of the interpolation techniques that we have described have been developed to estimate values at locations within the area covered by the source data. Estimating values beyond those boundaries (extrapolating) is more difficult and prone to greater uncertainty because there is less `control' on the interpolation technique. You will also appreciate that not all techniques can be used for extrapolation. Strictly (statistically) speaking none of them should be used but we live in a world where sometimes we are asked to perform analysis against our better judgement. 44 We will return to some of these issues in the Data Quality module when we look at error propagation. Periodic Self Assessed Exercise 5 (time: 30 minutes) To help you summarise this section try to define and describe what is meant by the following terms in the context of interpolation:- 1. Local/Global 2. Deterministic 3. Theissen polygons 4. Volume preserving area interpolation 5. Exact/Approximate 7. TIN 8. Eye balling 9. Neighbourhood