# User manual | Figure 5-6 Alternative routes between two sites in London Distance, Path Measurement and Path Models

Figure 5-6 Alternative routes between two sites in London

A. Fastest

B. Shortest

The Bromley-to-Sunbury example uses two kinds of primary or A-road (two speed variants) and one Motorway (another speed variant) and the function to minimise is actually a sum based on network distances and route speeds. Each road type is assumed to have a fixed average road speed (predefined, but in many systems this is amendable).

If all road types vary in their traffic carrying characteristics (and thus traffic speed) in the same way by time of day then this amounts to a constant multiplier and has no effect on the route selection (although the journey duration will alter based on the factor).

However, if road speeds change in different ways across the day then time of day

151

Distance, Path Measurement and Path Models

becomes important – for example, city roads may improve by 30% during the daytime outside of rush hours but motorways might improve by 100%.

The shortest route in this case is seeking to minimise

z

=

## ∑

i d i

where the d

i

are the length of route segments. For the fastest journey this changes to

z

=

## ∑

i t i

where the t

i

= d

i

/v i

are the times taken to travel along each segment with speed, or velocity v

i

. The integral formulation of this expression, which we use at various points in this study, is: min

z

=

## ∫

1

v ds

Note that different routing software and datasets can yield radically different results. In the example above some online systems suggest that the fastest route is via the London inner ring road (the South Circular) during the rush hour, which will be incorrect in most cases. Furthermore, in the example above a large section of the motorway selected

(the M25) is operated with an official variable speed limit system, varied by time of day and traffic conditions.

Computerised routing systems should take account of such variations, incorporating knowledge of actual traffic conditions and, for example, ask users when their journey is to start or (more usefully) when they need to reach their destination. It can be shown that this ‘backward path’ view is equivalent to the conventional forward path model

22

, thus solution or failure to find a solution of either one implies the other for scalar fields.

In the example above journey times in rush hours may actually be longer by the ‘faster’ route. Furthermore, this highlights the issue of solvability – if the objective is to reach

Sunbury between 08:45 and 09:15 for a meeting there may be no starting time that can

152

Distance, Path Measurement and Path Models achieves this objective - arrival times of before 08:15 may be the only solution. This phenomenon is prevalent in modes of transport that run to timetables or schedules (or fail to!). These kinds of ‘time holes’ in solution sets are mirrored by spatial ‘holes’ or

## 5.2 Path measurement theory and practice

At first sight the question of measuring and working with path lengths would appear to benefit from the use of the tools of measurement theory

23

and measure theory

24

. Indeed, both fields have important contributions to make, but neither possesses the range of tools required. Among the concepts these theories do provide are the notions of invariance

25

and measure. Stevens

26

has classified various scales of measurement, such as ordinal, nominal and ratio scales, on the basis of the group of transformations that leave the scale form invariant. For example, in the ‘intervening opportunities’ model of trip behaviour the distance to a shopping centre from any consumer is essentially a simple ranking of proximity or accessibility, i.e. an ordinal scale. Such a model would be unaffected by a transformation of proximity using any positive monotonic function

(e.g. replacing a proximity measure x by x

n

, n>0). Similarly, the classical notion of length, as measured on a ratio scale, is unaffected by multiplication by a positive constant.

The notion of a measure of a set, A, contained in the real line, is a generalisation of the idea of length and is invariant under translations of the set. Hence if M(A) is the measure of set A then M(A) = M(A+x) where x

, A+x = [y+x: y

A].

The application of measure theory to the social sciences appears to be limited to

27

monumental (but rather abstract) work, in which a range of location and other problems are generalised using the power of measure-theoretic analysis. It is, however, central to Crofton’s development of geometric probability, which is discussed in greater detail in Sections 5.2.7 and 6.3 below.

5.2.1 Linear approximation

The conventional mathematical approach to measuring path length involves distinguishing two classes of curves: rectifiable and non-rectifiable (without length).

153

Distance, Path Measurement and Path Models

The method of determining whether or not a curve, C, is rectifiable, involves replacing it with all possible inscribed polygons (Figure 5-7). If the length of these inscribed polygons, P, P′ etc., is bounded then the curve is called rectifiable, with the upper bound of polygon lengths defining the curve length.

If the curve is not rectifiable it is regarded by classical mathematics as having ‘no length’ or indeterminable length. This approach, whilst corresponding closely to the familiar ‘dividers’ or poly-line method of measuring line-lengths on maps and in vector

GIS datasets, has the unfortunate property of assigning no length to non-rectifiable curves. Such curves are not merely confined to the so-called ‘pathological’ functions

(such as Dirichlet’s function: f(x) = 1 if x is rational and f(x) = 0 when x is irrational) but includes several apparently well-behaved continuous functions (e.g.

f(x)=xcos(

π/2x), x0, f(0)=0, x[0,1]), and arguably, all real-world curves.

Figure 5-7 Rectifying a curve, C, by polygonal approximation

A

C x

3 x

2

P' x

1

B

P

The practical problem of measuring the length of a curve or line, whether mathematical, physical (e.g. a lake shore, a molecule or gene under the microscope) or cartographical, has attracted much attention. In addition to the linear method (rectification) described above, which forms the basis of land surveying methods, several other methods have been developed, each of which involves use of a scale-critical component,

η.

If the rectification process above used dividers to ‘walk’ along the line to be measured, then the separation of the divider points,

η, would determine the curve length L

m

(

η),

154

Distance, Path Measurement and Path Models where m indicates the method used for measurement. Length is thus not an absolute, but a function of the way in which the underlying data is modelled or represented, together with the measurement method and the resolution used when measuring. Rectification can be regarded as a piecewise linear approximation technique. Invariably, this technique under-estimates line length at a given resolution. For example, within GIS datasets recorded at a scale of 1:100,000, comparison with accurate odometer measurements of road lengths shows errors of up to -15% on sinuous routes

28

. Fixed interval walks can also be computationally intensive and require separate handling of the inevitable remainder at one end of the line.

In order to reduce computer data storage and/or manage representation at varying scales, some authors

29

have developed algorithms which examine the deviation of an approximation such as P′ to C with a coarser approximation such as P. If this deviation is less than a pre-defined value then the coarser link is retained. For example, in Figure

5-7 point x

2

might be dropped (or weeded out) in favour of the direct link from x

1

to x

3

.

This approach will exaggerate the under-estimation of length that is inherent in the poly-line approach. Also note that the orientation of the line segment x

1 x

3

is substantially different from x

1 x

2

and x

2 x

3

and from the original curve. Algorithms (such as refraction) that rely on boundary orientation are significantly affected by such approximations.

Piecewise non-linear approximations could also be used, such as quadratics, cubics and circle arcs. With quadratic approximation (Figure 5-8) the triple of path coordinates:

[x

i

, y i

], [x

i+1

,y i+1

] , [x

i+2

, y i+2

] are used to determine the three coefficients of a quadratic of the form y = a +bx + cx

2

.

155

Distance, Path Measurement and Path Models

Figure 5-8 Quadratic approximations to a map outline

y

S

2

S

1

S

3 x x

0 x

1 x

2 x

3

S

4

The curve length over the interval [x

i

, x i+1

] can then be calculated and the estimation process repeated for the next triple (i+1, i+2, i+3) etc. This method can be applied to both 2- and 3-dimensional (space) curves, since any three points will always be coplanar. The separation interval on the x-axis becomes our

η and the estimated length being the integral along the piecewise construction.

5.2.3 Circular approximation

A similar method involves piecewise circular approximation (Figure 5-9). For this method a point is selected along the line to be measured and a tangent drawn at this point. The normal to the tangent is then drawn and a circular arc constructed to approximate the interval [x

i

, x i+1

] by selecting the circle radius, r

i

(i.e.

η

i

), such that it lies on the normal and passes through y(x

i

) and y(x

i+1

).

The angle

θ

i

is measured and the length of the line estimated as

L

=

## ∑ ∑

i

L i i r i

θ .

156

Distance, Path Measurement and Path Models

Figure 5-9 Circular approximations to a map outline

y

θθθθ

x x

0 x

1 x

2 x

3

This method is similar to the use of a rolling wheel to measure road lengths except that in this case the circle radii can be variable. There is also an interesting relationship between this method and design guidelines for road construction, which often quote design objectives in terms of the radius of curvature - in some instances such guidelines assume a horizontal road profile to be designed from linear and circular elements.

5.2.4 Tangent approximation

This method is similar to that used in calculus, but instead of assuming that the step interval,

η 0, it is retained as a finite measure (Figure 5-10). In calculus it is usual to specify this interval,

η, as ∆x with ∆y = |y(x) - y(x+x)| giving:

d

= ∆

x

2 + ∆

y

2 =

(

1

+ tan

2 θ

)

x

= sec

θ ∆

x

where

θ is the angle between a tangent to the curve at x and the x-axis.

157

Distance, Path Measurement and Path Models

Figure 5-10 Tangent approximations to a map outline

y+

y

y

θθθθ

x x x+

The total length of the line to be measured can then be calculated by summation. If

x is fixed, the method simply requires measurement of the angle

θ at each point. However, accurate measurement of

θ requires accurate construction of tangents, which in many cases may not be possible.

5.2.5 Grid approximation

If a square grid of cell size

η is placed over the curve of interest, then both the number of grid lines intersected and the number of cells encountered measure the length of the curve (Figure 5-11).

Figure 5-11 Grid approximations to a map outline

y x

158

Distance, Path Measurement and Path Models

In the example shown there are slightly more grid line intersections than cell intersections, so the two counting methods will differ. Goodchild

30

has suggested that for maps the cell counting approach “gives estimates which are more simply related to

scale and less to how the map was constructed.”

Both methods are well suited to automation, unlike some of the previous approaches, but will generally require smaller values of

η (a larger number of computations) for comparable levels of approximation. Figure 5-12 highlights the degree of approximation involved by replacing the intersected cells in Figure 5-11 with the filled cells:

Figure 5-12 Grid approximations to a map outline – filled cells

y x

5.2.6 Area approximation

In the second grid method described above, the number of cells intersected (the ‘box count’) is also a form of area approximation or ‘coverage’ of line length. A number of other methods have been devised using the area in the neighbourhood of a curve as a measure of curve length. Each of the three methods to be described uses circular regions to approximate the curve.

The first method is the so-called Minkowski or Cantor sausage (Figure 5-13, method

A). In this method every point along the curve of interest is replaced by a circle of radius

η. The area of the ‘tape’ formed by the trace of these circles is then A, where A =

2

ηL, hence L(η) = A/2η.

159

Distance, Path Measurement and Path Models

This method essentially transfers the measurement of the line length to that of calculating the area, A. The surface area equivalent, utilising a sphere, is called the

Minkowski comforter. Note that there is an important problem with this method – it assumes that the curve to be measured is everywhere defined and identifiable and drawing of a circle at each point to be possible, which may not be the case. It has recently been suggested that the measurement should be of the “

ε-neighbourhood” of the original line, where

ε is a small but finite number, using a line of length 2ε drawn perpendicular to the path. This revision still leaves the method of defining the neighbourhood open to similar criticisms.

Mandelbrot calls the second method the Pointillist technique

31

by reference to the

Pointillist school of art (Figure 5-13, method B). The curve to be approximated is completely replaced by the smallest possible number of circles (points) of fixed radius

η, i.e. the smallest possible covering of the line. The centres of the circles used will not necessarily lie exactly on the line to be measured. As before L(

η) = A/2η.

Figure 5-13 Area approximations to a map outline

y

Method A:

Minkowski

Method B:

Mandelbrot

Method C:

Perkal x

The final method illustrated in Figure 5-13, method C is due to Perkal

32

, although it was not initially introduced by him specifically for line measurement. The method uses circles constructed on either side of the line to be measured to define an outer envelope.

As with the first method, the area of the outer envelope divided by its width (in this case

2

η) gives the approximate line length (finite lines require an end of line adjustment factor). In a subsequent paper Perkal

33

examines method A alone and in conjunction

160

Distance, Path Measurement and Path Models with method C. He describes the use of square and triangular templates (similar to

Figure 5-12) to estimate line length by repeated random sampling. Perkal calls these measuring templates “

ε-longimeters”.

None of the above methods is particularly easy to carry out by hand or computer, although method A is probably the most amenable to automation. Perkal tested his techniques and those of Steinhaus (see Section 5.2.7 below) on a variety of lines with varying values of

ε with several test users. He found that his methods were faster than those of Steinhaus, but both methods were subject to considerable variation by user in the estimates found.

Line following (tracing) by humans or computers (e.g. optical devices) will inevitably involve selection of points along the line and thus linear approximation will result automatically. The main advantage of the area methods is where the line to be followed is poorly defined (at the scale selected) and/or area separation is required.

5.2.7 Statistical approximation

There are a number of statistical methods for estimating the lengths of a fixed path between two known points, A and B. Some discussion of statistical models of paths has already been given in Section 5.1.3 - a more extensive discussion of the role of statistics in distance studies is provided in the next Chapter, where issues such as trip distributions within defined regions are considered. This Section concentrates on the application of statistics to line length estimation.

In the 1885 edition of the Encyclopædia Brittanica (9 th

Ed.) Crofton’s article on probability

34

described several important results of interest to this, and subsequent sections. Crofton discusses ‘measures’ such as:

M

(

X

)

=

## ∫∫

X dpdw

where p and w are the polar coordinates of equi-probable random lines in the plane, and

X is a region in the plane (Figure 5-14). When the integral above is evaluated over the region X it measures the total density of lines in X. Santaló

35 points out that:

161

Distance, Path Measurement and Path Models

“.. up to a constant factor, this measure is the only one which is invariant

under the group of motions in the plane.” (i.e. translation, rotation,

reflection)

The measure M(X) can be used to determine the length of any curve in the plane which is composed of a finite number of arcs with a tangent at every point. Santaló shows that if a curve, C, has length, L, then:

M

(

X

)

=

## ∫∫

X ndpdw

=

2

L

where n is the number of intersections each random straight line has with C. This result also holds when C is a curve on a surface other than a plane, and straight lines are replaced with geodesics.

Figure 5-14 Polar coordinates of a random straight line in the plane

y

A p

B w x

If C

1

is a closed, convex curve of length, L

1

, then n=2, hence M(x) = L

1

. Now if C is contained within C

1

the mean value of the number of intersections is: hence

n

=

## ∫∫

ndpdw dpdw

=

2

L

L

1

162

Distance, Path Measurement and Path Models

L

n L

1

/ 2

Crofton produced a similar result to that above, as did Steinhaus. The latter noted that if a curve of length, L, is dropped at random K times on a set of parallel lines d units apart, and n is the number of intersections observed, then

L

nd

π

/ 2

K

a result similar to George de Buffon’s needle method of estimating

π, first proposed in

1777 (i.e. using a straight line/needle of known length an estimate of

π can be found by re-arranging the above expression). Abeyata and Franklin have used a variant of this approach

36

and others to estimate boundary lengths (e.g. of patches of desert scrub and forested tracts) using a series of linear transects. They then compare these estimates with reference and/or alternatively derived data.

Steinhaus also states that if the m-length, L

m

, of a line is defined using the method above, but limiting the number of intersections per parallel counted to m, then L

m

tends to a fixed limit with increasing map quality as K increases and d decreases. The method could be reversed, with a set of parallel lines being randomly dropped onto the curve to be measured, which may prove more amenable to automation.

A related concept, which has been shown to be of considerable practical use, involves the use of projections (Figure 5-15). This has similarities to the tangent method described earlier.

163

Distance, Path Measurement and Path Models

Figure 5-15 The projection method – projected segment into x-domain

y

A

L

B

θθθθ

d x

2 x x

1

In this Figure the projected arc length d(x

1

,x

2

) =L|cos

θ|, hence L=d/|cosθ| (= P(θ) say).

But this example covers the case of a single straight-line segment projected onto the xaxis. With a complex curve there will be many projected segments. The mean value of

P(

θ) taken over all θ is:

P

(

θ

)

=

1

2

π

## ∫

0

2

π

L

| cos

θ

|

d

θ =

2

L

π a formula originally due to Cauchy and used by Steinhaus

37

as a basis for measuring the length of a curve under a microscope. In Figure 5-16 the projection of the line AB onto the x-axis gives P(

θ

x

) = |x

2

– x

1

|. The projection onto the y-axis involves overlapping line segments, resulting in double or triple counting.

164

Figure 5-16 Projection method - detail

y

Distance, Path Measurement and Path Models

y

4 y

3 y

2

A

B y

1 x x

1 x

2

The y-projection over-estimates the line length:

P(

θ

y

) = 2|y

2

– y

1

| + 3|y

3

– y

2

|+…

but as we have seen earlier, re-arranging terms:

L

=

P

(

θ

)

π

2 thus a first estimate of L is to calculate the total length of the projections onto a series of lines at angle

θ to the x-axis and divide these by 2. Increasing the number of projections carried out to 2n gives the mean value for P(

θ), which yields acceptable results even for small n. Kendall and Moran

38 quote a general formula for the accuracy bounds with projection lines at intervals of

π/n:

π cos

π

2

n

2

n

sin

π

2

n

− 1

P

2

n

P

≤ π

2

n

sin

π

2

n

− 1

For n=6, L is estimated to an accuracy of 3.4%, and for n=10, L is estimated to an accuracy of 1.2%. Improvement in accuracy as n increases is relatively slow, and we would expect systematic or statistical area-based methods to be more effective in many

165

Distance, Path Measurement and Path Models

applications. Indeed, the latter have the added advantage of being suitable to complex networks of linear forms, such as fluvial tree structures and transport networks.

5.2.8 Lattice approximations and distance transforms

39

Many spatial datasets and spatial problems are described with reference to regular lattice frameworks rather than continuous space. Examples include raster scan and

Digital Elevation Model (DEM) data, pixelated screen images, cost surfaces, cellular automata models, swarm models and many others. This raises the question as to how distances should be measured in such cases and to what extent these relate to continuous space metrics such as L

1

and L

2

. The most common regular lattices used today are square and we limit the discussion below to such arrangements – similar analyses can be carried out on other regular lattices in the plane, such as those of triangular and hexagonal form. All measurements are constrained to lie on the lattice, generally taken as either cell centres or lattice intersections, according to the data type. The methods described can be applied to greater than two dimensions, and three dimensional problems have widespread application in visualisation and medical imaging (e.g. CAT scans, MR scans) and related fields.

For the discussion below we consider centre-to-centre distances, and initially consider the family of local or neighbourhood metrics defined by the 3x3 cell-adjacency matrix.

A sample of such metrics is shown in Table 5-2. These are known as chamfer metrics because the locus of the metric generates a figure similar to a piece of wood with chamfered or bevelled edges (e.g. see Figure 5-22D). Chamfer metrics and their associated distance transform (DT) algorithms (see further, below) provide a very simple and extremely fast method for the approximation of Euclidean distances, or a multiple of Euclidean distances, over a square lattice

40

. In this Section we discuss the basic attributes of these transforms, whilst in Sections 7.4.3 and 8.1.2 we extend the DT concept and show how DTs can be used in the fields of optimum location theory and decision support systems.

Distances in the 3x3 cases are calculated in an incremental manner based entirely on the distance to directly adjacent cells. The standard ‘distance transform’ algorithm involves a two-pass scan of a square lattice: a forward scan from top left to bottom right, and then a backwards scan from bottom right to top left (Figure 5-17). The algorithm thus

166

Distance, Path Measurement and Path Models involves in the order of Mn

2

computations where n is the maximum dimension of the lattice and M is the number of cells used in the neighbourhood computation, or more formally MN computations where N is the number of cells in the underlying lattice (i.e. linear in the number of cells).

Table 5-2 3x3 Chamfer metrics

Case

A

B

C

D

Description

Distances are determined by the L

1

or ‘city-block’ metric and paths correspond to the “rook’s move” in chess parlance

As per A but with diagonal distances determined by the Euclidean metric applied locally - sometimes referred to as the Local Euclidean metric

Integer Chamfer (3,4)/3 metric. These integer values provide an improved estimate of distance than Cases A or B; divide by 3 on completion

Fractional Chamfer (Borgefors, 1986) – optimal non-integer values for all directions (values shown are after Butt and Maragos, 1998)

2

1

2

1

0

1

1

2

1

2

4

3

4

√2

1

√2

0

1

3

0

3

1.36039 0.96194 1.36039

0.96194 0 0.96194

1.36039 0.96194 1.36039

4

3

4

√2

1

√2

Each pass involves adding the values in a mask to cell values in the underlying lattice - see Figure 5-17, where 5 values are used based on the (3,4) chamfer. The value in mask position 0 of the transformed lattice is then set to the minimum of the sums calculated.

The central function of the algorithm (see further, Annex 3 – Sample algorithms) is:

d0 = min{d+D(i),d0} where d0 is the current value at the central point (0) of the mask, D(i) is the local distance to the i th

element of the mask, and d is the current value at the selected row, column position (r,c) .

The underlying lattice is normally a binary image, but could be a single source or target point or set of points from which distances are automatically generated. In this case the source point(s) would be initialised to 0 and all other points as a large value, e.g. 9999.

On completion of the two-pass scan each cell in the resulting lattice will contain the distance to the nearest point in the set of source points. In the example above, division of the values by 3 can be made on completion of the scanning process, giving an approximation that will be within 6.1% of the true Euclidean distance.

167

Distance, Path Measurement and Path Models

Figure 5-17 3x3 Chamfer masks for distance transformation

4 3

3 0

4

0 3

4 3 4

3 0 3

4 3 4 4 3 4

There are a number of highly efficient (very fast) sequential and parallel algorithms for performing this process and a great deal of research into these and the quality of approximation has been conducted for both binary and grey-scale images. Distance

Transforms (DTs) are used in a wide variety of image recognition and processing applications (for which they were designed) – example applications include image matching, skeletonisation and 3-D rendering. As we show below, DTs may also be used for the fast computation of distances and multi-level buffer zones from single or multiple objects (points, lines, areas) rather than just single points (see examples in

Figure 5-18), for the computation of watersheds and slope lines, for the determination of optimal paths and for the computation of Dirichlet regions (or Voronoi polygons) from lattice/raster data.

Figure 5-18 Raster buffer zones from distance transform using (3,4) chamfer

The examples in Figure 5-18 show distance bands from object sets shown in white, comprised of (a) a single central point, and (b) a cross shape, over a 100x100 lattice, generated with a single forward and backwards pass of the masks. Colours indicate the distance from the nearest point of the object set. There are many applications for such computations, for example in the computation and mapping of noise and environmental pollution impacts associated with major roads and aircraft flight paths.

168

Distance, Path Measurement and Path Models

With multiple objective points the distance transform generates the equivalent of

Voronoi polygons, or planar digitised Voronoi diagrams

41

. These may be mapped in two dimensions as above, or in three dimensions, where distances are treated as elevations, shortest paths are lines orthogonal to the contours (Figure 5-19), and region boundaries are watersheds. An example application with multiple objective lines is the interpolation of contour data

42

, although results in this case are improved if additional key landscape points are included (notably peaks and pits).

Figure 5-19 3-point distance transform, 2-D and 3-D views

Distance transforms can also be applied with almost no alteration (see further, Annex 3, for details) to cases where obstacles are included. For example, if we introduced a rectangular region as an obstacle in the single point example above, the resulting paths in the region are distorted (Figure 5-20A) – in this example the transform has been generated using modified Borgefors fractional values on a 5x5 mask (see further, below), with iterated passes of each mask to ensure convergence (two or more iterations may be necessary in cases where obstacles or varying costs exist). As before, following lines that are orthogonal to the contours shown have been used to generate the shortest paths.

Shortest paths can also be determined by retaining a record of the closest cell in the mask neighbourhood as part of the DT operation. This record can be held in absolute or relative terms, i.e. of the form: “next path point is in position 15” (15), or “next point is

169

Distance, Path Measurement and Path Models

in row 23 column 46” (R23C46), or “next point is 2 rows down and 3 across”. Using two arrays (or a multi-dimensional array) for x-components and y-components, this information can be stored as a pairs of single relative values, e.g. x(i,j) = -2, y(i,j)=1.

This formulation is convenient for use with the algorithm provided in Annex 3 since the components are simply the values already held in the DX(k) and DY(k) arrays. A vector diagram generated from a pair of arrays corresponding to the previous diagram is shown in Figure 5-20B (note the path distortion generated from the lattice representation of the underlying dataset).

Figure 5-20 Shortest paths by distance transform with constraints

A. Paths constructed orthogonal to contour lines B. Vector map of path directions obtained from

tracking during the transform operation

It should be noted that the shortest path vectors shown in Figure 5-20B above differ from the steepest descent vectors computed directly from the DT surface shown in

Figure 5-20A. The latter do not, in general, indicate the correct shortest paths, as illustrated by comparing the vectors in Figure 5-21 with those in Figure 5-20B and the orthogonal path construction method.

170

Distance, Path Measurement and Path Models

Figure 5-21Steepest descent (slope vectors) of distance transformed surface

A simple iterative procedure can be devised which utilises the local vector arrays to produce a complete set of shortest paths from each cell to its closest object point. This process can be used to produce digital Voronoi diagrams in inhomogeneous space since it assigns each point to its closest target point. It may also be used in homogeneous space to compute an exact Euclidean distance transform by calculating the Cartesian distance between the solution path start and end points – thus a chamfer transform can be used to create an exact transform in such cases. This result, whilst self-evident, does not appear to have been reported previously. It can be seen to be true since assignment of the closest target point under an accurate chamfer metric will always or almost always be the same as the assignment under an exact Euclidean metric, and thus computations based on this assignment will exactly match an exact Euclidean transform. Sample tests by the author have found no difference between the results obtained in this manner to those generated using an ‘exact’ algorithm.

For inhomogeneous images, such as that above, distance transforms are frequently based on an algorithm known as the uniform-cost model

43

, itself a form of A* algorithm

(i.e. an heuristic search procedure rather than an image scanning procedure). For variable cost surfaces and combinations of surfaces, scanning distance transform techniques can still be used (with minor modifications), as we demonstrate in Section

8.1.3.

171

Distance, Path Measurement and Path Models

By plotting the locus diagrams for a range of 3x3 chamfer metrics against the optimal

Euclidean locus (a circle) the relative merits of different local values or weights can be seen (Figure 5-22). Examining the diagrams it is clear that the last of the four approximations achieves the closest possible match to the circle, with a mix of positive and negative errors at intervals of

π/4 (positive) and π/8 (negative). The first two approximations underestimate most distances. The octagonal shape of all but the first example is the result of the 8-cell local neighbourhoods utilised. If a 5x5 local neighbourhood is used (see Figure 5-20 and Figure 5-23) the approximation is 16-sided

(a “hexadecagon”). The symmetry displayed and closeness of the best approximations to the Euclidean metric means that optimal chamfer metrics are nearly, but not completely, rotationally invariant.

Figure 5-22 Chamfer metric locus diagrams

A. Chamfer (1,2) or Manhattan metric

B. Chamfer (1,

√2) or Local Euclidean metric

C. Chamfer (3,4)/3 metric D. Chamfer (0.96194,1.36069) optimal metric

172

Distance, Path Measurement and Path Models

Integer values are frequently used in distance transforms, but Borgefors recognised that the approximation to Euclidean distance could be improved upon. She used Cartesian coordinate pairs to produce her non-integer results (Table 5-3). The Cartesian model generates a result that is not fully optimal in the propagation of distances around a point in a lattice (although these do provide the basis for the approximation to Euclidean distance we presented in the previous Chapter). In a detailed analysis using polar coordinates, Butt and Maragos

44

have shown that the values derived by Borgefors can be marginally improved upon. Their results for the 3x3 case are shown in Table 5-3: the

(3,4)/3 metric is the best low valued integer solution and yields correct values for horizontal and vertical paths; other choices involving larger integer pairs, such as

(24,34)/25, can be used to approximate non-integer variants quite accurately.

Table 5-3 Maximum absolute error for 3x3 chamfer metrics

Local distances (a,b)

(1,1)

Maximum absolute error

41.41%

Chess board “rook’s/bishop’s move”

(1,2) 29.29% City-block, L

1

(1,

√2) 7.61%

(3,4)/3 6.07%

Euclidean local distance

(1,1.3507) 5.63% Borgefors, with a=1

(1,1.3420) 5.38%

(0.95509,1.36930) 4.69%

(0.96194,1.36039) 3.96%

Butt-Maragos with a=1

The lattice neighbourhood can be increased to a 5x5 matrix, as noted above, in which case there are three distance weights to be assigned to the various cells rather than two, and the optimum fractional values in this case are (0.9866,

√2, 2.2062). These values provide estimates that are within 1.36% of the direct Euclidean distance but at the cost of slightly increased computation. The integer value optimum values are (5, 7, 11)/5, and are remarkably accurate – within 2% of the Euclidean distance. The integer neighbourhood (mask) for the 5x5 model is shown in Figure 5-23 – values not entered are predetermined (e.g. as 5+5 = 10 or 7+7=14). The mask is divided into two for forward and backwards scans, as per the 3x3 mask described above. From Figure 5-23 the distance transform of a point using the 5x5 mask can be seen to be a very close approximation to a circle over a square lattice.

173

Distance, Path Measurement and Path Models

Figure 5-23 5x5 Distance transform and integer chamfer mask

11

11

11

7

5

7

11

5

0

5

11

7

5

7

11

11

11

With neighbourhoods of 7x7 or greater the maximum error falls below 1%. Results can be obtained for triangular and hexagonal lattices

45

, with the latter providing improved results, but again at the cost of increased complexity in both representation and processing.

There is an additional useful result obtained from this table. It relies on the fact that between any two points on a square lattice it is always possible to construct a path consisting of two components, one diagonal path and one horizontal or vertical path. If two points in a square lattice are selected at random, and these are M steps

(horizontal/vertical) and N steps (diagonal) apart then the best estimated distance between them is:

d = 0.96194M + 1.36069N

The maximum absolute error in this calculation is 3.96% of max{|x

1

-x

2

|, |y

1

-y

2

|}.

The lack of exact correspondence to Euclidean distances must be recognised and regarded as a form of systematic error or uncertainty, which may be exaggerated with scale changes and/or systematic growth/shrinkage of objects using distance transform methods (e.g. topological inconsistencies may result) - these considerations are in addition to the representational issues associated with the original lattice/raster dataset.

However, algorithms are now available that provide exact Euclidean distances in near linear time, which may prove more suitable for some problems

46

.

174

Distance, Path Measurement and Path Models

## 5.3 Elevation and path

Many of the methods described above ignore elevation and associated cost or effort factors. In practice the direct line distance on a map/within a GIS facility is frequently a good first estimate of the surface distance (i.e. allowing for elevation)

47

and will often closely approximate the shortest distance across the surface. For example, the map sections below (Figure 5-24) show a transect southwards from East Creech, near Creech

Barrow in Dorset, extending for 1km from the road at East Creech to Barneston Manor

(now known as Barnston Farm). This value (1km) provides the 2-dimensional

(projected plane/map) distance rather than the surface distance.

The surface route, as a walker, rises from 260

′ at East Creech, to 400′ as the transect crosses a complex ridge and narrow valley structure over the chalk, and falls away to

200

′ on the far side of the ridge. Despite the fact that the rises and falls are steep, and thus difficult to traverse, the total 3-D surface distance is only slightly further than the projected plane distance (still being under 1.2km). The surface distance is equal to the sum of the incremental map distance divided by the cosine of the slope angle, thus it may be computed directly from a map or database representation by measuring or computing slope values along the transect or path. Only in extremely rough or mountainous terrain are surface distances much greater than simple 2-D estimates. For example, if the top of a 500m hill were located in the middle of a 1km stretch of path, giving a 45

°

slope (1:1) to the top and down again, the surface distance would still only be 1.4kms. Current design standards for UK roads aim to keep all gradients to below 5-

6% (c.1:20) - above these levels the operating costs for larger vehicles (HGVs) increases quite rapidly.

It has been suggested that elevation is a significant factor in the observed differences in distance calculated by driving along roads (using accurate odometers) and calculated distances obtained from GIS packages, but the principal causes of these differences are accounted for by horizontal errors introduced by poly-line approximation and the scales at which GIS datasets have been collected (as noted above) – path length calculation using piecewise polynomial, circular arc, spline or trigonometric functions have been shown to remove much of this error

48

.

175

Distance, Path Measurement and Path Models

The effort of traversing a path of 1:2 or even 1:5 is quite different, however, and a least cost or least effort route (i.e. avoiding the steep slopes of the ridge and valley structure) might take the route in Figure 5-24B now provided for cars, which has road distance of c. 2.5kms. Least cost/effort paths are thus rather like shortest paths on extremely mountainous terrain, they strenuously seek to avoid the high cost/effort zones even if this means that the path length itself is much greater than one might expect or takes an unexpected route

49

. We return to this question and example in more detail in Chapters 7 and 8.

Comparing the modern map with that from a century earlier, we see that two earlier routes existed across the ridge – one was a path or track, probably suitable for horses and light carts, but not for coaches or heavy carts, and the other was a purpose-built tramway for transporting materials from a small quarry. The current longer road route sweeps away to the left of the historic map, but still provides the basic path of today’s route.

Figure 5-24 Transect south from East Creech in Dorset, 1896 and 2000

300

East Creech

400

400

400

300

Cocknowle

A. OS Map, 1896

200

Barnston Farm

B. OS Map, 2001 - redrawn

176

Distance, Path Measurement and Path Models

Figure 5-25 Old and new routes near East Creech, Dorset

Original route

The blue circle on the first map (Figure 5-24) marks the location of the original route, and the photograph taken looking South-West (Figure 5-25), shows the area today, with the historic cart route still clearly visible in the cattle-field.

## 5.4 Fractals

The practical issues of estimating line length are at the heart of many problems in geographic research (and other sciences) and these have highlighted weaknesses in the underlying mathematical foundations. In order to determine the length of a curve one must first have a curve to measure. In spite of the concept of a curve being apparently intuitively obvious, adequate definitions are lacking. For example, let a plane curve, C, be defined as the set of all points (x,y) represented by the continuous functions:

x = f(t), y = g(t)

for all t

[t

0

, t

1

]

such that there exists only one value of t for each pair (x,y). Then C is called a simple

Jordan curve or, when crossing points are permitted, a Jordan curve.

This apparently simple and very general definition of a curve was shown by Peano

50 to permit certain curves that completely fill the plane (hence are indistinguishable from figures of dimension 2). Netto

51

, however, showed that Peano’s result would not hold

177

Distance, Path Measurement and Path Models

for simple Jordan curves. Further work on this question was carried out by Sierpinski

52

, who produced a recursive construction demonstrating that a (simple) Jordan curve could fill a square (Figure 5-26):

Figure 5-26 Sierpinski curve construction

Mandelbrot

53

observed that the trajectory of a particle exhibiting Brownian motion will also fill the plane, and combined this observation with those of the previous paragraphs and his work on financial modelling into his theory of fractals. He (currently) defines fractal geometry as “the study of scale-invariant roughness”.

Mandlebrot initially defined fractals as sets whose Hausdorff-Besicovitch (H-B) dimension

54

, D

HB

, strictly exceeds the topological dimension, D

T

. Peano curves and

Brownian trajectories are both classified as fractals by this definition, as are Koch

55

curves (Figure 5-27), which, like Sierpinski curves, are defined by an iterative or recursive construction process. Conventional (rectifiable) curves have topological dimension of 1. Their H-B dimension will also be 1, since they are clearly not fractals.

More recently this definition has shown to exclude a number of special cases of

‘fractals’ which can be shown to have D

HB

= D

T

= 1 and improved definitions based on notions of self-similarity and self-affine sets have been proposed, without a final conclusion being reached

56

. There is even beginning to be a question mark as to the value of the H-B dimension in this context and the possibility of a more general definition of fractals being unachievable.

Determination of fractional values of D for certain well-defined curves can be achieved by analytical methods. For natural fractals, i.e. ‘curves’ such as coastlines, rivers, contours, an alternative method is required. This involves selecting a measurement

178

Distance, Path Measurement and Path Models method, such as any of those described above, and plotting the length L(

η) against the measurement factor

η on log-log paper. If the graph produces a straight line its equation will be:

log L(

η) = a + b logη where a and b are constants

For conventional curves b = (1-D

T

) = 0 hence curve length is independent of sampling interval. However, for fractal curves b > 0. For example, Richardson (op. cit.),

Håkanson

57

and many others since found values of D varying from >1.0 to c.1.3 for coastlines and lakeshores, whilst Batty and Longley

58

found similar values for city boundaries (e.g. Cardiff). In reality such measurements have almost always been made from models or representations of coastlines, lakeshores and borders (i.e. using maps or aerial photographs) and not from terrestrial surveys such as those of Mackenzie or

Mason and Dixon. What is clear is that real-world boundaries have complex structures and are generally unlike classical curves, being nowhere smooth or differentiable, nowhere sharply discernable, and frequently dynamic (e.g. with tides, traffic flows, plant growth, urban development) – indeed, closely examined they are not strictly linear at all, but more like a narrow band or area.

It is a reasonable question to ask whether fractal-like or ‘self-similar’ behaviour is retained as the scale reduces to a finer and finer degree. It is immediately apparent that measurement becomes impossible in the ‘real-world’ as the scale is reduced – coastlines and lakeshores are dynamic, precisely what is to be measured becomes increasingly unclear and ultimately known to be not measurable.

The mathematical model of fractal lines and surfaces, which assumes an arbitrary fineness of measurement (behaviour of systems as lim

ε → 0) is similar to the mathematics of real analysis in this sense, i.e. a model with a set of assumptions, one of which is that limits can always be taken. To this degree, fractal analysis provides models and tools rather than definitive explanations. In addition, it should be noted that for curves such as

Brownian motion in the plane and other forms of random walk, such as random lattice walks and the trails left by insect, animal or human walks, self-crossing of paths is normal. Measurement of the length of such paths by the area methods described earlier

179

Distance, Path Measurement and Path Models

is not appropriate and incremental measurement must be used (i.e. calculating cumulative path length from the sum of the steps taken).

A major value of fractal analysis is in enabling us to distinguish between classes of curves: D=1 are in almost all cases simple (conventional mathematical) curves; 1 < D <

2 are distinctly fractal curves; D = 2 are plane-filling curves, or in some senses could be regarded as area measures. A secondary benefit is its descriptive value over varying scales. If D is constant over a range of

η values it indicates self-similarity in the curve form; that is, a similarity in the form at various scales. For example, the latter is true for

Koch curves (Figure 5-27) where D = 1.2619 (log4/log3) for all

η. In a later, detailed study of boundaries, Longley and Batty

59

found that D appeared to vary over a range of sample scales and raise the question as to whether an alternative log-linear model might not be more appropriate and/or a multi-fractal interpretation of the data. Their study also noted that different measurement methods had substantially different processing overheads and at least one piecewise polygonal approximation method (which involved elimination of selected points) was unsatisfactory – grid-based approximation was found to be simple, computationally fast and not subject to such problems.

The dimension, D, does not describe or imply form or process – the same value for D may apply to completely different fractals and different generation models (e.g.

Brownian motion, Diffusion-Limited Aggregation (DLA), Recursive Generation) may result in the same fractional dimension. For example, it has recently been proven that the dimension of the ‘hull’ or boundary of a Brownian process in the plane has H-B dimension of 4/3, but clearly this does not imply that some coastlines or city boundaries are actually generated in this manner

60

.

180

Figure 5-27 Koch curve construction

Distance, Path Measurement and Path Models

S ds

The Koch curve or ‘snowflake’ is constructed from an equilateral triangle, where progressively

smaller (1/3

rd

size) triangles are added to each of its three initial sides in place of a segment of

the side. The length of the curve is unbounded, it is nowhere differentiable, but the area enclosed by the curve is finite, with the formula based on the initial side length, s, being: ds

lim

0

Area

=

2

s

2

5

3

## 5.5 Self-avoiding and self-attracting random walks

Measurement of path length depends upon an agreed set of rules for carrying out the measurement, selection of sampling interval(s) and determination of the embedding space (e.g. 2-D versus 3-D). It also assumes that we have a clear understanding of where the path is and in general terms, what it is like. In the latter case, classical or fractal models are the main alternatives.

A set of models of particular interest to geographic research is known as self-avoiding

(random) walks (SAWs). These are random or pseudo-random walks, paths or trees that do not cross themselves, unlike some of those discussed above. There has been extensive analysis of SAWs, especially in the physical and biological sciences such as physics, polymer chemistry and biochemistry, but also in areas such as statistics, economics, financial modelling, engineering and robotics. SAWs have received limited attention in geographic studies. Most analyses focus on lattice-based SAWs using square or triangular lattices, but some studies of unconstrained (free space) SAWs have been carried out.

181

Distance, Path Measurement and Path Models

Random walks typically consist of a given start point and a set of rules which determine their behaviour (e.g. steps, directions, self-avoidance, self-attraction, bifurcation/spawning of children). In the plane it has been demonstrated that both unconstrained random walks (e.g. Brownian motion, which is not self-avoiding) and self-avoiding lattice walks will pass through every point (in the plane, on an arbitrarily fine lattice) as the number of steps, n, tends to infinity. A number of other interesting results have been produced, including analysis of the number of possible self-avoiding paths on a lattice. With a 6 x 6 square lattice the number of possible self-avoiding paths from (0,0) to (6,6) is over 1 million. At first sight this suggests that search methods for optimal paths on lattices may be very difficult unless additional constraints and rules are applied - however, efficient algorithms exist for such problems with solution times related to the square of the lattice size or better (single point to all other nodes). Note also that the specification of a random walk does not include a goal or destination as an explicit input. In general it is not possible to determine how many steps will be required to reach a target point or whether the route taken is optimal. It is often possible, however, to provide statistical estimates (for example of the expected path length or direction after a number of steps) and simulations can provide estimates of means and variances where analytical results are not forthcoming.

For geographic analysis several types of SAW are worthy of closer attention. These include directed (or correlated) random walks, directed dislocation walks and random trees, each of which is described briefly below and some are explored further later in this study. For convenience we shall use Cartesian coordinates (x,y) in the plane, but polar (r,

θ) or spherical coordinates could equally well be used, depending upon the requirements:

Directed random walks

– this process involves incrementing x by a fixed or positive random value (optionally constrained to lie in a pre-defined range, e.g. x

∈ [0,1]); y is incremented by a random amount in the same manner but positive and negative values are accepted, e.g. y

∈ [-1,1]. The average position of y after a large number of steps will tend to 0. If both x and y steps are fixed the result is essentially a square lattice walk. Note that this process, and the walk described

182

Distance, Path Measurement and Path Models below, generate single y-values for each x-value, and as such are a very restricted subset of possible random walks.

Directed dislocation walks

this process is the same as the above but constrained to a single direction (quadrant) for y, e.g. y

∈ [0,1]. The dislocation is then similar to a random walk down a tilted surface, a sideways impact on a geological structure or a distortion of a populated landscape by a pull towards a dominant conurbation. A more general random walk model of both these types is:

Correlated random walks (

CRWs). This set of models assumes that steps may be of fixed or variable length, but that the direction of travel is determined by the previous direction of travel plus or minus a random angular variable drawn from a range (e.g.

±60°) and distribution specified in advance (typically a Uniform or

Normal distribution). CRW models, which have been used widely in the sudy of insect and animal dispersion, and on a limited basis to serial crime behaviour, are not necessarily self-avoiding (but constraints can be applied to ensure they satisfy this additional requirement).

Random tree walks

- the directed walk model can be modified such that both x and y increments are repeated until either an obstacle, a solution space boundary or another random line or branch is reached or approached to within a specified distance. At this point the walk must stop, go back one or more steps and commence again with new x and y values. This kind of walk generates selfavoiding random trees (SARTs). A variety of additional rules and constraints may apply in such cases, such as defining what ‘reached’ means (setting proximity measures) and how new branches or children are to be generated. Use of SARTs is discussed in greater detail in Sections 7.5 and 8.2 where variants of the RRT algorithm (Rapidly-exploring Random Trees) are discussed

61

.

Examples of simple random walks are shown in Figure 5-28. The first has been generated in the manner described by Venn (of Venn diagram fame) in the 19 th century, selecting directions from an 8-sided die and random distances in the selected direction. This clearly is not self-avoiding. The second example shows random positive increments in x and random +/- values for y, yielding a curve of market price-like movements (strictly speaking, a one dimensional random walk).

183

Figure 5-28 Simple random walks

A. Simple random walk – after Venn

Distance, Path Measurement and Path Models

B. Directed simple random walk

As has been noted earlier, in 2-space (or n-space) random walks have a start-point but no definitive end-point. However, it is possible to create random trees simultaneously from more than one point (in sequence) and apply proximity rules to avoid obstacles and obtain end-to-end connectivity.

Random trees can be constructed in many ways, but typically will consist of a branching process and self-avoiding walks. Such trees are the subject of analysis later in this study, and provide the basis for quite general and fast solutions to shortest path/least cost problems. A section of a random tree is illustrated in Figure 5-29. In this example a random tree is being used to explore the space of feasible solutions for a path. The brighter red line indicates a boundary constraint for the tree.

184

Figure 5-29 Random tree walk

Distance, Path Measurement and Path Models

There has been a great deal of research into self-avoiding random walks (SAWs) in 2- and 3-space, much of it in chemistry, physics and mathematics rather than geographical studies. 2-D SAWs are often generated by a so-called ‘pivot’ algorithm

62

, which has similarities to some fractal generation models

63

, but unlike the processes described earlier the pivot algorithm generates a true self-avoiding random walk in the plane or higher dimensions.

Pivot algorithms commence with a straight line of length N steps defined by N nodes

(optionally being points on a square lattice). A series of n transformations are made to the line (rotations and reflections) by choosing a node at random and a transformation at random. The line is checked after each transformation to ensure it is not self-crossing.

With large n the ‘memory’ of the original configuration is lost and a truly random line is generated. Since it is possible to generate 2 or more such lines and then join their end points, checking that the result does not cross, a random polygon (region, island) of 2N steps can be generated with familiar geographic look. To avoid the need for re-scaling and sharp corners, 3 or more SAWs or SAW segments would normally be required to create a suitable closed figure. An example is shown in (Figure 5-30), which we have created from segments of 3 SAWs, each of 1 million steps, generated by Kennedy

64

:

185

Figure 5-30 SAW generated island.

Distance, Path Measurement and Path Models

Self-attracting random walks also have interesting applications in a range of practical geographic applications. For example, because random walks will eventually pass through all possible points, a subset of paths commencing at any given point (an origin) will reach a second point (destination) in less steps/a shorter time than other paths. If the shortest such path is marked or recorded each time walks are simulated, subsequent random walks can be programmatically biased to use all or parts of this path with a selfreinforcing result. Such concepts have been used in a variety of traffic behaviour modelling and, more recently, in modelling crowd behaviour in built-up environments during special events (galleries, London streets)

65

. Similar methods could be used to simulate past (unknown, historic) flows as well as current or future (predicted) flows and congestion. As with all such methods, model assumptions and calibration become the key issues.

## 5.6 Networks and path length

The majority of the analyses in the present study are concerned with distance measurement and path location in inhomogeneous (and often bounded) free-space. Such problems may be static or dynamic (e.g. involving moving objects such as vehicles or robots, or involving varying flows over time) and may involve finding a valid solution,

186

Distance, Path Measurement and Path Models a good solution or an optimum solution (if one exists). In seeking a good or optimal solution the process may involve distance minimisation or generalised cost minimisation with or without additional constraints. In many cases such problems can be re-stated using a range of techniques having first partitioned the sample space into some form of grid or lattice. Solutions may then be sought which are restricted to traversing the edges and vertices of this grid or lattice, or which use this framework as a simplification of the space and allow paths to cross the zones created by partitioning.

Where the sample space can be represented in terms of a (directed) graph the techniques available for identifying specific paths through pre-defined networks are available.

There is a substantial (almost limitless) literature dealing with this subject

66

. The classic problem and algorithm, which originates with Dijkstra

67

, is that of finding the shortest path from a source node to one or more destination nodes. Algorithms that seek solutions to network problems of this type are often compared in terms of their performance by reference to the number of vertices or nodes (n) in the graph and the number of links or edges (e). The simple shortest path problem can now be solved in linear-time, i.e. as a linear multiple of the number of vertices and edges, or O(n + e)

68

.

In a wide-ranging review of the general least cost path problems (LCPP), Smith and

Gahinet

69

highlight the difficulties involved:

• possible complexity of the surfaces – smooth/analytical to complex natural or man-made, or hybrids of these

• possible complexity of the mobile object – size, nature of motion

• conditions that must be satisfied – e.g. minimum ‘cost’, curvature constraints, continuity, passing through/via a specified location

• computational cost (in memory, processor time) of alternative solution methods

They conclude that it is most unlikely there will ever be a single, unified theory for solving the general LCPP.

An additional body of research addresses distance and path problems in so-called

‘geometric domains’. Mitchell

70

provides an excellent recent review and summary of

187

Distance, Path Measurement and Path Models

many of these problems - in the text below references to solution algorithms and their efficiency derive from Mitchell’s paper, unless otherwise stated.

Mitchell defines a geometric domain as follows:

“In contrast to graphs, where the encoding of edges is explicit, a geometric instance of a shortest path problem is usually specified by giving geometric objects that implicitly encode the graph and its edge weights..… the most basic problem is: given a collection of obstacles, find a Euclidean shortest obstacle-avoiding path between two given points”

Most geometric domain problems deal with path finding within closed polygonal areas, including one or more obstacles (high cost or prohibited zones). The majority of such problems relate to path finding in 2-space, with the objective function being simple distance minimisation. Variants include: alternative objective functions (e.g. different metrics); constrained paths; dynamic environments; and known versus unknown terrains. It is known that in simple polygons there is always a unique shortest path

(series of rectilinear segments) between any source and destination point, but that in a general polygonal domain (i.e. one including obstacles or holes) there can be any number of optimal paths. The latter class of problems do permit path finding using current algorithms in solution times O(n

2

) or better.

Geometric domain algorithms typically involve decomposition of the solution space into geometric components, notably triangular regions, and then they search the graph comprised of the edges and vertices so generated in combination with those of the original polygon and obstacle set. A variant, known as the continuous Dijkstra method simulates a wavefront, using this to construct a form of geodesic map, rather as described by Huygens and adopted by Wartnz. The method has been applied successfully in the so-called weighted region metric problem (WRM) in which different zones within the solution space have different weights or velocities applied to them.

Smith and Gahinet (op. cit.) show that exact solution times for this class of problem are greater than O(n

8

).

Mitchell points out that some constrained problems, notably those involving constraints on the average curvature (important for dynamic problems, such as transport

188

Distance, Path Measurement and Path Models engineering and robotics) may result in problems that are either not solvable or are very difficult to solve. Likewise, solutions that seek two optimise two or more objective function criteria are in general not solvable in a provably optimal manner. A simple example for which no exact solution is known is the problem of finding a path that minimises both path length and the number of steps (links, or edges) in the path within a simple polygonal solution space.

Another interesting class of problems relates to path finding without a map, but assuming knowledge (by observation/sensing) of the solution space as it is searched, and optionally knowledge of the location of a target (e.g. its coordinates). This is rather like journeying to the South Pole from the Weddell Sea, or finding one’s way in London or New York without a street map. Curiously enough, it has been shown that in a rectilinear street pattern when the location of the target is not known in advance but must be sought, prior knowledge of its location does not assist the solution time.

## 5.7 Summary

Measurement of distance from maps and in the field has highlighted both practical and theoretical problems. Central to the measurement process is a clear definition and understanding of the path along which measurement is to be made. A range of models is available, from the classical to statistical, and from the continuous and differentiable to discrete lattice and fractal formulations. Each measurement technique involves sampling and approximation and each is therefore scale-dependent as well as model or pathdependent. Where sampling takes place at multiple scales it might highlight selfsimilarity suggesting that the length is indeterminable and fractal like. But selfsimilarity at a range of scales does not imply self-similarity at all scales, nor does it exclude curves models other than fractals from consideration: given a sufficient number of fine steps SAWs exhibit very similar behaviour to fractals – sub-samples of the walks are very similar to broader samples.

From these observations and those of the previous Chapter, we must conclude that regarding distance measures as certain and absolute is frequently unsafe - geographic distance should be viewed in terms of context, measurement method, scale, path model and dynamics as well as metric formulation and the derived numerical results.

189

Distance, Path Measurement and Path Models

Notes and References:

1

Gatrell A C

(1983) Distance and space: A geographic perspective, Clarendon Press, Oxford

2

Cliff A D, Haggett P

(1998) On complex geographic space: computing frameworks for spatial diffusion

processes, p. 254 of Ch.11 in Longley P A, Brooks S M, McDonnell R and Macmillan B

(1998)

Geocomputation: A Primer, J Wiley, New York

3

Beals R, Krantz D H

(1967) Metrics and geodesics induced by order relations, Mathematische

Zeitschrift, 101, 285-298

4

Mandlebrot B B

(1977) Fractals: Form, chance and dimension, Freeman, San Francisco;

Goodchild M F

(1980) Approaches to the estimation of geographical measures: A fractal framework,

Math. Geol., 12, 85-98

5

Richardson L F

(1881-1953) Richardson’s statistical work on patterns of war led him to examine the

borders of many countries which drew attention to problems of measuring their lengths – this work was highlighted after his death by B Mandelbrot in his development of fractals. Although Richardson published some 14 main works during his lifetime

(and a total of 137 books, articles and lesser

publications

) on a variety of subjects, Richardson’s collected works (Vols. 1 and 2) were not published

until 1993. Richardson was a mathematical meteorologist by profession and a poem extract for which he is now famous relates to this discipline: " Big whorls have little whorls that feed on their velocity, and little whorls have smaller whorls and so on to viscosity." Richardson also worked extensively in the field of numerical methods, as in his book: Richardson L F

(1922) Weather prediction by numerical

process. London, Cambridge Univ. Press

(also reprinted by Dover Publications, NY, 1965

and Bunge W

(1962) Theoretical geography, Lund Studies in Geography, C, 1, Lund, Sweden

6

Poincaré H

(1913) Mathematics and Science: Last Essays (Derniers Pensées). trans. J W Bolduc,

Dover Edition

(1963), New York, pp27-28; Poincaré (1854-1912) is regarded as one of the co-

discovers

(with Einstein and Lorentz) of the special theory of relativity.

7

see further, Annex 4 - Traffic, teletraffic and statistical self-similarity

8

for an extensive, up-to-date, analysis of these issues see:

Zhang J, Goodchild M F

(2002) Uncertainty in geographic information, Taylor and Francis, London

9

Veregin H

(1999) Data quality parameters, Ch.12, p.180, in Longley et al (1999) Geographic

Information Systems, Vol.1, 2 nd

ed. J Wiley, New York, and Zhang J, Goodchild M F

(2002) op. cit.,

Section 7.3

10

Bouwkamp C J

(1977) On the average distance between points in two coplanar non-overlapping

circular disks, J. Applied Sci. and Engin., A, 2, 183-186. Bouwkamp originally published this result in

1947 in connection with his work on Bessel functions

11

see for example, Vaughan R J

(1987) Urban spatial traffic patterns, Pion, London, pp.223-4

12

Caspary W, Scheuring S

(1993) Positional accuracy in spatial databases, Comput. Environ. and

Urban Systems, 17, 103-110

190

Distance, Path Measurement and Path Models

13

Roberts F S, Suppes P

(1967) Some problems in the geometry of visual perception, Synthese, 17, 173-

201

14

Blank A A

(1958) Axiomatics of binocular vision: the foundations of metric geometry in relation to

space perception, J. Optical Soc. of America, 48, 328-333 and Blank A A

(1958) Analysis of

experiments in binocular space perception, J. Optical Soc. of America, 48, 911-925

15

Platt J R

(1960) How we see straight lines, Scientific American, 202, 6, 121-129

16

Battro A M, Netto S P, Rozestraten R J

(1976) Riemannian geometries of variable curvature in visual

space: visual alleys, horopters and triangles in big open fields, Perception, 5, 9-23

Todd J T, Oomes A H J, Koenderink J J, Kappers A M L

(2002) On the affine structure of perceptual

space, Psychological Science

(submitted)

17

Hägerstrand T (1957) Migration and area, in “Migration in Sweden”, Lund Studies in Geog., B, 13,

27-158, Lund, Sweden

18

Reichenbach H

(1925) Philosophy of space and time, Dover ed. (1958), New York

19

Defossez L

(1946) Les savants de XVIIe siècle et la mesure du temps, Editions du J. Suisse

d’Horlogorie et de Bijouterie, 258-262, Lausanne

20

Good R

(ed.) (1982) Britten’s watch and clock maker’s handbook, dictionary and guide, 16

th

edition,

Bloomsbury Books, London

21

by straight, we mean a line of zero curvature within the space under consideration

(i.e. zero instrinsic

curvature

)

22

Daganzo C F

(2002) Reversibility of the time-dependent shortest path problem, Transportation

Research, 36, 7, 665-668

23

Churchman C W, Ratoosh P

(eds) (1959) Measurement: Definitions and theories, J Wiley, New York

24

De Barra G

(1974) Introduction to measure theory, Van Nostrand Rheinhold, London

25

Blumental L M

(1970) Distance geometry, Chelsea, New York. Blumental defined distance geometry

as the study of “that subgroup of homeomorphisms for which the distance between two points is an invariant”. Such a definition is too restrictive modern spatial analysis

26

Stevens S S

(1959) Measurement, psychophysics and utility, in Churchman and Ratoosh, op. cit., 18-

63

27

(1977) Economics of space and time: the measure-theoretic foundations of social sciences,

Iowa State Univ. Press, Iowa

28

Noronha V, Church R L

(2002) Line referencing and other forms of location expression for

transportation, Final Report, Task Order 3021, California Department of Transportation, p 15.

Available from: www.ncgia.ucsb.edu/vital

29

Douglas D H, Peucker T K

(1973) Algorithms for the reduction of the number of points required to

represent a digitised line or its caricature, Can. Cartographer, 10, 2, 112-122

30

Goodchild M F

(1980) Approaches to the estimation of geographical measures: A fractal framework,

Math. Geol.,12, 85-98

191

Distance, Path Measurement and Path Models

31

Mandlebrot B B

(1970) Fractals: Form, chance and dimension. San Francisco, Freeman, p.29

32

Perkal J

(1956) On the

- length. Bull. Polish Acad. Sci. Cl., III, 4, 399-403

33

Perkal J

(1958) On the length of empirical curves, Zastosowania Matematyki, III,3-4, 257-286 (in

Polish, Trans. by R Jackowski with W Tobler

)

34

Crofton M

(1885) Probability, Encyclopædia Brittanica, 9

th

ed., Vol XIX, 771-798

35

Santaló L A

(1953) An introduction to integral geometry, Hermann, Paris , p.11

36

Abeyata A M, Franklin J

(1998) The accuracy of vegetation stand boundaries derived from image

segmentation in a desert environment, Photogrammetric Engineering and Remote Sensing, 64, 59-66.

This paper draws on the earlier work of Skidmore A K, Turner B J

(1992) Map accuracy assessment

using line intersect sampling, Photogrammetric Engineering and Remote Sensing, 58, 1453-57 which itself is based upon de Vries P G

(1986) Sampling theory for forest inventory, Springer-Verlag, Berlin

37

Steinhaus H

(1930) Akad. d. Wiss. Leipzig, Ber., 82, 120-130 (cited in Kendall and Moran, op. cit.)

38

Kendall M G, Moran P A P

(1963) Geometrical Probability, Griffin, London, p.60

39

material in this Section and subsequent Sections that deal with Distance Transforms has recently been published as: de Smith (2004) Distance transforms as a new tool in spatial analysis, urban planning and GIS, Environment & Planning B, 31(1), 85-104

40

for a clear summary and comparison of such methods see:

Leymarie F, Levine M D

(1992) A note on fast raster scan distance propagation on the discrete

rectangular lattice, Computer Vision, Graphics, Image Processing, 55, 1, 84-94

MATLAB: General purpose software packages, such as MATLAB

(Image Processing Toolbox) include

facilities for performing distance transforms with metrics based upon exact Euclidean (EDT), City block

(L

1

), Chessboard (defined here as L

), and Quasi-Euclidean (local Euclidean) distances. See

www.mathworks.com

for more details. The MATLAB Image Toolbox uses the following reference for

2-D exact Euclidean distance transforms:

Breu H, Gil J, Kirkpatrick D, Werman M (1995) Linear time Euclidean distance transform algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 529-533

41

Okabe A, Boots B, Sugihara K, Chiu S N

(2000) Spatial tesselations: Concepts and applications of

Voronoi diagrams, 2 nd

ed, John Wiley, Chichester, England

42

Gorte B, Koolhoven W

(1990) Interpolation between isolines based on the Borgefors distance

transform, ITC Journal, 3, 245-247. This paper describes a simple application of the

(5,7)/5 chamfer

with linear interpolation.

43

Verwer B J H, Verbeek P W, Dekker S T

(1989) An efficient uniform cost algorithm applied to

distance transforms, IEEE Trans. Pattern Analysis and Machine Intelligence, 11, 4, 425-429

44

Butt M A, Maragos P

(1998) Optimal design of chamfer distance transforms, IEEE Transactions on

Image processing, 7, 1477-1484

45

Borgefors G

(1989) Distance transformations on hexagonal grids, Pattern Recog. Letters, 9, 97-105

192

Distance, Path Measurement and Path Models

46

Cuisenaire O, Macq B

(1999) Fast and exact signed Euclidean distance transformation with linear

complexity, Proc. IEEE Intl. Conference on Acoustics, Speech and Signal Processing, 6, 3293-3296.

See also MATLAB reference, above, and Dr Cuisenaire’s website pages at École Polytechnique

Fédérale de Lausanne: http://ltswww.epfl.ch/~cuisenai/DT/

47

From the earliest ‘accurate’ surveys onwards measurements were always adjusted for slope

(reducing

the results

) to provide ‘true horizontal distance’.

48

Noronha V, Church R L

(2002) Line referencing and other forms of location expression for

transportation, Final Report , Task Order 3021, California Department of Transportation.

49

for a thorough discussion of least-effort (energy minimising) paths across physical landscapes using all-terrain/off-road vehicles, see: Rowe N C, Ross R S (1990) Optimal grid-free path planning across arbitrarily contoured terrain with anisotropic friction and gravity effects, IEEE Transactions on

Robotics and Automation, 6, 540-553; and Rowe N C (1997) Obtaining Optimal Mobile-Robot Paths with Non-Smooth Anisotropic Cost Functions Using Qualitative-State Reasoning, Internat. J. Robot.

Res., 16(3), 375-399; additional related papers by Rowe can be found under the Path Planning heading on http://www.cs.nps.navy.mil/people/faculty/rowe/index.html

50

Peano G

(1890) Math. Ann., 36, 157-160

51

Netto E E

(1879) Jour. für Math., 86, 263-268

52

Sierpinski W

(1882-1969). see Sierpinski W (1912) Bull. de l’Acad. des sciences de Cračowie, A, 462-

478. Sierpinski triangles, which also exhibit fractional dimension, can be constructed both geometrically

(via a recurrence operation) and by the so-called ‘chaos game’ in which they appear as

a result of a random process.

53

Mandlebrot B B

(1970) Fractals: Form, chance and dimension. San Francisco, Freeman, p.10

54

Hausdorff-Besicovitch dimension D

HB

(after Mandelbrot op. cit. and others):

Let

be an E-dimensional Euclidean metric space and let h

k

(

ρ) = V

k

ρ

k

be the volume of k-sphere of radius

ρ, where

V k

= Γ

( )

k

/

Γ

(

k

/ 2

+

1

)

and

Γ

(

n

)

=

## ∫

0

x n

− 1

e

x dx

is the Gamma function with n>0 ;

Γ(1/2 )= √π and Γ(n+1)= nΓ(n)=n!.

Let S be a subset of

, for example S = a curve in the plane and = a Euclidean plane. Then S may be

covered by a finite number, n, of k-spheres

(cf. the Pointillist method described in the Area

approximations Section

) with total’ volume’ (e.g. area) given by:

V

=

i n

## ∑

=

1

h

(

ρ

i

)

Define

ρ = sup ρ

i

and form the most economical covering of S possible, given

ρ, i.e.

V

*

= inf

ρ

i

< ρ

h

(

ρ

i

)

193

Distance, Path Measurement and Path Models then limV* as

ρ tends to 0 defines the h-measure of S; if there exists D such that the h-measure is

infinite for k<D and zero for k>D then D is called the Hausdorff-Besicovitch dimension of S. It can be readily shown that is S is a self-similar set

(e.g. the Koch curve, Figure 5-27) then its self-similarity

dimension equals D. HB dimension is often written in the limit form:

D

HB

=−

ρ lim

0

+ ln

N

(

ρ

) ln

ρ

where N() is the number of k-spheres required forcomplete coverage

Cantor-Minkowski-Bouligand dimension

(D

MB

):

Let

be an E-dimensional Euclidean metric space and V

k

be the volume of k-sphere as per the H-B definition above. Let s

(

ρ) be the smoothed out version of the set S in (cf. the Minkowski-Cantor

sausage described in the Area approximations Section

). The Minkowski-Bouligand dimension of S is

defined as that value for k for which the upper and lower contents of S both exist and are equal, i.e. where:

lim

ρ

→ 0

{

sup

.volume(s( ρ)/V

E

k

ρ

E

k

}

=

ρ

lim

→ 0

{

inf

.volume(s( ρ)/V

E

k

ρ

E

k

}

55

due to

N F H von Koch

(

1870-1924

). This is an example of a continuous curve which is nowhere

differentiable

56

Mandlebrot B B

(

2002

) Gaussian self-affinity and fractals, Springer-Verlag, p. 97

57

Håkanson L

(

1978

)

The length of closed geomorphic lines, Math. Geol., 10, 141-167

58

Batty M, Longley P

(

1994

) Fractal Cities, Academic Press, London and San Diego

Longley P

(

2000

) Fractal analysis of digital spatial data, Ch12. in

Openshaw S, Abrahart R J

(

eds.

)

(

2000

)

GeoComputation, Taylor and Francis, London

59

Longley P, Batty M

(

1989

) On the fractal measurement of geographic boundaries, Geog. Anal., 21, 1,

47-67

60

Mandlebrot B B

(

2002

) Gaussian self-affinity and fractals, Springer-Verlag, p. 43

61

Kuffner J J, LaValle S M

(

2000

) RRT Connect – An efficient approach to single query path planning,

Proc. IEEE Int'l Conf. on Robotics and Automation (ICRA'2000), San Francisco, CA, April 2000

See also: The “RRT Page” maintained by LaValle at http://msl.cs.uiuc.edu/rrt/ and Kuffner’s automated animated characters page at: http://robotics.stanford.edu/~kuffner/anim/index.html

62

(

1988

) The pivot algorithm: A highly efficient Monte Carlo method for the self-

avoiding walk, J. Stat. Phys., 50, 109-186

63

Mandlebrot B B

(

2002

) Gaussian self-affinity and fractals, Springer-Verlag, Chapter *H1

64

Kennedy T

(

2002

) A faster implementation of the pivot algorithm for self-avoiding walks. J. Stat.

Phys., 106, 407-429; the three SAW segments are from the set of images provided by Kennedy at: http://hedgehog.math.arizona.edu/~tgk/saw_pictures/index.html

. Closed figures with random boundaries may generated using other random procedures in the plane, but these are not discussed further in this study.

194

Distance, Path Measurement and Path Models

65

Batty M, Jiang B, Thurstain-Goodwin M

(

1998

) Local movement: agent-based models of pedestrian

movement, Working Paper 4, CASA, UCL London

Batty M

(

2000

) Geocomputation using cellular automation, Ch.5 in

Openshaw S, Abrahart R

J

(

eds.

) (

2000

)

GeoComputation, Taylor and Francis, London

66

67

Dijkstra E W

(

1959

) A note on two problems in connexion with graphs, Numerische Mathematik, 1,

269-271

68

O

(

..

) or “big O” notation is used to describe the upper bound for the computational ‘order’ of time

and space required for exact or approximate solution of a given network problem. Figures quoted in

this Section are based on those provided by Mitchell (op. cit.)

69

Smith T R, Gahinet P M

(

1988

) Least cost paths through space, Ch.11 in

Coffey W

(

ed.

) (

1988

)

Geographical systems and systems of geography: Essays in honour of William Warntz, Univ. of W.

Ontario, Geography Dept., London, Ontario

70

Mitchell J S B

(

1998

) Geometric shortest paths and network optimisation, Research Rpt, Dept of

Applied Math. and Statistics, SUNY, Stony Brook, NY., 62 pages, 393 references; also published in the

Handbook of Computational Geometry, Elsevier Science, North Holland, Amsterdam

195

Distance Statistics

## 6 Distance Statistics

1

This Chapter (and the associated Annex 1) examines distance statistics: in particular

average distances, and tests of complete spatial randomness (CSR) that utilise distance

measures. We seek to illustrate both the scope and applicability of statistical distance measures to a variety of practical spatial problems. A wide range of results are derived and presented in a manner that seeks to clarify the underlying assumptions and the logic of the models used.

Although a number of statistical methods have been discussed in earlier Sections

(sampling, point-pair uncertainty, random walks), the central focus has been on

deterministic problems. We now focus upon statistically defined processes and associated distance measures. Much of the research in this area is concerned with point patterns and processes – in many cases it is the relationship between each point and its

nearest neighbour(s) that is of interest from a process perspective, so we start by

reviewing this area. We then examine and develop a number of approaches to the distribution of the distance between random pairs of points in bounded regions, subjects that arise from research in the fields of geometrical probability and trip distribution theory.

In recent years there has been a move away from these ‘direct’ statistical measures towards more exploratory approaches based on analysis of spatial intensity

2

and spatial autoregression

3

, reflecting the difficulty of extending classical distance statistics models to many real-world situations. These more recent techniques still rely on distance

measures, distance decay models and density (intensity) estimation – as such, much of

the discussion in the previous two Chapters, which is extended in the present Chapter, applies to these newer approaches.

196

Distance Statistics

## 6.1 Introduction

There is an enormous body of literature dealing with distance distributions reflecting their applicability in many disciplines. Much of this research has assumed that the distance metric to be used is Euclidean and that the sample space is unbounded and uniform. A number of results are presented in the following sections, and others derived, where the metric is more general and in which the sample space is bounded.

These factors are shown to result in alteration of the expected distance (and squared distance), which result in divergence from the values for key parameters (such as mean values and ratios) that conventional models suggest. Statistical analyses of spatial datasets utilising inter-point distance measures, directly or indirectly must take into consideration the effects described above. This can be achieved either by using appropriate measures and modified statistical results or seek to eliminate such effects, e.g. by correction of distance calculations, topological transforms, sampling subsets of points well away from region borders, systematic subdivision of sampled regions, and/or use of functional distance measures and Monte Carlo simulation techniques.

The set of distances defined by the spacing of randomly selected pairs of points within a bounded region represents a distribution whose frequency varies with line length, as measured by some agreed metric. There have been at least four approaches to the study of these ‘finite’ distance distributions:

(i) tests of randomness

(ii) geometric probability studies

(iii) shape analysis, and

(iv) trip distribution analysis

Each approach is discussed in the sub-sections below in which we describe existing work in this field and develop a number of extensions to the theoretical findings. The variety of results and approaches taken illustrate the many ways in which distance distributions and their central moments (e.g. mean, variance) can be applied to practical problems.

197

Distance Statistics

In botanical, biological and geographic research distance-based tests of point pattern randomness have historically been based on n

th

-order nearest neighbour (nn) statistics

(distributions and central moments) in infinite (unbounded) Euclidean spaces. More recently there has been consideration of nearest neighbour relations in bounded regions.

The majority of analyses focus on testing mapped point patterns against a hypothesis of

“Complete Spatial Randomness” (CSR), but other hypotheses, such as non-stationary

Poisson processes, have also received attention. Separately, there has been a lively debate on the treatment of planar point patterns which may exhibit clustering

4

. This latter area is not covered in detail here, especially as a number of the methods used do not rely on explicit distance measures for their analysis. Where such methods do adopt distance measurements they almost exclusively rely on Euclidean measures and often use (edge corrected) circular sampling regions (e.g. techniques such as Ripley’s Kstatistic and kernel-based density estimation).

The formal definitions of dimension in the footnotes of the previous Chapter utilised a general expression, V

k

, in the volume of a k-dimensional hypersphere of radius, r, where:

V k r k

= π

k

/ 2

r k

/

Γ

(

k

/ 2

+

1

)

The formula yields the familiar results for a line, circle and sphere:

V

1

r

1

=2r; V

2

r

2

=

πr

2

; V

3

r

3

=4

πr

3

/3

The results derived in Annex 1 - Nearest neighbour statistics and earlier by Dacey

5

, use this general expression in the formulae for the distribution of distances to the n

th

-nearest neighbour in k-dimensional space under the CSR hypothesis (which we show is related to the

χ

2

-distribution). From this result we derive the following general expression for the mean distance to the n

th

-nearest neighbour in k-dimensional space (the general expression for higher crude moments is also provided in the Annex):

198

Distance Statistics r n

,

k

=

Γ

(

n

Γ

(

+

1

n

)

/

k

)

1

(

λ

V k

)

1 /

k

where

λ is the point density. In two dimensions this expression can be simplified to

6

:

r n

, 2

=

n

( 2

n

!

) / 2

n n

!

2

Thus, for example, we have:

λ

r

1 , 1

=

1 / 2

λ

,

r

1 , 2

=

1 / 2

λ

,

r

2 , 2

=

3 / 4

λ

Tests of randomness may then utilise the observed distribution of point-event or eventevent distances and compare the observed mean values with those expected under CSR, e.g. as a ratio, or more powerfully, by comparing (transformed) sample distributions with the percentage points of the

χ

2

-distribution or with the Normal distribution for larger samples/higher order neighbours for which the Normal approximation to

χ

2

is valid.

The point density, λ, is typically an unknown and therefore its determination in practical problems is an important issue. In field-based studies (rather than mapped point sets) the CSR hypothesis is sometimes the presumed distribution and nearest-neighbour measurements used to determine λ. The approach is effective for relative dense, static objects (e.g. natural forest stands) but of limited use in lower density and dynamic objects (e.g. estimation of populations of animals or fish). In the latter instance density estimation is based on the assumption that not all events are detected and distances to events are regarded as samples from a detection function or distribution, which must be modelled

7

. Work in this latter area is almost exclusively based on Euclidean distance measurement without adjustment for boundary issues.

The issue of point density estimation in mapped studies can be illustrated for the case of the line: if N points are dropped at random on a line of length, L, then the expected distance to the n

th

-nearest neighbours may be calculated. However, because the line is of

199

Distance Statistics

finite length some nearest neighbours to a base point will lie outside of the boundary and mis-measurements will be made. The error thus induced is equivalent to miscalculating the density, λ, and results in an increase in the measured mean and variance. The formula λ

0

= N

2

/(N+n)L (where N

2), i.e. adjusting the theoretical density formula by N/(N+n), has been found to provide a better estimate of the point density than N/L by the present author using simulation. This estimate of density is very good for n small and also for larger n when N>10.

In two dimensions similar boundary effects exist and may be more or less serious depending on: the size and shape of the sample area; the event density; and the orderneighbour measurements being taken. Donnelly

8

has used simulation techniques to provide adjusted estimates for rectangular sample regions with sides of length a and b, and perimeter length P=2(a+b). Letting λ=N/ab (number of points/sample area) he proposes using the adjusted value:

r

1 , 2

=

1 / 2

λ +

0 .

0514

P

/

N

+

0 .

041

P

/

N

3 / 2 which approximates to

r

1 , 2

1 / 2

λ +

1 / 5

λ

for the unit square with N>10

Donnelly also provides an edge-corrected estimate for the standard deviation which may then be used to compute a standardised measure (z-score) for significance testing. If P is large in relation to N, the adjustment is substantial. Furthermore, if the sample region is long and thin (e.g. 10:1 or greater) Donnelly’s adjustment becomes unreasonably large and one must question whether alternative approaches and models (e.g. Monte Carlo simulation) are preferable

9

.

Extensions of formal analysis to more complex shapes and alternative distance metrics have been limited. A simple extension to hyper-ellipsoids and hyper-spheroids with the

Euclidean metric will have larger values for the mean and variance than hyper-spherical measures, and an adjusted volume measure based on the formula for V

k

but where the radius r is replaced by a set of

ρ

i

values which are the semi-axes of the figures. If all the

200