Figure 56 Alternative routes between two sites in London
Distance, Path Measurement and Path Models
Figure 56 Alternative routes between two sites in London
A. Fastest
B. Shortest
The BromleytoSunbury example uses two kinds of primary or Aroad (two speed variants) and one Motorway (another speed variant) and the function to minimise is actually a sum based on network distances and route speeds. Each road type is assumed to have a fixed average road speed (predefined, but in many systems this is amendable).
If all road types vary in their traffic carrying characteristics (and thus traffic speed) in the same way by time of day then this amounts to a constant multiplier and has no effect on the route selection (although the journey duration will alter based on the factor).
However, if road speeds change in different ways across the day then time of day
151
Distance, Path Measurement and Path Models
becomes important – for example, city roads may improve by 30% during the daytime outside of rush hours but motorways might improve by 100%.
The shortest route in this case is seeking to minimise
z
=
∑
i d i
where the d
i
are the length of route segments. For the fastest journey this changes to
z
=
∑
i t i
where the t
i
= d
i
/v i
are the times taken to travel along each segment with speed, or velocity v
i
. The integral formulation of this expression, which we use at various points in this study, is: min
z
=
∫
1
v ds
Note that different routing software and datasets can yield radically different results. In the example above some online systems suggest that the fastest route is via the London inner ring road (the South Circular) during the rush hour, which will be incorrect in most cases. Furthermore, in the example above a large section of the motorway selected
(the M25) is operated with an official variable speed limit system, varied by time of day and traffic conditions.
Computerised routing systems should take account of such variations, incorporating knowledge of actual traffic conditions and, for example, ask users when their journey is to start or (more usefully) when they need to reach their destination. It can be shown that this ‘backward path’ view is equivalent to the conventional forward path model
22
, thus solution or failure to find a solution of either one implies the other for scalar fields.
In the example above journey times in rush hours may actually be longer by the ‘faster’ route. Furthermore, this highlights the issue of solvability – if the objective is to reach
Sunbury between 08:45 and 09:15 for a meeting there may be no starting time that can
152
Distance, Path Measurement and Path Models achieves this objective  arrival times of before 08:15 may be the only solution. This phenomenon is prevalent in modes of transport that run to timetables or schedules (or fail to!). These kinds of ‘time holes’ in solution sets are mirrored by spatial ‘holes’ or
‘shadows’ in some cases.
5.2 Path measurement theory and practice
At first sight the question of measuring and working with path lengths would appear to benefit from the use of the tools of measurement theory
23
and measure theory
24
. Indeed, both fields have important contributions to make, but neither possesses the range of tools required. Among the concepts these theories do provide are the notions of invariance
25
and measure. Stevens
26
has classified various scales of measurement, such as ordinal, nominal and ratio scales, on the basis of the group of transformations that leave the scale form invariant. For example, in the ‘intervening opportunities’ model of trip behaviour the distance to a shopping centre from any consumer is essentially a simple ranking of proximity or accessibility, i.e. an ordinal scale. Such a model would be unaffected by a transformation of proximity using any positive monotonic function
(e.g. replacing a proximity measure x by x
n
, n>0). Similarly, the classical notion of length, as measured on a ratio scale, is unaffected by multiplication by a positive constant.
The notion of a measure of a set, A, contained in the real line, is a generalisation of the idea of length and is invariant under translations of the set. Hence if M(A) is the measure of set A then M(A) = M(A+x) where x
∈
ℜ
, A+x = [y+x: y
∈ A].
The application of measure theory to the social sciences appears to be limited to
Faden’s
27
monumental (but rather abstract) work, in which a range of location and other problems are generalised using the power of measuretheoretic analysis. It is, however, central to Crofton’s development of geometric probability, which is discussed in greater detail in Sections 5.2.7 and 6.3 below.
5.2.1 Linear approximation
The conventional mathematical approach to measuring path length involves distinguishing two classes of curves: rectifiable and nonrectifiable (without length).
153
Distance, Path Measurement and Path Models
The method of determining whether or not a curve, C, is rectifiable, involves replacing it with all possible inscribed polygons (Figure 57). If the length of these inscribed polygons, P, P′ etc., is bounded then the curve is called rectifiable, with the upper bound of polygon lengths defining the curve length.
If the curve is not rectifiable it is regarded by classical mathematics as having ‘no length’ or indeterminable length. This approach, whilst corresponding closely to the familiar ‘dividers’ or polyline method of measuring linelengths on maps and in vector
GIS datasets, has the unfortunate property of assigning no length to nonrectifiable curves. Such curves are not merely confined to the socalled ‘pathological’ functions
(such as Dirichlet’s function: f(x) = 1 if x is rational and f(x) = 0 when x is irrational) but includes several apparently wellbehaved continuous functions (e.g.
f(x)=xcos(
π/2x), x≠0, f(0)=0, x∈[0,1]), and arguably, all realworld curves.
Figure 57 Rectifying a curve, C, by polygonal approximation
A
C x
3 x
2
P' x
1
B
P
The practical problem of measuring the length of a curve or line, whether mathematical, physical (e.g. a lake shore, a molecule or gene under the microscope) or cartographical, has attracted much attention. In addition to the linear method (rectification) described above, which forms the basis of land surveying methods, several other methods have been developed, each of which involves use of a scalecritical component,
η.
If the rectification process above used dividers to ‘walk’ along the line to be measured, then the separation of the divider points,
η, would determine the curve length L
m
(
η),
154
Distance, Path Measurement and Path Models where m indicates the method used for measurement. Length is thus not an absolute, but a function of the way in which the underlying data is modelled or represented, together with the measurement method and the resolution used when measuring. Rectification can be regarded as a piecewise linear approximation technique. Invariably, this technique underestimates line length at a given resolution. For example, within GIS datasets recorded at a scale of 1:100,000, comparison with accurate odometer measurements of road lengths shows errors of up to 15% on sinuous routes
28
. Fixed interval walks can also be computationally intensive and require separate handling of the inevitable remainder at one end of the line.
In order to reduce computer data storage and/or manage representation at varying scales, some authors
29
have developed algorithms which examine the deviation of an approximation such as P′ to C with a coarser approximation such as P. If this deviation is less than a predefined value then the coarser link is retained. For example, in Figure
57 point x
2
might be dropped (or weeded out) in favour of the direct link from x
1
to x
3
.
This approach will exaggerate the underestimation of length that is inherent in the polyline approach. Also note that the orientation of the line segment x
1 x
3
is substantially different from x
1 x
2
and x
2 x
3
and from the original curve. Algorithms (such as refraction) that rely on boundary orientation are significantly affected by such approximations.
5.2.2 Quadratic approximation
Piecewise nonlinear approximations could also be used, such as quadratics, cubics and circle arcs. With quadratic approximation (Figure 58) the triple of path coordinates:
[x
i
, y i
], [x
i+1
,y i+1
] , [x
i+2
, y i+2
] are used to determine the three coefficients of a quadratic of the form y = a +bx + cx
2
.
155
Distance, Path Measurement and Path Models
Figure 58 Quadratic approximations to a map outline
y
S
2
S
1
S
3 x x
0 x
1 x
2 x
3
S
4
The curve length over the interval [x
i
, x i+1
] can then be calculated and the estimation process repeated for the next triple (i+1, i+2, i+3) etc. This method can be applied to both 2 and 3dimensional (space) curves, since any three points will always be coplanar. The separation interval on the xaxis becomes our
η and the estimated length being the integral along the piecewise construction.
5.2.3 Circular approximation
A similar method involves piecewise circular approximation (Figure 59). For this method a point is selected along the line to be measured and a tangent drawn at this point. The normal to the tangent is then drawn and a circular arc constructed to approximate the interval [x
i
, x i+1
] by selecting the circle radius, r
i
(i.e.
η
i
), such that it lies on the normal and passes through y(x
i
) and y(x
i+1
).
The angle
θ
i
is measured and the length of the line estimated as
L
=
∑ ∑
i
L i i r i
θ .
156
Distance, Path Measurement and Path Models
Figure 59 Circular approximations to a map outline
y
θθθθ
x x
0 x
1 x
2 x
3
This method is similar to the use of a rolling wheel to measure road lengths except that in this case the circle radii can be variable. There is also an interesting relationship between this method and design guidelines for road construction, which often quote design objectives in terms of the radius of curvature  in some instances such guidelines assume a horizontal road profile to be designed from linear and circular elements.
5.2.4 Tangent approximation
This method is similar to that used in calculus, but instead of assuming that the step interval,
η → 0, it is retained as a finite measure (Figure 510). In calculus it is usual to specify this interval,
η, as ∆x with ∆y = y(x)  y(x+∆x) giving:
∆
d
= ∆
x
2 + ∆
y
2 =
(
1
+ tan
2 θ
)
∆
x
= sec
θ ∆
x
where
θ is the angle between a tangent to the curve at x and the xaxis.
157
Distance, Path Measurement and Path Models
Figure 510 Tangent approximations to a map outline
y+
∆y
y
θθθθ
x x x+
∆
The total length of the line to be measured can then be calculated by summation. If
∆x is fixed, the method simply requires measurement of the angle
θ at each point. However, accurate measurement of
θ requires accurate construction of tangents, which in many cases may not be possible.
5.2.5 Grid approximation
If a square grid of cell size
η is placed over the curve of interest, then both the number of grid lines intersected and the number of cells encountered measure the length of the curve (Figure 511).
Figure 511 Grid approximations to a map outline
y x
158
Distance, Path Measurement and Path Models
In the example shown there are slightly more grid line intersections than cell intersections, so the two counting methods will differ. Goodchild
30
has suggested that for maps the cell counting approach “gives estimates which are more simply related to
scale and less to how the map was constructed.”
Both methods are well suited to automation, unlike some of the previous approaches, but will generally require smaller values of
η (a larger number of computations) for comparable levels of approximation. Figure 512 highlights the degree of approximation involved by replacing the intersected cells in Figure 511 with the filled cells:
Figure 512 Grid approximations to a map outline – filled cells
y x
5.2.6 Area approximation
In the second grid method described above, the number of cells intersected (the ‘box count’) is also a form of area approximation or ‘coverage’ of line length. A number of other methods have been devised using the area in the neighbourhood of a curve as a measure of curve length. Each of the three methods to be described uses circular regions to approximate the curve.
The first method is the socalled Minkowski or Cantor sausage (Figure 513, method
A). In this method every point along the curve of interest is replaced by a circle of radius
η. The area of the ‘tape’ formed by the trace of these circles is then A, where A =
2
ηL, hence L(η) = A/2η.
159
Distance, Path Measurement and Path Models
This method essentially transfers the measurement of the line length to that of calculating the area, A. The surface area equivalent, utilising a sphere, is called the
Minkowski comforter. Note that there is an important problem with this method – it assumes that the curve to be measured is everywhere defined and identifiable and drawing of a circle at each point to be possible, which may not be the case. It has recently been suggested that the measurement should be of the “
εneighbourhood” of the original line, where
ε is a small but finite number, using a line of length 2ε drawn perpendicular to the path. This revision still leaves the method of defining the neighbourhood open to similar criticisms.
Mandelbrot calls the second method the Pointillist technique
31
by reference to the
Pointillist school of art (Figure 513, method B). The curve to be approximated is completely replaced by the smallest possible number of circles (points) of fixed radius
η, i.e. the smallest possible covering of the line. The centres of the circles used will not necessarily lie exactly on the line to be measured. As before L(
η) = A/2η.
Figure 513 Area approximations to a map outline
y
Method A:
Minkowski
Method B:
Mandelbrot
Method C:
Perkal x
The final method illustrated in Figure 513, method C is due to Perkal
32
, although it was not initially introduced by him specifically for line measurement. The method uses circles constructed on either side of the line to be measured to define an outer envelope.
As with the first method, the area of the outer envelope divided by its width (in this case
2
η) gives the approximate line length (finite lines require an end of line adjustment factor). In a subsequent paper Perkal
33
examines method A alone and in conjunction
160
Distance, Path Measurement and Path Models with method C. He describes the use of square and triangular templates (similar to
Figure 512) to estimate line length by repeated random sampling. Perkal calls these measuring templates “
εlongimeters”.
None of the above methods is particularly easy to carry out by hand or computer, although method A is probably the most amenable to automation. Perkal tested his techniques and those of Steinhaus (see Section 5.2.7 below) on a variety of lines with varying values of
ε with several test users. He found that his methods were faster than those of Steinhaus, but both methods were subject to considerable variation by user in the estimates found.
Line following (tracing) by humans or computers (e.g. optical devices) will inevitably involve selection of points along the line and thus linear approximation will result automatically. The main advantage of the area methods is where the line to be followed is poorly defined (at the scale selected) and/or area separation is required.
5.2.7 Statistical approximation
There are a number of statistical methods for estimating the lengths of a fixed path between two known points, A and B. Some discussion of statistical models of paths has already been given in Section 5.1.3  a more extensive discussion of the role of statistics in distance studies is provided in the next Chapter, where issues such as trip distributions within defined regions are considered. This Section concentrates on the application of statistics to line length estimation.
In the 1885 edition of the Encyclopædia Brittanica (9 th
Ed.) Crofton’s article on probability
34
described several important results of interest to this, and subsequent sections. Crofton discusses ‘measures’ such as:
M
(
X
)
=
∫∫
X dpdw
where p and w are the polar coordinates of equiprobable random lines in the plane, and
X is a region in the plane (Figure 514). When the integral above is evaluated over the region X it measures the total density of lines in X. Santaló
35 points out that:
161
Distance, Path Measurement and Path Models
“.. up to a constant factor, this measure is the only one which is invariant
under the group of motions in the plane.” (i.e. translation, rotation,
reflection)
The measure M(X) can be used to determine the length of any curve in the plane which is composed of a finite number of arcs with a tangent at every point. Santaló shows that if a curve, C, has length, L, then:
M
(
X
)
=
∫∫
X ndpdw
=
2
L
where n is the number of intersections each random straight line has with C. This result also holds when C is a curve on a surface other than a plane, and straight lines are replaced with geodesics.
Figure 514 Polar coordinates of a random straight line in the plane
y
A p
B w x
If C
1
is a closed, convex curve of length, L
1
, then n=2, hence M(x) = L
1
. Now if C is contained within C
1
the mean value of the number of intersections is: hence
n
=
∫∫
∫∫
ndpdw dpdw
=
2
L
L
1
162
Distance, Path Measurement and Path Models
L
≅
n L
1
/ 2
Crofton produced a similar result to that above, as did Steinhaus. The latter noted that if a curve of length, L, is dropped at random K times on a set of parallel lines d units apart, and n is the number of intersections observed, then
L
≅
nd
π
/ 2
K
a result similar to George de Buffon’s needle method of estimating
π, first proposed in
1777 (i.e. using a straight line/needle of known length an estimate of
π can be found by rearranging the above expression). Abeyata and Franklin have used a variant of this approach
36
and others to estimate boundary lengths (e.g. of patches of desert scrub and forested tracts) using a series of linear transects. They then compare these estimates with reference and/or alternatively derived data.
Steinhaus also states that if the mlength, L
m
, of a line is defined using the method above, but limiting the number of intersections per parallel counted to m, then L
m
tends to a fixed limit with increasing map quality as K increases and d decreases. The method could be reversed, with a set of parallel lines being randomly dropped onto the curve to be measured, which may prove more amenable to automation.
A related concept, which has been shown to be of considerable practical use, involves the use of projections (Figure 515). This has similarities to the tangent method described earlier.
163
Distance, Path Measurement and Path Models
Figure 515 The projection method – projected segment into xdomain
y
A
L
B
θθθθ
d x
2 x x
1
In this Figure the projected arc length d(x
1
,x
2
) =Lcos
θ, hence L=d/cosθ (= P(θ) say).
But this example covers the case of a single straightline segment projected onto the xaxis. With a complex curve there will be many projected segments. The mean value of
P(
θ) taken over all θ is:
P
(
θ
)
=
1
2
π
∫
0
2
π
L
 cos
θ

d
θ =
2
L
π a formula originally due to Cauchy and used by Steinhaus
37
as a basis for measuring the length of a curve under a microscope. In Figure 516 the projection of the line AB onto the xaxis gives P(
θ
x
) = x
2
– x
1
. The projection onto the yaxis involves overlapping line segments, resulting in double or triple counting.
164
Figure 516 Projection method  detail
y
Distance, Path Measurement and Path Models
y
4 y
3 y
2
A
B y
1 x x
1 x
2
The yprojection overestimates the line length:
P(
θ
y
) = 2y
2
– y
1
 + 3y
3
– y
2
+…
but as we have seen earlier, rearranging terms:
L
=
P
(
θ
)
π
2 thus a first estimate of L is to calculate the total length of the projections onto a series of lines at angle
θ to the xaxis and divide these by 2. Increasing the number of projections carried out to 2n gives the mean value for P(
θ), which yields acceptable results even for small n. Kendall and Moran
38 quote a general formula for the accuracy bounds with projection lines at intervals of
π/n:
π cos
π
2
n
2
n
sin
π
2
n
− 1
≤
P
2
n
P
≤ π
2
n
sin
π
2
n
− 1
For n=6, L is estimated to an accuracy of 3.4%, and for n=10, L is estimated to an accuracy of 1.2%. Improvement in accuracy as n increases is relatively slow, and we would expect systematic or statistical areabased methods to be more effective in many
165
Distance, Path Measurement and Path Models
applications. Indeed, the latter have the added advantage of being suitable to complex networks of linear forms, such as fluvial tree structures and transport networks.
5.2.8 Lattice approximations and distance transforms
39
Many spatial datasets and spatial problems are described with reference to regular lattice frameworks rather than continuous space. Examples include raster scan and
Digital Elevation Model (DEM) data, pixelated screen images, cost surfaces, cellular automata models, swarm models and many others. This raises the question as to how distances should be measured in such cases and to what extent these relate to continuous space metrics such as L
1
and L
2
. The most common regular lattices used today are square and we limit the discussion below to such arrangements – similar analyses can be carried out on other regular lattices in the plane, such as those of triangular and hexagonal form. All measurements are constrained to lie on the lattice, generally taken as either cell centres or lattice intersections, according to the data type. The methods described can be applied to greater than two dimensions, and three dimensional problems have widespread application in visualisation and medical imaging (e.g. CAT scans, MR scans) and related fields.
For the discussion below we consider centretocentre distances, and initially consider the family of local or neighbourhood metrics defined by the 3x3 celladjacency matrix.
A sample of such metrics is shown in Table 52. These are known as chamfer metrics because the locus of the metric generates a figure similar to a piece of wood with chamfered or bevelled edges (e.g. see Figure 522D). Chamfer metrics and their associated distance transform (DT) algorithms (see further, below) provide a very simple and extremely fast method for the approximation of Euclidean distances, or a multiple of Euclidean distances, over a square lattice
40
. In this Section we discuss the basic attributes of these transforms, whilst in Sections 7.4.3 and 8.1.2 we extend the DT concept and show how DTs can be used in the fields of optimum location theory and decision support systems.
Distances in the 3x3 cases are calculated in an incremental manner based entirely on the distance to directly adjacent cells. The standard ‘distance transform’ algorithm involves a twopass scan of a square lattice: a forward scan from top left to bottom right, and then a backwards scan from bottom right to top left (Figure 517). The algorithm thus
166
Distance, Path Measurement and Path Models involves in the order of Mn
2
computations where n is the maximum dimension of the lattice and M is the number of cells used in the neighbourhood computation, or more formally MN computations where N is the number of cells in the underlying lattice (i.e. linear in the number of cells).
Table 52 3x3 Chamfer metrics
Case
A
B
C
D
Description
Distances are determined by the L
1
or ‘cityblock’ metric and paths correspond to the “rook’s move” in chess parlance
As per A but with diagonal distances determined by the Euclidean metric applied locally  sometimes referred to as the Local Euclidean metric
Integer Chamfer (3,4)/3 metric. These integer values provide an improved estimate of distance than Cases A or B; divide by 3 on completion
Fractional Chamfer (Borgefors, 1986) – optimal noninteger values for all directions (values shown are after Butt and Maragos, 1998)
2
1
2
Adjacency matrix
1
0
1
1
2
1
2
4
3
4
√2
1
√2
0
1
3
0
3
1.36039 0.96194 1.36039
0.96194 0 0.96194
1.36039 0.96194 1.36039
4
3
4
√2
1
√2
Each pass involves adding the values in a mask to cell values in the underlying lattice  see Figure 517, where 5 values are used based on the (3,4) chamfer. The value in mask position 0 of the transformed lattice is then set to the minimum of the sums calculated.
The central function of the algorithm (see further, Annex 3 – Sample algorithms) is:
d0 = min{d+D(i),d0} where d0 is the current value at the central point (0) of the mask, D(i) is the local distance to the i th
element of the mask, and d is the current value at the selected row, column position (r,c) .
The underlying lattice is normally a binary image, but could be a single source or target point or set of points from which distances are automatically generated. In this case the source point(s) would be initialised to 0 and all other points as a large value, e.g. 9999.
On completion of the twopass scan each cell in the resulting lattice will contain the distance to the nearest point in the set of source points. In the example above, division of the values by 3 can be made on completion of the scanning process, giving an approximation that will be within 6.1% of the true Euclidean distance.
167
Distance, Path Measurement and Path Models
Figure 517 3x3 Chamfer masks for distance transformation
4 3
3 0
4
0 3
4 3 4
3 0 3
4 3 4 4 3 4
Forward scan Backwards scan Combined masks/adjacency matrix
There are a number of highly efficient (very fast) sequential and parallel algorithms for performing this process and a great deal of research into these and the quality of approximation has been conducted for both binary and greyscale images. Distance
Transforms (DTs) are used in a wide variety of image recognition and processing applications (for which they were designed) – example applications include image matching, skeletonisation and 3D rendering. As we show below, DTs may also be used for the fast computation of distances and multilevel buffer zones from single or multiple objects (points, lines, areas) rather than just single points (see examples in
Figure 518), for the computation of watersheds and slope lines, for the determination of optimal paths and for the computation of Dirichlet regions (or Voronoi polygons) from lattice/raster data.
Figure 518 Raster buffer zones from distance transform using (3,4) chamfer
The examples in Figure 518 show distance bands from object sets shown in white, comprised of (a) a single central point, and (b) a cross shape, over a 100x100 lattice, generated with a single forward and backwards pass of the masks. Colours indicate the distance from the nearest point of the object set. There are many applications for such computations, for example in the computation and mapping of noise and environmental pollution impacts associated with major roads and aircraft flight paths.
168
Distance, Path Measurement and Path Models
With multiple objective points the distance transform generates the equivalent of
Voronoi polygons, or planar digitised Voronoi diagrams
41
. These may be mapped in two dimensions as above, or in three dimensions, where distances are treated as elevations, shortest paths are lines orthogonal to the contours (Figure 519), and region boundaries are watersheds. An example application with multiple objective lines is the interpolation of contour data
42
, although results in this case are improved if additional key landscape points are included (notably peaks and pits).
Figure 519 3point distance transform, 2D and 3D views
Distance transforms can also be applied with almost no alteration (see further, Annex 3, for details) to cases where obstacles are included. For example, if we introduced a rectangular region as an obstacle in the single point example above, the resulting paths in the region are distorted (Figure 520A) – in this example the transform has been generated using modified Borgefors fractional values on a 5x5 mask (see further, below), with iterated passes of each mask to ensure convergence (two or more iterations may be necessary in cases where obstacles or varying costs exist). As before, following lines that are orthogonal to the contours shown have been used to generate the shortest paths.
Shortest paths can also be determined by retaining a record of the closest cell in the mask neighbourhood as part of the DT operation. This record can be held in absolute or relative terms, i.e. of the form: “next path point is in position 15” (15), or “next point is
169
Distance, Path Measurement and Path Models
in row 23 column 46” (R23C46), or “next point is 2 rows down and 3 across”. Using two arrays (or a multidimensional array) for xcomponents and ycomponents, this information can be stored as a pairs of single relative values, e.g. x(i,j) = 2, y(i,j)=1.
This formulation is convenient for use with the algorithm provided in Annex 3 since the components are simply the values already held in the DX(k) and DY(k) arrays. A vector diagram generated from a pair of arrays corresponding to the previous diagram is shown in Figure 520B (note the path distortion generated from the lattice representation of the underlying dataset).
Figure 520 Shortest paths by distance transform with constraints
A. Paths constructed orthogonal to contour lines B. Vector map of path directions obtained from
tracking during the transform operation
It should be noted that the shortest path vectors shown in Figure 520B above differ from the steepest descent vectors computed directly from the DT surface shown in
Figure 520A. The latter do not, in general, indicate the correct shortest paths, as illustrated by comparing the vectors in Figure 521 with those in Figure 520B and the orthogonal path construction method.
170
Distance, Path Measurement and Path Models
Figure 521Steepest descent (slope vectors) of distance transformed surface
A simple iterative procedure can be devised which utilises the local vector arrays to produce a complete set of shortest paths from each cell to its closest object point. This process can be used to produce digital Voronoi diagrams in inhomogeneous space since it assigns each point to its closest target point. It may also be used in homogeneous space to compute an exact Euclidean distance transform by calculating the Cartesian distance between the solution path start and end points – thus a chamfer transform can be used to create an exact transform in such cases. This result, whilst selfevident, does not appear to have been reported previously. It can be seen to be true since assignment of the closest target point under an accurate chamfer metric will always or almost always be the same as the assignment under an exact Euclidean metric, and thus computations based on this assignment will exactly match an exact Euclidean transform. Sample tests by the author have found no difference between the results obtained in this manner to those generated using an ‘exact’ algorithm.
For inhomogeneous images, such as that above, distance transforms are frequently based on an algorithm known as the uniformcost model
43
, itself a form of A* algorithm
(i.e. an heuristic search procedure rather than an image scanning procedure). For variable cost surfaces and combinations of surfaces, scanning distance transform techniques can still be used (with minor modifications), as we demonstrate in Section
8.1.3.
171
Distance, Path Measurement and Path Models
By plotting the locus diagrams for a range of 3x3 chamfer metrics against the optimal
Euclidean locus (a circle) the relative merits of different local values or weights can be seen (Figure 522). Examining the diagrams it is clear that the last of the four approximations achieves the closest possible match to the circle, with a mix of positive and negative errors at intervals of
π/4 (positive) and π/8 (negative). The first two approximations underestimate most distances. The octagonal shape of all but the first example is the result of the 8cell local neighbourhoods utilised. If a 5x5 local neighbourhood is used (see Figure 520 and Figure 523) the approximation is 16sided
(a “hexadecagon”). The symmetry displayed and closeness of the best approximations to the Euclidean metric means that optimal chamfer metrics are nearly, but not completely, rotationally invariant.
Figure 522 Chamfer metric locus diagrams
A. Chamfer (1,2) or Manhattan metric
B. Chamfer (1,
√2) or Local Euclidean metric
C. Chamfer (3,4)/3 metric D. Chamfer (0.96194,1.36069) optimal metric
172
Distance, Path Measurement and Path Models
Integer values are frequently used in distance transforms, but Borgefors recognised that the approximation to Euclidean distance could be improved upon. She used Cartesian coordinate pairs to produce her noninteger results (Table 53). The Cartesian model generates a result that is not fully optimal in the propagation of distances around a point in a lattice (although these do provide the basis for the approximation to Euclidean distance we presented in the previous Chapter). In a detailed analysis using polar coordinates, Butt and Maragos
44
have shown that the values derived by Borgefors can be marginally improved upon. Their results for the 3x3 case are shown in Table 53: the
(3,4)/3 metric is the best low valued integer solution and yields correct values for horizontal and vertical paths; other choices involving larger integer pairs, such as
(24,34)/25, can be used to approximate noninteger variants quite accurately.
Table 53 Maximum absolute error for 3x3 chamfer metrics
Local distances (a,b)
(1,1)
Maximum absolute error
41.41%
Comments
Chess board “rook’s/bishop’s move”
(1,2) 29.29% Cityblock, L
1
(1,
√2) 7.61%
(3,4)/3 6.07%
Euclidean local distance
(1,1.3507) 5.63% Borgefors, with a=1
(1,1.3420) 5.38%
(0.95509,1.36930) 4.69%
(0.96194,1.36039) 3.96%
ButtMaragos with a=1
The lattice neighbourhood can be increased to a 5x5 matrix, as noted above, in which case there are three distance weights to be assigned to the various cells rather than two, and the optimum fractional values in this case are (0.9866,
√2, 2.2062). These values provide estimates that are within 1.36% of the direct Euclidean distance but at the cost of slightly increased computation. The integer value optimum values are (5, 7, 11)/5, and are remarkably accurate – within 2% of the Euclidean distance. The integer neighbourhood (mask) for the 5x5 model is shown in Figure 523 – values not entered are predetermined (e.g. as 5+5 = 10 or 7+7=14). The mask is divided into two for forward and backwards scans, as per the 3x3 mask described above. From Figure 523 the distance transform of a point using the 5x5 mask can be seen to be a very close approximation to a circle over a square lattice.
173
Distance, Path Measurement and Path Models
Figure 523 5x5 Distance transform and integer chamfer mask
11
11
11
7
5
7
11
5
0
5
11
7
5
7
11
11
11
With neighbourhoods of 7x7 or greater the maximum error falls below 1%. Results can be obtained for triangular and hexagonal lattices
45
, with the latter providing improved results, but again at the cost of increased complexity in both representation and processing.
There is an additional useful result obtained from this table. It relies on the fact that between any two points on a square lattice it is always possible to construct a path consisting of two components, one diagonal path and one horizontal or vertical path. If two points in a square lattice are selected at random, and these are M steps
(horizontal/vertical) and N steps (diagonal) apart then the best estimated distance between them is:
d = 0.96194M + 1.36069N
The maximum absolute error in this calculation is 3.96% of max{x
1
x
2
, y
1
y
2
}.
The lack of exact correspondence to Euclidean distances must be recognised and regarded as a form of systematic error or uncertainty, which may be exaggerated with scale changes and/or systematic growth/shrinkage of objects using distance transform methods (e.g. topological inconsistencies may result)  these considerations are in addition to the representational issues associated with the original lattice/raster dataset.
However, algorithms are now available that provide exact Euclidean distances in near linear time, which may prove more suitable for some problems
46
.
174
Distance, Path Measurement and Path Models
5.3 Elevation and path
Many of the methods described above ignore elevation and associated cost or effort factors. In practice the direct line distance on a map/within a GIS facility is frequently a good first estimate of the surface distance (i.e. allowing for elevation)
47
and will often closely approximate the shortest distance across the surface. For example, the map sections below (Figure 524) show a transect southwards from East Creech, near Creech
Barrow in Dorset, extending for 1km from the road at East Creech to Barneston Manor
(now known as Barnston Farm). This value (1km) provides the 2dimensional
(projected plane/map) distance rather than the surface distance.
The surface route, as a walker, rises from 260
′ at East Creech, to 400′ as the transect crosses a complex ridge and narrow valley structure over the chalk, and falls away to
200
′ on the far side of the ridge. Despite the fact that the rises and falls are steep, and thus difficult to traverse, the total 3D surface distance is only slightly further than the projected plane distance (still being under 1.2km). The surface distance is equal to the sum of the incremental map distance divided by the cosine of the slope angle, thus it may be computed directly from a map or database representation by measuring or computing slope values along the transect or path. Only in extremely rough or mountainous terrain are surface distances much greater than simple 2D estimates. For example, if the top of a 500m hill were located in the middle of a 1km stretch of path, giving a 45
°
slope (1:1) to the top and down again, the surface distance would still only be 1.4kms. Current design standards for UK roads aim to keep all gradients to below 5
6% (c.1:20)  above these levels the operating costs for larger vehicles (HGVs) increases quite rapidly.
It has been suggested that elevation is a significant factor in the observed differences in distance calculated by driving along roads (using accurate odometers) and calculated distances obtained from GIS packages, but the principal causes of these differences are accounted for by horizontal errors introduced by polyline approximation and the scales at which GIS datasets have been collected (as noted above) – path length calculation using piecewise polynomial, circular arc, spline or trigonometric functions have been shown to remove much of this error
48
.
175
Distance, Path Measurement and Path Models
The effort of traversing a path of 1:2 or even 1:5 is quite different, however, and a least cost or least effort route (i.e. avoiding the steep slopes of the ridge and valley structure) might take the route in Figure 524B now provided for cars, which has road distance of c. 2.5kms. Least cost/effort paths are thus rather like shortest paths on extremely mountainous terrain, they strenuously seek to avoid the high cost/effort zones even if this means that the path length itself is much greater than one might expect or takes an unexpected route
49
. We return to this question and example in more detail in Chapters 7 and 8.
Comparing the modern map with that from a century earlier, we see that two earlier routes existed across the ridge – one was a path or track, probably suitable for horses and light carts, but not for coaches or heavy carts, and the other was a purposebuilt tramway for transporting materials from a small quarry. The current longer road route sweeps away to the left of the historic map, but still provides the basic path of today’s route.
Figure 524 Transect south from East Creech in Dorset, 1896 and 2000
300
East Creech
400
400
400
300
Cocknowle
A. OS Map, 1896
200
Barnston Farm
B. OS Map, 2001  redrawn
176
Distance, Path Measurement and Path Models
Figure 525 Old and new routes near East Creech, Dorset
Original route
The blue circle on the first map (Figure 524) marks the location of the original route, and the photograph taken looking SouthWest (Figure 525), shows the area today, with the historic cart route still clearly visible in the cattlefield.
5.4 Fractals
The practical issues of estimating line length are at the heart of many problems in geographic research (and other sciences) and these have highlighted weaknesses in the underlying mathematical foundations. In order to determine the length of a curve one must first have a curve to measure. In spite of the concept of a curve being apparently intuitively obvious, adequate definitions are lacking. For example, let a plane curve, C, be defined as the set of all points (x,y) represented by the continuous functions:
x = f(t), y = g(t)
for all t
∈ [t
0
, t
1
]
such that there exists only one value of t for each pair (x,y). Then C is called a simple
Jordan curve or, when crossing points are permitted, a Jordan curve.
This apparently simple and very general definition of a curve was shown by Peano
50 to permit certain curves that completely fill the plane (hence are indistinguishable from figures of dimension 2). Netto
51
, however, showed that Peano’s result would not hold
177
Distance, Path Measurement and Path Models
for simple Jordan curves. Further work on this question was carried out by Sierpinski
52
, who produced a recursive construction demonstrating that a (simple) Jordan curve could fill a square (Figure 526):
Figure 526 Sierpinski curve construction
Mandelbrot
53
observed that the trajectory of a particle exhibiting Brownian motion will also fill the plane, and combined this observation with those of the previous paragraphs and his work on financial modelling into his theory of fractals. He (currently) defines fractal geometry as “the study of scaleinvariant roughness”.
Mandlebrot initially defined fractals as sets whose HausdorffBesicovitch (HB) dimension
54
, D
HB
, strictly exceeds the topological dimension, D
T
. Peano curves and
Brownian trajectories are both classified as fractals by this definition, as are Koch
55
curves (Figure 527), which, like Sierpinski curves, are defined by an iterative or recursive construction process. Conventional (rectifiable) curves have topological dimension of 1. Their HB dimension will also be 1, since they are clearly not fractals.
More recently this definition has shown to exclude a number of special cases of
‘fractals’ which can be shown to have D
HB
= D
T
= 1 and improved definitions based on notions of selfsimilarity and selfaffine sets have been proposed, without a final conclusion being reached
56
. There is even beginning to be a question mark as to the value of the HB dimension in this context and the possibility of a more general definition of fractals being unachievable.
Determination of fractional values of D for certain welldefined curves can be achieved by analytical methods. For natural fractals, i.e. ‘curves’ such as coastlines, rivers, contours, an alternative method is required. This involves selecting a measurement
178
Distance, Path Measurement and Path Models method, such as any of those described above, and plotting the length L(
η) against the measurement factor
η on loglog paper. If the graph produces a straight line its equation will be:
log L(
η) = a + b logη where a and b are constants
For conventional curves b = (1D
T
) = 0 hence curve length is independent of sampling interval. However, for fractal curves b > 0. For example, Richardson (op. cit.),
Håkanson
57
and many others since found values of D varying from >1.0 to c.1.3 for coastlines and lakeshores, whilst Batty and Longley
58
found similar values for city boundaries (e.g. Cardiff). In reality such measurements have almost always been made from models or representations of coastlines, lakeshores and borders (i.e. using maps or aerial photographs) and not from terrestrial surveys such as those of Mackenzie or
Mason and Dixon. What is clear is that realworld boundaries have complex structures and are generally unlike classical curves, being nowhere smooth or differentiable, nowhere sharply discernable, and frequently dynamic (e.g. with tides, traffic flows, plant growth, urban development) – indeed, closely examined they are not strictly linear at all, but more like a narrow band or area.
It is a reasonable question to ask whether fractallike or ‘selfsimilar’ behaviour is retained as the scale reduces to a finer and finer degree. It is immediately apparent that measurement becomes impossible in the ‘realworld’ as the scale is reduced – coastlines and lakeshores are dynamic, precisely what is to be measured becomes increasingly unclear and ultimately known to be not measurable.
The mathematical model of fractal lines and surfaces, which assumes an arbitrary fineness of measurement (behaviour of systems as lim
ε → 0) is similar to the mathematics of real analysis in this sense, i.e. a model with a set of assumptions, one of which is that limits can always be taken. To this degree, fractal analysis provides models and tools rather than definitive explanations. In addition, it should be noted that for curves such as
Brownian motion in the plane and other forms of random walk, such as random lattice walks and the trails left by insect, animal or human walks, selfcrossing of paths is normal. Measurement of the length of such paths by the area methods described earlier
179
Distance, Path Measurement and Path Models
is not appropriate and incremental measurement must be used (i.e. calculating cumulative path length from the sum of the steps taken).
A major value of fractal analysis is in enabling us to distinguish between classes of curves: D=1 are in almost all cases simple (conventional mathematical) curves; 1 < D <
2 are distinctly fractal curves; D = 2 are planefilling curves, or in some senses could be regarded as area measures. A secondary benefit is its descriptive value over varying scales. If D is constant over a range of
η values it indicates selfsimilarity in the curve form; that is, a similarity in the form at various scales. For example, the latter is true for
Koch curves (Figure 527) where D = 1.2619 (log4/log3) for all
η. In a later, detailed study of boundaries, Longley and Batty
59
found that D appeared to vary over a range of sample scales and raise the question as to whether an alternative loglinear model might not be more appropriate and/or a multifractal interpretation of the data. Their study also noted that different measurement methods had substantially different processing overheads and at least one piecewise polygonal approximation method (which involved elimination of selected points) was unsatisfactory – gridbased approximation was found to be simple, computationally fast and not subject to such problems.
The dimension, D, does not describe or imply form or process – the same value for D may apply to completely different fractals and different generation models (e.g.
Brownian motion, DiffusionLimited Aggregation (DLA), Recursive Generation) may result in the same fractional dimension. For example, it has recently been proven that the dimension of the ‘hull’ or boundary of a Brownian process in the plane has HB dimension of 4/3, but clearly this does not imply that some coastlines or city boundaries are actually generated in this manner
60
.
180
Figure 527 Koch curve construction
Distance, Path Measurement and Path Models
S ds
The Koch curve or ‘snowflake’ is constructed from an equilateral triangle, where progressively
smaller (1/3
rd
size) triangles are added to each of its three initial sides in place of a segment of
the side. The length of the curve is unbounded, it is nowhere differentiable, but the area enclosed by the curve is finite, with the formula based on the initial side length, s, being: ds
lim
0
Area
=
2
s
2
5
3
5.5 Selfavoiding and selfattracting random walks
Measurement of path length depends upon an agreed set of rules for carrying out the measurement, selection of sampling interval(s) and determination of the embedding space (e.g. 2D versus 3D). It also assumes that we have a clear understanding of where the path is and in general terms, what it is like. In the latter case, classical or fractal models are the main alternatives.
A set of models of particular interest to geographic research is known as selfavoiding
(random) walks (SAWs). These are random or pseudorandom walks, paths or trees that do not cross themselves, unlike some of those discussed above. There has been extensive analysis of SAWs, especially in the physical and biological sciences such as physics, polymer chemistry and biochemistry, but also in areas such as statistics, economics, financial modelling, engineering and robotics. SAWs have received limited attention in geographic studies. Most analyses focus on latticebased SAWs using square or triangular lattices, but some studies of unconstrained (free space) SAWs have been carried out.
181
Distance, Path Measurement and Path Models
Random walks typically consist of a given start point and a set of rules which determine their behaviour (e.g. steps, directions, selfavoidance, selfattraction, bifurcation/spawning of children). In the plane it has been demonstrated that both unconstrained random walks (e.g. Brownian motion, which is not selfavoiding) and selfavoiding lattice walks will pass through every point (in the plane, on an arbitrarily fine lattice) as the number of steps, n, tends to infinity. A number of other interesting results have been produced, including analysis of the number of possible selfavoiding paths on a lattice. With a 6 x 6 square lattice the number of possible selfavoiding paths from (0,0) to (6,6) is over 1 million. At first sight this suggests that search methods for optimal paths on lattices may be very difficult unless additional constraints and rules are applied  however, efficient algorithms exist for such problems with solution times related to the square of the lattice size or better (single point to all other nodes). Note also that the specification of a random walk does not include a goal or destination as an explicit input. In general it is not possible to determine how many steps will be required to reach a target point or whether the route taken is optimal. It is often possible, however, to provide statistical estimates (for example of the expected path length or direction after a number of steps) and simulations can provide estimates of means and variances where analytical results are not forthcoming.
For geographic analysis several types of SAW are worthy of closer attention. These include directed (or correlated) random walks, directed dislocation walks and random trees, each of which is described briefly below and some are explored further later in this study. For convenience we shall use Cartesian coordinates (x,y) in the plane, but polar (r,
θ) or spherical coordinates could equally well be used, depending upon the requirements:
•
Directed random walks
– this process involves incrementing x by a fixed or positive random value (optionally constrained to lie in a predefined range, e.g. x
∈ [0,1]); y is incremented by a random amount in the same manner but positive and negative values are accepted, e.g. y
∈ [1,1]. The average position of y after a large number of steps will tend to 0. If both x and y steps are fixed the result is essentially a square lattice walk. Note that this process, and the walk described
182
Distance, Path Measurement and Path Models below, generate single yvalues for each xvalue, and as such are a very restricted subset of possible random walks.
•
Directed dislocation walks
– this process is the same as the above but constrained to a single direction (quadrant) for y, e.g. y
∈ [0,1]. The dislocation is then similar to a random walk down a tilted surface, a sideways impact on a geological structure or a distortion of a populated landscape by a pull towards a dominant conurbation. A more general random walk model of both these types is:
•
Correlated random walks (
CRWs). This set of models assumes that steps may be of fixed or variable length, but that the direction of travel is determined by the previous direction of travel plus or minus a random angular variable drawn from a range (e.g.
±60°) and distribution specified in advance (typically a Uniform or
Normal distribution). CRW models, which have been used widely in the sudy of insect and animal dispersion, and on a limited basis to serial crime behaviour, are not necessarily selfavoiding (but constraints can be applied to ensure they satisfy this additional requirement).
•
Random tree walks
 the directed walk model can be modified such that both x and y increments are repeated until either an obstacle, a solution space boundary or another random line or branch is reached or approached to within a specified distance. At this point the walk must stop, go back one or more steps and commence again with new x and y values. This kind of walk generates selfavoiding random trees (SARTs). A variety of additional rules and constraints may apply in such cases, such as defining what ‘reached’ means (setting proximity measures) and how new branches or children are to be generated. Use of SARTs is discussed in greater detail in Sections 7.5 and 8.2 where variants of the RRT algorithm (Rapidlyexploring Random Trees) are discussed
61
.
Examples of simple random walks are shown in Figure 528. The first has been generated in the manner described by Venn (of Venn diagram fame) in the 19 th century, selecting directions from an 8sided die and random distances in the selected direction. This clearly is not selfavoiding. The second example shows random positive increments in x and random +/ values for y, yielding a curve of market pricelike movements (strictly speaking, a one dimensional random walk).
183
Figure 528 Simple random walks
A. Simple random walk – after Venn
Distance, Path Measurement and Path Models
B. Directed simple random walk
As has been noted earlier, in 2space (or nspace) random walks have a startpoint but no definitive endpoint. However, it is possible to create random trees simultaneously from more than one point (in sequence) and apply proximity rules to avoid obstacles and obtain endtoend connectivity.
Random trees can be constructed in many ways, but typically will consist of a branching process and selfavoiding walks. Such trees are the subject of analysis later in this study, and provide the basis for quite general and fast solutions to shortest path/least cost problems. A section of a random tree is illustrated in Figure 529. In this example a random tree is being used to explore the space of feasible solutions for a path. The brighter red line indicates a boundary constraint for the tree.
184
Figure 529 Random tree walk
Distance, Path Measurement and Path Models
There has been a great deal of research into selfavoiding random walks (SAWs) in 2 and 3space, much of it in chemistry, physics and mathematics rather than geographical studies. 2D SAWs are often generated by a socalled ‘pivot’ algorithm
62
, which has similarities to some fractal generation models
63
, but unlike the processes described earlier the pivot algorithm generates a true selfavoiding random walk in the plane or higher dimensions.
Pivot algorithms commence with a straight line of length N steps defined by N nodes
(optionally being points on a square lattice). A series of n transformations are made to the line (rotations and reflections) by choosing a node at random and a transformation at random. The line is checked after each transformation to ensure it is not selfcrossing.
With large n the ‘memory’ of the original configuration is lost and a truly random line is generated. Since it is possible to generate 2 or more such lines and then join their end points, checking that the result does not cross, a random polygon (region, island) of 2N steps can be generated with familiar geographic look. To avoid the need for rescaling and sharp corners, 3 or more SAWs or SAW segments would normally be required to create a suitable closed figure. An example is shown in (Figure 530), which we have created from segments of 3 SAWs, each of 1 million steps, generated by Kennedy
64
:
185
Figure 530 SAW generated island.
Distance, Path Measurement and Path Models
Selfattracting random walks also have interesting applications in a range of practical geographic applications. For example, because random walks will eventually pass through all possible points, a subset of paths commencing at any given point (an origin) will reach a second point (destination) in less steps/a shorter time than other paths. If the shortest such path is marked or recorded each time walks are simulated, subsequent random walks can be programmatically biased to use all or parts of this path with a selfreinforcing result. Such concepts have been used in a variety of traffic behaviour modelling and, more recently, in modelling crowd behaviour in builtup environments during special events (galleries, London streets)
65
. Similar methods could be used to simulate past (unknown, historic) flows as well as current or future (predicted) flows and congestion. As with all such methods, model assumptions and calibration become the key issues.
5.6 Networks and path length
The majority of the analyses in the present study are concerned with distance measurement and path location in inhomogeneous (and often bounded) freespace. Such problems may be static or dynamic (e.g. involving moving objects such as vehicles or robots, or involving varying flows over time) and may involve finding a valid solution,
186
Distance, Path Measurement and Path Models a good solution or an optimum solution (if one exists). In seeking a good or optimal solution the process may involve distance minimisation or generalised cost minimisation with or without additional constraints. In many cases such problems can be restated using a range of techniques having first partitioned the sample space into some form of grid or lattice. Solutions may then be sought which are restricted to traversing the edges and vertices of this grid or lattice, or which use this framework as a simplification of the space and allow paths to cross the zones created by partitioning.
Where the sample space can be represented in terms of a (directed) graph the techniques available for identifying specific paths through predefined networks are available.
There is a substantial (almost limitless) literature dealing with this subject
66
. The classic problem and algorithm, which originates with Dijkstra
67
, is that of finding the shortest path from a source node to one or more destination nodes. Algorithms that seek solutions to network problems of this type are often compared in terms of their performance by reference to the number of vertices or nodes (n) in the graph and the number of links or edges (e). The simple shortest path problem can now be solved in lineartime, i.e. as a linear multiple of the number of vertices and edges, or O(n + e)
68
.
In a wideranging review of the general least cost path problems (LCPP), Smith and
Gahinet
69
highlight the difficulties involved:
• possible complexity of the surfaces – smooth/analytical to complex natural or manmade, or hybrids of these
• possible complexity of the mobile object – size, nature of motion
• conditions that must be satisfied – e.g. minimum ‘cost’, curvature constraints, continuity, passing through/via a specified location
• computational cost (in memory, processor time) of alternative solution methods
They conclude that it is most unlikely there will ever be a single, unified theory for solving the general LCPP.
An additional body of research addresses distance and path problems in socalled
‘geometric domains’. Mitchell
70
provides an excellent recent review and summary of
187
Distance, Path Measurement and Path Models
many of these problems  in the text below references to solution algorithms and their efficiency derive from Mitchell’s paper, unless otherwise stated.
Mitchell defines a geometric domain as follows:
“In contrast to graphs, where the encoding of edges is explicit, a geometric instance of a shortest path problem is usually specified by giving geometric objects that implicitly encode the graph and its edge weights..… the most basic problem is: given a collection of obstacles, find a Euclidean shortest obstacleavoiding path between two given points”
Most geometric domain problems deal with path finding within closed polygonal areas, including one or more obstacles (high cost or prohibited zones). The majority of such problems relate to path finding in 2space, with the objective function being simple distance minimisation. Variants include: alternative objective functions (e.g. different metrics); constrained paths; dynamic environments; and known versus unknown terrains. It is known that in simple polygons there is always a unique shortest path
(series of rectilinear segments) between any source and destination point, but that in a general polygonal domain (i.e. one including obstacles or holes) there can be any number of optimal paths. The latter class of problems do permit path finding using current algorithms in solution times O(n
2
) or better.
Geometric domain algorithms typically involve decomposition of the solution space into geometric components, notably triangular regions, and then they search the graph comprised of the edges and vertices so generated in combination with those of the original polygon and obstacle set. A variant, known as the continuous Dijkstra method simulates a wavefront, using this to construct a form of geodesic map, rather as described by Huygens and adopted by Wartnz. The method has been applied successfully in the socalled weighted region metric problem (WRM) in which different zones within the solution space have different weights or velocities applied to them.
Smith and Gahinet (op. cit.) show that exact solution times for this class of problem are greater than O(n
8
).
Mitchell points out that some constrained problems, notably those involving constraints on the average curvature (important for dynamic problems, such as transport
188
Distance, Path Measurement and Path Models engineering and robotics) may result in problems that are either not solvable or are very difficult to solve. Likewise, solutions that seek two optimise two or more objective function criteria are in general not solvable in a provably optimal manner. A simple example for which no exact solution is known is the problem of finding a path that minimises both path length and the number of steps (links, or edges) in the path within a simple polygonal solution space.
Another interesting class of problems relates to path finding without a map, but assuming knowledge (by observation/sensing) of the solution space as it is searched, and optionally knowledge of the location of a target (e.g. its coordinates). This is rather like journeying to the South Pole from the Weddell Sea, or finding one’s way in London or New York without a street map. Curiously enough, it has been shown that in a rectilinear street pattern when the location of the target is not known in advance but must be sought, prior knowledge of its location does not assist the solution time.
5.7 Summary
Measurement of distance from maps and in the field has highlighted both practical and theoretical problems. Central to the measurement process is a clear definition and understanding of the path along which measurement is to be made. A range of models is available, from the classical to statistical, and from the continuous and differentiable to discrete lattice and fractal formulations. Each measurement technique involves sampling and approximation and each is therefore scaledependent as well as model or pathdependent. Where sampling takes place at multiple scales it might highlight selfsimilarity suggesting that the length is indeterminable and fractal like. But selfsimilarity at a range of scales does not imply selfsimilarity at all scales, nor does it exclude curves models other than fractals from consideration: given a sufficient number of fine steps SAWs exhibit very similar behaviour to fractals – subsamples of the walks are very similar to broader samples.
From these observations and those of the previous Chapter, we must conclude that regarding distance measures as certain and absolute is frequently unsafe  geographic distance should be viewed in terms of context, measurement method, scale, path model and dynamics as well as metric formulation and the derived numerical results.
189
Distance, Path Measurement and Path Models
Notes and References:
1
Gatrell A C
(1983) Distance and space: A geographic perspective, Clarendon Press, Oxford
2
Cliff A D, Haggett P
(1998) On complex geographic space: computing frameworks for spatial diffusion
processes, p. 254 of Ch.11 in Longley P A, Brooks S M, McDonnell R and Macmillan B
(1998)
Geocomputation: A Primer, J Wiley, New York
3
Beals R, Krantz D H
(1967) Metrics and geodesics induced by order relations, Mathematische
Zeitschrift, 101, 285298
4
Mandlebrot B B
(1977) Fractals: Form, chance and dimension, Freeman, San Francisco;
Goodchild M F
(1980) Approaches to the estimation of geographical measures: A fractal framework,
Math. Geol., 12, 8598
5
Richardson L F
(18811953) Richardson’s statistical work on patterns of war led him to examine the
borders of many countries which drew attention to problems of measuring their lengths – this work was highlighted after his death by B Mandelbrot in his development of fractals. Although Richardson published some 14 main works during his lifetime
(and a total of 137 books, articles and lesser
publications
) on a variety of subjects, Richardson’s collected works (Vols. 1 and 2) were not published
until 1993. Richardson was a mathematical meteorologist by profession and a poem extract for which he is now famous relates to this discipline: " Big whorls have little whorls that feed on their velocity, and little whorls have smaller whorls and so on to viscosity." Richardson also worked extensively in the field of numerical methods, as in his book: Richardson L F
(1922) Weather prediction by numerical
process. London, Cambridge Univ. Press
(also reprinted by Dover Publications, NY, 1965
and Bunge W
(1962) Theoretical geography, Lund Studies in Geography, C, 1, Lund, Sweden
6
Poincaré H
(1913) Mathematics and Science: Last Essays (Derniers Pensées). trans. J W Bolduc,
Dover Edition
(1963), New York, pp2728; Poincaré (18541912) is regarded as one of the co
discovers
(with Einstein and Lorentz) of the special theory of relativity.
7
see further, Annex 4  Traffic, teletraffic and statistical selfsimilarity
8
for an extensive, uptodate, analysis of these issues see:
Zhang J, Goodchild M F
(2002) Uncertainty in geographic information, Taylor and Francis, London
9
Veregin H
(1999) Data quality parameters, Ch.12, p.180, in Longley et al (1999) Geographic
Information Systems, Vol.1, 2 nd
ed. J Wiley, New York, and Zhang J, Goodchild M F
(2002) op. cit.,
Section 7.3
10
Bouwkamp C J
(1977) On the average distance between points in two coplanar nonoverlapping
circular disks, J. Applied Sci. and Engin., A, 2, 183186. Bouwkamp originally published this result in
1947 in connection with his work on Bessel functions
11
see for example, Vaughan R J
(1987) Urban spatial traffic patterns, Pion, London, pp.2234
12
Caspary W, Scheuring S
(1993) Positional accuracy in spatial databases, Comput. Environ. and
Urban Systems, 17, 103110
190
Distance, Path Measurement and Path Models
13
Roberts F S, Suppes P
(1967) Some problems in the geometry of visual perception, Synthese, 17, 173
201
14
Blank A A
(1958) Axiomatics of binocular vision: the foundations of metric geometry in relation to
space perception, J. Optical Soc. of America, 48, 328333 and Blank A A
(1958) Analysis of
experiments in binocular space perception, J. Optical Soc. of America, 48, 911925
15
Platt J R
(1960) How we see straight lines, Scientific American, 202, 6, 121129
16
Battro A M, Netto S P, Rozestraten R J
(1976) Riemannian geometries of variable curvature in visual
space: visual alleys, horopters and triangles in big open fields, Perception, 5, 923
Todd J T, Oomes A H J, Koenderink J J, Kappers A M L
(2002) On the affine structure of perceptual
space, Psychological Science
(submitted)
17
Hägerstrand T (1957) Migration and area, in “Migration in Sweden”, Lund Studies in Geog., B, 13,
27158, Lund, Sweden
18
Reichenbach H
(1925) Philosophy of space and time, Dover ed. (1958), New York
19
Defossez L
(1946) Les savants de XVIIe siècle et la mesure du temps, Editions du J. Suisse
d’Horlogorie et de Bijouterie, 258262, Lausanne
20
Good R
(ed.) (1982) Britten’s watch and clock maker’s handbook, dictionary and guide, 16
th
edition,
Bloomsbury Books, London
21
by straight, we mean a line of zero curvature within the space under consideration
(i.e. zero instrinsic
curvature
)
22
Daganzo C F
(2002) Reversibility of the timedependent shortest path problem, Transportation
Research, 36, 7, 665668
23
Churchman C W, Ratoosh P
(eds) (1959) Measurement: Definitions and theories, J Wiley, New York
24
De Barra G
(1974) Introduction to measure theory, Van Nostrand Rheinhold, London
25
Blumental L M
(1970) Distance geometry, Chelsea, New York. Blumental defined distance geometry
as the study of “that subgroup of homeomorphisms for which the distance between two points is an invariant”. Such a definition is too restrictive modern spatial analysis
26
Stevens S S
(1959) Measurement, psychophysics and utility, in Churchman and Ratoosh, op. cit., 18
63
27
Faden A M
(1977) Economics of space and time: the measuretheoretic foundations of social sciences,
Iowa State Univ. Press, Iowa
28
Noronha V, Church R L
(2002) Line referencing and other forms of location expression for
transportation, Final Report, Task Order 3021, California Department of Transportation, p 15.
Available from: www.ncgia.ucsb.edu/vital
29
Douglas D H, Peucker T K
(1973) Algorithms for the reduction of the number of points required to
represent a digitised line or its caricature, Can. Cartographer, 10, 2, 112122
30
Goodchild M F
(1980) Approaches to the estimation of geographical measures: A fractal framework,
Math. Geol.,12, 8598
191
Distance, Path Measurement and Path Models
31
Mandlebrot B B
(1970) Fractals: Form, chance and dimension. San Francisco, Freeman, p.29
32
Perkal J
(1956) On the
∈  length. Bull. Polish Acad. Sci. Cl., III, 4, 399403
33
Perkal J
(1958) On the length of empirical curves, Zastosowania Matematyki, III,34, 257286 (in
Polish, Trans. by R Jackowski with W Tobler
)
34
Crofton M
(1885) Probability, Encyclopædia Brittanica, 9
th
ed., Vol XIX, 771798
35
Santaló L A
(1953) An introduction to integral geometry, Hermann, Paris , p.11
36
Abeyata A M, Franklin J
(1998) The accuracy of vegetation stand boundaries derived from image
segmentation in a desert environment, Photogrammetric Engineering and Remote Sensing, 64, 5966.
This paper draws on the earlier work of Skidmore A K, Turner B J
(1992) Map accuracy assessment
using line intersect sampling, Photogrammetric Engineering and Remote Sensing, 58, 145357 which itself is based upon de Vries P G
(1986) Sampling theory for forest inventory, SpringerVerlag, Berlin
37
Steinhaus H
(1930) Akad. d. Wiss. Leipzig, Ber., 82, 120130 (cited in Kendall and Moran, op. cit.)
38
Kendall M G, Moran P A P
(1963) Geometrical Probability, Griffin, London, p.60
39
material in this Section and subsequent Sections that deal with Distance Transforms has recently been published as: de Smith (2004) Distance transforms as a new tool in spatial analysis, urban planning and GIS, Environment & Planning B, 31(1), 85104
40
for a clear summary and comparison of such methods see:
Leymarie F, Levine M D
(1992) A note on fast raster scan distance propagation on the discrete
rectangular lattice, Computer Vision, Graphics, Image Processing, 55, 1, 8494
MATLAB: General purpose software packages, such as MATLAB
(Image Processing Toolbox) include
facilities for performing distance transforms with metrics based upon exact Euclidean (EDT), City block
(L
1
), Chessboard (defined here as L
∞
), and QuasiEuclidean (local Euclidean) distances. See
www.mathworks.com
for more details. The MATLAB Image Toolbox uses the following reference for
2D exact Euclidean distance transforms:
Breu H, Gil J, Kirkpatrick D, Werman M (1995) Linear time Euclidean distance transform algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 529533
41
Okabe A, Boots B, Sugihara K, Chiu S N
(2000) Spatial tesselations: Concepts and applications of
Voronoi diagrams, 2 nd
ed, John Wiley, Chichester, England
42
Gorte B, Koolhoven W
(1990) Interpolation between isolines based on the Borgefors distance
transform, ITC Journal, 3, 245247. This paper describes a simple application of the
(5,7)/5 chamfer
with linear interpolation.
43
Verwer B J H, Verbeek P W, Dekker S T
(1989) An efficient uniform cost algorithm applied to
distance transforms, IEEE Trans. Pattern Analysis and Machine Intelligence, 11, 4, 425429
44
Butt M A, Maragos P
(1998) Optimal design of chamfer distance transforms, IEEE Transactions on
Image processing, 7, 14771484
45
Borgefors G
(1989) Distance transformations on hexagonal grids, Pattern Recog. Letters, 9, 97105
192
Distance, Path Measurement and Path Models
46
Cuisenaire O, Macq B
(1999) Fast and exact signed Euclidean distance transformation with linear
complexity, Proc. IEEE Intl. Conference on Acoustics, Speech and Signal Processing, 6, 32933296.
See also MATLAB reference, above, and Dr Cuisenaire’s website pages at École Polytechnique
Fédérale de Lausanne: http://ltswww.epfl.ch/~cuisenai/DT/
47
From the earliest ‘accurate’ surveys onwards measurements were always adjusted for slope
(reducing
the results
) to provide ‘true horizontal distance’.
48
Noronha V, Church R L
(2002) Line referencing and other forms of location expression for
transportation, Final Report , Task Order 3021, California Department of Transportation.
49
for a thorough discussion of leasteffort (energy minimising) paths across physical landscapes using allterrain/offroad vehicles, see: Rowe N C, Ross R S (1990) Optimal gridfree path planning across arbitrarily contoured terrain with anisotropic friction and gravity effects, IEEE Transactions on
Robotics and Automation, 6, 540553; and Rowe N C (1997) Obtaining Optimal MobileRobot Paths with NonSmooth Anisotropic Cost Functions Using QualitativeState Reasoning, Internat. J. Robot.
Res., 16(3), 375399; additional related papers by Rowe can be found under the Path Planning heading on http://www.cs.nps.navy.mil/people/faculty/rowe/index.html
50
Peano G
(1890) Math. Ann., 36, 157160
51
Netto E E
(1879) Jour. für Math., 86, 263268
52
Sierpinski W
(18821969). see Sierpinski W (1912) Bull. de l’Acad. des sciences de Cračowie, A, 462
478. Sierpinski triangles, which also exhibit fractional dimension, can be constructed both geometrically
(via a recurrence operation) and by the socalled ‘chaos game’ in which they appear as
a result of a random process.
53
Mandlebrot B B
(1970) Fractals: Form, chance and dimension. San Francisco, Freeman, p.10
54
HausdorffBesicovitch dimension D
HB
(after Mandelbrot op. cit. and others):
Let
∆ be an Edimensional Euclidean metric space and let h
k
(
ρ) = V
k
ρ
k
be the volume of ksphere of radius
ρ, where
V k
= Γ
( )
k
/
Γ
(
k
/ 2
+
1
)
and
Γ
(
n
)
=
∫
0
∞
x n
− 1
e
−
x dx
is the Gamma function with n>0 ;
Γ(1/2 )= √π and Γ(n+1)= nΓ(n)=n!.
Let S be a subset of
∆, for example S = a curve in the plane and ∆ = a Euclidean plane. Then S may be
covered by a finite number, n, of kspheres
(cf. the Pointillist method described in the Area
approximations Section
) with total’ volume’ (e.g. area) given by:
V
=
i n
∑
=
1
h
(
ρ
i
)
Define
ρ = sup ρ
i
and form the most economical covering of S possible, given
ρ, i.e.
V
*
= inf
ρ
i
< ρ
∑
h
(
ρ
i
)
193
Distance, Path Measurement and Path Models then limV* as
ρ tends to 0 defines the hmeasure of S; if there exists D such that the hmeasure is
infinite for k<D and zero for k>D then D is called the HausdorffBesicovitch dimension of S. It can be readily shown that is S is a selfsimilar set
(e.g. the Koch curve, Figure 527) then its selfsimilarity
dimension equals D. HB dimension is often written in the limit form:
D
HB
=−
ρ lim
0
+ ln
N
(
ρ
) ln
ρ
where N() is the number of kspheres required forcomplete coverage
CantorMinkowskiBouligand dimension
(D
MB
):
Let
∆ be an Edimensional Euclidean metric space and V
k
be the volume of ksphere as per the HB definition above. Let s
(
ρ) be the smoothed out version of the set S in ∆ (cf. the MinkowskiCantor
sausage described in the Area approximations Section
). The MinkowskiBouligand dimension of S is
defined as that value for k for which the upper and lower contents of S both exist and are equal, i.e. where:
lim
ρ
→ 0
{
sup
.volume(s( ρ)/V
E
−
k
ρ
E
−
k
}
=
ρ
lim
→ 0
{
inf
.volume(s( ρ)/V
E
−
k
ρ
E
−
k
}
55
due to
N F H von Koch
(
18701924
). This is an example of a continuous curve which is nowhere
differentiable
56
Mandlebrot B B
(
2002
) Gaussian selfaffinity and fractals, SpringerVerlag, p. 97
57
Håkanson L
(
1978
)
The length of closed geomorphic lines, Math. Geol., 10, 141167
58
Batty M, Longley P
(
1994
) Fractal Cities, Academic Press, London and San Diego
Longley P
(
2000
) Fractal analysis of digital spatial data, Ch12. in
Openshaw S, Abrahart R J
(
eds.
)
(
2000
)
GeoComputation, Taylor and Francis, London
59
Longley P, Batty M
(
1989
) On the fractal measurement of geographic boundaries, Geog. Anal., 21, 1,
4767
60
Mandlebrot B B
(
2002
) Gaussian selfaffinity and fractals, SpringerVerlag, p. 43
61
Kuffner J J, LaValle S M
(
2000
) RRT Connect – An efficient approach to single query path planning,
Proc. IEEE Int'l Conf. on Robotics and Automation (ICRA'2000), San Francisco, CA, April 2000
See also: The “RRT Page” maintained by LaValle at http://msl.cs.uiuc.edu/rrt/ and Kuffner’s automated animated characters page at: http://robotics.stanford.edu/~kuffner/anim/index.html
62
Madras N, Sokal A D
(
1988
) The pivot algorithm: A highly efficient Monte Carlo method for the self
avoiding walk, J. Stat. Phys., 50, 109186
63
Mandlebrot B B
(
2002
) Gaussian selfaffinity and fractals, SpringerVerlag, Chapter *H1
64
Kennedy T
(
2002
) A faster implementation of the pivot algorithm for selfavoiding walks. J. Stat.
Phys., 106, 407429; the three SAW segments are from the set of images provided by Kennedy at: http://hedgehog.math.arizona.edu/~tgk/saw_pictures/index.html
. Closed figures with random boundaries may generated using other random procedures in the plane, but these are not discussed further in this study.
194
Distance, Path Measurement and Path Models
65
Batty M, Jiang B, ThurstainGoodwin M
(
1998
) Local movement: agentbased models of pedestrian
movement, Working Paper 4, CASA, UCL London
see also
Batty M
(
2000
) Geocomputation using cellular automation, Ch.5 in
Openshaw S, Abrahart R
J
(
eds.
) (
2000
)
GeoComputation, Taylor and Francis, London
66
http://www.nada.kth.se/~viggo/wwwcompendium/wwwcompendium.html
67
Dijkstra E W
(
1959
) A note on two problems in connexion with graphs, Numerische Mathematik, 1,
269271
68
O
(
..
) or “big O” notation is used to describe the upper bound for the computational ‘order’ of time
and space required for exact or approximate solution of a given network problem. Figures quoted in
this Section are based on those provided by Mitchell (op. cit.)
69
Smith T R, Gahinet P M
(
1988
) Least cost paths through space, Ch.11 in
Coffey W
(
ed.
) (
1988
)
Geographical systems and systems of geography: Essays in honour of William Warntz, Univ. of W.
Ontario, Geography Dept., London, Ontario
70
Mitchell J S B
(
1998
) Geometric shortest paths and network optimisation, Research Rpt, Dept of
Applied Math. and Statistics, SUNY, Stony Brook, NY., 62 pages, 393 references; also published in the
Handbook of Computational Geometry, Elsevier Science, North Holland, Amsterdam
195
Distance Statistics
6 Distance Statistics
1
This Chapter (and the associated Annex 1) examines distance statistics: in particular
average distances, and tests of complete spatial randomness (CSR) that utilise distance
measures. We seek to illustrate both the scope and applicability of statistical distance measures to a variety of practical spatial problems. A wide range of results are derived and presented in a manner that seeks to clarify the underlying assumptions and the logic of the models used.
Although a number of statistical methods have been discussed in earlier Sections
(sampling, pointpair uncertainty, random walks), the central focus has been on
deterministic problems. We now focus upon statistically defined processes and associated distance measures. Much of the research in this area is concerned with point patterns and processes – in many cases it is the relationship between each point and its
nearest neighbour(s) that is of interest from a process perspective, so we start by
reviewing this area. We then examine and develop a number of approaches to the distribution of the distance between random pairs of points in bounded regions, subjects that arise from research in the fields of geometrical probability and trip distribution theory.
In recent years there has been a move away from these ‘direct’ statistical measures towards more exploratory approaches based on analysis of spatial intensity
2
and spatial autoregression
3
, reflecting the difficulty of extending classical distance statistics models to many realworld situations. These more recent techniques still rely on distance
measures, distance decay models and density (intensity) estimation – as such, much of
the discussion in the previous two Chapters, which is extended in the present Chapter, applies to these newer approaches.
196
Distance Statistics
6.1 Introduction
There is an enormous body of literature dealing with distance distributions reflecting their applicability in many disciplines. Much of this research has assumed that the distance metric to be used is Euclidean and that the sample space is unbounded and uniform. A number of results are presented in the following sections, and others derived, where the metric is more general and in which the sample space is bounded.
These factors are shown to result in alteration of the expected distance (and squared distance), which result in divergence from the values for key parameters (such as mean values and ratios) that conventional models suggest. Statistical analyses of spatial datasets utilising interpoint distance measures, directly or indirectly must take into consideration the effects described above. This can be achieved either by using appropriate measures and modified statistical results or seek to eliminate such effects, e.g. by correction of distance calculations, topological transforms, sampling subsets of points well away from region borders, systematic subdivision of sampled regions, and/or use of functional distance measures and Monte Carlo simulation techniques.
The set of distances defined by the spacing of randomly selected pairs of points within a bounded region represents a distribution whose frequency varies with line length, as measured by some agreed metric. There have been at least four approaches to the study of these ‘finite’ distance distributions:
(i) tests of randomness
(ii) geometric probability studies
(iii) shape analysis, and
(iv) trip distribution analysis
Each approach is discussed in the subsections below in which we describe existing work in this field and develop a number of extensions to the theoretical findings. The variety of results and approaches taken illustrate the many ways in which distance distributions and their central moments (e.g. mean, variance) can be applied to practical problems.
197
Distance Statistics
In botanical, biological and geographic research distancebased tests of point pattern randomness have historically been based on n
th
order nearest neighbour (nn) statistics
(distributions and central moments) in infinite (unbounded) Euclidean spaces. More recently there has been consideration of nearest neighbour relations in bounded regions.
The majority of analyses focus on testing mapped point patterns against a hypothesis of
“Complete Spatial Randomness” (CSR), but other hypotheses, such as nonstationary
Poisson processes, have also received attention. Separately, there has been a lively debate on the treatment of planar point patterns which may exhibit clustering
4
. This latter area is not covered in detail here, especially as a number of the methods used do not rely on explicit distance measures for their analysis. Where such methods do adopt distance measurements they almost exclusively rely on Euclidean measures and often use (edge corrected) circular sampling regions (e.g. techniques such as Ripley’s Kstatistic and kernelbased density estimation).
The formal definitions of dimension in the footnotes of the previous Chapter utilised a general expression, V
k
, in the volume of a kdimensional hypersphere of radius, r, where:
V k r k
= π
k
/ 2
r k
/
Γ
(
k
/ 2
+
1
)
The formula yields the familiar results for a line, circle and sphere:
V
1
r
1
=2r; V
2
r
2
=
πr
2
; V
3
r
3
=4
πr
3
/3
The results derived in Annex 1  Nearest neighbour statistics and earlier by Dacey
5
, use this general expression in the formulae for the distribution of distances to the n
th
nearest neighbour in kdimensional space under the CSR hypothesis (which we show is related to the
χ
2
distribution). From this result we derive the following general expression for the mean distance to the n
th
nearest neighbour in kdimensional space (the general expression for higher crude moments is also provided in the Annex):
198
Distance Statistics r n
,
k
=
Γ
(
n
Γ
(
+
1
n
)
/
k
)
1
(
λ
V k
)
1 /
k
where
λ is the point density. In two dimensions this expression can be simplified to
6
:
r n
, 2
=
n
( 2
n
!
) / 2
n n
!
2
Thus, for example, we have:
λ
r
1 , 1
=
1 / 2
λ
,
r
1 , 2
=
1 / 2
λ
,
r
2 , 2
=
3 / 4
λ
Tests of randomness may then utilise the observed distribution of pointevent or eventevent distances and compare the observed mean values with those expected under CSR, e.g. as a ratio, or more powerfully, by comparing (transformed) sample distributions with the percentage points of the
χ
2
distribution or with the Normal distribution for larger samples/higher order neighbours for which the Normal approximation to
χ
2
is valid.
The point density, λ, is typically an unknown and therefore its determination in practical problems is an important issue. In fieldbased studies (rather than mapped point sets) the CSR hypothesis is sometimes the presumed distribution and nearestneighbour measurements used to determine λ. The approach is effective for relative dense, static objects (e.g. natural forest stands) but of limited use in lower density and dynamic objects (e.g. estimation of populations of animals or fish). In the latter instance density estimation is based on the assumption that not all events are detected and distances to events are regarded as samples from a detection function or distribution, which must be modelled
7
. Work in this latter area is almost exclusively based on Euclidean distance measurement without adjustment for boundary issues.
The issue of point density estimation in mapped studies can be illustrated for the case of the line: if N points are dropped at random on a line of length, L, then the expected distance to the n
th
nearest neighbours may be calculated. However, because the line is of
199
Distance Statistics
finite length some nearest neighbours to a base point will lie outside of the boundary and mismeasurements will be made. The error thus induced is equivalent to miscalculating the density, λ, and results in an increase in the measured mean and variance. The formula λ
0
= N
2
/(N+n)L (where N
≥2), i.e. adjusting the theoretical density formula by N/(N+n), has been found to provide a better estimate of the point density than N/L by the present author using simulation. This estimate of density is very good for n small and also for larger n when N>10.
In two dimensions similar boundary effects exist and may be more or less serious depending on: the size and shape of the sample area; the event density; and the orderneighbour measurements being taken. Donnelly
8
has used simulation techniques to provide adjusted estimates for rectangular sample regions with sides of length a and b, and perimeter length P=2(a+b). Letting λ=N/ab (number of points/sample area) he proposes using the adjusted value:
r
1 , 2
=
1 / 2
λ +
0 .
0514
P
/
N
+
0 .
041
P
/
N
3 / 2 which approximates to
r
1 , 2
≈
1 / 2
λ +
1 / 5
λ
for the unit square with N>10
Donnelly also provides an edgecorrected estimate for the standard deviation which may then be used to compute a standardised measure (zscore) for significance testing. If P is large in relation to N, the adjustment is substantial. Furthermore, if the sample region is long and thin (e.g. 10:1 or greater) Donnelly’s adjustment becomes unreasonably large and one must question whether alternative approaches and models (e.g. Monte Carlo simulation) are preferable
9
.
Extensions of formal analysis to more complex shapes and alternative distance metrics have been limited. A simple extension to hyperellipsoids and hyperspheroids with the
Euclidean metric will have larger values for the mean and variance than hyperspherical measures, and an adjusted volume measure based on the formula for V
k
but where the radius r is replaced by a set of
ρ
i
values which are the semiaxes of the figures. If all the
200
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Related manuals
advertisement