A project report submitted
submitted in partial fulfillments of the requirements
for the degree of
Bachelor Of Technology in Electrical Engineering
Mr. Pratik Panchabhaiyye
Roll No. : 10502010
Mr. Ravi Shekhar Chaudhary
Roll No. : 10502051
Guided by
Prof. K.R. Subhashini
Deptt. Of Electrical Engineering
NIT Rourkela
National Institute Of Technology Rourkela,
Rourkela, Orissa - 769008
Road Extraction From Satellite Imagery [2008-09]
This is to certify that Mr. Ravi Shekhar Chaudhary of 8th semester Electrical
Engineering has completed the B.Tech Project on
“Automatic Main Road
Extraction From High Resolution Satellite Imagery” in a satisfactory manner.
This project is submitted in partial fulfillment towards the Bachelor’s degree
in Electrical Engineering as prescribed by the institute during the year 2008-2009.
PROF. K.R. Subhashini
Project Guide
Deptt Of Electrical Engg
Road Extraction From Satellite Imagery [2008-09]
This is to certify that Mr. Pratik Panchabhaiyye of 8th semester Electrical
Engineering has completed the B.Tech Project on
“Automatic Main Road
Extraction From High Resolution Satellite Imagery” in a satisfactory manner.
This project is submitted in partial fulfillment towards the Bachelor’s degree
in Electrical Engineering as prescribed by the institute during the year 2008-2009.
PROF. K.R. Subhashini
Project Guide
Deptt Of Electrical Engg
Road Extraction From Satellite Imagery [2008-09]
Road information is essential for automatic GIS (geographical information system) data
acquisition, transportation and urban planning. Automatic road (network) detection from high
resolution satellite imagery will hold great potential for significant reduction of database
development/updating cost and turnaround time. From so called low level feature detection to
high level context supported grouping, so many algorithms and methodologies have been
presented for this purpose. There is not any practical system that can fully automatically extract
road network from space imagery for the purpose of automatic mapping. This paper presents
the methodology of automatic main road detection from high resolution satellite IKONOS
imagery. The strategies include multiresolution or image pyramid method, Gaussian blurring
and the line finder using 1-dimemsional template correlation filter, line segment grouping and
multi-layer result integration. Multi-layer or multi-resolution method for road extraction is a
very effective strategy to save processing time and improve robustness. To realize the strategy,
the original IKONOS image is compressed into different corresponding image resolution so that
an image pyramid is generated; after that the line finder of 1-dimemsional template correlation
filter after Gaussian blurring filtering is applied to detect the road centerline. Extracted
centerline segments belong to or do not belong to roads. There are two ways to identify the
attributes of the segments, the one is using segment grouping to form longer line segments and
assign a possibility to the segment depending on the length and other geometric and
photometric attribute of the segment, for example the longer segment means bigger possibility
of being road. Perceptual-grouping based method is used for road segment linking by a
possibility model that takes multi-information into account; here the clues existing in the gaps
Road Extraction From Satellite Imagery [2008-09]
are considered. Another way to identify the segments is feature detection back-to-higher
resolution layer from the image pyramid.
Road Extraction From Satellite Imagery [2008-09]
1. Introduction
Modern digital technology has made it possible to manipulate multi-dimensional signals with
systems that range from simple digital circuits to advanced parallel computers. The goal of this
manipulation can be divided into three categories:
• Image Processing image in → image out
• Image Analysis image in → measurements out
• Image Understanding image in → high-level description out
Here we will focus on the fundamental concepts of image processing.
Image understanding requires an approach that differs fundamentally from the theme of our
discussion. Further, we will restrict ourselves to two–dimensional (2D) image processing
although most of the concepts and techniques that are to be described can be extended easily
to three or more dimensions.
We begin with certain basic definitions. An image defined in the “real world” is considered to
be a function of two real variables, for example, a(x,y) with a as the amplitude (e.g. brightness)
of the image at the real coordinate position (x,y). An image may be considered to contain subimages sometimes referred to as
regions–of–interest, ROIs, or simply regions. This concept reflects the fact that images
frequently contain collections of objects each of which can be the basis for a region. In a
sophisticated image processing system it should be possible to apply specific image processing
operations to selected regions. Thus one part of an image (region) might be processed to
suppress motion blur while another part might be processed to improve color rendition. The
amplitudes of a given image will almost always be either real numbers or integer numbers. The
latter is usually a result of a quantization process that converts a continuous range (say,
between 0 and 100%) to a discrete number of levels. In certain image-forming processes,
however, the signal may involve photon counting which implies that the amplitude would be
inherently quantized. In other image forming procedures, such as magnetic resonance imaging,
the direct physical measurement yields a complex number in the form of a real magnitude and
a real phase.
Road Extraction From Satellite Imagery [2008-09]
2. Digital Image Definitions
A digital image a[m,n] described in a 2D discrete space is derived from an analog image a(x,y) in
a 2D continuous space through a sampling process that is frequently referred to as digitization.
The mathematics of that sampling process will be described in Section 5. For now we will look
at some basic definitions
associated with the digital image. The effect of digitization is shown in Figure 1.
The 2D continuous image a(x,y) is divided into N rows and M columns. The intersection of a row
and a column is termed a pixel. The value assigned to the
integer coordinates [m,n] with {m=0,1,2,…,M–1} and {n=0,1,2,…,N–1} is a[m,n].
In fact, in most cases a(x,y)—which we might consider to be the physical signal that impinges
on the face of a 2D sensor—is actually a function of many variables including depth (z), color (l),
and time (t). Unless otherwise stated, we will consider the case of 2D, monochromatic, static
images in this chapter.
Value = a(x, y, z, l, t)
Figure 2.1: Digitization of a continuous image. The pixel at coordinates
[m=10, n=3] has the integer brightness value 110.
The image shown in Figure 2.1 has been divided into N = 16 rows and M = 16 columns. The
value assigned to every pixel is the average brightness in the pixel rounded to the nearest
integer value. The process of representing the amplitude of the 2D signal at a given coordinate
as an integer value with L different gray levels is usually referred to as amplitude quantization
or simply quantization.
Road Extraction From Satellite Imagery [2008-09]
There are standard values for the various parameters encountered in digital image processing.
These values can be caused by video standards, by algorithmic requirements, or by the desire
to keep digital circuitry simple. Table 1 gives some commonly encountered problems.
Parameters Symbol Typical
Grey Levels
Table 1: Common values of digital image parameters
Quite frequently we see cases of M=N=2K where {K = 8,9,10}. This can be motivated by digital
circuitry or by the use of certain algorithms such as the (fast)
Fourier transforms. The number of distinct gray levels is usually a power of 2, that is, L=2B
where B is the number of bits in the binary representation of the brightness levels. When B>1
we speak of a gray-level image; when B=1 we speak of a binary image. In a binary image there
are just two gray levels which can be referred to, for example, as “black” and “white” or “0” and
There is a variety of ways to classify and characterize image operations. The reason for doing so
is to understand what type of results we might expect to achieve with a given type of operation
or what might be the computational burden associated with a given operation.
2.2.1 Types of operations
The types of operations that can be applied to digital images to transform an input image
a[m,n] into an output image b[m,n] (or another representation) can be classified into three
categories as shown in Table 2.
Road Extraction From Satellite Imagery [2008-09]
• Point
– the output value at a specific coordinate is dependent
only on the input value at that same
• Local
– the output value at a specific coordinate is dependent on
the input values in the neighborhood of that same
• Global – the output value at a specific coordinate is dependent on
all the values in the input image.
Table 2: Types of image operations. Image size = N ´ N; neighborhood size
= P ´ P. Note that the complexity is specified in operations per pixel.
2.2.2 Types of neighborhoods
Neighborhood operations play a key role in modern digital image processing. It is therefore
important to understand how images can be sampled and how that relates to the various
neighborhoods that can be used to process an image.
• Rectangular sampling – In most cases, images are sampled by laying a rectangular grid over an
image as illustrated in Figure 1. This results in the type of sampling shown in Figure 3ab.
• Hexagonal sampling – An alternative sampling scheme is shown in Figure 3c and is termed
hexagonal sampling.
Road Extraction From Satellite Imagery [2008-09]
Both sampling schemes have been studied extensively [1] and both represent a possible
periodic tiling of the continuous image space. We will restrict our attention, however, to only
rectangular sampling as it remains, due to hardware and software considerations, the method
of choice.
Local operations produce an output pixel value b[m=mo,n=no] based upon the pixel values in
the neighborhood of a[m=mo,n=no]. Some of the most common neighborhoods are the 4connected neighborhood and the 8-connected neighborhood in the case of rectangular
sampling and the 6-connected neighborhood in the case of hexagonal sampling illustrated in
Figure 3.
Figure 2.2a
Rectangular sampling
Figure 2.2b
Rectangular sampling
Road Extraction From Satellite Imagery [2008-09]
3. Tools
Certain tools are central to the processing of digital images. These include
Mathematical tools such as convolution, Fourier analysis, and statistical descriptions, and
manipulative tools such as chain codes and run codes. We will present these tools without any
specific motivation. The motivation will follow in later sections.
There are several possible notations to indicate the convolution of two (multidimensional)
signals to produce an output signal. The most common are:
c = a⊗ b = a * b
We shall use the first form, c = a *b, with the following formal definitions.
In 2D continuous space:
c(x, y) = a(x, y) * b(x, y) = ( , )(
− , − ) (2)
In 2D discrete space:
c[m,n] = a[m,n] ⊗ b[m, n] = ∑∑a[j,k]b[m- j,n - k]
There are a number of important mathematical properties associated with
• Convolution is commutative.
10 | P a g e
Road Extraction From Satellite Imagery [2008-09]
• Convolution is associative.
c = a ⊗ (b ⊗ d) = (a ⊗ b) ⊗ d = a ⊗ b ⊗ d
• Convolution is distributive.
c = a ⊗ (b + d) = (a ⊗ b) + (a ⊗ d)
where a, b, c, and d are all images, either continuous or discrete.
The Fourier transform produces another representation of a signal, specifically a representation
as a weighted sum of complex exponentials. Because of Euler’s formula:
e jq = cos(q) + jsin(q)
where j = -1, we can say that the Fourier transform produces a representation of a (2D) signal
as a weighted sum of sines and cosines. The defining formulas for the forward Fourier and the
inverse Fourier transforms are as follows. Given an image a and its Fourier transform A, then
the forward transform goes from the spatial domain (either continuous or discrete) to the
frequency domain which is always continuous.
A = {a}
Forward –
The inverse Fourier transform goes from the frequency domain back to the spatial
a =F -1{A}
Inverse –
The Fourier transform is a unique and invertible operation so that:
a =F -1{F {a}} and A = F{F -1{A}}
The specific formulas for transforming back and forth between the spatial domain and the
frequency domain are given below.
In 2D continuous space:
Forward –
A(u,v) = (
, ) () Inverse –
, ) = 1/4 (, ) () (11)
In 2D discrete space:
11 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Forward –
(Ω, Ψ) = ∑
.% a'm, n*e
Inverse –
'/, 0* = 1/4 12 12 (Ω, Ψ) (Ω3-4) ΩΨ
There are a variety of properties associated with the Fourier transform and the
inverse Fourier transform. The following are some of the most relevant for digital
image processing.
• The Fourier transform is, in general, a complex function of the real frequency
variables. As such the transform can be written in terms of its magnitude and
A(u,v) = |(, )| 6(,)
A(W,Y) = |(Ω, Ψ)| 6(Ω,-)
• A 2D signal can also be complex and thus written in terms of its magnitude and phase.
, ) = |(
, )| 7(,)
'/, 0* = |'/, 0*| 7'3,4*
• If a 2D signal is real, then the Fourier transform has certain symmetries.
(, ) = ∗ (−, −)
(Ω, Ψ) = ∗ (−Ω, −Ψ)
The symbol (*) indicates complex conjugation. For real signals eq. (17) leads
directly to:
|(, ) | = |(−, −) |
• If a 2D signal is real and even, then the Fourier transform is real and even.
A(u,v) = A(-u,-v)
A(Ω, Ψ) = A(-Ω,- Ψ)
• The Fourier and the inverse Fourier transforms are linear operations.
F {w1a + w2b} =F {w1a} + F{w2b} = w1A +w2 B
12 | P a g e
Road Extraction From Satellite Imagery [2008-09]
F-1{w1A + w2B} = F-1 {w1A} + F-1{w2B} = w1a +w2b
where a and b are 2D signals (images) and w1 and w2 are arbitrary, complex
• The Fourier transform in discrete space, A(Ω, Ψ), is periodic in both W and Y.
Both periods are 2π.
A(Ω+ 2πj,Ψ + 2πk) = A(Ω, Ψ)
j, k integers
• The energy, E, in a signal can be measured either in the spatial domain or the
frequency domain. For a signal with finite energy:
Parseval’s theorem (2D continuous space):
9 = 1 1 |(
, )| = 1/4 1
1 |(, )| (22)
This “signal energy” is not to be confused with the physical energy in the phenomenon that
produced the signal. If, for example, the value a[m,n] represents a photon count, then the
physical energy is proportional to the amplitude, a, and not the square of the amplitude. This is
generally the case in video imaging.
3.4.1 Importance of phase and magnitude
Equation (15) indicates that the Fourier transform of an image can be complex.
This is illustrated below in Figures 4a-c. Figure 4a shows the original image a[m,n], Figure 4b the
magnitude in a scaled form as log(|A(Ω,Ψ)|), and Figure 4c the phase j(Ω,Ψ).
Figure 4a
Figure 4b
Figure 4c
13 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Both the magnitude and the phase functions are necessary for the complete reconstruction of
an image from its Fourier transform. Figure 5a shows what happens when Figure 4a is restored
solely on the basis of the magnitude information and Figure 5b shows what happens when
Figure 4a is restored solely on the basis of the phase information.
Figure 5a
Φ(Ω,Ψ) = 0
Figure 5b
|A(Ω,Ψ)| = constant
Neither the magnitude information nor the phase information is sufficient to restore the image.
The magnitude–only image (Figure 5a) is unrecognizable and has severe dynamic range
problems. The phase-only image (Figure 5b) is barely recognizable, that is, severely degraded in
3.4.2 Circularly symmetric signals
An arbitrary 2D signal a(x,y) can always be written in a polar coordinate system as
a(r,θ). When the 2D signal exhibits a circular symmetry this means that:
a(x, y) = a(r,θ) = a(r )
where r2 = x2 + y2 and tanθ = y/x. As a number of physical systems such as lenses
exhibit circular symmetry, it is useful to be able to compute an appropriate Fourier
The Fourier transform A(u, v) can be written in polar coordinates A(ωr,ξ) and then,
for a circularly symmetric signal, rewritten as a Hankel transform:
A(u,v) ={a(x,y)} = 2π 1; (:) Jo(ωrr)rdr = A(ωr)
where ωr2 = u2 + v2 and tanξ = v u and Jo(•) is a Bessel function of the first kind
of order zero.
The inverse Hankel transform is given by:
14 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(:) = 1/2 1; (ωr )Jo(ωrr)ωrdωr
The Fourier transform of a circularly symmetric 2D signal is a function of only the radial
frequency, ωr. The dependence on the angular frequency, ξ, has vanished.
Further, if a(x,y) = a(r) is real, then it is automatically even due to the circular symmetry.
According to equation (19), A(ωr) will then be real and even.
3.4.3 Examples of 2D signals and transforms
Table 4 shows some basic and useful signals and their 2D Fourier transforms. In
using the table entries in the remainder of this chapter we will refer to a spatial
domain term as the point spread function (PSF) or the 2D impulse response and its
Fourier transforms as the optical transfer function (OTF) or simply transfer
function. Two standard signals used in this table are u(•), the unit step function, and
J1(•), the Bessel function of the first kind. Circularly symmetric signals are treated
as functions of r as in eq. (23).
15 | P a g e
Road Extraction From Satellite Imagery [2008-09]
As noted in the preceding sections of this chapter, the reverse address computation process
usually results in an address result lying between known pixel values of an input image. Thus it
is necessary to estimate the unknown pixel amplitude from its known neighbors. This process is
related to the image reconstruction task, as described in Chapter 4, in which a spacecontinuous display is generated from an array of image samples. However, the geometrical
resampling process is usually not spatially regular. Furthermore, the process is discrete to
discrete; only one output pixel is produced for each input address.
In this section, consideration is given to the general geometrical resampling process in which
output pixels are estimated by interpolation of input pixels. The special, but common case, of
image magnification by an integer zooming factor is also discussed. In this case, it is possible to
perform pixel estimation by convolution.
4.1. Interpolation Methods
The simplest form of resampling interpolation is to choose the amplitude of an output image
pixel to be the amplitude of the input pixel nearest to the reverse address. This process, called
nearest-neighbor interpolation, can result in a spatial offset error by as much as pixel units. The
resampling interpolation error can be significantly reduced by utilizing all four nearest
neighbors in the interpolation. A common approach, called bilinear interpolation, is to
interpolate linearly along each row of an image and then interpolate that result linearly in the
columnar direction. The estimated pixel is easily found to be
F(p2, q2)=(1 – a)[(1 – b)F(p, q)+ bF(p, q + 1)]
+ a[(1 – b)F(p+ 1, q)+ bF(p + 1, q + 1)]
Although the horizontal and vertical interpolation operations are each linear, in general,
their sequential application results in a nonlinear surface fit between the four neighboring
16 | P a g e
Road Extraction From Satellite Imagery [2008-09]
The expression for bilinear interpolation of Eq. 4.5-1 can be generalized for any
interpolation function that is zero-valued outside the range of sample spacing. With this
generalization, interpolation can be considered as the summing of four weighted interpolation
functions as given by
F(p2, q2)= F(p, q)R{–a}R{b}+F(p, q+ 1)R{–a}R{–(1 – b)}
+F(p + 1, q)R{1 – a}R{b}+F(p+ 1, q + 1)R{1 – a}R{–(1 – b)}
In the special case of linear interpolation, , where is defined in Eq. 4.3-2. Typically, for reasons
of computational complexity, resampling interpolation is limited to a pixel neighborhood.
Figure 13.5-2 defines a generalized bicubic interpolation neighborhood in which the pixel is the
nearest neighbor to the pixel to be interpolated. The interpolated pixel may be expressed in the
compact form
F (p2, q2) = ∑ ∑ ( = + /
, ? + 0) @A B/– D @A{ – F 0– G}
where denotes a bicubic interpolation function such as a cubic B-spline or cubic interpolation
function, as defined in Section 4.3-2.
4.2. Convolution Methods
When an image is to be magnified by an integer zoom factor, pixel estimation can be
implemented efficiently by convolution (12). As an example, consider image magnification by a
factor of 2:1. This operation can be accomplished in two stages. First, the input image is
transferred to an array in which rows and columns of zeros are interleaved with the input
image data as follows:
17 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 4-3. Interpolation kernels for 2:1 magnification.
(a) Original
(b) Zero interleaved quadrant
18 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(c) Peg
(e) Bell
(d) Pyramid
(f )Cubic B-spline
FIGURE 4.5-4. Image interpolation on the mandrill_mon image for 2:1 magnification
Next, the zero-interleaved neighborhood image is convolved with one of the discrete
interpolation kernels listed in Figure 4-3. Figure 4-4 presents the magnification results for
several interpolation kernels. The inevitable visual trade-off between the interpolation error
(the jaggy line artifacts) and the loss of high spatial frequency detail in the image is apparent
from the examples.
This discrete convolution operation can easily be extended to higher-order magnification
factors. For N:1 magnification, the core kernel is a peg array. For large kernels it may be more
computationally efficient in many cases, to perform the interpolation indirectly by Fourier
domain filtering rather than by convolution (6).
19 | P a g e
Road Extraction From Satellite Imagery [2008-09]
The use of digital processing techniques for image improvement has received much interest
with the publicity given to applications in space imagery and medical research. Other
applications include image improvement for photographic surveys and industrial radiographic
Image improvement is a term coined to denote three types of image manipulation processes:
image enhancement, image restoration, and geometrical image modification.
Image enhancement entails operations that improve the appearance to a human viewer, or
operations to convert an image to a format better suited to machine processing. Image
restoration has commonly been defined as the modification of an observed image in order to
compensate for defects in the imaging system that produced the observed image. Geometrical
image modification includes image magnification, minification, rotation, and nonlinear spatial
Chapter 4 describes several techniques of monochrome and color image enhancement. The
chapters that follow develop models for image formation and restoration, and present methods
of point and spatial image restoration. The final chapter of this part considers geometrical
image modification.
20 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Image enhancement processes consist of a collection of techniques that seek to improve the
visual appearance of an image or to convert the image to a form better suited for analysis by a
human or a machine. In an image enhancement system, there is no conscious effort to improve
the fidelity of a reproduced image with regard to some ideal form of the image, as is done in
image restoration. Actually, there is some evidence to indicate that often a distorted image, for
example, an image with amplitude overshoot and undershoot about its object edges, is more
subjectively pleasing than a perfectly reproduced original.
For image analysis purposes, the definition of image enhancement stops short of information
extraction. As an example, an image enhancement system might emphasize the edge outline of
objects in an image by high-frequency filtering. This edge-enhanced image would then serve as
an input to a machine that would trace the outline of the edges, and perhaps make
measurements of the shape and size of the outline. In this application, the image enhancement
processor would emphasize salient features of the original image and simplify the processing
task of a data extraction machine.
There is no general unifying theory of image enhancement at present because there is no
general standard of image quality that can serve as a design criterion for an image
enhancement processor. Consideration is given here to a variety of techniques that have
proved useful for human observation improvement and image analysis.
One of the most common defects of photographic or electronic images is poor contrast resulting
from a reduced, and perhaps nonlinear, image amplitude range. Image contrast can often be
improved by amplitude rescaling of each pixel (1,2).
Figure 5.1-1a illustrates a transfer function for contrast enhancement of a typical continuous
amplitude low-contrast image. For continuous amplitude images, the transfer function operator
can be implemented by photographic techniques, but it is often difficult to realize an arbitrary
transfer function accurately. For quantized amplitude images, implementation of the transfer
function is a
21 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 5.1-1. Continuous and quantized image contrast enhancement.
relatively simple task. However, in the design of the transfer function operator, consideration
must be given to the effects of amplitude quantization. With reference to Figure 5.l-lb, suppose
that an original image is quantized to J levels, but it occupies a smaller range. The output image
is also assumed to be restricted to J levels, and the mapping is linear. In the mapping strategy
indicated in Figure 5.1-1b, the output level chosen is that level closest to the exact mapping of
an input level. It is obvious from the diagram that the output image will have unoccupied levels
within its range, and some of the gray scale transitions will be larger than in the original image.
The latter effect may result in noticeable gray scale contouring. If the output image is quantized
to more levels than the input image, it is possible to approach a linear placement of output
levels, and hence, decrease the gray scale contouring effect.
22 | P a g e
Road Extraction From Satellite Imagery [2008-09]
c) Absolute value scaling
FIGURE 5.1-2. Image scaling methods.
5.1.1. Amplitude Scaling
A digitally processed image may occupy a range different from the range of the original image.
In fact, the numerical range of the processed image may encompass negative values, which
cannot be mapped directly into a light intensity range. Figure 5.1-2 illustrates several
possibilities of scaling an output image back into the domain of values occupied by the original
image. By the first technique, the processed image is linearly mapped over its entire range,
while by the second technique, the extreme amplitude values of the processed image are
clipped to maximum and minimum limits. The second technique is often subjectively
preferable, especially for images in which a relatively small number of pixels exceed the limits.
Contrast enhancement algorithms often possess an option to clip a fixed percentage of the
amplitude values on each end of the amplitude scale. In medical image enhancement
applications, the contrast modification operation shown in Figure 5.2-2b, for , a ≥ b is called a
window-level transformation. The window value is the width of the linear slope,b-a ; the level is
located at the midpoint c of the slope line. The third technique of amplitude scaling, shown in
Figure 5.1-2c, utilizes an absolute value transformation for visualizing an image with negatively
valued pixels. This is a useful transformation for systems that utilize the two's complement
23 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(e) Min. clip = 0.24, max. clip = 0.35
(f) Enhancement histogram
FIGURE 5.1-4. Window-level contrast stretching of an earth satellite image.
5.1.2. Contrast Modification
Section 10.1.1 dealt with amplitude scaling of images that do not properly utilize the dynamic
range of a display; they may lie partly outside the dynamic range or occupy only a portion of the
dynamic range. In this section, attention is directed to point transformations that modify the
contrast of an image within a display's dynamic range.
Figure 10.1-5a contains an original image of a jet aircraft that has been digitized to 256
gray levels and numerically scaled over the range of 0.0 (black) to 1.0 (white).
(a) Original
(b) Original histogram
24 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(c) Cube function
(d ) Cube output
FIGURE 5.1-6. Square and cube contrast modification of the jet_mon image.
G(j, k) = [F(j,k)]p
Where 0.0 ≤ F(j, k)≤ 1.0 represents the original image and p is the power law variable. It is
important that the amplitude limits of Eq. 5.1-1 be observed; processing of the integer code
(e.g., 0 to 255) by Eq. 5.1-1 will give erroneous results. The square function provides the best
visual result. The rubber band transfer function shown in Figure 5.1-8a provides a simple
piecewise linear approximation to the power law curves. It is often useful in interactive
enhancement machines in which the inflection point is interactively placed.
The Gaussian error function behaves like a square function for low-amplitude pixels and
like a square root function for high- amplitude pixels. It is defined as
G(j,k) = :IB((J, K) − 0.5)/√2D + 0.5/√2
2 erf{ 0.5/√2}
Image analysis is concerned with the extraction of measurements, data or information from an
image by automatic or semiautomatic methods. In the literature, this field has been called
image data extraction, scene analysis, image description, automatic photo interpretation,
image understanding, and a variety of other names.
Image analysis is distinguished from other types of image processing, such as coding,
restoration, and enhancement, in that the ultimate product of an image analysis system is
usually numerical output rather than a picture.
25 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Most discrete image processing computational algorithms are linear in nature; an output image
array is produced by a weighted linear combination of elements of an input array. The
popularity of linear operations stems from the relative simplicity of spatial linear processing as
opposed to spatial nonlinear processing. However, for image processing operations,
conventional linear processing is often computationally infeasible without efficient
computational algorithms because of the large image arrays. This chapter considers indirect
computational techniques that permit more efficient linear processing than by conventional
Two-dimensional linear transformations have been defined in Section 5.4 in series form as
P(m1, m2) = ∑ ∑ (01, 02)P(01, 02; /1, /2)
and defined in vector form as
It will now be demonstrated that such linear transformations can often be computed more
efficiently by an indirect computational procedure utilizing two-dimensional unitary transforms
than by the direct computation indicated by Eq. 6.1-1 or 6.1-2.
26 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 6.1-1. Direct processing and generalized linear filtering; series formulation.
Figure 6.1-1 is a block diagram of the indirect computation technique called generalized linear
filtering (1). In the process, the input array undergoes a two-dimensional unitary
transformation, resulting in an array (n1,n2) of transform coefficients F( u1, u2) . Next, a linear
combination of these coefficients is taken according to the general relation
∑ ∑ (1, 2)P(1, 2; R1, R2)
where T (u1, u2, ;w1, w2 ) represents the linear filtering transformation function. Finally, an
inverse unitary transformation is performed to reconstruct the processed array P (m1, m2 ) . If
this computational procedure is to be more efficient than direct computation by Eq. 6.1-1, it is
necessary that fast computational algorithms exist for the unitary transformation, and also the
kernel must be reasonably sparse; that is, it must contain many zero elements.
The generalized linear filtering process can also be defined in terms of vectorspace
computations as shown in Figure 9.1-2. For notational simplicity, let N1 = N2 = N and M1 = M2
= M. Then the generalized linear filtering process can be described by the equations
I = 'ST * I
IU = Pf
p='VT *-1IU
27 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 9.1-2. Direct processing and generalized linear filtering; vector formulation.
where AN2 is a N2 x N2 unitary transform matrix, T is a M2 x N2 linear filtering transform
operation, and VT is a unitary transform matrix. From Eq. 9.1-4, the input and output vectors
are related by
p='VT *-1P'ST *f
The superposition operations can often be performed more efficiently by transform domain
processing rather than by direct processing. Figure 6.2-1a and b illustrate block diagrams of the
computational steps involved in direct finite area or sampled image superposition. In Figure
6.2-1d and e, an alternative form of processing is illustrated in which a unitary transformation
operation is performed on the data vector f before multiplication by a finite area filter matrix D
or sampled image filter matrix B . An inverse transform reconstructs the output vector. From
Figure 6.2-1, for finite-area superposition, because
then clearly the finite-area filter matrix may be expressed as
28 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(a) Finite length convolution
(b) Sampled data convolution
(c) Circulant convolution
FIGURE 6.2-2. One-dimensional Fourier and Hadamard domain convolution matrices.
Figure 6.2-2 shows the Fourier and Hadamard domain filter matrices for the three forms of
convolution for a one-dimensional input vector and a Gaussian-shaped impulse response (6). As
expected, the transform domain representations are much more sparse than the data domain
representations. Also, the Fourier domain circulant convolution filter is seen to be of diagonal
form. Figure 6.2-3 illustrates the structure of the three convolution matrices for twodimensional convolution (4).
29 | P a g e
Road Extraction From Satellite Imagery [2008-09]
As noted previously, the equivalent output vector for either finite-area or sampled image
convolution can be obtained by an element selection operation on the extended output vector
kE for circulant convolution or its matrix counterpart KE.
Spatial domain
Fourier domain
(a) Finite-area convolution
(b) Sampled image convolution
(c) Circulant convolution
30 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 9.2-3. Two-dimensional Fourier domain convolution matrices.
This result, combined with Eq. 6.2-13, leads to a particularly efficient means of convolution
computation indicated by the following steps:
1. Embed the impulse response matrix in the upper left corner of an all-zero JxJ matrix, for
J≥Nfinite-area convolution or for J ≥ M sampled infinite-area convolution, and take the
two-dimensional Fourier transform of the extended impulse response matrix, giving
2. Embed the input data array in the upper left corner of an all-zero matrix, and take the
Fourier transform of the extended input data matrix to
3. Perform the scalar multiplication
KE(m,n) = JHE(m,n)FE(m,n)
4. Take the inverse Fourier transform
5. Extract the desired output matrix
It is important that the size of the extended arrays in steps 1 and 2 be chosen large enough to
satisfy the inequalities indicated. If the computational steps are performed with J = N, the
resulting output array, shown in Figure 9.3-1, will contain erroneous terms in a boundary region
of width L – 1 elements, on the top and left-hand side of the output field. This is the
wraparound error associated with incorrect use of the Fourier domain convolution method. In
addition, for finite area (D-type) convolution, the bottom and right-hand-side strip of output
elements will be missing. If the computation is performed with J = M, the output array will be
completely filled with the correct terms for D-type convolution. To force J = M for B-type
convolution, it is necessary to truncate the bottom and right-hand side of the input array. As a
consequence, the top and left-hand-side elements of the output array are erroneous.
31 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Fourier domain processing is more computationally efficient than direct processing for image
convolution if the impulse response is sufficiently large. However, if the image to be processed
is large, the relative computational advantage of Fourier domain processing diminishes. Also,
there are attendant problems of computational
FIGURE 6.3-5. Comparison of direct and Fourier domain processing for finite-area convolution.
accuracy with large Fourier transforms. Both difficulties can be alleviated by a block-mode
filtering technique in which a large image is separately processed in adjacent overlapped blocks
(2, 7–9).
Figure 9.3-6a illustrates the extraction of a pixel block from the upper left corner of a large
image array. After convolution with a impulse response, the resulting pixel block is placed in the
upper left corner of an output
32 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 6.3-5. Comparison of direct and Fourier domain processing for finite-area convolution.
data array as indicated in Figure 6.3-6a. Next, a second block of pixels is extracted from the
input array to produce a second block of output pixels that will lie adjacent to the first block. As
indicated in Figure 6.3-6b, this second input block must be overlapped by (L – 1) pixels in order
to generate an adjacent output block. The computational process then proceeds until all input
blocks are filled along the first row. If a partial input block remains along the row, zero-value
elements can be added to complete the block. Next, an input block, overlapped by (L –1) pixels
with the first row blocks, is extracted to produce the first block of the second output row. The
algorithm continues in this fashion until all output points are computed.
33 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Image restoration may be viewed as an estimation process in which operations are performed
on an observed or measured image field to estimate the ideal image field that would be
observed if no image degradation were present in an imaging system. Mathematical models are
described in this chapter for image degradation in general classes of imaging systems. These
models are then utilized in subsequent chapters as a basis for the development of image
restoration techniques.
In order effectively to design a digital image restoration system, it is necessary quantitatively to
characterize the image degradation effects of the physical imaging system, the image digitizer,
and the image display. Basically, the procedure is to model the image degradation effects and
then perform operations to undo the model to obtain a restored image. It should be
emphasized that accurate image modeling is often the key to effective image restoration. There
are two basic approaches to the modeling of image degradation effects: a priori modeling and a
posteriori modeling.
In the former case, measurements are made on the physical imaging system, digitizer, and
display to determine their response for an arbitrary image field. In some instances it will be
possible to model the system response deterministically, while in other situations it will only be
possible to determine the system response in a stochastic sense. The a posteriori modeling
approach is to develop the model for the image degradations based on measurements of a
particular image to be restored.
Basically, these two approaches differ only in the manner in which information is gathered to
describe the character of the image degradation.
34 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 7.1-1. Digital image restoration model.
Figure 7.1-1 shows a general model of a digital imaging system and restoration process. In the
model, a continuous image light distribution dependent on spatial coordinates (x, y), time (t),
and spectral wavelength is assumed to exist as the driving force of a physical imaging system
subject to point and spatial degradation effects and corrupted by deterministic and stochastic
disturbances. Potential degradations include diffraction in the optical system, sensor
nonlinearities, optical system aberrations, film nonlinearities, atmospheric turbulence effects,
image motion blur, and geometric distortion. Noise disturbances may be caused by electronic
imaging sensors or film granularity. In this model, the physical imaging system produces a set of
output image fields at time instant described by the general relation
FO(i)(x,y,tj ) = OP{C(x, y, t, W)}
where represents a general operator that is dependent on the space coordinates (x, y), the time
history (t), the wavelength , and the amplitude of the light distribution (C). For a monochrome
imaging system, there will only be a single output field, while for a natural color imaging
system, may denote the red, green, and blue tristimulus bands for i = 1, 2, 3, respectively.
Multispectral imagery may also involve several output bands of data.
In the general model of Figure 7.1-1, each observed image field is digitized, following the
techniques outlined in Part 3, to produce an array of image samples at each time instant . The
output samples of the digitizer are related to the input observed field by
FS(i)(m1,m2,tj) = OG{FO(i)(x,y,tj)}
where is an operator modeling the image digitization process. A digital image restoration
system that follows produces an output array by the transformation
35 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FˆI(i)(x,y,tj) = OD{FK(i)(k1,k2,tj)}
where OD{.}models the display transformation.
One of the major advances in the field of optics during the past 40 years has been the
application of system concepts to optical imaging. Imaging devices consisting of lenses, mirrors,
prisms, and so on, can be considered to provide a deterministic transformation of an input
spatial light distribution to some output spatial light distribution.
Also, the system concept can be extended to encompass the spatial propagation of light
through free space or some dielectric medium.
In the study of geometric optics, it is assumed that light rays always travel in a straight-line path
in a homogeneous medium. By this assumption, a bundle of rays passing through a clear
aperture onto a screen produces a geometric light projection of the aperture. However, if the
light distribution at the region between the light and
FIGURE 11.2-1. Generalized optical imaging system.
dark areas on the screen is examined in detail, it is found that the boundary is not sharp. This
effect is more pronounced as the aperture size is decreased. For a pinhole aperture, the entire
screen appears diffusely illuminated. From a simplistic viewpoint, the aperture causes a
bending of rays called diffraction. Diffraction of light can be quantitatively characterized by
considering light as electromagnetic radiation that satisfies Maxwell's equations. The
formulation of a complete theory of optical imaging from the basic electromagnetic principles
of diffraction theory is a complex and lengthy task. In the following, only the key points of the
formulation are presented; details may be found in References 1 to 3.
Figure 7.2-1 is a diagram of a generalized optical imaging system. A point in the object plane at
coordinate xoyoof intensity Io (xo, yo radiates energy toward an imaging system
characterized by an entrance pupil, exit pupil, and intervening system transformation.
Electromagnetic waves emanating from the optical system are focused to a point on the image
36 | P a g e
Road Extraction From Satellite Imagery [2008-09]
plane producing an intensity Ii (xi, yi . The imaging system is said to be diffraction limited if the
light distribution at the image plane produced by a point-source object consists of a converging
spherical wave whose extent is limited only by the exit pupil. If the wavefront of the
electromagnetic radiation emanating from the exit pupil is not spherical, the optical system is
said to possess aberrations.
This chapter began with an introduction to a general model of an imaging system and a digital
restoration process. Next, typical components of the imaging system were described and
modeled within the context of the general model. Now, the discussion turns to the
development of several discrete image restoration models. In the development of these
models, it is assumed that the spectral wavelength response and temporal response
characteristics of the physical imaging system can be separated from the spatial and point
characteristics. The following discussion considers only spatial and point characteristics.
After each element of the digital image restoration system of Figure 7.1-1 is modeled, following
the techniques described previously, the restoration system may be conceptually distilled to
three equations:
Observed image:
FS(m1, m2) = OM{FI(n1,n2),N1(m1,m2)...........,NN(m1m2 )}
Compensated image:
FK(k1,k2) = OR{FS(m1,m2)}
Restored image:
FˆI(n1,n2) = OD{(FK(k1,k2)}
where FS represents an array of observed image samples, FI and are arrays of ideal image
points and estimates, respectively, FK is an array of compensated image points from the digital
restoration system, Ni denotes arrays of noise samples from various system elements, and , ,
represent general transfer functions of the imaging system, restoration processor, and display
system, respectively.
Vector-space equivalents of Eq. 7.4-1 can be formed for purposes of analysis by column
scanning of the arrays of Eq. 7.3-1.
Several estimation approaches to the solution of 7.3-1 or 7.3-2 are described in the following
chapters. Unfortunately, general solutions have not been found; recourse must be made to
specific solutions for less general models.
37 | P a g e
Road Extraction From Satellite Imagery [2008-09]
The most common digital restoration model is that of Figure 7.3-1a, in which a continuous
image field is subjected to a linear blur, the electrical sensor responds nonlinearly to its input
intensity, and the sensor amplifier introduces additive Gaussian noise independent of the image
fP(i) = OP{fB(i)}
Equation for the observed physical image samples in terms of points on the ideal image:
Several special cases of Eq. 11.4-5 will now be defined. First, if the point nonlinearity is absent,
(a) Original
(b) Impulse response
(c) Observation
38 | P a g e
Road Extraction From Satellite Imagery [2008-09]
where B = BPBB and nB = BPn. This is the classical discrete model consisting of a set of linear
equations with measurement uncertainty. Another case that will be defined for later discussion
occurs when the spatial blur of the physical image digitizer is negligible.
Chapter 8 contains results for several image restoration experiments based on the
restoration model defined by Eq. 7.3-6. An artificial image has been generated for these
computer simulation experiments (9). The original image used for the analysis of
underdetermined restoration techniques, shown in Figure 7.3-3a, consists of a pixel square of
intensity 245 placed against an extended background of intensity 10 referenced to an intensity
scale of 0 to 255. All images are zoomed for display purposes.
In the computer simulation restoration experiments, the observed blurred image model
has been obtained by multiplying the column-scanned original image of Figure 7.3-3a by the
blur matrix B. Next, additive white Gaussian observation noise has been simulated by adding
output variables from an appropriate random number generator to the blurred images. For
display, all image points restored are clipped to the intensity range 0 to 255.
39 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Image detection is concerned with the determination of the presence or absence of objects
suspected of being in an image.
One of the most fundamental means of object detection within an image field is by template
matching, in which a replica of an object of interest is compared to all unknown objects in the
image field (1–4). If the template match between an unknown object and the template is
sufficiently close, the unknown object is labeled as the template object.
As a simple example of the template-matching process, consider the set of binary black line
figures against a white background as shown in Figure 8.1-1a. In this example, the objective is
to detect the presence and location of right triangles in the image field. Figure 8.1-1b contains a
simple template for localization of right triangles that possesses unit value in the triangular
region and zero elsewhere. The width of the legs of the triangle template is chosen as a
compromise between localization accuracy and size invariance of the template. In operation,
the template is sequentially scanned over the image field and the common region between the
template and image field is compared for similarity.
A template match is rarely ever exact because of image noise, spatial and amplitude
quantization effects, and a priori uncertainty as to the exact shape and structure of an object to
be detected. Consequently, a common procedure is to produce a difference measure between
the template Dmnand the image field at all points of the image field where and denote
the trial offset. An object is deemed to be matched wherever the difference is smaller than
some established level . Normally, the threshold level is constant over the image field. The
usual difference measure is the mean-square difference or error as defined by
D(m, n) = ∑ ∑ [F(j,k)– T(j –m, k – n)]2
where F(j,k) denotes the image field to be searched and T(j,k) is the template. The search, of
course, is restricted to the overlap region between the translated template and the image field. A
template match is then said to exist at coordinate (m,n) if
40 | P a g e
Road Extraction From Satellite Imagery [2008-09]
D(m, n) < LD (m,n)
Now, let Eq. 12.1 be expanded to yield
D(m, n) = D1 (m,n) − 2D2 (m,n) + D3 (m,n)
D1(m,n) = ∑ ∑ [F(j,k)]2
D2 (m,n) = ∑ ∑ [F(j,k)T(j –m, k – n)]
D3 (m,n) = ∑ ∑ [T(j–m,k–n)]2
The term D3 (m,n) represents a summation of the template energy. It is constant valued and
independent of the coordinate (m,n) . The image energy over the window area represented by the
first term D1 (m,n) generally varies rather slowly over the image field. The second term should
be recognized as the cross correlation RFT (m,n) between the image field and the template. At
the coordinate location of a template match, the cross correlation should become large to yield a
small difference.
However, the magnitude of the cross correlation is not always an adequate measure of the
template difference because the image energy term D1 (m,n) is position variant. For example,
the cross correlation can become large, even under a condition of template mismatch, if the
image amplitude over the template region is high about a particular coordinate (m,n) . This
difficulty can be avoided by comparison of the normalized cross correlation
XY (/, 0) =
∑ ∑ (J, K)P(J − / , K − 0)
[1(/, 0)
∑ ∑'(J, K)*
[2(/, 0)
to a threshold level . A template match is said to exist if
˜RFT(m,n) > LR(m,n)
The normalized cross correlation has a maximum value of unity that occurs if and only if the
image function under the template exactly matches the template.
One of the major limitations of template matching is that an enormous number of templates must
often be test matched against an image field to account for changes in rotation and magnification
of template objects. For this reason, template matching is usually limited to smaller local
features, which are more invariant to size and shape variations of an object. Such features, for
example, include edges joined in a Y or T arrangement.
41 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Morphological image processing is a type of processing in which the spatial form or structure of
objects within an image are modified. Dilation, erosion, and skeletonization are three
fundamental morphological operations. With dilation, an object grows uniformly in spatial
extent, whereas with erosion an object shrinks uniformly. Skeletonization results in a stick
figure representation of an object. The basic concepts of morphological image processing trace
back to the research on spatial set algebra by Minkowski (1) and the studies of Matheron (2) on
topology. Serra (3–5) developed much of the early foundation of the subject. Steinberg (6,7)
was a pioneer in applying morphological methods to medical and industrial vision applications.
This research work led to the development of the cytocomputer for high-speed morphological
image processing (8,9). In the following sections, morphological techniques are first described
for binary images. Then these morphological concepts are extended to gray scale images.
Binary image morphological operations are based on the geometrical relationship or
connectivity of pixels that are deemed to be of the same class (10,11). In the binary image of
Figure 9.1-1a, the ring of black pixels, by all reasonable definitions of connectivity, divides the
image into three segments: the white pixels exterior to the ring, the white pixels interior to the
ring, and the black pixels of the ring itself. The pixels within each segment are said to be
connected to one another. This concept of connectivity is easily understood for Figure 14.1-1a,
but ambiguity arises when considering Figure 9.1-1b. Do the black pixels still define a ring, or do
they instead form four disconnected lines? The answers to these questions depend on the
definition of connectivity.
42 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 9.1-1. Connectivity.
Consider the following neighborhood pixel pattern:
X3 X2 X1
X4 X X0
X5 X6 X7
in which a binary-valued pixel , where X = 0 (white) or X = 1 (black) is surrounded by its eight
nearest neighbors . An alternative nomenclature is to label the neighbors by compass
directions: north, northeast, and so on:
Pixel X is said to be four-connected to a neighbor if it is a logical 1 and if its east, north, west, or
south neighbor is a logical 1. Pixel X is said to be eight-connected if it is a logical 1 and if its
north, northeast, etc. neighbor is a logical 1.
The connectivity relationship between a center pixel and its eight neighbors canbe quantified
by the concept of a pixel bond, the sum of the bond weights betweenthe center pixel and each
of its neighbors. Each four-connected neighbor has a bondof two, and each eight-connected
neighbor has a bond of one. In the followingexample, the pixel bond is seven.
1 1 1
0 X 0
1 1 0
43 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 9.1-2. Pixel neighborhood connectivity definitions.
Under the definition of four-connectivity, Figure 9.1-1b has four disconnected black line
segments, but with the eight-connectivity definition, Figure 9.1-1b has a ring of connected black
pixels. Note, however, that under eight-connectivity, all white pixels are connected together.
Thus a paradox exists. If the black pixels are to be eight-connected together in a ring, one would
expect a division of the white pixels into pixels that are interior and exterior to the ring. To
eliminate this dilemma, eight-connectivity can be defined for the black pixels of the object, and
four-connectivity can be established for the white pixels of the background. Under this
definition, a string of black pixels is said to be minimally connected if elimination of any black
pixel results in a loss of connectivity of the remaining black pixels. Figure 9.1-2 provides
definitions of several other neighborhood connectivity relationships between a center black
pixel and its neighboring black and white pixels.
The preceding definitions concerning connectivity have been based on a discrete image model
in which a continuous image field is sampled over a rectangular array of points. Golay (12) has
utilized a hexagonal grid structure. With such a structure, many of the connectivity problems
associated with a rectangular grid are eliminated.
In a hexagonal grid, neighboring pixels are said to be six-connected if they are in the same set
and share a common edge boundary. Algorithms have been developed for the linking of
boundary points for many feature extraction tasks (13). However, two major drawbacks have
hindered wide acceptance of the hexagonal grid. First, most image scanners are inherently
limited to rectangular scanning. The second problem is that the hexagonal grid is not well
suited to many spatial processing operations, such as convolution and Fourier transformation.
44 | P a g e
Road Extraction From Satellite Imagery [2008-09]
The two basic morphological operations, dilation and erosion, plus many variants can be
defined and implemented by hit-or-miss transformations (3). The concept is quite simple.
Conceptually, a small odd-sized mask, typically 3 x 3, is scanned over a binary image. If the
binary-valued pattern of the mask matches the state of the pixels under the mask (hit), an
output pixel in spatial correspondence to the center pixel of the mask is set to some desired
binary state. For a pattern mismatch (miss), the output pixel is set to the opposite binary state.
For example, to perform simple binary noise cleaning, if the isolated 3 x3 pixel pattern
0 0 0
0 1 0
is encountered, the output pixel is set to zero; otherwise, the output pixel is set to the state of
the input center pixel. In more complicated morphological algorithms, a large number of the29
= 512 possible mask patterns may cause hits.
It is often possible to establish simple neighborhood logical
relationships that define the conditions for a hit. In the isolated pixel removal example, the
defining equation for the output pixel G(j,k)becomes
G(j, k) = X(X0 ∩X1 ∩............∩X7 )
Where ∩ denotes the intersection operation (logical AND) and ∪ denotes the union operation
(logical OR). For complicated algorithms, the logical equation method of definition can be
cumbersome. It is often simpler to regard the hit masks as a collection of binary patterns.
Hit-or-miss morphological algorithms are often implemented in digital image processing
hardware by a pixel stacker followed by a look-up table (LUT), as shown in Figure 9.2-1 (14).
Each pixel of the input image is a positive integer, represented by a conventional binary code,
whose most significant bit is a 1 (black) or a 0 (white). The pixel stacker extracts the bits of the
center pixel X and its eight neighbors and puts them in a neighborhood pixel stack. Pixel
stacking can be performed by convolution with the 3x3 pixel kernel
2-4 2-3 2-2
2-5 20 2-1
2-6 2-7 2-8
The binary number state of the neighborhood pixel stack becomes the numeric input address of
the LUT whose entry is Y For isolated pixel removal, integer entry 256, corresponding to the
neighborhood pixel stack state 100000000, contains Y = 0; all other entries contain Y = X.
45 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 14.2-1. Look-up table flowchart for binary unconditional operations
Several 3x3 other hit-or-miss operators are described in the following subsections.
9.2.1. Additive Operators
Additive hit-or-miss morphological operators cause the center pixel of a pixelwindow to be
converted from a logical 0 state to a logical 1 state if the neighboringpixels meet certain
predetermined conditions. The basic operators are now defined.
Interior Fill. Create a black pixel if all four-connected neighbor pixels are black.
G(j, k) =X ∪ [X0 ∩X2 ∩X4 ∩X6 ]
Diagonal Fill. Create a black pixel if creation eliminates the eight-connectivity of the
G(j, k) = X ∪ [∪P1 ∪P2 ∪P3 ∪P4]
FIGURE 9.2-1. Look-up table flowchart for binary unconditional operations.where
P1 = X∩X0∩X1∩X2
P2 =X∩ X2 ∩X3 ∩X4
P3 =X∩ X4∩ X5 ∩X6
P4 =X∩ X6 ∩X7 ∩X0
46 | P a g e
Road Extraction From Satellite Imagery [2008-09]
There are 132 such qualifying patterns. This strategem will not prevent connection of two
objects separated by two rows or columns of white pixels. A solution to this problem is
considered in Section 9.3. Figure 9.2-3 provides an example of fattening.
(b) One iteration
(c) Three iterations
FIGURE 9.2-2. Dilation of a binary image
47 | P a g e
Road Extraction From Satellite Imagery [2008-09]
9.2.2. Subtractive Operators
Subtractive hit-or-miss morphological operators cause the center pixel of a 3 x 3 window to be
converted from black to white if its neighboring pixels meet predetermined conditions. The
basic subtractive operators are defined below.
Isolated Pixel Remove: Erase a black pixel with eight white neighbors.
G(j,k) = X ∪ [ X0 ∩ X1 ∩ X2 ∩ …. ∩ X7]
Spur Remove : Erase a black pixel with a single eight-connected neighbor.
FIGURE 9.2-3. Fattening of a binary image
The following is one of four qualifying patterns:
0 0 0
0 1 0
1 0 0
Interior Pixel Remove: Erase a black pixel if all four-connected neighbors are black.
G(j,k) = X∪ [ X0∩X2∩ X4∩X6]
There are 16 qualifying patterns.
H-Break: Erase a black pixel that is H-connected. There are two qualifying patterns.
48 | P a g e
Road Extraction From Satellite Imagery [2008-09]
1 1 1
0 1 0
1 1 1
1 0 1
1 1 1
1 0 1
Eight-Neighbor Erode: Erase a black pixel if at least one eight-connected neighbor pixel is
G(j,k) = X ∩ X0 ∩ …. ∩ X7
(a) Original image
(b) One iteration
(c) Three iterations
FIGURE 9.2-4. Erosion of a binary image.
A generalized erosion operator is defined in Section 9.4. Recursive application of the erosion
operator will eventually erase all black pixels. Figure 9.2-4 shows results for one and three
iterations of the erode operator. The eroded pixels are midgray. It should be noted that after
three iterations, the ring is totally eroded.
49 | P a g e
Road Extraction From Satellite Imagery [2008-09]
9.2.3. Majority Black Operator
The following is the definition of the majority black operator:
Majority Black. Create a black pixel if five or more pixels in a 3x 3 window are black; otherwise,
set the output pixel to white. The majority black operator is useful for filling small holes in
objects and closing short gaps in strokes.
Shrinking, thinning, skeletonizing, and thickening are forms of conditional erosion in which the
erosion process is controlled to prevent total erasure and to ensure connectivity.
9.3.1. Binary Image Shrinking
The following is a definition of shrinking:
Shrink. Erase black pixels such that an object without holes erodes to a single pixel at or near its
center of mass, and an object with holes erodes to a connected ring lying midway between
each hole and its nearest outer boundary.
A 3 x 3 pixel object will be shrunk to a single pixel at its center. A 2 x 2 pixel object will be
arbitrarily shrunk, by definition, to a single pixel at its lower right corner.
It is not possible to perform shrinking using single-stage 3 x 3 pixel hit-or-miss transforms
of the type described in the previous section. The 3 x 3 window does not provide enough
information to prevent total erasure and to ensure connectivity. A 5 x 5 hit-or-miss transform
could provide sufficient information to perform proper shrinking. But such an approach would
result in excessive computational complexity (i.e., 225 possible patterns to be examined!).
References 15 and 16 describe twostage shrinking and thinning algorithms that perform a
conditional marking of pixels for erasure in a first stage, and then examine neighboring marked
pixels in a second stage to determine which ones can be unconditionally erased without total
erasure or loss of connectivity. The following algorithm developed by Pratt and Kabir (17) is a
pipeline processor version of the conditional marking scheme.
In the algorithm, two concatenated 3 x 3 hit-or-miss transformations are performed to obtain
indirect information about pixel patterns within a 5 x 5window. Figure 9.3-1 is a flowchart for
the look-up table implementation of this algorithm.
In the first stage, the states of nine neighboring pixels are gathered together by a pixel stacker,
and a following look-up table generates a conditional mark M for possible erasures. Table 9.3-1
lists all patterns, as indicated by the letter S in the table column, which will be conditionally
50 | P a g e
Road Extraction From Satellite Imagery [2008-09]
marked for erasure. In the second stage of the algorithm, the center pixel X and the conditional
marks in a 3x3 neighborhood centered about X are examined to create an output pixel. The
shrinking operation can be expressed logically as
G(j,k) = X∩ [M ∪ P( M, M0… M7)]
where P( M, M0… M7) is an erasure inhibiting logical variable, as defined in Table 9.3-2. The first
four patterns of the table prevent strokes of single pixel width from being totally erased. The
remaining patterns inhibit erasure that would break object connectivity. There are a total of
157 inhibiting patterns. This two-stage process must be performed iteratively until there are no
further erasures.
As an example, the square 2x 2 pixel object
results in the following intermediate array of conditional marks
The corner cluster pattern of Table 9.3-2 gives a hit only for the lower right corner mark. The resulting output is
FIGURE 9.3-1. Look-up table flowchart for binary conditional mark operations.
51 | P a g e
Road Extraction From Satellite Imagery [2008-09]
9.3.2. Binary Image Thinning
The following is a definition of thinning:
Thin. Erase black pixels such that an object without holes erodes to a minimally connected
stroke located equidistant from its nearest outer boundaries, and an object with holes erodes
to a minimally connected ring midway between each hole and its nearest outer boundary.
TABLE 9.3-2. Shrink and Thin Unconditional Mark Patterns
[P(M, M0, M1, M2, M3, M4, M5, M6, M7) = 1 if hit] a
Single 4-connection
0 0M M0 0
000 000
0 M0 0M0
0M 0 0MM
000 000
0M0 000
L Cluster (thin only)
00M 0MM MM0 M00
000 000
0 MM 0 M 0 0 M 0 M M 0 M M0 0M0
000 000
0 0 0 M0 0M M0 0
000 000
0M0 0MM
MM 0 0 M
4-Connected offset
0 MM MM0 0M0 00M
0 0 0 0 0 0 0 0 M 0 M0
Spur corner cluster
0 A M MB0 0 0M M0 0
0 MB A M0 A M0 0MB
M0 0 0 0M MB0 0A M
Corner cluster
Tee branch
DM0 0MD 0 0D D0 0 DMD 0 M0 0M0
D 0 0 0 0 D 0 MD DM0 0M0
0 MM
0 M0
Vee branch
Diagonal branch
D 0M M0 D
0 MM MM0 MM0 0 MM
M0D D0 M 0 MD DM0
52 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Table 9.3-1 lists the conditional mark patterns, as indicated by the letter T in the table column,
for thinning by the conditional mark algorithm of Figure 14.3-1. The shrink and thin
unconditional patterns are identical, as shown in Table 9.3-2.
Figure 9.3-3 contains an example of the thinning of a binary image for four and eight iterations.
Figure 9.3-4 provides an example of the thinning of an image of a printed circuit board in order
to locate solder pads that have been deposited improperly and that do not have holes for
component leads. The pads with holes erode to a minimally connected ring, while the pads
without holes erode to a point.
Thinning can be applied to the background of an image containing several objects as a means of
separating the objects. Figure 9.3-5 provides an example of the process. The original image
appears in Figure 9.3-5a, and the backgroundreversed image is Figure 9.3-5b. Figure 9.3-5c
shows the effect of thinning the background. The thinned strokes that separate the original
objects are minimally connected, and therefore the background of the separating strokes is
eight-connected throughout the image. This is an example of the connectivity ambiguity
discussed in
Section 9.1. To resolve this ambiguity, a diagonal fill operation can be applied to the thinned
strokes. The result, shown in Figure 9.3-5d, is called the exothin of the original image. The name
derives from the exoskeleton, discussed in the following section.
(a) Four iterations
(b) Eight iterations
FIGURE 9.3-3. Thinning of a binary image.
53 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Changes or discontinuities in an image amplitude attribute such as luminance or tristimulus
value are fundamentally important primitive characteristics of an image because they often
provide an indication of the physical extent of objects within the image. Local discontinuities in
image luminance from one level to another are called luminance edges. Global luminance
discontinuities, called luminance boundary segments, are considered in Section 12.4. In this
chapter the definition of a luminance edge is limited to image amplitude discontinuities
between reasonably smooth regions. Discontinuity detection between textured regions is
considered in Section 12.5. This chapter also considers edge detection in color images, as well
as the detection of lines and spots within an image.
Figure 10.1-1a is a sketch of a continuous domain, one-dimensional ramp edge modeled as a
ramp increase in image amplitude from a low to a high level, or vice versa. The edge is
characterized by its height, slope angle, and horizontal coordinate of the slope midpoint. An
edge exists if the edge height is greater than a specified value. An ideal edge detector should
produce an edge indication localized to a single pixel located at the midpoint of the slope. If the
slope angle of Figure 10.1-1a is 90°, the resultant edge is called a step edge, as shown in Figure
10.1-1b. In a digital imaging system, step edges usually exist only for artificially generated
images such as test patterns and bilevel graphics data. Digital images, resulting from digitization
of optical images of real scenes, generally do not possess step edges because the anti aliasing
low-pass filtering prior to digitization reduces the edge slope in the digital image caused by any
sudden luminance change in the scene. The one-dimensional profile of a line is shown in Figure
10.1-1c. In the limit, as the line width w approaches zero, the resultant amplitude discontinuity
is called a roof edge.
Continuous domain, two-dimensional models of edges and lines assume that the amplitude
discontinuity remains constant in a small neighborhood orthogonal to the edge or line profile.
Figure 10.1-2a is a sketch of a two-dimensional edge. In addition to the edge parameters of a
one-dimensional edge, the orientation of the edge slope with respect to a reference axis is also
important. Figure 10.1-2b defines the edge orientation nomenclature for edges of an
octagonally shaped object whose amplitude is higher than its background.
54 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Figure 10.1-3 contains step and unit width ramp edge models in the discrete domain. The
vertical ramp edge model in the figure contains a single transition pixel whose amplitude is at
the midvalue of its neighbors. This edge model can be obtained by performing a pixel moving
window average on the vertical step edge model. The figure also contains two versions of a
diagonal ramp edge. The singlepixel transition model contains a single midvalue transition pixel
between the regions of high and low amplitude; the smoothed transition model is generated by
a 2x2 pixel moving window average of the diagonal step edge model. Figure 10.1-3 also
presents models for a discrete step and ramp corner edge. The edge location for discrete step
edges is usually marked at the higher-amplitude side of an edge transition.
For the single-pixel transition model and the smoothed transition vertical and corner edge
models, the proper edge location is at the transition pixel. The smoothed transition diagonal
ramp edge model has a pair of adjacent pixels in its transition zone. The edge is usually marked
at the higher-amplitude pixel of the pair. In Figure
10.1-3 the edge pixels are italicized.
Discrete two-dimensional single-pixel line models are presented in Figure 10.1-4 for step lines
and unit width ramp lines. The single-pixel transition model has a mid value transition pixel
inserted between the high value of the line plateau and the low-value background. The
smoothed transition model is obtained by performing a 2x2 pixel moving window average on
the step line model.
55 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 10.1-3. Two-dimensional, discrete domain edge models.
A spot, which can only be defined in two dimensions, consists of a plateau of high amplitude
against a lower amplitude background, or vice versa. Figure 10.1-5 presents single-pixel spot
models in the discrete domain.
There are two generic approaches to the detection of edges, lines, and spots in a luminance
image: differential detection and model fitting. With the differential detection approach, as
illustrated in Figure 10.1-6, spatial processing is performed on an original image F(j,k) to
produce a differential image G(j,k) with accentuated spatial amplitude changes. Next, a
differential detection operation is executed to determine the pixel locations of significant
differentials. The second general approach to edge, line, or spot detection involves fitting of a
local region of pixel values to a model of the edge, line, or spot, as represented in Figures 10.1-1
to 10.1-5. If the fit is sufficiently close, an edge, line, or spot is said to exist, and its assigned
parameters are those of the appropriate model. A binary indicator map E(j,k) is often
generated to indicate the position of edges, lines, or spots within an image. Typically, edge, line,
and spot locations are specified by black pixels against a white background.
56 | P a g e
Road Extraction From Satellite Imagery [2008-09]
There are two fundamental methods for generating first-order derivative edge gradients. One
method involves generation of gradients in two orthogonal directions in an image; the second
utilizes a set of directional derivatives.
FIGURE 10.1-6. Differential edge, line, and spot detection.
10.2.1. Orthogonal Gradient Generation
An edge in a continuous domain edge segment F(x,y) such as the one depicted in Figure 10.12a can be detected by forming the continuous one-dimensional gradient G(j,k) along a line
normal to the edge slope, which is at an angle θ with respect to the horizontal axis. If the
gradient is sufficiently large (i.e., above some threshold value), an edge is deemed present. The
gradient along the line normal to the edge slope can be computed in terms of the derivatives
along orthogonal axes according to the following (1, p. 106)
, ) =
cos a +
sin a
Figure 10.2-1 describes the generation of an edge gradient in the discrete domain in
terms of a row gradient and a column gradient . The spatial gradient amplitude is given by
G(j, k) = GR [ (j, k)]2 + [GC [ (j, k)]2 ]1/2
For computational efficiency, the gradient amplitude is sometimes approximated by the
magnitude combination
G(j, k) = GR (j, k) + GC (j, k)
FIGURE 15.2-1. Orthogonal gradient generation.
57 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(a) Original
(b) Horizontal magnitude
(c) Vertical magnitude
FIGURE 10.2-2. Horizontal and vertical differencing gradients of the peppers_mon image
Diagonal edge gradients can be obtained by forming running differences of diagonal pairs of
pixels. This is the basis of the Roberts (2) cross-difference operator, which is defined in
magnitude form as
G(j, k) = G1 (j, k) + G2 (j, k)
G(j,k) = [(G1(j,k))2 + (G2(j,k))2]1/2
G1(j, k) = F(j, k)– F(j + 1, k + 1)
G2 (j, k) = F(j, k + 1)– F(j + 1, k)
and in square-root form as
Figure 10.2-3 presents the edge gradients of the peppers image for the Roberts operators.
Visually, the objects in the image appear to be slightly better distinguished with the Roberts
square-root gradient than with the magnitude gradient. In Section 10.5, a quantitative evaluation
of edge detectors confirms the superiority of the square-root combination technique.
The pixel difference method of gradient generation can be modified to localize the edge
center of the ramp edge model of Figure 15.1-3 by forming the pixel difference separated by a
null value. The row and column gradients then become
GR (j, k) = F(j,k+ 1)–F(j,k– 1)
GC (j, k) = F(j– 1, k)–F(j + 1, k)
The row gradient response for a vertical ramp edge model is then
58 | P a g e
Road Extraction From Satellite Imagery [2008-09]
0 h/2 h h/2 0 0
(a) Magnitude
(b) Square root
FIGURE 10.2-3. Roberts gradients of the peppers_mon image.
FIGURE 10.2-4. Numbering convention for 3 ⋅ 3 edge detection operators.
Although the ramp edge is properly localized, the separated pixel difference gradient
generation method remains highly sensitive to small luminance fluctuations in the image. This
problem can be alleviated by using two-dimensional gradient formation operators that perform
differentiation in one coordinate direction and spatial averaging in the orthogonal direction
Prewitt has introduced a 3x3 pixel edge gradient operator described by the pixel numbering
convention of Figure 10.2-4. The Prewitt operator square root edge gradient is defined as
G(j, k) = [GR [ (j, k)]2
GC [ (j, k)]2 ]1/2
GR (j, k) = 1 / K + 2 [(A2+ KA3+ A4 ) − (A0+ KA7+ A6 )]
GC (j, k) = 1/K + 2 [ ( A0+ KA1+ A2 ) − (A6+ KA5+ A4 )]
where K = 1. In this formulation, the row and column gradients are normalized to provide unitgain positive and negative weighted averages about a separated edge position. The Sobel
59 | P a g e
Road Extraction From Satellite Imagery [2008-09]
operator edge detector (3, p. 271) differs from the Prewitt edge detector in that the values of
the north, south, east, and west pixels are doubled (i. e.,
K = 2). The motivation for this weighting is to give equal importance to each pixel in terms of its
contribution to the spatial gradient. Frei and Chen (4) have proposed north, south, east, and
west weightings by k = √2 so that the gradient is the same for horizontal, vertical, and diagonal
edges. The edge gradient G(j,k) for these three operators along a row through the single pixel
transition vertical ramp edge model of Figure 1.1-3 is
0 0 h/2 h h/2 0 0
Along a row through the single transition pixel diagonal ramp edge model, the gradient is
h2/(2+ K)
2(1+ K)h/2 + K
h/2(2+ K)
In the Frei–Chen operator with k = √2 , the edge gradient is the same at the edge center for
the single-pixel transition vertical and diagonal ramp edge models. The Prewitt gradient for a
diagonal edge is 0.94 times that of a vertical edge. The corresponding factor for a Sobel edge
detector is 1.06. Consequently, the Prewitt operator is more sensitive to horizontal and vertical
edges than to diagonal edges; the reverse is true for the Sobel operator. The gradients along a
row through the smoothed transition diagonal ramp edge model are different for vertical and
diagonal edges for all three of the edge detectors. None of them are able to localize the edge to
a single pixel.
(a) Prewitt
(b) Sobel
60 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(b) Frei−Chen
FIGURE 10.2-5. Prewitt, Sobel , and Frei–Chen gradients of the peppers_mon image.
Figure 10.2-5 shows examples of the Prewitt, Sobel, and Frei–Chen gradients of the peppers
image. The reason that these operators visually appear to better delineate object edges than
the Roberts operator is attributable to their larger size, which provides averaging of small
luminance fluctuations.
The row and column gradients for all the edge detectors mentioned previously in this
subsection involve a linear combination of pixels within a small neighborhood. Consequently,
the row and column gradients can be computed by the convolution relationships
GR j k = F( j, k) ⊕ HR (j, k)
GC (j,k) = F (j,k )⊕ HC (j, k)
Where HR (j, k) and HC (j, k) are 3x3 row and column impulse response arrays, respectively,
as defined in Figure 15.2-6. It should be noted that this specification of the gradient impulse
response arrays takes into account the 180° rotation of an impulse response array inherent to the
definition of convolution in Eq. 7.1-14.
A limitation common to the edge gradient generation operators previously defined is their
inability to detect accurately edges in high-noise environments. This problem can be alleviated
by properly extending the size of the neighborhood operators over which the differential
gradients are computed. As an example, a Prewitt type operator has a row gradient impulse
response of the form
HR = 1/21d 1
− 1h
− 1g
− 1g
− 1g
− 1f
An operator of this type is called a boxcar operator. Figure 15.2-7 presents the boxcar
gradient of a 7x7 array.
61 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 15.2-6. Impulse response arrays for 3 x 3 orthogonal differential gradient edge operators.
Abdou (5) has suggested a truncated pyramid operator that gives a linearly decreasing weighting to pixels away
from the center of an edge. The row gradient impulse response array for a 7x7 truncated pyramid operator is given
HR = 1/34 d 1
− 1h
− 1g
− 1g
− 1f
Argyle (6) and Macleod (7,8) have proposed large neighborhood Gaussian-shaped weighting
functions as a means of noise suppression. Let
g(x, s) = [2s2 ]–1 /2 exp{–1 ∆ 2(x/s)2}
62 | P a g e
Road Extraction From Satellite Imagery [2008-09]
denote a continuous domain Gaussian function with standard deviation s. Utilizing this notation,
the Argyle operator horizontal coordinate impulse response array can be expressed as a sampled
version of the continuous domain impulse response
, m)l(, n), < 0q
[email protected](J, K) = I(
) = k
, m)l(, n), ≥ 0
where s and t are spread parameters. The vertical impulse response function can be expressed
similarly. The Macleod operator horizontal gradient impulse response function is given by
HR (j, k) = [g(x + s, s)– g(x – s, s)]g(y, t)
The Argyle and Macleod operators, unlike the boxcar operator, give decreasing importance to
pixels far removed from the center of the neighborhood. Figure 15.2-7 provides examples of the
Argyle and Macleod gradients.
Extended-size differential gradient operators can be considered to be compound
operators in which a smoothing operation is performed on a noisy image followed by a
differentiation operation. The compound gradient impulse response can be written as
H(j, k) = HG (j, k) ⊗ HS (j, k)
where HG(j,k)is one of the gradient impulse response operators of Figure 15.2-6 and HG(j,k)is a
low-pass filter impulse response. For example, if HS(j,k)is the 3x3 Prewitt row gradient operator
and HS(j,k), for all , is a 3x3 uniform smoothing operator, the resultant 5x5 row gradient
operator, after normalization to unit positive and negative gain, becomes
HR = 1/34 d
1 0 −1 −1
2 0 − 2 − 2h
3 0 − 3 − 3g
2 0 −2 −2 g
1 0 − 1 − 1f
The decomposition of Eq. 15.2-16 applies in both directions. By applying the SVD/ SGK
decomposition of Section 9.6, it is possible, for example, to decompose a 5x5 boxcar operator
into the sequential convolution of a 3x3 smoothing kernel and a 3x3 differentiating kernel.
A well-known example of a compound gradient operator is the first derivative of
Gaussian (FDOG) operator, in which Gaussian-shaped smoothing is followed by differentiation
(9). The FDOG continuous domain horizontal impulse response is
HR(j,k) =-
which upon differentiation yields
HR(j,k) =
Figure 10.2-7 presents an example of the FDOG gradient. All of the differential edge
enhancement operators presented previously in this subsection have been derived heuristically.
63 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Canny (9) has taken an analytic approach to the design of such operators. Canny's development
is based on a onedimensional continuous domain model of a step edge of amplitude hE plus
additive white Gaussian noise with standard deviation . It is assumed that edge detection is
performed by convolving a one-dimensional continuous domain noisy edge signal f(x,y) with an
antisymmetric impulse response function , which is of zero amplitude outside the range [-W,W].
An edge is marked at the local maximum of the convolved gradient f(x,y) ⨂ h(x,y) . The Canny
operator impulse response h(x,y) is chosen to satisfy the following three criteria.
1. Good detection. The amplitude signal-to-noise ratio (SNR) of the gradient is
maximized to obtain a low probability of failure to mark real edge points and a low
probability of falsely marking nonedge points. The SNR for the model is
xy z(x)
S(h) =
1~ x()|

1~ x()T |
2. Good localization. Edge points marked by the operator should be as close to the center
of the edge as
possible. The localization factor is defined as
L(h) =
xy €(x)

1~ x()T |
3. Single response. There should be only a single response to a true edge. Thedistance
between peaks of the
gradient when only noise is present,
denoted as xm, is set to some fraction k of the operator width factor W. Thus
Xm = kW
Canny has combined these three criteria by maximizing the product subject to the constraint of
Eq. 10.2-21. Because of the complexity of the formulation, no analytic solution has been found,
but a variational approach has been developed. Figure 10.2-8 contains plots of the Canny
impulse response functions in terms of xm.
64 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 10.2-8. Comparison of Canny and first derivative of Gaussian impulse response functions
As noted from the figure, for low values of xm, the Canny function resembles a boxcar function,
while for xm large, the Canny function is closely approximated by a FDOG impulse response
Discrete domain versions of the large operators defined in the continuous domain can be
obtained by sampling their continuous impulse response functions over some window. The
window size should be chosen sufficiently large that truncation of the impulse response
function does not cause high-frequency artifacts. Demigny and Kamie (10) have developed a
discrete version of Canny’s criteria, which lead to the computation of discrete domain edge
detector impulse response arrays.
10.2.2. Edge Template Gradient Generation
With the orthogonal differential edge enhancement techniques discussed previously, edge
gradients are computed in two orthogonal directions, usually along rows and columns, and then
the edge direction is inferred by computing the vector sum of the gradients. Another approach
is to compute gradients in a large number of directions by convolution of an image with a set of
template gradient impulse response arrays. The edge template gradient is defined as
G(j,k) = MAX {G1 (j, k),....., Gm (j, k),.......... GM (j, k)}
65 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Segmentation of an image entails the division or separation of the image into regions of similar
attribute. The most basic attribute for segmentation is image luminance amplitude for a
monochrome image and color components for a color image. Image edges and texture are also
useful attributes for segmentation.
The definition of segmentation adopted in this chapter is deliberately restrictive; no contextual
information is utilized in the segmentation. Furthermore, segmentation does not involve
classifying each segment. The segmenter only subdivides an image; it does not attempt to
recognize the individual segments or their relationships to one another.
There is no theory of image segmentation. As a consequence, no single standard method of
image segmentation has emerged. Rather, there are a collection of ad hoc methods that have
received some degree of popularity. Because the methods are ad hoc, it would be useful to
have some means of assessing their performance. Haralick and Shapiro (1) have established the
following qualitative guideline for a good image segmentation: “Regions of an image
segmentation should be uniform and homogeneous with respect to some characteristic such as
gray tone or texture.
Region interiors should be simple and without many small holes. Adjacent regions of a
segmentation should have significantly different values with respect to the characteristic on
which they are uniform. Boundaries of each segment should be simple, not ragged, and must
be spatially accurate.” Unfortunately, no quantitative image segmentation performance metric
has been developed.
Several generic methods of image segmentation are described in the following sections.
Because of their complexity, it is not feasible to describe all the details of the various
algorithms. Surveys of image segmentation methods are given in References 1 to 6.
This section considers several image segmentation methods based on the thresholding of
luminance or color components of an image. An amplitude projection segmentation technique
is also discussed.
11.1.1. Bilevel Luminance Thresholding
Many images can be characterized as containing some object of interest of reasonably uniform
brightness placed against a background of differing brightness. Typical examples include
handwritten and typewritten text, microscope biomedical samples, and airplanes on a runway.
66 | P a g e
Road Extraction From Satellite Imagery [2008-09]
For such images, luminance is a distinguishing feature that can be utilized to segment the
object from its background. If an object of interest is white against a black background, or vice
versa, it is a trivial task to set a midgray threshold to segment the object from the background.
Practical problems occur, however, when the observed image is subject to noise and when both
the object and background assume some broad range of gray scales. Another frequent difficulty
is that the background may be nonuniform. Figure 17.1-1a shows a digitized typewritten text
consisting of dark letters against a lighter background. A gray scale histogram of the text is
presented in Figure 17.1-1b. The expected bimodality of the histogram is masked by the
relatively large percentage of background pixels. Figure 17.1-1c to e are threshold displays in
which all pixels brighter than the threshold are mapped to unity display luminance and all the
remaining pixels below the threshold are mapped to the zero level of display luminance. The
photographs illustrate a common problem associated with image thresholding. If the threshold
is set too low, portions of the letters are deleted (the stem of the letter “p” is fragmented).
Conversely, if the threshold is set too high, object artifacts result (the loop of the letter “e” is
filled in).
Several analytic approaches to the setting of a luminance threshold have been proposed (7,8).
One method is to set the gray scale threshold at a level such that the cumulative gray scale
count matches an a priori assumption of the gray scale probability distribution (9). For example,
it may be known that black characters cover 25% of the area of a typewritten page. Thus, the
threshold level on the image might be set such that the quartile of pixels with the lowest
luminance are judged to be black. Another approach to luminance threshold selection is to set
the threshold at the minimum point of the histogram between its bimodal peaks (10).
Determination of the minimum is often difficult because of the jaggedness of the histogram. A
solution to this problem is to fit the histogram values between the peaks with some analytic
function and then obtain its minimum by differentiation. For example, let y and x represent the
histogram ordinate and abscissa, respectively. Then the quadratic curve
y = ax2 + bx + c
(a) Gray scale text
(b) Histogram
67 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(c) Segment 11
(d ) Segment 11 histogram
FIGURE 11.1-4. Multilevel luminance thresholding image segmentation of the peppers_mon
segmentation, 1 branch.
image; second-level
stage of separation under manual surveillance. Ohlander's segmentation technique using
multidimensional thresholding aided by texture discrimination has proved quite effective in
simulation tests. However, a large part of the segmentation control has been performed by a
human operator; human judgment, predicated on trial
threshold setting results, is required for guidance.
In Ohlander's segmentation method, the nine property values are obviously interdependent.
The YIQ and intensity components are linear combinations of RGB; the hue and saturation
measurements are nonlinear functions of RGB. This observation raises several questions. What
types of linear and nonlinear transformations of RGB are best for segmentation? Ohta et al. (19)
suggest an approximation to the spectral Karhunen–Loeve transform. How many property
values should be used? What is the best form of property thresholding? Perhaps answers to
these last two questions may be forthcoming from a study of clustering techniques in pattern
recognition (20).
Property value histograms are really the marginal histograms of a joint histogram of property
values. Clustering methods can be utilized to specify multidimensional decision boundaries for
segmentation. This approach permits utilization of all the property values for segmentation and
inherently recognizes their respective cross correlation. The following section discusses
clustering methods of image segmentation.
68 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 17.1-5. Typical property histograms for color image segmentation.
11.1.2. Amplitude Projection
Image segments can sometimes be effectively isolated by forming the average amplitude
projections of an image along its rows and columns (21,22).
Figure 17.1-6 illustrates an application of gray scale projection segmentation of an image. The
rectangularly shaped segment can be further delimited by taking projections over oblique
69 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(a) Row projection
(c) Segmentation
(b) Original
(d ) Column projection
FIGURE 17.1-6. Gray scale projection image segmentation of a toy tank image.
One of the earliest examples of image segmentation, by Haralick and Kelly (23) using data
clustering, was the subdivision of multispectral aerial images of agricultural land into regions
containing the same type of land cover. The clustering segmentation concept is simple;
however, it is usually computationally intensive.
Consider a vector of measurements at each pixel coordinate (j, k) in an image. The
measurements could be point multispectral values, point color components, and derived color
components, as in the Ohlander approach described previously, or they could be neighborhood
70 | P a g e
Road Extraction From Satellite Imagery [2008-09]
feature measurements such as the moving window mean, standard deviation, and mode, as
discussed in Section 16.2. If the measurement set is to be effective for image segmentation,
data collected at various pixels within a segment of common attribute should be similar. That is,
the data should be tightly clustered in an N-dimensional measurement space. If this condition
holds, the segmenter design task becomes one of subdividing the N-dimensional measurement
space into mutually exclusive compartments, each of which envelopes typical data clusters for
each image segment. Figure 17.2-1 illustrates the concept for two features. In the segmentation
process, if a measurement vector for a pixel falls within a measurement space compartment,
the pixel is assigned the segment name or label of that compartment.
Coleman and Andrews (24) have developed a robust and relatively efficient image
segmentation clustering algorithm. Figure 17.2-2 is a flowchart that describes a simplified
version of the algorithm for segmentation of monochrome images. The first stage of the
algorithm involves feature computation. In one set of experiments,
Coleman and Andrews used 12 mode measurements in square windows of size 1, 3, 7, and 15
pixels. The next step in the algorithm is the clustering stage, in which the optimum number of
clusters is determined along with the feature space center of each cluster. In the segmenter, a
given feature vector is assigned to its closest cluster center.
FIGURE 17.2-1. Data clustering for two feature measurements.
FIGURE 17.2-2. Simplified version of Coleman–Andrews clustering image segmentation method.
71 | P a g e
Road Extraction From Satellite Imagery [2008-09]
The amplitude and clustering methods described in the preceding sections are based on point
properties of an image. The logical extension, as first suggested by Muerle and Allen (25), is to
utilize spatial properties of an image for segmentation.
11.3.1. Region Growing
Region growing is one of the conceptually simplest approaches to image segmentation;
neighboring pixels of similar amplitude are grouped together to form a segmented region.
However, in practice, constraints, some of which are reasonably complex, must be placed on
the growth pattern to achieve acceptable results.
Brice and Fenema (26) have developed a region-growing method based on a set of simple
growth rules. In the first stage of the process, pairs of quantized pixels are combined together
in groups called atomic regions if they are of the same amplitude and are four-connected. Two
heuristic rules are next invoked to dissolve weak boundaries between atomic boundaries.
Referring to Figure 17.3-1, let R1 and R2 be two adjacent regions with perimeters P1 and P2,
respectively, which have previously been merged. After the initial stages of region growing, a
region may contain previously merged subregions of different amplitude values. Also, let C
denote the length of the common boundary and let D represent the length of that portion of C
for which the amplitude difference Y across the boundary is smaller than a significance factor .
The regions R1 and R2 are then merged if
FIGURE 17.3-1. Region-growing geometry.
where is a constant typically set at . This heuristic prevents merger of adjacent regions of the
same approximate size, but permits smaller regions to be absorbed into larger regions. The
second rule merges weak common boundaries remaining after application of the first rule.
Adjacent regions are merged if
72 | P a g e
Road Extraction From Satellite Imagery [2008-09]
≥ 3
where is a constant set at about . Application of only the second rule tends to overmerge regions.
The Brice and Fenema region growing method provides reasonably accurate
segmentation of simple scenes with few objects and little texture (26, 27) but does not perform
well on more complex scenes. Yakimovsky (28) has attempted to improve the region-growing
concept by establishing merging constraints based on estimated Bayesian probability densities
of feature measurements of each region.
11.3.2. Split and Merge
Split and merge image segmentation techniques (29) are based on a quad tree data
representation whereby a square image segment is broken (split) into four quadrants if the
original image segment is nonuniform in attribute. If four neighboring squares are found to be
uniform, they are replaced (merge) by a single square composed of the four adjacent squares.
In principle, the split and merge process could start at the full image level and initiate split
operations. This approach tends to be computationally intensive. Conversely, beginning at the
individual pixel level and making initial merges has the drawback that region uniformity
measures are limited at the single pixel level. Initializing the split and merge process at an
intermediate level enables the use of more powerful uniformity tests without excessive
The simplest uniformity measure is to compute the difference between the largest and smallest
pixels of a segment. Fukada (30) has proposed the segment variance as a uniformity measure.
Chen and Pavlidis (31) suggest more complex statistical measures of uniformity. The basic split
and merge process tends to produce rather blocky segments because of the rule that square
blocks are either split or merged. Horowitz and Pavlidis (32) have proposed a modification of
the basic process whereby adjacent pairs of regions are merged if they are sufficiently uniform.
11.3.3. Watershed
Topographic and hydrology concepts have proved useful in the development of region
segmentation methods (33–36). In this context, a monochrome image is considered to be an
altitude surface in which high-amplitude pixels correspond to ridge points, and low-amplitude
pixels correspond to valley points. If a drop of water were to fall on any point of the altitude
surface, it would move to a lower altitude until it reached a local altitude minimum. The
accumulation of water in the vicinity of a local minimum is called a catchment basin. All points
that drain into a common catchment basin are part of the same watershed. A valley is a region
that is surrounded by a ridge. A ridge is the loci of maximum gradient of the altitude surface.
There are two basic algorithmic approaches to the computation of the watershed of an image:
rainfall and flooding.
In the rainfall approach, local minima are found throughout the image. Each local minima is
given a unique tag. Adjacent local minima are combined with a unique tag. Next, a conceptual
water drop is placed at each untagged pixel. The drop moves to its lower-amplitude neighbor
73 | P a g e
Road Extraction From Satellite Imagery [2008-09]
until it reaches a tagged pixel, at which time it assumes the tag value. Figure 11.3-2 illustrates a
section of a digital image encompassing a watershed in which the local minimum pixel is black
and the dashed line indicates the path of a water drop to the local minimum.
In the flooding approach, conceptual single pixel holes are pierced at each local minima, and
the amplitude surface is lowered into a large body of water. The water enters the holes and
proceeds to fill each catchment basin. If a basin is about to overflow, a conceptual dam is built
on its surrounding ridge line to a height equal to the highest- altitude ridge point. Figure 11.3-3
shows a profile of the filling process of a catchment basin (37). Figure 11.3-4 is an example of
watershed segmentation provided by Moga and Gabbouj (38).
FIGURE 11.3-4. Watershed image segmentation of the peppers_mon image. Courtesy of Alina
N. Moga and M. Gabbouj,
Tampere University of Technology, Finland.
(a) Original
74 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(b) Edge map
(c) Thinned edge map
FIGURE 17.4-1. Boundary detection image segmentation of the projectile image.
Simple watershed algorithms tend to produce results that are oversegmented (39). Najman and
Schmitt (37) have applied morphological methods in their watershed algorithm to reduce over
segmentation. Wright and Acton (40) have performed watershed segmentation on a pyramid of
different spatial resolutions to avoid oversegmentation.
It is possible to segment an image into regions of common attribute by detecting the boundary
of each region for which there is a significant change in attribute across the boundary.
Boundary detection can be accomplished by means of edge detection as described in Chapter
10. Figure 11.4-1 illustrates the segmentation of a projectile from its background. In this
example a derivative of Gaussian edge detector is used to generate the edge map of Figure
11.4-1b. Morphological thinning of this edge map results in Figure 17.4-1c. The resulting
boundary appears visually to be correct when overlaid on the original image. If an image is
noisy or if its region attributes differ by only a small amount between regions, a detected
boundary may often be broken. Edge linking techniques can be employed to bridge short gaps
in such a region boundary.
11.4.1. Curve-Fitting Edge Linking
In some instances, edge map points of a broken segment boundary can be linked together to
form a closed contour by curve-fitting methods. If a priori information is available as to the
expected shape of a region in an image (e.g., a rectangle or a circle), the fit may be made
75 | P a g e
Road Extraction From Satellite Imagery [2008-09]
directly to that closed contour. For more complexshaped regions, as illustrated in Figure 11.4-2,
it is usually necessary to break up the supposed closed contour into chains with broken links.
One such chain, shown in Figure 11.4-2 starting at point A and ending at point B, contains a
single broken link. Classical curve-fitting methods (29) such as Bezier polynomial or spline fitting
be used to fit the broken chain.
In their book, Duda and Hart (41) credit Forsen as being the developer of a simple piecewise
linear curve-fitting procedure called the iterative endpoint fit. In the first stage of the algorithm,
illustrated in Figure 11.4-3, data endpoints A and B are connected by a straight line. The point
of greatest departure from the straight-line (point C) is examined. If the separation of this point
is too large, the point becomes an anchor point for two straight-line segments (A to C and C to
B). The procedure then continues until the data points are well fitted by line segments. The
principal advantage of the algorithm is its simplicity; its disadvantage is error caused by
incorrect data points. Ramer (42) has used a technique similar to the iterated endpoint
procedure to determine a polynomial approximation to an arbitrary-shaped closed curve.
Pavlidis and Horowitz (43) have developed related algorithms for polygonal curve fitting. The
curve-fitting approach is reasonably effective for simply structured objects. Difficulties occur
when an image contains many overlapping objects and its corresponding edge map contains
branch structures.
FIGURE 11.4-2. Region boundary with missing links indicated by dashed lines.
76 | P a g e
Road Extraction From Satellite Imagery [2008-09]
FIGURE 11.4-3. Iterative endpoint curve fitting.
11.4.2. Heuristic Edge-Linking Methods
The edge segmentation technique developed by Roberts (44) is typical of the philosophy of
many heuristic edge-linking methods. In Roberts' method, edge gradients are examined in
pixels blocks. The pixel whose magnitude gradient is largest is declared a tentative edge point if
its magnitude is greater than a threshold value.
77 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Then north-, east-, south-, and west-oriented lines of length 5 are fitted to the gradient data
about the tentative edge point. If the ratio of the best fit to the worst fit, measured in terms of
the fit correlation, is greater than a second threshold, the tentative edge point is declared valid,
and it is assigned the direction of the best fit. Next, straight lines are fitted between pairs of
edge points if they are in adjacent blocks and if the line direction is within degrees of the edge
direction of either edge point. Those points failing to meet the linking criteria are discarded. A
typical boundary at this stage, shown in Figure 11.4-4a, will contain gaps and multiply
connected edge points. Small triangles are eliminated by deleting the longest side; small
FIGURE 11.4-4. Roberts edge linking.
rectangles are replaced by their longest diagonal, as indicated in Figure 17.4-4b. Short spur lines
are also deleted. At this stage, short gaps are bridged by straight-line connection. This form of
edge linking can be used with a wide variety of edge detectors. Nevatia (45) has used a similar
method for edge linking of edges produced by a
Heuckel edge detector.
Robinson (46) has suggested a simple but effective edge-linking algorithm in which edge points
from an edge detector providing eight edge compass directions are examined in blocks as
indicated in Figure 11.4-5. The edge point in the center of the block is declared a valid edge if it
78 | P a g e
Road Extraction From Satellite Imagery [2008-09]
possesses directional neighbors in the proper orientation. Extensions to larger windows should
be beneficial, but the number of potential valid edge connections will grow rapidly with window
11.4.3. Hough Transform Edge Linking
The Hough transform (47–49) can be used as a means of edge linking. The Hough transform
involves the transformation of a line in Cartesian coordinate space to a
point in polar coordinate space. With reference to Figure 17.4-6a, a straight line can be
described parametrically as
p = xcosθ + y sinθ
where p is the normal distance of the line from the origin and θis the angle of the origin with
respect to the x axis. The Hough transform of the line is simply a point at coordinate in the polar
domain as shown in Figure 17.4-6b. A family of lines passing through a common point, as shown
in Figure 11.4-6c, maps into the connected set of points of Figure 17.4-6d. Now consider the
three collinear points of Figure 11.4-6e. The Hough transform of the family of curves passing
through the three points results in the set of three parametric curves in the space of Figure
11.4-6f. These three curves cross at a single point corresponding to the dashed line passing
through the collinear points.
79 | P a g e
Road Extraction From Satellite Imagery [2008-09]
12.Literature Review
Because of its significant applicability, automated road information extraction from satellite and
aerial imagery has been an important subject of research in remote sensing. The current effort
is particularly interested in road network extraction using pattern recognition approaches. The
reason is that most applications involved in safety, hazards, and disaster assessment requires
detailed and accurate representations of road networks, which can only be obtained from high
resolution images. However, these types of images are mostly panchromatic, such as Digital
Orthophoto Quadrangles (DOQ) from U.S. Geological Survey (USGS), and one-meter resolution
IKONOS images from Space Imaging. Their spectral signatures are weak and cannot be
effectively utilized to identify road features. Instead, image interpretation has to rely on image
texture, shapes, patterns and changes of local image intensity.
In the last twenty years or so, a variety of extraction methods have been developed. Some of
them can be generally applied for linear feature extractions, while others are particularly
designed for the road extraction task. The report classifies these methods into five categories:
ridge finding, heuristic reasoning, dynamic programming, statistical tracking, and map
It is worth noting that such a classification is mainly for generalization and convenience of
description. Actually, it is very difficult to classify some of these methods into specific
categories. For instance, some of the existing algorithms may use a combination of different
methods. And in some other cases, a method may fall into several method categories.
12.1. Ridge Detection
Intuitively roads are linear features that are shown as ridges or valleys on an image. Therefore
road finding can also be considered as a task of ridge finding. The procedure developed by
Nevatia and Babu (1980) is a classic one, which starts with the convolution of an image with a
number of edge filters to obtain edge magnitude and direction, then goes through a
thresholding and thinning process in order to delineate edge pixels, and finally connects the
delineated edge pixels to form linked line segments. Gradient Direction Profile Analysis (GDPA)
(Wang et al.1992; Gong and Wang, 1997) is another representative method used for ridge
finding. This method first calculates the gradient direction for each pixel, which is defined as the
direction of maximum slope among the four defined directions near the pixel. As the ridge
direction has the same direction as a road segment or a linear segment, and it is perpendicular
to the gradient directions of the pixels with the ridge, an analysis of the gradient profile will
generate the ridge pixels, which correspond to the highest points of the profile. Linking the
ridge points then produces the ridgelines, or in the road detection case, the road segments.
80 | P a g e
Road Extraction From Satellite Imagery [2008-09]
The concept of differential geometry has also been applied to ridge detection (Steger, 1996).
This method uses curve or surface fitting techniques to derive ridge locations on an image.
Once the image intensity surface is represented with a mathematical equation, the first and the
second derivatives of the equation can be analyzed to locate ridgelines. Image filtering methods
have also been extensively discussed in the literature for edge or ridge delineation. Most
frequently referenced methods in this category include Canny’s edge detector (Canny, 1986)
and Marr and Hildreth’s zero crossing operator (Marr and Hildreth, 1980). These methods filter
out the low frequency information in the image and preserve the high-frequency structures,
such as roads, which are particularly useful for high-level recognition of road networks because
through image filtering, a large amount of information that is irrelevant to road recognition can
be thrown away at the very beginning, and then image analysis can focus on these structures
that contain road networks.
12.2. Heuristic Method
The heuristic method makes use of a reasoning process similar to the human vision system.
Sometimes it is also referred to as rule- or knowledge-based method. Melsels and Mintz (1990)
considered a three-stage reasoning concept and applied the concept to road network
extraction: a low-level stage that deals mainly with pixel segmentation, a high-level stage that
models and labels feature objects, and an intermediate level of representation that interfaces
between low level and high-level processing. Image primitives that are considered as building
blocks of roads are first identified with pixel value checking at neighborhood levels.
Intermediate level analysis will combine image primitives into line segments with the use of
some reasoning mechanism (e.g., how far the combined primitives are apart; what directions
each of the primitives follow; whether they parallel with each other; etc). High level processing
allows further gap filling and segment grouping by considering distances, brightness, and
uniformity among the grouped tokens and gaps between them.
One of the advantages of the heuristic approach is in its flexibility in dealing with problems such
as linear feature alignment and fragmentation. For example, McKeown and Pane (1985)
suggested that trend and the relative distances between fragments can be utilized as factors
while fragmented primitives are aligned and connected. With some extension, McKeown and
Denlinger (1985) also developed a road tracking system that relies on road texture correlation
and road edge following alternatively: when one fails, the other will be utilized. Later, Mckeown
et al (1992) and (Zlotnick, 1993) introduced methods that track roads by searching for
antiparallel edges as starting points for road tracking and linking.
12.3. Dynamic Programming
Dynamic Programming (DP) has been frequently utilized in road network extraction and has
been described in detail by Gruen and Li (1996). The most appealing aspect of the method is
that the road recognition problem can be formulated as an optimization program, and the
81 | P a g e
Road Extraction From Satellite Imagery [2008-09]
solution to this program will result in the delineation of the road pixels. According to Gruen and
Li, roads can be modeled with a set of mathematical equations. That is, derivatives of the gray
values in the direction normal to the roads tend to be maximized, while derivatives along the
road direction tend to be minimized. At the same time, roads tend to be straight lines or
smooth curves, normally circular arcs, and their local curvature has an upper bound. These
properties, then, can be characterized with a merit function, which can be solved using the DP
The introduction of the DP technique in pattern recognition dates back to the late 1960s. Since
then, the technique has been used on raw images for pixel based processing (e.g., Montanari,
1970), and also used on tokens that are based upon higher level representations (Fischler et al,
1981; Geman and Jedunak, 1996). The technique is considered to be a good approach to finding
curves in noisy pictures, because it can bridge weakly connected feature elements
automatically while the program searches for optimal solutions. This property is also preserved
when applied to the grouping of feature elements with higher level of representations. In either
case, regulatory constraints can be utilized to derive curves that will meet certain predefined
geometric requirements (e.g., smoothness or curvature). More importantly, DP provides a way
to serialize the optimization procedure to allow computationally attainable solutions.
12.4. Statistical Inference
Due to complexities of road images, an ability to handle uncertainties (e.g., bridge crossing,
road width changing, cars and shadows on the roads, image noises, etc.) is essential. For this
reason, statistical models are particularly attractive for road image representation. Cooper
(1979) first came up with the idea of modeling a blob boundary or a linear feature as a Markov
Based upon this modeling scheme, he was able to use maximum likelihood estimation
to derive the boundary representation that has the same formulation as other deterministic
methods. The underlying significance of Cooper’s scheme is that it provides an elegant and
effective methodology for incorporating uncertainties into the linear feature recognition
Barzohar and Cooper (1996) explored the idea further and developed a stochastic
approach that can be more sophisticatedly applied to automatic road finding. This approach
makes use of the so-called geometric-stochastic model that formulates road width, direction,
gray level intensity, and background intensity as a stochastic process using the Gibbs
Distribution. Then road detection is achieved through the maximum a posteriori probability
(MAP) estimation. The method demonstrated promising results and can be further extended to
consider possibly many other types of uncertainties in road extraction. A more recent study by
Yuille and Coughlan (2000) provides additional enhancements to Barzohar and Cooper’s
approach. This study not only allows the analysis of the time and memory complexity in the
MAP estimation, but also the determination of whether roads are detectable in a given image.
The work of Geman and Jedynak (1996) is also based upon a statistic model that tracks roads
through hypothesis testing. Their approach uses the testing rule that is computed from the
82 | P a g e
Road Extraction From Satellite Imagery [2008-09]
empirical joint distributions of tests (matched filters for short road segments) to determine
whether the hypothesis (road position) is true or not. The tests are performed sequentially and
a uncertainty or entropy minimization procedure is devised to facilitate testing decisions, so
that new tests can be analytically identified. The method appears to work reasonably well on
low resolution images, but likely adaptable to high resolution images as well.
12.5. Map Matching
The rationale behind map matching in road extraction is that there already exists a large
amount of data on road systems in many parts of the world, especially in the United States.
Once in house, this data can serve as a starting point for road network extraction. There are
many advantages to combining remotely sensed data with existing databases (Wang at
al.,1992). That is, existing data can be used as the interpretation key at the beginning, then
verified, or updated, and the attributes of existing network can be also transferred to the newly
updated network.
The map matching approach has been explored by several researchers (Maillard
and Cavayas, 1989; Stilla, 1995; Fiest et al., 1998; Zafiropoulos, 1998). Maillard and Cavayas first
introduced the map-matching approach. Using this approach, they established a road updating
procedure that consists of two major algorithms. The first algorithm focuses on image-map
matching to identify roads that can be found on the map and the image. The second algorithm
is then to search new roads based upon the assumption that these new roads are connected to
the old ones.
This approach was studied further and applied in different geographic settings (Fiset
and Cavayas, 1997). To overcome some of the map matching problems of this approach, Fiset
et al (1998) used a multi-layer perceptron based upon template matching to improve its
performance further.
Stilla developed a syntax-oriented method to use the map knowledge as a
supportive aid for image interpretation. In this study, representations of road network
structures are first obtained through map analysis. Then image object models are defined and
utilized to search for objects that fulfil model expectations with a given tolerance. Assessment
on image objects with respect to its correspondence to the map representations results in road
object identification for a given image scene. Zafiropoulos (1998) considered the concepts of
deformable contours, B-Spines, Ribbon Snakes and presented a mathematical-modeling-based
road verification and revision framework. With this framework, road centerlines are localized
using a global least square adjustment.
83 | P a g e
Road Extraction From Satellite Imagery [2008-09]
12.6. Summary
From the above review of the existing methods on road network extraction, several
observations can be made. First, the concept for road network extraction is relatively simple.
That is, roads are shown as ridges and valleys; image intensity changes little along a road, but
change drastically while cutting across a road; road direction changes tend to be smooth and
have a upper bound; and roads follow some topological regularities in terms of connectivity
among themselves.
Basically, many of the existing methods make use of one or several of these
Second, reliable road network extraction remains a difficult task, and there exists no
algorithm sufficiently reliable for practical use (Geman, and Jedynak, 1996). This is primarily due
to the fact that the real world is too complex, and many of the existing tools can only handle
very specific cases. Some may be able to handle more complex situations, but still can not be
compared with a trained human operator. Mathematical models such as statistical models and
dynamic programming have enhanced the ability to deal with this complexity significantly, but
factors such as computational performance, implementation difficulty, and limitation of
sophistication make more ambitious efforts difficult. Theoretically, a road recognition algorithm
can consider all possible listed road characteristics. Practically, a road recognition algorithm can
only consider a limited set of characteristics, and when these characteristics change beyond a
limit, the algorithm may fail.
Finally, even though automated road extraction is a difficult problem to solve,
improvement has been constantly made and there are still opportunities to be explored to
further improve its performance. It appears that the semi-automated and the map-matchingbased approaches tend to be more practical for short-term implementations. In the long run,
studies of road image characteristics, their changes with respect to geographic background,
image types, image resolutions, and development of mathematical models to represent these
characteristics are critical in order to make substantive progress in this area.
84 | P a g e
Road Extraction From Satellite Imagery [2008-09]
13.1 Strategy
We firstly define the main roads in particular resolution image. The definition depends on the
geometric attributes. Geometric attributes include road width and road length limit. The road
whose width and length is greater than a limit will be considered as ‘main road’. For IKONOS
1m/pixel high resolution image, as a case study we set the width and length limit as 12m and
400m. The main road extraction is based on hierarchical strategy. Firstly, image pyramid is
generated according to image resolution and main road definition. Actually a 2-layers image
pyramid in which the low resolution is re-sampled to ¼ resolution so as to extract the main road
in this layer is generated.
85 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Generate image pyramid
Line Detection
Segmentation Grouping
Road Vereification
Road Precise Extraction
Figure 1. Workflow of automatic main road extraction
13.2 Line Detection
A 2-direction (horizontal and vertical) template matching is applied to detect road central point
candidates in low resolution layer of the image pyramid. The procedure is shown in figure 2.
Matching in
Matching in
Figure 2. Line detection based on template matching
86 | P a g e
Road Extraction From Satellite Imagery [2008-09]
(a) IKONOS image
Figure 3 . Binary detection from IKONOS image after thresholding (19).
This is a very simple and ‘easy to use’ detector. At first, Gaussian filtering smoothes the image
for reducing noise and then corre
sponding template is generated by road profile. Road profile
87 | P a g e
Road Extraction From Satellite Imagery [2008-09]
template is used to matching image signal by correlating along x (horizontal) and y (vertical)
direction, the local peak correlation coefficient indicates the existing of candidate of road
centre point. Figure 3 is an example of the detection in a ¼ resolution layer of the image
pyramid. The detection from low resolution layer is much more effective, but it cannot get so
accurate geometric location of road point. However, finally the precise delineation will be
applied to adjust the road central lines based on original resolution image. After Line detection,
candidate lines should be traced to vector form for next step of line segments grouping.
13.3 Lines Grouping
Grouping plays a very important role in road extraction. Due to the alignment and
fragmentation of the candidate segments gotten from line detection step, grouping only by
geometric attributes is often unreliable, so we should take more information into account. On
the other hand, if one tries to group all detected candidate segments, it will be too complex and
the computational cost is too high because of the great amount of segments. Now the question
is: how to guarantee the ‘reliability’ of road grouping? In this paper, the grouping is based on a
multi-level grouping according sorted segments by ‘labeling’ possibility of being road segment’
to each segment. Figure 4 shows the grouping strategy.
Sorting By
Long Straight Lines
Short Straight Lines
Curve Segments
Grouped Lines
88 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Figure 4. Grouping Procedure
Grouping starts from segments with higher possibility of being road segment. Long, straight,
and bright line segment has the higher score while short, curve, and dark line has lower score.
From the viewpoint of vision, one tries to find salient, bigger and more global structuralized
features when one recognizes objects from image. This is the ability of so called ‘perceptual
organization or grouping’ which can be defined as imposing
Figure 5. Link clue in the gap
structural regularity on sensory data. To evaluate the possibility of each candidate segment an
evaluation function is built depend on geometric and photometric attributes, and then
segments are sorted by the possibilities. Grouping starts from longer straight line segment,
sequentially shorter, curve segments are grouped.
Grouping takes advantage of clues from detailed line detection result. The link clue
between grouped longer segments mainly refers to short lines. See figure 5, shorter lines in the
gap between segments strongly indicate the linkage. The strategy can not only reduce the
computational complexity by starting grouping from segments having higher possibility of being
road centreline, but also enhance the reliability of grouping via using the clues between
segment gaps which are detected by line detector.
Grouping is based on the Gastalt principles of similarity, continuality and proximity. The salient
linear feature on the image – the main road can be easily perceived by human vision system
even the feature is locally occluded, perceptual grouping plays core role in segment linking.
The implementation of grouping is depended on a global link matrix for extracting collinear
segment (Daniel C 1999). The segment connectivity matrix A can be formed by considering the
valid collinear junctions extracted in the previous steps. If n is the number of segments, A is of
dimension n * n, with ai, j, if segments i and j are collinear, and 0 otherwise. The candidate
collinear chains are extracted from the matrix A by tracing the minimal-cost route (the sum of
link possibilities is to be maximum). Figure 6 shows the collinear grouping work flow.
It’s an iteration procedure that ends when there is no new link generated.
89 | P a g e
Road Extraction From Satellite Imagery [2008-09]
New Links
Link Matrix
Collinear chains
Updating Linkage
New collinear Segments
Figure 6. Collinear grouping work flow
3.4. Verification and Precise Delineation of Extracted Road Centerline
To verify the extracted centerlines, original resolution image is used to evaluate the ‘score’ of
each line. Firstly, a length threshold is set to remove all short, isolated lines by applying a
connection analyzer because the main road means that it has definitely length or is connected
with other road parts; and then other longer lines are evaluated by peak value of
profiletemplate matching. Correlation mean of each segment indicates the photometric
possibility of being road, and the length and curvature of the centerline indicates the geometric
attribute of being road. An evaluation function is set up to combine the geometric and
photometric attribute together by weighted summing of the 2 items. So for each maintained
90 | P a g e
Road Extraction From Satellite Imagery [2008-09]
centerline, there is an assessment value to indicate the reliability or possibility of road
To get precise delineation of the road centerline, least square template matching is applied into
the correction of centerline curve in original resolution image. The algorithm is similar to
semiautomatic road extraction from initial seed points and image (Gruen, A.and H. Li, 1997).
91 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Figure 7 (a) is the extraction result from image shown in figure 3(a), the black lines are the
identified road centrelines; Figure 7 (b) shows the collinear grouping procedure, it starts from
the segments with higher possibilities and the link threshold is higher as mentioned in section
2.3, and then shorter and curve segments are grouped by setting lower link threshold and
referencing link clues. In (b), it shows the starting segment number is 393, after grouping it
becomes 296; in second stage, more segments with lower possibility are added into grouping
procedure and the segment number is from 719 varying to 637, as final processing, simple
topology analysis is used to remove the isolated and short segments, so the remaining main
road centrelines number reduces to 31
(a) Extraction result
Checking the result, we find the extracted main road is quite correc, while it is not so correct
and completed in the complex building site because in this case the extraction strategy and
algorithm based on segment grouping is not so reliable because even by human vision system
to identify the complicated structure of road or street network is still difficult due to the
dependence of contextual and more information needed.
92 | P a g e
Road Extraction From Satellite Imagery [2008-09]
Main road (centerline) extraction from high resolution satellite image is very useful for GIS
database revision, change detection, and some applications that transportation infrastructure
information, such as disaster management, urban planning plays an important role. Presented
method in this paper is trying to automatically extract the prior defined main road centerline.
The hierarchical grouping strategy applied could reliably extract most of main road centerlines
in the open rural areas even though there are some occlusions (caused by clouds, trees etc).
The method attempts to simulate the perceptual capability of human vision system to find
salient linear feature from image.
The experimental result indicates the potential utility of the method. For complex
scenes, more information should be integrated in order to obtain more robust result. For
acquiring more reliable result (no wrong extraction), we should take more road type (means
more profile template) and more reliable verification algorithm into account, that can be done
by combining kinds of template types and more segment attributes together for line detection
and final assessment.
Future work also includes extend the main road extraction to full road network
extraction, because the main road network is very strong indication and contextual information
for smaller road extraction and identification.
93 | P a g e
Road Extraction From Satellite Imagery [2008-09]
1. Baumgartner, A., C. Steger, H. Mayer, and W. Eckstein, 1999.Automatic road extraction based on
multi-scale,grouping and context, Photogrammetric Engineering & Remote Sensing,
2. Vol .65 (7), pp. 777-785. Daniel C, 1999. A probabilistic method for extracting chains of
collinear segments. Computer Vision and Image Understanding, Vol 76(1), pp. 36-53.
3. Gruen, A., E. Baltsavias, and O. Henricsson (editors), 1997. Automatic Extraction of Man-Made
Objects from Aerial and Space Images II, Birkhaeuser Verlag, Basel, Sweezerland.
4. Gruen, A.and H. Li, 1995. Road extraction from aerial and satellite images by dynamic
programming, ISPRS Journal of Photogrammetry and Remote Sensing, Vol .50 (4), pp. 11-21.
5. Heipke, C., C. Steger and R. Multhammer, 1996. A hierarchical approach to automatic road
extraction from aerial imagery, In: Integrating Photogrammetric Techniques with scene analysis
and machine vision II, Procceding of SPIE (D.M. Mckeown and I.J. Dowman, editors) (2486):222231.
Mayer H. and C. Steger, 1998. Scale-space events and their link to abstraction, ISPRS Journal of
Photogrammetry and RemoteSensing, Vol .53, pp. 62-75, McKeown, D.M, Harvey, and J,
McDermott, 1985. Rule-based interpretation of aerial imagery, IEEE Tran On PAMI-7, No 5,pp.
7. Treash K, Amaratunga K, 2000. Automatic road detection in grayscale aerial images, Journal of
Computing in Civil Engineering, Vol 14 (1),pp.60-69.
8. Trinder JC, Wang Y.D, 1998. Automatic road extraction from aerial images, Digital Signal
Processing, Vol 8 (4), pp.215-224. OCT
94 | P a g e
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF