Unmixing and target recognition in airborne hyper
Earth Science Research; Vol. 1, No. 2; 2012
ISSN 19270542 EISSN 19270550
Published by Canadian Center of Science and Education
Unmixing and Target Recognition in Airborne HyperSpectral Images
Amir Averbuch
1
, Michael Zheludev
1
& Valery Zheludev
1
1
School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Correspondence: Amir Averbuch, School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Tel:
972545694455. Email: [email protected]
Received: May 25, 2012 Accepted: June 16, 2012 Online Published: July 6, 2012 doi:10.5539/esr.v1n2p200 URL: http://dx.doi.org/10.5539/esr.v1n2p200
Abstract
We present two new linear algorithms that perform unmixing in hyperspectral images and then recognize their targets whose spectral signatures are given. The first algorithm is based on the ordered topology of spectral signatures. The second algorithm is based on a linear decomposition of each pixel's neighborhood. The sought after target can occupy sub or above pixel. These algorithms combine ideas from algebra and probability theories as well as statistical data mining. Experimental results demonstrate their robustness. This paper is a complementary extension to Averbuch & Zheludev (2012).
Keywords: hyperspectral processing, target recognition, sub and above pixel, unmixing, dimensionality reduction, diffusion maps
1. Introduction
1.1 Data Representation and Extraction of Spectral Information
We assume that a hyperspectral signature of a sought after material is given. In many applications according to
Winter (1999), a fundamental processing task is to automatically identify pixels whose spectra coincide with the given spectral shape (signature). This problem raises the following issues: How the measured spectrum of a ground material is related to a given “pure” spectrum and how to compare between them to determine if they are the same?
Spatial and spectral sampling produce a 3D data structure referred to as a data cube. A data cube can be visualized as a stack of images where each plane on the stack represents a single spectral channel (wavelength). The observed spectral radiance data, or the derived surface reflectance data, can be viewed as a scattering of points in a
K
dimensional Euclidean space
K
where K is the number of spectral bands (wavelengths). Each spectral band is assigned to one axis. All the axes are mutually orthogonal. Therefore, the spectrum of each pixel can be viewed as a vector
x
x
1
, , spectral band. Since
x
K x i
where its Cartesian coordinates
0,
i
K x i
are either radiance or reflectance values at each
, then the spectral vectors lie inside a positive cone in
K
. Changes in the illumination level can change the length of the spectral vector but not its, which is related to the shape of the spectrum. When targets are too small to be resolved spatially or when they are partially obscured or of an unknown shape, as shown in Winter (1999), then the detection has to rely on the available spectral information.
Unfortunately, a perfect fixed spectrum for any given material does not exist.
In agreement with Winter (1999), spectra of the same material are probably never identical even in laboratory experiments. This is due to variations in the material surface. The variability amount is even more profound in remote sensing applications because of the variations in atmospheric conditions, sensor noise, material composition, location, surrounding materials and other contributing factors. As a result, the measured spectra, which correspond to pixels with the same surface type, exhibit an inherent spectral variability that prevents the characterization of homogeneous surface materials by unique spectral signatures.
Another significant complication arises from the interplay between the spatial resolution of the sensor and the spatial variability present in the observed ground scene. According to Winter (1999), a sensor integrates the radiance from all the materials within the ground surface that are “seen” by the sensor as a single image pixel.
Therefore, depending on the spatial resolution of the sensor and the distribution of surface materials within each ground resolution cell, the result is a hyperspectral data cube comprised of “pure” and “mixed” pixels, where a pure pixel contains a single surface material and a mixed pixel contains multiple (superposition of) materials.
200
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
A linear mixing model is the most widely used spectral mixing model. It assumes that the observed reflectance spectrum of a given pixel is generated by a linear combination of a small number of unique constituent known as endmembers. This model is defined with constraints in the following way (Harsanyi &
Chang, 1994):
x
k
M
1
a s
,
k
M
1
a k
1,
additivity constraint,
a k
0 positivity constraint (1) where
s
1
s
M
are the M endmember spectra that assumed to be linearly independent, corresponding abundances (cover material fractions) and
w
is an additivenoise vector.
a
1
1.2 Outline of the Algorithms to Identify Target with Known Spectra a
M
are the
The new methods in this paper achieve targets identification with known spectra. Target identification in hyperspectral has the following consecutive steps:
1) Finding suspicious points: there are points whose spectra are different in any norm from the spectra of the points in its neighborhood. This is also called anomaly detection;
2) Extracting from the suspicious points the spectra of the independent components (unmixing) where one of them is the target that its spectrum fits the given (sought after) spectrum.
We assume that spectra of different materials are statistically dependent and the difference between them occurs from the behavior of the first and second derivatives in some sections in the spectrum. If they are statistically independent, then all the related work such as Maximum Likelihood (ML) and Geometrical (MVT, PPI and
NFINDR) work well.
The experiments in this paper were performed on three real hyperspectral datasets, which were measured as reflectance, titled: “desert”, “city” and “field” which were acquired by the Specim camera SPECIM camera (2006) located on a plane. Their properties with a display of one waveband per dataset are given in Figures 13.
Figure 1. The dataset “desert” is a hyperspectral image of a desert place taken from an airplane flying 10,000 feet above sea level. The resolution is 1.3 meter/pixel, 286 2640 pixels per waveband with 168 wavebands
201
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 2. The dataset “city” is a hyperspectral image of a city taken from an airplane flying 10,000 feet above sea level. The resolution is 1.5 meter/pixel, 294 501 pixels per waveband with 28 wavebands
Figure 3. The dataset “field” is a hyperspectral image of a field taken from an airplane flying 9,500 feet above sea level. The resolution is 1.2 meter/pixel, 286 300 pixels per waveband with 50 wavebands
The paper has the following structure: Section 2 describes the related work. The two algorithms, which are described in this paper, are compared with the performance of the orthogonal subspace projection (OSP) algorithm.
Section 3 presents an algorithm that identifies the target's spectrum where the target occupies at least a whole pixel.
This method assumes that the target's spectrum is distorted by atmospheric conditions and noised. Section 4 presents an unmixing method that is based on neighborhood analysis of each pixel. This method can also be used for detecting a subpixel target. This algorithm contains two parts. In the first part, suspicious points are discovered.
The algorithm is based on the properties of neighborhood morphology and on the properties of the Diffusion Maps
(DM) algorithm Coifman & Lafon (2006). The second part unmixes the suspicious point. It is based on the application of DM to the linear span of the neighboring background spectra. The appendix describes the Diffusion
Maps algorithm for dimensionality reduction.
2. Related Work
Uptodate overview on hyperspectral unmixing is given in BioucasDias & Plaza (2010; 2011). The challenges related to target detection, which is the main focus of this paper, are described in the survey papers Manolakis,
Marden, & Shaw (2001), Manolakis & Shaw (2002). They provide tutorial review on stateoftheart target detection algorithms for hyperspectral imaging (HIS) applications. The main obstacles in having effective detection algorithms are the inherent variability target and background spectra. Adaptive algorithms are effective to solve some of these problems. The solution provided in this paper meets some of the challenges mentioned in
Manolakis & Shaw (2002).
202
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
In the rest of this section, we divided the many existing algorithms into several groups. We wish to show some trends but do not attempt to cover the avalanche of related work on unmixing and target detection.
Linear approach: Under the linear mixing model, where the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, by the ML setup
Settle (1996) and by the constrained least squares approach Chang (2003). These methods do not supply sufficiently accurate estimates and do not reflect the physical behavior. Distinction between different material's spectra is conditioned generally by the distinction in the behavior of the first and second derivatives and not by a trend.
Independent component analysis (ICA) is an unsupervised source separation process that finds a linear decomposition of the observed data yielding statistically independent components Common (1994), Hyvarinen,
Karhunen, & Oja (2001). It has been applied successfully to blind source separation, to feature extraction and to unsupervised recognition such as in Bayliss, Gualtieri, & Cromp (1997), where the endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions. Numerous works including
Nascimento & BioucasDias (2005) show that ICA cannot be used to unmix hyperspectral data.
Geometric approach: Assume a linear mixing scenario where each observed spectral vector is given by
r
,
,
where r is an L vector ( L is the number of bands),
M m
1
m p
is the mixing matrix (
m i
denotes the
i
th endmember signature and
p
is the number of endmembers present in the sensed area),
s
a
T a a
1
a p
(
is a scale factor that models illumination variability due to a surface topography),
is the abundance vector that contains the fractions of each endmember ( T denotes a transposed vector) and
n
is the system's additive noise. Owing to physical constraints, abundance fractions are nonnegative and satisfy the socalled positivity constraint
k p
1
a k
1 . Each pixel can be viewed as a vector in a
L
dimensional Euclidean space, where each channel is assigned to one axis. Since the set
a
p
:
p k
1
a k
1,
a k k
is a simplex, then the set
S x
x
L
:
,
p k
1
a k
1,
a k k
is also a simplex whose vertices correspond to endmembers.
Several approaches Ifarraguerri & Chang (1999), Boardman (1993), Craig (1994) exploited this geometric feature of hyperspectral mixtures. The minimum volume transform (MVT) algorithm Craig (1994) determines the simplex of a minimal volume that contains the data. The method presented in Bateson, Asner, & Wessman
(2000) is also of MVT type, but by introducing the notion of bundles, it takes into account the endmember variability that is usually present in hyperspectral mixtures.
The MVT type approaches are complex from computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) Boardman (1993) and the
NFINDR Winter (1999) still find the maximum volume simplex that contains the data cloud. They assume the presence of at least one pure pixel of each endmember in the data. This is a strong assumption that may not be true in general. In any case, these algorithms find the set of most of the pure pixels in the data.
Extending subspace approach: A fast unmixing algorithm, termed
vertex component analysis (VCA), is described in Nascimento & BioucasDias (2005). The algorithm is unsupervised and utilizes two facts: 1) The endmembers are the vertices of a simplex; 2) The affine transformation of a simplex is also a simplex. It works with projected and unprojected data. As PPI and NFINDR algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already detected. The new endmember's signature corresponds to the extreme projection. The algorithm iterates until all the endmembers are exhausted. VCA performs much better than PPI and better than or comparable to NFINDR. Yet, its computational complexity is between one and two orders of magnitude lower than NFINDR.
If the image is of size approximately 300
2000 pixels, then this method, which builds linear span in each step, is too computationally expensive. In addition, it relies on “pure” spectra which are not available all the time.
Statistical methods: In the statistical framework, spectral unmixing is formulated as a statistical inference problem by adopting a Bayesian methodology where the inference engine is the posterior density of the random objects to be estimated as described for example in Dobigeon, Moussaoui, Coulon, Tourneret, & Hero (2009),
Moussaoui, Carteretb, Briea, & MohammadDjafaric (2006), Arngren, Schmidt, & Larsen (2009).
203
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
2.1 Orthogonal Subspace Projection (OSP)
The method of orthogonal subspace projection (OSP) for unmixing and target detection is described in Ahmad &
Ul Haq (2011), Ahmad, Ul Haq, & Mushtaq (2011), Ren & Chang (2003). We will compare between our method and the method in Ahmad & Ul Haq (2011) that is currently considered to be very effective. According to the notation in Ahmad & Ul Haq (2011), we are given the dataset
X i
where
S
is the set of pure signatures, A is the corresponding abundance fractions and
W
is a white noise matrix. According to the OSP method in Ahmad & Ul Haq (2011), the mixing matrix is found as
A
T
T
where
U
,
are a singular matrix and an eigenvaluesmatrix, respectively, of the projection matrix to the subspace L of the pure signatures and is the pseudo inverse of
U
. The creation of the subspace
L
is described in Ren, H., &
Chang, C. I. (2003), pp. 1236.
We present the results from target detection by the application of the OSP method with a given target signature
s
and compare them to our method. The targets in the scene are detected via the application of the OSP method on multipixels, which contain the dominant coefficient from the matrix A , corresponding to target signature
s
.
2.2 Linear Classification for Threshold Optimization
According to Cristianini & ShaweTaylor (2000), a binary classification is frequently performed by using a realvalued function :
n
class if
0,
in the following way: the input
x
x
1
, , otherwise, to a negative class. We consider the case where
x n
T
is assigned to a positive
is a linear function of
x
with the parameters
w
and
b
such that
w x b n
w x
b
(2)
,
n
i
1 where
are the parameters that control the function. The decision rule is given by is assumed to be the weight vector and
b
is the threshold. sgn
.
w
Definition 2.1. (Cristianini & ShaweTaylor, 2000))
A training set is a collection of training examples (data)
S
,
1
,
l
X Y l
(3)
where l
is the number of examples,
X
n
,
Y
1,1
is the output domain.
The Rosenblatt's Perceptron algorithm (Cristianini & ShaweTaylor, 2000; Burges, 1998; pages 12 and 8, respectively) creates an hyperplane
w x b
0
S
. It creates the best linear separation between positive and negative examples via minimization of measurement function of “margin” distribution
i
y i
,
i
b
.
i
0 that implies the correct classification for
x y i
.
The perceptron algorithm is guaranteed to converge only if the training data are linearly separable. A procedure that does not suffer from this limitation is the Linear Discriminant Analysis (LDA) via Fisher's discriminant functional Cristianini & ShaweTaylor (2000). The aim is to find the hyperplane
on which the projection of the data is maximally separated. The cost function (the Fisher's function) to be optimized is: where
P i
m i
and
b y i j
F
m
1
m
1
2
1
2
1
(4)
are the mean and the standard deviation, respectively, of the function output values
i
P i
Definition 2.2. (Cristianini & ShaweTaylor, 2000)
The dataset S from Equation 3 is linearly separable if the hyperplane w x b
0,
correctly classifies the training data. It means that separation threshold. If
i
0
i
y i
,
i
then the dataset is linearly inseparable. b
0,
i
l
In this case, b is the
Definition 2.3.
S of
p
1
p k
x
is isolated from the set
P
p
1
, ,
p k
n
if the training set
is linearly separable according to definition 2.2. In this case, the absolute value b is the separation threshold.
204
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Suppose that we have a set disjoint subsets
S
1 and
S
1
S
x
1
, ,
x n
of
n
samples. First, we want to partition the data into exactly two
. Each subset represents a cluster. The solution is based on the Kmeans algorithm
(Duda, Hart, & Stork, 2001). Kmeans maximizes the function where
e
is a partition. The value of
depends on how the samples are grouped into clusters and on the number of clusters (see Duda, Hart, & Stork,
2001)
B
(5) where
S
W
i
i
i
i
T
is an “withincluster scatter matrix” (Duda, Hart, & Stork, 2001),
l
is the classes,
S i
are the classes and
(Duda, Hart, & Stork, 2001), where
m i
S
B
are the center of each class.
i l
1
n m i
i
m
T
,
S
B n i
is called “betweencluster scatter matrix”
is the cardinality of a class and
m
is the center for all the dataset.
Definition 2.4.
Let
be the best separation for the set
S
x
1
x n
discriminant analyzes Cristianini and ShaweTaylor (2000), Burges (1998).
separation and b the Fisher's threshold for the data
P
.
When a dataset is separable? One criterion is when
Equation 4 is used.
diam
is defined as
m
1 max
m
1
x y
max
L
2
: ,
n
via Kmeans and Fisher's
is called the Fisher's
.
,
, where the notation in
Another criterion is:
Definition 2.5. (Duda, Hart, & Stork, 2001)
A dataset is separable if from Equation 5
is the partition and the number of classes is 1 and e
2
is the best partition into two classes. If the dataset is inseparable and Fisher's separation is incorrect.
J e
J e
1
where
e
1
then
3. Method I: Weak Dependency Recognition (WDR) of Targets That Occupy One or More Pixels
We assume that a target occupies one or more pixels. The process, which determines whether a given target's spectrum and the spectrum of the current pixel are dependent, is described next.
Definition 3.1.
Two discrete functions
Y
1
and
Y
2
are weakly dependent if there exists a permutation
of the
coordinates that provides monotonic order for the values of
Y
1
and
Y
2
.
Let
T be a given target’s spectrum and P is the pixel’s spectrum. We assume that the spectra of T and P are discrete vectors. In general, we assume that
T and P are normalized and centralized. The following hypotheses are assumed:
H
0
:
T and P are weakly dependent.
H
1
:
T and P are not weakly dependent.
3.1 Hypotheses Check
We find an orthogonal transformation
that permutes the coordinates of T into a decreasing order. This permutation
is applied to P and T . We get that
P
1
( ),
1
which means that
T and P are weakly dependent, then the values of increasing and the first and second derivatives of
P
1
P
1
has an oscillatory behavior  see Figure 4 (right). In addition,
P
1
T
P
1 where
T
1
is monotonic. If are close to zero  see Figure 4 (left). Otherwise,
H
0
holds,
are either monotonic decreasing or
H
1 holds and
has a subset of coordinates whose first and second derivatives have an oscillatory behavior  see Figure 4 (right).
205
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 4. The
x
 and the
y
axes are the wavebands and their reflectance values, respectively. The spectra are represented after the application of the permutation to the coordinates, which permutes
T into a monotonic deceasing order. Left: Weak dependency between T and P , Right: No weak dependency between T and P
If the permutation of the coordinates of
P provides that their values are either decreasing or increasing monotonically, then the first and second derivatives of
P have a minimal norm. This is another criterion for deciding who has weak dependency.
Let
x
x
1
, ,
x n
Definition 3.2.
Let
n
.
The norm is defined as
x
max
i
.
be an orthogonal transformation that permutes the coordinates of T into a decreasing order. Denote the second derivative of a vector X by
Let
Y
1
X
1
, ,
Y
X
2
.
can be classified as:
X
2
. Define the mapping
: be a dataset of spectra from all the pixels in the scene. Denote
Y i
n
such that
. The dataset
1) The set
2) The set
Y
1
Y
1
, ,
Y
, ,
Y
is separable according to definition2.5.
is inseparable according to definition 2.5.
In the first case,
is the best separation for the set the Fisher's threshold for this separation. Then, the set
no targets in the scene.
S
:
i
Y
1
b
Y
according to definition 2.4 and
b
is
is the set of targets. In the other case, there are
The flow of the WDR algorithm is given in Figure 5.
206
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 5. The flow of the WDR algorithm
3.2 Experimental Results
Figures 68 display the results after the application of the algorithm in section 3.1 to the “desert” image (Figure 1).
The yellow lines mark the neighborhood of the detected targets.
Figure 6. Left: One wavelength part from the original “desert” image (Figure 1). Right: The white points mark the detected targets. The intensity of each pixel in the right side corresponds to the value
where X the spectrum in the current pixel
207
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 7. Left: One wavelength part from the original “desert” image (Figure 1). Right: The white points mark the detected targets. The intensity of each pixel in the right side corresponds to the value
where
X is the spectrum in the current pixel
Figure 8. Left: One wavelength part from the original “desert” image (Figure 1). Right: The white points mark the detected targets. The intensity of each pixel in the right side corresponds to the value
where X is the spectrum in the current pixel
The desert image contains documented targets. The detection of the suspicious points in Figures 68 match exactly the known targets.
The point
P
1 in Figure 8 is the pattern of the known target's material. Its spectrum is displayed in Figure 4 as a plot of the “target”. Other spectra plots, which were detected by the WDR algorithm in the scenes of Figures 68, are classified as “spectra of suspicious points”.
208
www.ccsen
et.org/esr Vol. 1, No. 2; 2012 the
y
and the suspic ious points’ sp
x
respectively
 and compared next P algorithm.
11) algorithms . The In this sect tion, we compa ength (multipi ixel) from the original “deser rt” image (Fig ure 1). Right: The oints mark the detected targe algorithm Ah Red circle mar rks thm Ahmad & Ul Haq (2011
209 es not one
www.ccsen
et.org/esr
Figure 12 shows the RO Ccurves whil le comparing tw
Vol. 1, No. 2; 2012 y varying the t threshold. method. The green lin e corresponds hod corresp ength (multipi ixel) from the original “deser rt” image (Fig ure 1). Right: The gets by the WD
w
X is the
The intensity of each pixel i e spectrum in t xel ength (multipi ixel) from the original “deser rt” image (Fig ure 1). Right: The s by the OSP al lgorithm Ahm ed circle mark s the
210
www.ccsen
et.org/esr Vol. 1, No. 2; 2012
DR one
5. The red line corresponds to method. The green lin e corresponds hod
011)
g by Examini ing the Neighb oint (UNSP)
In this sect tion, we provid projection . They project the data into l linear subspace local model of f the backgrou gical structure of the pixel's n . This yields be (suspicious po . hang, ferent t into troduces the pa square of
m
2
m
1
1
m m
X
which is the nei
, where
m
1 is ighborhood's s with a center the radius of t at the pixel X this neighborh ssing.
m
yed in
Figure 17.
m
neighborho
211 by
m
www.ccsen
et.org/esr
A connect ted component means that t there exists a the next pi ixel is adjacent any two pixel e 18.
Vol. 1, No. 2; 2012 er. This conne ction pixels
Figure 18. The morph ighborhood is represented by nts
Consider t the spectra fro ctral image. Us sually there is high ese spectra in real situation s. For exampl le, Figure 19 displays the spectra from three of three differ rent materials
To reduce the correlation ixel X by
d
and it is c
d
‐sp vative s less than half
m
nei
(as a subpixel or as a whole
T is the given t ected um. s two steps: D etection of sus ing). n of the target
lter: Detection of Suspicious Points via Nei Morphology
m the
H
0 point.
H
1
: Y is ous point. column j
m
1 is the o e indices of
is denoted by f the neighbor
, , 1,
rhood’s radius.
.
m
, m xample, the cen cated in row
Y
p m
1
1,
m
1
1
i
and
212
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 20. The indices of a pixel
Denote by
S
,
m
the set of multipixels (multipixel means all the wavelengths that belong to this pixel) in the current neighborhood. Consider the mapping
: S
corr d p
S
ˆ
,
d Y
1,1
,
m
The set
ˆS can be in one of two cases:
. such that
is the correlation coefficient between the vectors
and
,
where
. Denote
1)
ˆS is inseparable according to definition 2.5. This means that the pixels, which are correlated with the target, are inseparable from the other pixels;
2)
ˆS is separable according to definition 2.5. This means that the pixels, which are correlated with the target, are separated from the other pixels.
If we are in case 1, then
Y is not a suspicious point. If we are in case 2, assume that is the first cluster closest to
1. According to definition 2.4,
provides the best separation. It separates the set
from the other points where
ij
:
b
,
b
.
If the set
represents two or more connected components, then Y is also not a suspicious point. If
Y
then
Y
is also not a suspicious point. Therefore,
H
1 holds. In other words, if
Y is a suspicious point, then
is a set of pixels that intersects with the target and this set of correlated points is concentrated around the central point
Y .
Here and below, we assume that a correlated point is a pixel whose
d
spectrum and correlated coefficient that is greater than Fisher's threshold
b
.
are correlated with the
Let
N
1
be the neighborhood
m
2
.
N
1
is called the internal square. Let the external square. They are visualized in Figure 21.
N
2
m
\
N
1
.
N
2
is called
Figure 21.
N
1 is the internal square and
N
2 is the external square
Assume that
b
,
is the set of all pixels
p ij
, which are bounded by the external square with correlation coefficients
, which are associated with the current neighborhood that are less than the Fisher's threshold
. Each pixel in
is treated as a vector (multipixel) where its entries are spread all over the wavelengths. The
d
spectra of this vector is denoted by
V v s
where
s
is one of the
. This is the set of all the
d
spectra that belong to
. If
s
then
.
The set of all these vectors is denoted by
V
v
1
,
v s
.
In order to derive the
d
spectrum of some material in the central pixel, the background around the central pixel has to be removed. For that, we construct an orthogonal projection
, which projects all the
d
spectra onto the
213
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012 orthocomplement of the linear span where the background of the
d
spectra is located. If the
d
spectrum of the central pixel
does not belong to this linear span, then this projection extracts an orthogonal component of
which does not mix with the background of the
d
spectrum. For example, if orthocomplement of this span. Then, after projection we obtain
d
d Y
spectrum and
d
2
d
2
d
1
2
where
d
1
belongs to the which does not correlate with the background of the
d
spectrum. Hence, the background influence is removed by this projection.
d
Now, we formalize the above. Assume the matrix
E is associated with the vectors
v v j
. Assume that
T e v
1
v s
where
is the Fisher's threshold, which separates between the big and small absolute values of the eigenvalues of the matrix
E . In some cases,
T e
can separate between zero and nonzero eigenvalues.
The eigenvectors associated with the eigenvalues, which are smaller than
T e
, generate the eigensubspace, which is the orthocomplement of the linear span of the principal directions of the set
V
. Denote this orthocomplement by
C
.
Throughout this paper, we assume that in our model the spectrum of any pixel X consists of three components:
1) The spectrum of the material M is different from its background;
2) The spectrum of the background was generated from a linear combination of spectra of pixels from the
X
neighborhood;
3) Random noise is present.
spectrum of the material
1
,
v s
M
,
P
'
is a linear combination of the vectors
M
'
v
1
,
v s
.
1
v s
N
where
P
'
,
M
'
is the
d
is the portion of the material
M in Y ,
N
is a random noise and
If the correlated points concentrate around
Y , then these points consist of the same material as Y . If the projection operator
. This operator projects vectors onto the orthocomplement
C
. The vector
1
v s
is approximated to be a zero vector. Thus, this orthogonal projection removes from the
d
spectrum of influence of the background.
Let
T
' be the given
d
spectrum of the target. If the correlation coefficient of than the correlation coefficient of holds.
P
'
and
T
'
and
, then
Y is a suspicious point, M is the target,
T
'
M
'
the
is greater
and
H
0
4.2 Detection of Outliers within a Single Testing Cube
In section 4.1, we presented how to detect suspicious points. There is another way to do it. An alternative detection method uses dimensionality reduction by the application of the Diffusion Maps (DM) algorithm Coifman & Lafon
(2006) and a nearestneighbor scheme. The DM is a nonlinear algorithm for dimensionality reduction.
Assume, we are given a data cube D of size
X Y Z
wavebands. We define a small testing cube d of size hyperspectral data cube D.
,
X h
Y
which is included in the
4.2.1 Dimensionality Reduction by DM Application
Assume that a sliding testing cube d, pointed by the arrows in Figure 22, is moving by ironing each time a different fragment in the data cube D described in Figure 2. Section 4.3 describes in details how the testing cube d moves.
214
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 22. An urban scene of size 294 501 (from the “city” in Figure 2) with different locations of the sliding testing cube d. The arrows point to these locations
The sliding testing cube d contains and
i h
are in the range 3050,
,
N
N
vh
multipixels each of which comprises Z wavebands. Typically,
v
Z is in the range 30100,
Y
290 . Thus, each of the
of length Z . We arrange these data points into a matrix M of size
N
N Z
. data points is a vector
The next step applies the DM (see the appendix for its description) algorithm to the matrix M. It reduces the dimensionality of the data vectors by embedding them into the main eigenvectors of the covariance matrix of the data M. This projection reveals the geometrical structure of the data and facilitates a search for singular (abnormal) data points. The data matrix M of size
N Z
is mapped onto the eigenvectors of the matrix P of size
,
Z
. Typically, R is in the range 35, which is determined by the magnitudes of the corresponding eigenvalues. R is the number of essential eigenvalues of the covariance matrix and it is determined as explained in Coifman & Lafon (2006). Figure 23 displays the embedding on three major eigenvectors of the data from four positions of the sliding testing cube. These are the embeddings onto three major eigenvectors of the covariance matrices.
215
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 23. Embedding of the data from different positions of the sliding testing cube on the image in Figure 2 onto three major eigenvectors of the diffusion matrix
We observe that the overwhelming majority of the embedded data points form a dense cloud while a few outliers present. It can be a single point, which lies far away from the rest of points or, more frequently, there exists a small group of points, which are located close to each other but far away from the majority of the cloud. This reflects the situation when an optional target can occupy the area of size from one to several pixels (or even a subpixel). These single or grouped outliers are detected as explained in the next section on outliers detection.
4.2.2 Detection of Grouped Outliers
Assume we are looking for groups of outliers that consist of no more than
K members. It is done by the following steps:
1) For each row
, 1, , ,
d
i
s i
,1
d
i
,1
j s i
,2
of the DM matrix P (see Appendix), calculate its Euclidean distances to all other rows and sort them in ascending order
d
2) Form the matrix
J
j
,
i
i
,2
N k
1, ,
s
,
1
N
1,
d
, 1
.
d
,
s
. Thus,
S s
,
i
N j
1, ,
N
1, of the sorted distances and the matrix of the corresponding indices.
j i
,1,
, columns
j
s
i
,1
,
,
i
1, , ,
1, , ,
s
, distances matrices
S
K
,
i
N
determine its
K nearest neighbors. For this, take the first K columns
of the index matrix J. The corresponding distances are presented in the first
K
N
of the matrix S. Thus, we have the nearest neighbor index where both are of size
N K
. First, the simplest case are looking for groups of outliers consisting of no more than two points, is handled.
2
J
K
and the
K
, which means that we
4) Assume that
2
max
i s i
nearest neighbor of the
,2
i
2
is achieved by
th data point in the data cube D. Store the point
2
,
2
p i
2
.
2
. It means that the distance to the second in order for the is the largest among the distances to their second nearest neighbors of all the data points. Restore the coordinates
x
2
and
y
2
of the data point
p i
2
(multipixel
m i
2
)
5) Find a) max
P
2
i s i
,1
. Two alternatives are possible:
is an isolated outlier. It takes place when the maximum
1
that the distances from the point distances of all the other points.
P
2 max
i s i
,1 is achieved by
2
. It means
to its first two nearest neighbors is greater than the respective b) However, it may happen that some point lies close to
P
2
while all the others are far apart. It can be interpreted as a pairwise outlier. An indicator of this situation is the fact that the maximum
2
1
max
,
2
i s i
,1
is achieved by
and regard
,
2
i i
1
i
2
. In this case, we add the point
as a pairwise outlier. The index of the point
6) While looking for grouped outliers that may contain up to
K
max
i s
,
(multipixel
m i
K
is achieved by
K
. Restore the coordinates
) in the data cube D. Store the point
7) Find the maximal values in the first
K
1
The following alternatives are possible:
K k
K K
K
.
2
x
K
and max
i s
,
,
k
K
y
K
1,
1
,
1
1
closest to the point
,
1
is
i
1
j i
2
,1
.
K
, such that
of the data point
p i
K
of the distance matrix S.
216
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012 a)
P
K
is an isolated outlier. It takes place when all the maxima means that the distances from the point distances for all the other points.
P
K
k
'
k
K
1,
are achieved
K
. It
to its first nearest neighbors are greater than the respective this case, we add the point achieved by some other the point
i
1
j i
K
P
K i
j i
i
and of the
i
K
,
L
K
1
, respectively.
, c) If the maxima in the columns
1
k
,
k
,
k
L points closest to
K
1 that is the closest to the point pairwise outlier. The index of the point
k
1
,
1
is
K
i
, then we have grouped outliers. These outliers
P
K
1
1,
L j i
2
,1
1,
.
2
are achieved by
,
2
and regard
are achieved by
P
1
,
i i
,
K
,
. The indices of the points
, while
P
1
K
1
,
i i
2
K
. In
as a
L
consist of
P
L
are
We emphasize that, once the upper limit
K is given, the number
L
1 automatically depending on the data within the sliding testing cube d. Figure 24 illustrates the grouped detected outliers in the 3dimensional space of eigenvectors of the data from four positions of the sliding testing cube.
Figure 24. Detection of grouped outliers in data from different positions of the sliding testing cube embedded in the diffusion space
4.3 Detection of Singular Points within the Whole Data Cube
In the section on outliers detection, we described how to find a group of data points (multipixels) within one sliding testing cube, whose geometry differs from the geometry of the majority of the data points. Let
P
1
1
, ,
P
1
L
1
be the list of such data points in the sliding testing cube
d of size
v h Z
located in the upper left corner of the sliding data cube D as illustrated by the arrow in Figure 22. The next testing cube
h
/ 4 of
P
1
2
P
2
L
2
be the list of outliers in the cube of the vast overlap between the cubes
2
to
1
. Because
2 and
217
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
1 . In the united list, these points gain the weight 2. The next right shift produces the sliding testing cube outliers list
3 s appended to the combined list
1 2
. Again, the common gain weights. We proceed with the right shifts till the right edge of the data cube D. Then, the sliding testing cube slides down by
v
/ 4
and starts
shifts to the left and so on. As a result, we get a combined list
R
1
i
of outliers, where
R is the number of jumps of the testing cube d within the sliding data cube D. Figure 22 illustrates a route of the cube d on the data cube D.
It is important that each point than 40. The weight that the point
P i w i
P i
in the list
is supplied with the weight
can serve as a singularity measure for the point
P i w i
, which can range from 1 to more
. A large weight
w i
reflects the fact
is singular for a big number of overlaps between sliding testing cubes. Thus, it can be regarded as a strong singular point in the sliding data cube D and vice versa. Figure 25 illustrates the distribution of the weighted singular points around the data cube
Figure 22 whose source is Figure 2.
D
U
of size 500 294 64 from the urban scene displayed in
Figure 25. Distribution of the weighted singular points around the data cube
D
U
Right: Singular points whose weights exceed 12
. Left: All the singular points.
4.3.1 Examples of Detected Singular Points
We applied the above algorithm to find singular points in different data cubes. The following figures display a few singular points detected in the data cube
D
U
.
Figure 26. A group of singular points centered around the point
Multipixel spectra at the point
P
399,85
P
329,85
. Left: Vicinity of the point P. Right:
and the surrounding points. The weight of the data point P is 19
218
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 27. A strong singular point
P
352,90
P
352,90 . Left: Vicinity of the point P. Right: Multipixel spectra of the point
and the surrounding points. The weight of the data point P is 32
Figure 28. A strong singular point
P
117,182
P
117,182 . Left: Vicinity of the point P. Right: Multipixel spectra at the point and the surrounding points. The weight of the data point P is 32
By comparing between Figures 28 and 27 we observe that spectra of singular multipixels located at points
P
117,182
and
P
352,90
are similar to each other. Supposedly, they correspond to the same material. A different singular multipixel is displayed in Figure 29.
Figure 29. A singular point
P
242, 202
P
242, 202 . Left: Vicinity of the point P. Right: Multipixel spectra at the point and the surrounding points. The weight of the data point P is 32
219
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
4.4 Extraction of the Target's Spectrum from a Suspicious Point
Let
Y be a suspicious point and let T be the given target's spectrum. What portion of the target is contained in
Y ?
We consider a simplified version of Equation 1 via the definition of a simple mixing model that describes the relation between a target and its background. Assume
P is a pixel of mixed spectrum (a spectrum that contains background influence and the target) and T is the given target's spectrum. Consider three spectra: an average background spectrum
B
M k
1
a B k
, a mixed pixel spectrum (spectrum of a suspicious point)
P and the target's spectrum T . They are related by the following model
P tB
1
t k
M
1
c B k
(6) which is a modified version of Eq. 1, where
a
1
t
and
s
1
T t
,
t
. , 1, ,
M
the neighborhood pixel. Therefore, all of them are close to each other and have a similar feature.
, was taken from
We are given the target's spectrum
T and the mixed pixel spectrum P . Our goal is to estimate t denoted by ˆt , which will satisfy Equation 6 provided that
B and T have some independent features. Once ˆt is found, the estimate of the unknown background spectrum
B , denoted by ˆB , is calculated by
ˆ
P tT
.
Estimating the parameter t in Equation 6 is called linear unmixing.
In Step 2 from Section 4.1, we calculated the following:
V
is the
d
spectra set, which is uncorrelated with
pixels from the
m
neighborhood of Y and
, is the projection operator onto the orthocomplement of the linear span of
V
. Let
P
2
,
T
2
random noise that is independent of
T
2
, then
P
2
t T
2
N
. The parameter
t
'
where
t is an unknown parameter,
N
is a
is estimated as the maximum of the independency between the two
d
spectra
T
2
and
P
2
t T
2
.
The fact that two vectors
X
1
and
X
2
are independent is equivalent to
corr
analytical function
(Hyvarinen, Karhunen, & Oja, 2001). An analytical function can be represented by a
Taylor expansion of its argument's degrees. Then, the condition
to
n
1, 2,3, 4
X
2
n
0
. From the independency criterion between the two vectors
X corr
1 and
X
2
X
1
,
X
2
we can have
0 equals to
for any positive integer
n
where
n
denotes a power. In our algorithm, we limit our self
f
,
2
2
3
4
(7) which equals to zero in case
X
1 and
X
2 are independent. If
' where P is the spectrum of the suspicious point and
B is a mix of the background's spectrum from the neighborhood that is affected by noise.
The flow of the UNSP algorithm is given in Figure 30.
220
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 30. The flow of the UNSP algorithm
4.5 Experimental Results
In this section, we consider two scenes “field” (Figure 3) and “city” (Figure 2) that contain the subpixel's targets.
As a first step, we find all the suspicious points via the application of anomaly detection process (section 4.2). The next step checks the anomaly by the “morphologicalfilter” which was described in section 4.1. If the pixel is passed via the application of the “morphologicalfilter” then the target is present in it.
Figures 31 and 32 present the outputs from the application of the “morphologicalfilter” algorithm to two different hyperspectral scenarios.
Figure 31. Left: The source image (Figure 2). Right: The white points are the suspicious points in the neighborhood of diameter
m
10
221
www.ccsen
et.org/esr Vol. 1, No. 2; 2012 hite points are t
m
10
the suspicious points in the
In Figures 33 and 34, th e
x
 and
y

Figures 31 present in
and 32 are the the parameter t from Equat means that this el. The estimat vectors
T
2
an d
P
2
t T
2
us gular points det tected as anom tion of t is d dure. hat is mization of th
f
4.4. Now we pr resent an unmi
in ixing ection
4.1). d their values, respectively. T
Figure 33.
The output fro
This suspi icious point is the suspicious point in Figur re 31. tions
222
www.ccsen
et.org/esr Vol. 1, No. 2; 2012
Figure 34.
The output fro
This suspi icious point is een UNSP and hms
In this sect tion, we comp gures 3541. g algorithm to t the suspicious point in Figur re 32. tions
, 2011) algorit thms.
Figure 35. Left: The ” image. Right : The white po detected targe
223
www.ccsen
et.org/esr Vol. 1, No. 2; 2012
Figure 36.
Left: The orig ithm
Figure 37. The detected t targets by the O gets. The other r points are “fa ver the original l “city” image.
. The yellow ci ircles
The OSP algorit more
7. The red line corresponds to ethod
224
011)
www.ccsen
et.org/esr Vol. 1, No. 2; 2012
Figure 39. Left: The o oints mark the detected targe
Figure 40 . Left: The ori
Ahmad & Ul Haq (2011) al lgorithm etected targets OSP
Fi igure 41. The d ts by the OSP a w circles mark known places m produces mo
225
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Figure 42. The “ROCcurve” for scene in Figure 41. The red line corresponds to OSP Ahmad, & Ul Haq (2011) method. The green line corresponds to the WDR method
5. Conclusions
We presented two algorithms for linear unmixing. The WDR algorithm detects well targets that occupy at least one pixel but fails to detect subpixel targets. The UNSP algorithm detects well subpixels targets but it is computational expensive due to the need to search for the spectral decomposition in each pixel's neighborhood by sliding the “morphologyfiler”. In the future, we plan to add to these algorithms a classification method with machine learning methodologies.
References
Ahmad, M., & Ul Haq, I. (2011). Linear Unmixing and Target Detection of Hyperspectral Imagery.
2011
International Conference on Modeling, Simulation and Control IPCSIT, IACSIT Press, 10, 179183.
Ahmad, M., Ul Haq, I., & Mushtaq, Q. (2011). AIK Method for Band Clustering Using Statistics of Correlation and Dispersion Matrix.
2011 International Conference on Information Communication and Management,
IACSIT Press, 10, 114118.
Arngren, M., Schmidt, M. N., & Larsen, J. (2009). Bayesian nonnegative matrix factorization with volume prior for unmixing of hyperspectral images.
IEEE Workshop on Machine Learning for Signal Processing (MLSP),
Grenoble, 16. http://dx.doi.org/10.1109/MLSP.2009.5306262
Averbuch, A. Z., & Zheludev, M. V. (2012). Two Linear Unmixing Algorithms to Recognize Targets using
Supervised Classification and Orthogonal Rotation in Airborne Hyperspectral Images. Remote Sensing, 4,
532560. http://dx.doi.org/10.3390/rs4020532
Bateson, C., Asner, G., & Wessman, C. (2000). Endmember bundles: A new approach to incorporating endmember variability into spectral mixture analysis. IEEE Trans. Geoscience Remote Sensing, 38,
10831094. http://dx.doi.org/10.1109/36.841987
Bayliss, J. D., Gualtieri, J. A., & Cromp, R. F. (1997). Analysing hyperspectral data with independent component analysis. Proceeding of the SPIE conference, 3240, 133143.
BioucasDias, J., & Plaza, A. (2010). Hyperspectral unmixing: Geometrical, statistical and sparse regressionbased approaches. SPIE Remote Sensing Europe, Image and Signal Processing for Remote
Sensing Conference, SPIE.
BioucasDias, J., & Plaza, A. (2011). An overview on hyperspectral unmixing: geometrical, statistical, and sparse regression based approaches.
Proceeding IEEE Int. Conf. Geosci. and Remote Sensing (IGARSS), IEEE
International, 11351138.
Boardman, J. (1993). Automating spectral unmixing of AVIRIS data using convex geometry concepts.
Summaries of the Fourth Annunl Airborne Gcoscicnce Workshop, TIMS Workshop, Jet Propulsion Laboratory,
Pasadena, CA, 2, 1114.
Burges, J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition.
Data Mining and
Knowledge Discovery, 2(2), 121167. http://dx.doi.org/10.1023/A:1009715923555
226
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012
Chang, C. I. (2003). Hyperspectral Imaging: Techniques for spectral detection and classification. Kluwer
Academic New York.
Chang, C. I., Zhao, X., Althouse, M. L. G., & Pan, J. J. (1998). Least squares subspace projection approach to mixed pixel classification for hyperspectral images.
IEEE Trans. Geoscience Remote Sensing, 36(3),
898912. http://dx.doi.org/10.1109/36.673681
Coifman, R. R., &. Lafon, S. (2006). Diffusion Maps. Applied and Computational Harmonic Analysis, 21(1), 530. http://dx.doi.org/10.1016/j.acha.2006.04.006
Common, P. (1994). Independent component analysis: A new concept. Signal Processing, 36, 287314. http://dx.doi.org/10.1016/01651684(94)900299
Craig, M. D. (1994). Minimumvolume transforms for remotely sensed data.
IEEE Trans. Geoscience Remote
Sensing, 99109.
Cristianini, N., & ShaweTaylor, J. (2000). Support Vector Machines and other kernelbased learning methods.
Cambridge University Press.
Dobigeon, N., Moussaoui, S., Coulon, M., Tourneret, J. Y., & Hero, A. O. (2009). Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery. IEEE Trans. Signal Processing, 57(11),
43554368. http://dx.doi.org/10.1109/TSP.2009.2025797
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. John Wiley & Sons Inc.
Harsanyi, J. C., & Chang, C. I. (1994). Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach.
IEEE Trans. Geosciencce Remote Sensing, 32(4), 779785. http://dx.doi.org/10.1109/36.298007
Hyvarinen, A., Karhunen, J., & Oja, E. (2001).
Independent Component Analysis. John Wiley & Sons Inc1.
Ifarraguerri, A., & Chang, C. I. (1999). Multispectral and hyperspectral image analysis with convex cones. IEEE
Trans. Geoscience Remote Sensing, 37, 756770. http://dx.doi.org/10.1109/36.752192
Manolakis, D., & Shaw, G. (2002). Detection algorithms for hyperspectral imaging applications.
IEEE Signal
Processing Magazine, 19(1), 2943. http://dx.doi.org/10.1109/79.974724
Manolakis, D., Marden, D., & Shaw, G. (2001). Hyperspectral image processing for automatic target detection applications. Lincoln Lab Journal, 14(1), 79114.
Moussaoui, S., Carteretb, C., Briea, D., & MohammadDjafaric, A. (2006). Bayesian analysis of spectral mixture data using markov chain monte carlo methods.
Chemometrics and Intelligent Laboratory Systems, 81(2),
137148. http://dx.doi.org/10.1016/j.chemolab.2005.11.004
Nascimento, M. P., & BioucasDias, M. (2005a). Does independent component analysis play a role in unmixing hyperspectral data? IEEE Trans. Geoscience Remote Sensing, 43(1), 175187. http://dx.doi.org/10.1109/TGRS.2004.839806
Nascimento, M. P., & BioucasDias, M. (2005b). Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Trans. Geoscience Remote Sensing, 43(4), 898910. http://dx.doi.org/10.1109/TGRS.2005.844293
Ren, H., & Chang, C. I. (2003). Automatic Spectral Target Recognition in Hyperspectral imagery,
IEEE Trans, on
Aerospace and electronic Systems, 39(4), 12321249. http://dx.doi.org/10.1109/TAES.2003.1261124
Settle, J. J. (1996). On the relationship between spectral unmixing and subspace projection.
IEEE Trans.
Geoscience Remote Sensing, 34, 10451046. http://dx.doi.org/10.1109/36.508422
SPECIM camera. (2006). Retrieved from http://www.specim.fi/
Winter, M. E. (1999). Nfindr: An algorithm for fast autonomous spectral endmember determination in hyperspectral data. Proceeding SPIE Conf. Imaging Spectrometry V, SPIE, 266275.
Appendix: Diffusion Maps
Diffusion Maps (DM) Coifman, R. R., &. Lafon, S. (2006) analyzes a dataset M by exploring the geometry of the manifold M from which it is sampled. It is based on defining an isotropic kernel
K
whose elements
227
www.ccsenet.org/esr Earth Science Research Vol. 1, No. 2; 2012 are defined by
e
,
is a metaparameter of the algorithm. This kernel represents the affinities between data points in the manifold. The kernel can be viewed as a construction of a weighted graph over the dataset M . The data points in M are the vertices and the weights of the edges are defined by the kernel K .
The degree of each data point (i.e., vertex)
x M
in this graph is
. Normalization of the kernel by this degree produces an
n n
row stochastic transition matrix P whose elements are
,
,
/ ( ), ,
, which defines a Markov process (i.e., a diffusion process) over the data points in
M . A symmetric conjugate
P
of the transition operator
P defines the diffusion affinities between data points by
1
, ,
.
DM embeds the manifold into an Euclidean space whose dimensionality is usually significantly lower than the original dimensionality. This embedding is a result from the spectral analysis of the diffusion affinity kernel
P
.
The eigenvalues 1
0
1
of
P
and their corresponding eigenvectors desired map, which embeds each data point
x M
into the data point
i
0 are used to construct the
for a sufficiently small
, which is the dimension of the embedded space. depends on the decay of the spectrum
P
.
228
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Related manuals
advertisement