Sparse Representations for Radar with MATLAB

Sparse Representations for Radar with MATLAB
M
M
Mor
Morgan
gan&
Cl
Claypool
aypool Publishers
Publishers
&
C
&
&C
Sparse
Sparse Representations
Representations
for
for Radar
Radar with
with MATLAB®
MATLAB®
Examples
Examples
Peter
Peter Knee
Knee
SSYNTHESIS
YNTHESIS L
LECTURES
ECTURES ON
ON
A
ALGORITHMS
LGORITHMS AND
AND S
SOFTWARE
OFTWARE IN
IN E
ENGINEERING
NGINEERING
Andreas
Andreas
AndreasSpanias,
Spanias,
Spanias,Series
Series
SeriesEditor
Editor
Editor
Sparse Representations for Radar
with MATLAB Examples
®
Synthesis Lectures on
Algorithms and Software in
Engineering
Editor
Andreas Spanias, Arizona State University
Sparse Representations for Radar with MATLAB® Examples
Peter Knee
2012
Analysis of the MPEG-1 Layer III (MP3) Algorithm Using MATLAB
Jayaraman J. Thiagarajan and Andreas Spanias
2011
Theory and Applications of Gaussian Quadrature Methods
Narayan Kovvali
2011
Algorithms and Software for Predictive and Perceptual Modeling of Speech
Venkatraman Atti
2011
Adaptive High-Resolution Sensor Waveform Design for Tracking
Ioannis Kyriakides, Darryl Morrell, and Antonia Papandreou-Suppappola
2010
MATLAB® Software for the Code Excited Linear Prediction Algorithm: The Federal
Standard-1016
Karthikeyan N. Ramamurthy and Andreas S. Spanias
2010
OFDM Systems for Wireless Communications
Adarsh B. Narasimhamurthy, Mahesh K. Banavar, and Cihan Tepedelenliouglu
2010
iii
Advances in Modern Blind Signal Separation Algorithms: Theory and Applications
Kostas Kokkinakis and Philipos C. Loizou
2010
Advances in Waveform-Agile Sensing for Tracking
Sandeep Prasad Sira, Antonia Papandreou-Suppappola, and Darryl Morrell
2008
Despeckle Filtering Algorithms and Software for Ultrasound Imaging
Christos P. Loizou and Constantinos S. Pattichis
2008
Copyright © 2012 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.
Sparse Representations for Radar with MATLAB® Examples
Peter Knee
www.morganclaypool.com
ISBN: 9781627050340
paperback
ISBN: 9781627050357
ebook
DOI 10.2200/S00445ED1V01Y201208ASE010
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON ALGORITHMS AND SOFTWARE IN ENGINEERING
Lecture #10
Series Editor: Andreas Spanias, Arizona State University
Series ISSN
Synthesis Lectures on Algorithms and Software in Engineering
Print 1938-1727 Electronic 1938-1735
Sparse Representations for Radar
with MATLAB Examples
®
Peter Knee
Sandia National Laboratories, Albuquerque, New Mexico
SYNTHESIS LECTURES ON ALGORITHMS AND SOFTWARE IN
ENGINEERING #10
M
&C
Morgan
& cLaypool publishers
ABSTRACT
Although the field of sparse representations is relatively new, research activities in academic and
industrial research labs are already producing encouraging results. The sparse signal or parameter
model motivated several researchers and practitioners to explore high complexity/wide bandwidth
applications such as Digital TV, MRI processing, and certain defense applications. The potential
signal processing advancements in this area may influence radar technologies. This book presents
the basic mathematical concepts along with a number of useful MATLAB® examples to emphasize
the practical implementations both inside and outside the radar field.
KEYWORDS
radar, sparse representations, compressive sensing, MATLAB®
vii
Contents
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1
2
3
Radar Systems: A Signal Processing Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1
History of Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2
Current Radar Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3
Basic Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Introduction to Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1
Signal Coding Using Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2
Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3
Sparse Recovery Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Greedy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Non-Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Image Reconstruction from Fourier Sampling . . . . . . . . . . . . . . . . . . . . . . . 17
Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1
Linear Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Principal Component Analysis (PCA) and Multidimensional Scaling
(MDS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Linear Discriminant Analysis (LDA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2
Nonlinear Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 ISOMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Local Linear Embedding (LLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Linear Model Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Random Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
25
26
27
30
viii
4
Radar Signal Processing Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1
4.2
4.3
4.4
5
33
36
39
42
Sparse Representations in Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1
5.2
5.3
5.4
A
Elements of a Pulsed Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Range and Angular Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Echo Signal Detection and Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Angle-Doppler-Range Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Image Registration (Matching) and Change Detection for SAR . . . . . . . . . . . . . .
Automatic Target Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Sparse Representation for Target Classification . . . . . . . . . . . . . . . . . . . . . .
5.4.2 Sparse Representation-Based Spatial Pyramids . . . . . . . . . . . . . . . . . . . . . .
45
47
49
52
53
54
Code Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.1
A.2
A.3
Non-Uniform Sampling and Signal Reconstruction Code . . . . . . . . . . . . . . . . . . . 57
Long-Shepp Phantom Test Image Reconstruction Code . . . . . . . . . . . . . . . . . . . . 59
Signal Bandwidth Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
ix
List of Symbols
The following definitions and relations between symbols are used throughout this text unless specifically noted.
∗
x(t)
x[t]
x
x̂
x̂ n
X
X(i, j )
X†
R
Re{x}
Rn×m
∅
0
1
(·)T
I
∥·∥n
PD
PFA
PM
PCC
!CR
!R
Tp
τb
λ
B
Convolution operator
Continuous variable
Discrete variable
Vector variable
Estimate for vector variable
Estimate for vector variable at iteration
Matrix variable
Element from row, column of matrix
Pseudo-inverse of matrix
Set of real numbers
Real portion of complex variable
Euclidian space for real matrices of size
Null set
Vector of zeros
Vector of ones
Transposition
Identity matrix
ln -vector norm
Probability of detection
Probability of false alarm
Probability of miss
Probability of correct classification
Cross range resolution
Range resolution
Pulse repetition period
Pulse width
Wavelength
Bandwidth
xi
List of Acronyms
The following acronyms are used throughout this text.
ADC
ATR
CFAR
CW
DSP
DTM
EM
FFT
FT
IFSAR
IRE
ISOMAP
LDA
LLC
LLE
LP
LTSA
MDL
MDS
MIMO
MP
MTI
MVU
OMP
PCA
PPI
RP
SAR
SFR
SIFT
SNR
TV
Analalog-to-Digital Converter
Automatic Target Recognition
Constant False Alarm Rate
Continuous Wave
Digital Signal Processing
Digital Terrain Map
Expectation Maximization
Fast Fourier Transform
Fourier Transform
Interferometric Synthetic Aperture Radar
Instititute of Radio Engineers
Isometric Mapping
Linear Discriminant Analysis
Local Linear Coordination
Local Linear Embedding
Linear Programming
Local Tangent Space Alignment
Minimum Description Length
Multi-Dimensional Scaling
Multi-Input Multi-Output
Matching Pursuit
Moving Target Indication
Maximum Variance Unfolding
Orthogonal Matching Pursuit
Principal Component Analysis
Plan-Position Indicator
Random Projection
Synthetic Aperture Radar
Step Frequency Radar
Scale-Invariant Feature Transform
Signal-to-Noise Ratio
Total Variation
xiii
Acknowledgments
Portions of this work were supported by Raytheon Missile Systems (Tucson) through the NSF ASU
SenSIP/Net-Centric Center and I/UCRC. Special thanks to Jayaraman Thiagarajan, Karthikeyan
Ramamurthy, and Tom Taylor of Arizona State University, as well as Visar Berisha, Nitesh Shah
and Al Coit of Raytheon Missile Systems for their support of this research.
Peter Knee
September 2012
1
CHAPTER
1
Radar Systems: A Signal
Processing Perspective
In the late 1800’s Heinrich Hertz demonstrated a remarkable phenomenon; radio waves are deflected
and refracted in much the same manner as light waves [1]. This demonstration proved the validity
of Maxwell’s electromagnetic theory and laid the groundwork for the development of the modern
radar systems. In fact, the term “radar” has become so predominant in our vernacular that its original
development as an acronym for “radio detection and ranging” has given way to its use as a standard
English noun [2].
Advances in radar systems and the identification of new applications are driven by enhancements in digital signal processing algorithms. Consider the basic block diagram for a conventional
pulsed radar system with a superheterodyne receiver in Figure 1.1. Analog-to-digital (A/D) converters that must sample on the order of tens or hundreds of megahertz and the large amounts of
data that must be processed necessitate the need for advanced signal processing algorithms. For
this reason, radar engineers pay particular attention to advancements in the field of digital signal
and image processing. Typically, DSP techniques have concentrated on improving a single aspect
of a radar system such as range resolution or Doppler estimation. The objective of this book is to
highlight the potential of sparse representations in radar signal processing.
Early work in sparse representations by mathematicians and engineers concentrated on the
potential for finding approximately sparse solutions to underdetermined systems of linear equations.
The recent guarantees for exact solutions computed using convex optimization techniques continue
to generate interest within academia. Sparse representations have several applications across several
disciplines [3].
1.1
HISTORY OF RADAR
The demonstration of the reflection of radio waves by Hertz in 1886 paved the way for the German
engineer Christian Hülsmeyer [4] to develop the first monostatic pulsed radar for collision avoidance
in maritime vessels. Surprisingly, radar systems did not receive attention again until 1922 when S.G.
Marconi [5] emphasized its importance to the Institute of Radio Engineers (IRE). Independently,
work also began at that time at the U.S. Naval Research Laboratory on a system that utilized widely
separated transmitters and receivers for ship detection. Nearly ten years later, this work received a
patent as the first bistatic (separate receiver and transmitter) continuous wave (CW) radar system.
2
1. RADAR SYSTEMS: A SIGNAL PROCESSING PERSPECTIVE
Figure 1.1: Block diagram of a conventional pulsed monostatic radar, with the potential impacts of
sparse representations highlighted in gray.
Numerous other reports of ship detections using similar systems occurred throughout the 1920s,
however this early technology never saw widespread commercial utilization.
The bistatic CW radar was actually a result of the accidental detection of an aircraft as it
passed between the transmitter and receiver of a radio system [2]. This setup was burdensome and
required the target to pass between the transmitter and receiver of a radio system. The emergence of
long-range, high-altitude military aircraft after World War I ignited the development of colocated
transmitter and receiver radar systems utilizing pulsed waveforms. Work by A. Hoyt Taylor and Leo
C. Young began in the U.S. at the U.S. Naval Research Laboratory in 1934 on single site, pulsed radar
systems [4]. Similar systems, operating in the 100–200 MHz range, were developed throughout the
world. It was not until after the end of World War II, with the release of the microwave power tube,
that radar systems began to operate at the high-end microwave frequencies used today.
The end of the war did not signal the end of development. In addition to continuous enhancements for military radar capabilities, modern commercial systems continue to see expansive
development. Signal processing techniques including pulse compression, synthetic aperture radar
imaging, and phased array antennas have provided tremendous capabilities both in the commercial
and military domains. The advancements across the radar industry continue to this day, with military applications still driving research and development. Going forward, advances in digital signal
processing technology will be required for the future state-of-the-art radar systems.
1.2. CURRENT RADAR APPLICATIONS
1.2
CURRENT RADAR APPLICATIONS
Most scientists and practitioners are well aware of the capabilities of radar systems to detect and track
targets, likely stemming from their familiarity with displays such as the plan-position indicator (PPI)
display shown in Figure 1.2. The 2-D display shows scatterer reflection magnitudes with range and
azimuth displayed in polar coordinates. These displays, along with its numerous variants, were vital
in presenting accurate information to a user for manual extraction of target speed and direction. The
development of enhanced digital signal and data processing techniques has allowed for the automatic
extraction of this information, relegating the user to a bystander. Perhaps a less-known function of
radar however is that of target scene imaging. The development of both 2-D and 3-D images of an
area allows for analysis for a variety of purposes, including surveillance and intelligence, topological
mapping, and Earth resource monitoring. The benefit of radar imaging lies in its ability to operate
effectively in adverse weather conditions, including rain, fog and cloud cover. This capability is a
result of the use of signal frequencies that reduce atmospheric attenuation of the transmitted waves.
Additionally, radar imaging systems have the capability to operate at night when optical imaging
systems cannot operate at all.
Figure 1.2: Example of a PPI (plan position indicator) display. (From Middleton and Mair, Radar
c National Research Council of Canada, http://www.ieee.ca/
Technical Overview. copyright ⃝
millennium/radar/radar_technical.html.)
Some of the more popular radar technologies include [4]:
• meteorological radars;
3
4
1. RADAR SYSTEMS: A SIGNAL PROCESSING PERSPECTIVE
• military surveillance, including target detection, recognition and tracking;
• interferometric SAR, for 3-D scene images;
• ground-penetrating radar;
• ballistic missile defense;
• air-traffic control;
• environmental remote sensing;
• law-enforcement and highway safety; and
• planetary exploration.
Figure 1.3 gives a slight indication of the diversity in commercial and military radar systems.
Many readers may be familiar with the systems found at local airports but these images show the
vast differences in currently available radar applications.
1.3
BASIC ORGANIZATION
This book will present a practical approach to the inclusion of the sparse representation framework
into the field of radar. The relevant mathematical framework for the theory of sparse representations
will be discussed but emphasis will be placed on providing examples using the included MATLAB®
scripts. The ability to immediately manipulate operating parameters is intended not only to facilitate
learning but also provide the opportunity to quickly integrate emerging signal processing techniques
into radar applications.
Chapter 2 presents the basics in sparse representation theory. Emphasis is again not focused
on the mathematics but on the application of the theory. Simple signal processing examples are
included to demonstrate the utility of the sparse representation framework. A discussion on reduced
representations for radar would not be complete without at least a tertiary discussion on the assumption of sparse representations for radar signals. To this end, Chapter 3 presents a brief review
on common dimensionality reduction techniques. Examples of reduced dimensional representations
for radar, specifically in the imaging domain, are included. The final two chapters of this text serve
to introduce radar signal processing fundamentals and the radar applications in which sparse representations have already been explored. We also take the opportunity to present our current work in
the area of automatic target recognition (ATR).
It should be noted that compressed sensing is a recent branch of sparse representation theory
that is quickly attracting a lot of attention. The exploitation of signal sparsity allows for sampling
that is very effective. As the work on this topic alone is immense, the theory will not be considered in
this book. However, seeing as it is a branch of sparse representations, the applications in the field of
radar will be presented to illustrate the effect sparse representations have had on radar technologies.
1.3. BASIC ORGANIZATION
(a)
(b)
(c)
(d)
Figure 1.3: (a) airport surveillance radar for vicinity surrounding airport, from the Federal Aviation
Administration, http://en.wikipedia.org/wiki/File:ASR-9_Radar_Antenna.jpg; (b) speed
c Applided Concepts, Inc. http://www.stalkerradar.com/
monitoring radar gun, copyright ⃝
law_basic.shtml; (c) ground-penetrating radar used to locate buried cables, pipes, power lines, or
c SPX Corporation, http://www.radiodetection.com/; and (d) phased
other objects, copyright ⃝
array radar that allows for scanning without radar motion, from the U.S. Army Corps of Engineers,
http://en.wikipedia.org/wiki/Radar.
As the two are intertwined, we hope that the use of sparse representation and compressed sensing
interchangeably does not cause any confusion.
5
7
CHAPTER
2
Introduction to Sparse
Representations
Be it physical or theoretical, solutions to complex problems can often be determined by following
the age-old mantra: “Work smart, not hard.” Scientifically, this principle is referred to as Occam’s
razor, or the law of parsimony. This guiding principle generally recommends selecting the hypothesis
that makes the fewest new assumptions when two competing hypotheses are equivalent in all other
regards. An interesting example would be the consideration of the orbit of the planets; do they
revolve around the Earth or around the sun? Both are possible but the explanation for the planets
revolving around the sun requires far less complexity and is thus preferred.
For decision-making tasks, the application of Occam’s razor has led to the development of the
principle of minimum description length (MDL) [6]. MDL is used as a heuristic, or general rule, to
guide designers towards the model that provides the most concise system representation. For highdimensional signals or systems, this implies that small or even single subsets of features that accurately
describe their complexity are preferred.The process of extracting meaningful lower-dimensional data
from high-dimensional representations occurs naturally within the human visual system. In a single
instant, the human brain is confronted with 106 optic nerve fiber inputs, from which it must extract
a meaningful lower-dimensional representation [7]. Theoretically, scientists encounter this same
problem for high-dimensional data sets such as 2-D imagery. Numerous compression algorithms,
such as JPEG 2000 image coding standard, utilize wavelet coefficients to extract salient features
from the high-dimensional data, negating the need to retain all the image pixels. Image storage and
transmission are easier using lower-dimensional representations. Across the application spectrum,
the ability to develop a low-dimensional understanding of high-dimensional data sets can drastically
improve system performance.
Throughout the signal processing community, there has been large-scale interest in estimating
lower-dimensional signal representations with respect to a dictionary of base signal elements. When
considering an overcomplete dictionary, the process of computing sparse linear representations has
seen a surge of interest, particularly in the areas of classification [8] and signal acquisition and
reconstruction [9, 10]. This interest continues to grow based on recent results reported in the sparse
representation literature: If a signal is sufficiently sparse, the optimal signal representation can be
formulated and solved as a general convex optimization problem [10]. The specifics of these two
technical arguments will be discussed in more detail in Section 2.1.
8
2. INTRODUCTION TO SPARSE REPRESENTATIONS
As mentioned, compressive sensing is a research area associated with sparse representations
tailored to signal acquisition and multimodal sensing applications. Compressive sensing [11] was
designed to facilitate the process of transform coding, a typically lossy data compression technique.
The common procedure, typical in applications such as JPEG compression and MPEG encoding, is
to: (1) acquire the desired N-length signal x at or above the Nyquist sampling rate; (2) calculate the
transform coefficients; and (3) retain and encode the K largest or the K most perceptually meaningful coefficients and their locations. By directly acquiring a compressed signal representation, which
essentially bypasses steps (1) and (2), compressive sensing has been shown to be capable of significantly reducing the signal acquisition overhead. More importantly, the theoretical developments
from the area of sparse representations have provided the ability to accurately reconstruct the signal
using basic optimization processes. Compressive sensing without the ability to reconstruct the signal
is merely a means of dimensionality reduction.
While early signs of sparse representation theory appeared in the early 1990s, this field is still
relatively young. This chapter presents the basic mathematics along with intuitive and simple examples in an effort to provide the background needed to understand the utility of sparse representations.
As such, the core concepts presented in this chapter provide the basics of signal sparsity required
to cover the radar topics presented later in this book. For more detailed discussions on both the
theory and applications of sparse representations, we encourage the reader to consult the numerous
references provided throughout the chapter.
2.1
SIGNAL CODING USING SPARSE REPRESENTATIONS
The basic aim in finding a sparse representation for a signal is to determine a linear combination
of elementary elements that are able to adequately (according to some metric) represent the signal.
Consider a set of unit-norm column vector elements, [d 1 , . . . , d N ], stacked into a matrix D ∈
RM×N , known as an N-element dictionary. The linear combination of all elements in the dictionary
can be written as
(2.1)
y = x1 d 1 + . . . + xN d N ,
where xn are scalar coefficients. In matrix notation this is equivalent to
y = Dx 0 ∈ RM ,
(2.2)
where x 0 is a coefficient vector whose entries are the scalar coefficients of (2.1). A sparse representation for the signal y indicates that the number of non-zero coefficients in the representation vector
x 0 is less than M. Typical situations result in the percentage of non-zero coefficients being between
0 and 30% with algorithm breakdowns occurring with up to 70% non-zeros [8].
For M > N , the system of equations described by y = Dx 0 contains more equations than
unknowns. When a non-trivial solution does exist for this overdetermined system of equations, the
solution can be approximated using the method of least squares [12]. Referred to also as the method
of frames [13], this is equivalent to finding a solution that minimizes the Euclidian distance between
2.1. SIGNAL CODING USING SPARSE REPRESENTATIONS
the true and reconstructed signals, i.e.,
x̂0 = min ∥y − Dx0 ∥22 .
x0
(2.3)
The approximation for x0 can be found using the pseudo-inverse of the D † = (D T D)−1 D T dictionary matrix, i.e., x 0 = D † y. More often, however, we are concerned with the case where we have
more unknowns than equations (i.e., M < N ). The solution to (2.3) typically contains an infinite
number of solutions. Using the Euclidian norm as the representation metric, the minimum norm
solution can be found by minimizing the length of the coefficient vector, i.e.,
x̂ 0 = min ∥x0 ∥2 subject to y = Dx 0 .
x0
(2.4)
The pseudo-inverse D † = D T (DD T )−1 again provides the minimum norm solution. Unfortunately, whether the system is over or underdetermined, the solution x̂ 0 is typically not informative,
especially in the sparse representation framework, as it contains a large number of non-zero elements.
The relatively large percentage of non-zero elements is a result of the minimum energy constraint,
which tends to prefer numerous smaller elements to a few larger elements. For a geometric interpretation of this fact, see Section 2.2.
Instead, we can explicitly seek a sparse solution to y = Dx 0 by formulating the linear system
as an l0 -norm minimization problem
x̂ 0 = min ∥x0 ∥0 subject to y = Dx 0 ,
x0
(2.5)
where ∥·∥0 is referred to as the l0 -norm. The coefficient vector x̂ 0 that contains the fewest number
of non-zero elements, also known as the sparsest vector, is now the preferred solution. Unfortunately, this minimum weight solution [14] is NP-hard and has been shown to be difficult to even
approximate [15].
Despite the difficulty in computing the minimum weight solution, sparse representations
continue to generate a lot of interest. The pursuit of a tractable, minimum weight solution has been
made possible due to one important result: if the solution is sparse enough, the solution to the
l0 -minimization of (2.5) can be unique and equal to the solution of the convex l1 -minimization
problem [10]
x̂ 1 = min ∥x0 ∥1 subject to y = Dx 0 ,
(2.6)
x0
where ∥·∥ is the l1 -norm given by the sum of the magnitudes of all vector elements. Various approaches exist for finding solutions to (2.6) and will be the subject of Section 2.3. The formulation
of the sparse representation problem in (2.6) as a convex optimization problem allows for the efficient (polynomial time) calculation of the solution using interior-point programming methods [16].
Additionally, emerging greedy pursuit methods have shown to be extremely versatile in providing
approximate solutions to the non-convex l0 -minimization problem.
9
10
2. INTRODUCTION TO SPARSE REPRESENTATIONS
One noticeable assumption at this point is that the signal y can be expressed exactly as a
sparse superposition of the dictionary elements. In practice, noise may make this a strong assumption
although it can be accounted for by assuming that y = Dx 0 + z, where z ∈ RM is small, possibly
dense noise. The l1 -minimization problem of (2.6) becomes a convex optimization problem of the
form
(2.7)
x̂ 1 = min ∥x0 ∥1 subject to ∥Dx 0 − y∥ ≤ ∈ .
x0
2.2
GEOMETRIC INTERPRETATION
The distributive nature of the l2 -norm in developing a sparse representation is important and will
be addressed here. A simple geometric interpretation will clearly show that as we move from l2
regularization towards l0 , we promote sparser and more informative solutions. We will consider
lp -“norms” for p < 1 although the formal “norm” is not defined since the triangle inequality is no
longer satisfied.
To see this “promotion” of sparsity, consider the following generic problem:
p
min ∥x∥p subject to y = Dx .
x
(2.8)
The solution set for an underdetermined linear system of equations y = Dx ∈ Rn is a subspace of
Rn . If we consider a particular solution, x 0 , the feasible set is a linear combination of x 0 and any
vector from the null-space of A. Geometrically, this solution set appears as a hyperplane of dimension
Rn−m embedded in Rn space [3].
%lpNorm.m
x = -1.25:.01:1.25;
m = -.66; b = 1;
yl = m*x+b;
p = [.4 .9 1 1.5 2];
figure;
for ii = 1:length(p)
[Np ind] = min((abs(yl).^p(ii)+abs(x).^p(ii)).^(1/p(ii)));
xp = linspace(-Np,Np,1000);
yp = (Np^p(ii)-abs(xp).^p(ii)).^(1/p(ii));
end
subplot(1,length(p),ii);
plot(xp,yp,'-k',xp,-yp,'-k');hold on;
plot(x,yl,'-r'); xlim([-1.25 1.25]);ylim([-1.5 1.5]);axis square;
title(sprintf('P = %.1f',p(ii)));xlabel('x_1');ylabel('x_2');
Program 2.1: Computation of lp -norm balls.
2.3. SPARSE RECOVERY ALGORITHMS
11
R2×3
For the purposes of illustration, we will consider a 2-D example, in which D ∈
so that
the solution set is a line in the two-dimensional subspace. The solution to (2.8) is then found by
“blowing” an lp -ball centered around the origin until it touches the hyperplane solution set. This
methodology is demonstrated using script Program 2.1. Results for values ranging from p = .4 to
p = 2 are shown in Figure 2.1. The norm-ball for p ≤ 1 contains sharp corners whose points lie on
the coordinate axes. It is the solutions at the corners of the norm-ball where 1 or more coefficients
are zero that promote sparsity in a representation. Conversely, the intersection of the solution set
with the lp -norm balls for p > 1 occurs off a cardinal axis. The solution thus contains all non-zero
coefficient values. Stated more explicitly, the intersection of the solution subspace with a norm ball
of p ≤ 1 is expected to occur on the cardinal axes, forcing coefficient values to be zero in the sparse
solution.
P=0.9
P=1.0
P=1.5
P=2.0
1
0
0
0
0
0
-1
-1
0
x
1
1
-1
-1
0
x
1
1
x
x
x
-1
2
1
2
1
2
1
2
1
x
x
2
P=0.4
-1
-1
0
x
1
1
-1
-1
0
x
1
1
-1
0
x
1
1
Figure 2.1: The intersection between the lp -ball and the feasible solution set for the linear system
y = Dx. Values of p ≤ 1 force a sparse solution that lie on the “corners” of the respective norm-balls,
whereas the solution for larger values is a non-sparse solution resulting in the most feasible point closest
to the origin.
2.3
SPARSE RECOVERY ALGORITHMS
Theoretically, it has been noted that it is possible, under certain circumstances, to recover the solution to the l0 -minimization problem using the convex l1 -formulation. The burden now falls on
algorithmic designs for the solution of the l1 -minimization problem in (2.6). Under the compressive sensing framework, this implies that it is possible to design non-adaptive measurements that
contain the information necessary to reconstruct virtually any signal. The recent advances in sparse
representations and compressive sensing have driven rapid development of sparse reconstruction
algorithms. The consideration of all the approaches is beyond the scope of this book, however, the
natural discrimination of the approaches into two basic methods allows for a brief introduction that
can provide the background needed for the examples given in the subsequent section.
12
2. INTRODUCTION TO SPARSE REPRESENTATIONS
2.3.1
CONVEX OPTIMIZATION
The highly-discontinuous l0 -norm makes the computation of the ideal sparse solution often not
tractable. Numerical approximations require that the l0 -norm be replaced by a continuous or even
smooth approximation. Examples of such functions include the replacement with lp -norms for p ∈
!
!
(0, 1] or smooth functions such as j log(1 + αj2 ) or j (1 + exp(−αxj2 )). The l1 -norm problem
specifically has been addressed using basic convex optimization techniques including conversion to
a linear programming problem that can be solved using modern interior point methods, simplex
methods, homotopy methods, or other techniques [3].
Recall the l1 -optimization problem
x̂ 1 = min ∥x0 ∥1 subject to ∥Dx 0 − y∥ ≤∈ .
x0
(2.9)
Suppose the unknown x 0 is replaced by x 0 = u − v where u, v ∈ Rn are both non-negative vectors
such that u takes all the positive entries in x 0 and v takes all the negative entries. Since we can
write that ∥x 0 ∥1 = 1T x 0 = 1T (u − v) = 1T z where 1 is a vector of ones and z = [uT , v T ]T , the
l1 -optimization problem can be written as
x̂ 1 = min 1T z subject to ∥[D, −D]z − y∥ ≤ ∈ and z ≥ 0
x0
(2.10)
which now has the classical form for a linear-programming (LP) problem. Simplex algorithms like
the Dantzig selector [17] solve LP problems by walking along a path on the edge of the polytope
solution region until an optimum is reached. Interior point methods, albeit slower, solve the problem
in polynomial time [16] and are another option when considering approximations for (2.10).
2.3.2
GREEDY APPROACH
Greedy algorithms are iterative approaches that seek to locate the dictionary atoms that best approximate the properties of a given signal. As the name overtly implies, the algorithm is designed to
be greedy at each stage by selecting the best possible approximation at the time, hoping to achieve
a satisfactory final solution in the end. There are a number of greedy methods, however, the most
popular continue to be the basic matching pursuit (MP) [18] and orthogonal matching pursuit
(OMP) [19] algorithms. MP continues to see throughput increases with very efficient O(N log M)
(per iteration) implementations that have led to practical large scale application usage [20]. Current
iterations of the OMP algorithm are much more computationally intensive, however, the algorithm
has shown superior approximation performance.
Consider the iterative approximation of the signal y using the dictionary D, where each
column vector d i will be referred to as an atom. The greedy algorithms approximate the signal in
iteration n as
ŷ n = D % n x % n ,
(2.11)
where the vector x is the sparse representation for y and % n denotes the set containing the indices of
the atoms selected up to and including iteration n. The approximation error or residual is calculated
2.3. SPARSE RECOVERY ALGORITHMS
13
as
r n = y − ŷ n .
(2.12)
The residual at each iteration is used to determine the new atoms that will be selected in subsequent
iterations to achieve a better approximation. While the selection strategy for the new atom does
differ slightly across greedy algorithms, MP and OMP choose the index for the selected atom at
iteration n as
i n = arg max |D T r n−1 | .
(2.13)
i
MP and OMP fall into a class of greedy algorithms referred to as directional updates [20], all of which
share a common algorithm structure, differing only in the method in which the update direction at
each iteration is computed. As given in [20], the algorithm can be summarized as follows.
Algorithm 2.2: Greedy directional update algorithm.
As mentioned, the choice of the directional update at step (4) in the iterative loop determines
the type of greedy algorithm. The original matching pursuit algorithm chose a directional update
equivalent to p% n = ∈i n where ∈k is the Dirac basis RN [20]. This process is a special case of a
technique called projection pursuit, common throughout the statistics literature [21].The asymptotic
convergence is based on the orthogonality of the residual to the previously selected dictionary atom.
Orthogonal matching pursuit improves performance by guaranteeing projection onto the span of the
dictionary elements in no more than N steps by ensuring full backward orthogonality of the error at
each iteration. The directional update for the OMP algorithm is achieved by setting p % n = D † r n−1 .
14
2. INTRODUCTION TO SPARSE REPRESENTATIONS
function [A]=OMPnorm(D,X,L,eps)
%OMPNORM
Sparse coding of a group of signals based on a
%
dictionary and specified number of atoms
%
% [USAGE]
%
A = OMPnorm(D,X,L,DEBUG)
%
% [INPUTS]
%
D
M x K overcomplete dictionary
%
X
Data vector or matrix. Must be of size M x P, where P is
%
number of signals to code
%
L
Maximum number of coefficients for each signal
%
eps
Stopping criterion (stop when l2 norm of residual is less
%
than eps)
%
% [OUTPUTS]
%
A
Sparse coefficient matrix of size K x P
%
P=size(X,2);
[M K]=size(D);
%Ensure that the dictionary elements have been normalized
Dnorm = D./repmat(sqrt(sum(D.^2)),[M 1]);
A = zeros(K,P);
for k=1:1:P,
x
= X(:,k);
residual
= x;
indx
= zeros(L,1);
resids
= zeros(K+1,1);
for j = 1:1:L
proj
= Dnorm'*residual;
[maxVal,pos]
= max(abs(proj));
pos
= pos(1);
indx(j)
= pos;
a
= pinv(D(:,indx(1:j)))*x;
residual
= x-D(:,indx(1:j))*a;
resids(j+1)
= sum(residual.^2);
end
%Break if error has reach sufficient value
if sum(residual.^2) < eps
break;
end
end;
A(indx(1:j),k)=a;
Program 2.3: Orthogonal matching pursuit function.
2.4. EXAMPLES
15
The simplicity of the MATLAB® implementation for the OMP algorithm is shown in script
Program 2.3. Unfortunately, while implementation is simple, computational complexity can be quite
high for the OMP algorithm. A recent study [22] detailing the processing and memory requirements
for the MP and OMP algorithms shows the dramatic increase in required resources for the OMP
algorithm. The improvement in approximation estimation is only seen when the ratio of non-zero
elements to the number of observations increases above 20%. Regardless, both algorithms have
been shown extensively to provide adequate sparse approximations while significantly reducing the
theoretical performance gap that exists between the greedy approaches and their linear programming
counterparts [23].
2.4
EXAMPLES
This section is devoted to demonstrating that the theory of sparse representations and the development of increasingly efficient and accurate l1 -minimization solvers could have ramifications in
nearly every aspect of everyday life. We present two examples: one detailing the development of a
non-uniform sampling theorem [24] for a 1-D signal that reduces the required sampling rate, and
the other, an image reconstruction technique that has proven to be advantageous for fields ranging
from medicine to radar.
2.4.1
NON-UNIFORM SAMPLING
Consider a generic 1-D analog signal x(t) and its corresponding discrete-time representation x[n].
N samples of the analog signal are computed by setting t = nTs , where n = 0, . . . , N − 1 and Ts is
known as the sampling period, or the spatial time in between samples. Shorthand notation throughout the book will omit the discrete-time index n and emphasize the 1-D nature of the discrete signal
by using the vector notation x. The development of the compressive sensing framework arose due
to the desire to circumvent the well-known Shannon-Nyquist sampling criterion, an increasinglylimiting design requirement as signal processing demands continue to grow. The theorem states
that a signal strictly bandlimited to a bandwidth of B rad/s can be uniquely represented using sampled values that are spaced at uniform intervals no more than π/B seconds apart. Noting that the
sampling period Ts = 1/fs , where fs is referred to the ordinary sampling frequency (measured in
Hertz) and is related to the angular frequency ω (measured in radians per second) by ω = 2πf , the
sampling criterion requires that the signal be sampled at a rate greater than twice the maximum
frequency component of the signal (ωmax = B → fs > Bπ = 2πfπmax = 2fmax ).
The sum of three sinusoids at varying frequencies will be used to display the utility of the
sparse representation framework. Our goal is to demonstrate the ability to sample at a rate much less
than 2fmax while retaining the ability to accurately reconstruct the signal. Consider the discrete-time
signal:
x = x[n] = sin(2π 1000nTs ) + sin(2π 2000nTs ) + sin(2π 4000nTs ) .
(2.14)
16
2. INTRODUCTION TO SPARSE REPRESENTATIONS
As shown in the upper-left plot of Figure 2.2, the signal is clearly" not sparse in the time
∞
domain. Recalling the definition of the Fourier transform (FT), X(ω) = −∞ x(t)e−j ωt , and the
linearity of the transform, it is easily verifiable that the frequency domain signal consists of 6 sharp
peaks at ± the principal frequencies of each sinusoid. More importantly, the sparse signal f can
be obtained by projecting the signal onto a set of sinusoidal basis functions stacked row-wise into
a linear transformation matrix ! (i.e., f = !x). The matrix ! is known as the discrete Fourier
transform (DFT) matrix and can be generated in MATLAB® using either the dftmtx function
or fft(eye(N)). To capture the underlying properties of the signal, we must sample at M nonuniform locations. Randomly sampling the signal x is equivalent to multiplication by a sampling
matrix S ∈ RM×N , where each row of the matrix contains a single unit element in the column
corresponding to the i th sample of the signal x to be extracted. In keeping with the compressive
sensing convention, we thus have
x̂ =
1
S!T f ,
N
(2.15)
where the (·)T refers to the matrix transposition. The reconstructed signal x̂ can be determined by
finding the sparse representation for f according to the l1 -minimization technique presented in
(2.6).
For the purposes of this example, we have sampled the signal x in (2.14) at a rate of 10 kHz
for a duration of .1 s (N = 1028 samples). It is assumed that a large number of non-zero elements
exist in the sparse coefficient vector f , so it is expected that many of the coefficients returned by
the OMP algorithm will be very near zero. Additionally, we have randomly selected approximately
15% of the available signal samples for reconstruction. This is in accordance with the requirements
for the OMP algorithm described in [20]. The script used to generate the results is included in
Appendix A.1.
Figure 2.2 details the results with the time domain signals in the left column and the frequency
domain signals (coefficients) on the right. The figure at the top left is the fully sampled N-length
signal with the randomly selected M samples shown using blue circles. The frequency domain signal
verifies the presence of the three sinusoids. The second row of figures indicates the time-domain
reconstruction results (left) using the frequency domain coefficients (right) computed using the
minimum norm solution for our underdetermined linear system (fˆ = (S!T )† x = !S T (SS T )−1 x).
The distribution of energy across all the coefficients using this naïve solution is evident. The third
row shows the performance of the sparse representation framework in reconstructing the signal
and estimating the Fourier coefficients. The reconstruction is not exact (due to the estimation of
M/4 non-zero coefficient values) but the iterative nature of the OMP algorithm combined with the
reduced sampling rate in generating an accurate reconstruction are very promising.
It is important to emphasize the utility of this approach. Rather than acquiring all samples
of the signal, transforming and coding the coefficients for storage, a reduced number of random
samples can be extracted and stored directly. The nearly 85% reduction in the Nyquist sampling rate
will prove to be very advantageous in a society focused on technological improvements. For further
2.4. EXAMPLES
Sum of sinusoids
600
|x(f)|
x(t)
5
0
-5
0
0.005
0.01
Time (s)
0.015
x 12 (t)
|x12 (t)|
0
0.005
0.01
Time (s)
0.015
0
2000
4000
6000
Frequency (Hz)
8000
10000
0
2000
4000
6000
Frequency (Hz)
8000
10000
0
2000
4000
6000
Frequency (Hz)
8000
10000
50
0
0.02
600
|x11 (t)|
5
x 11 (t)
200
100
0
0
-5
400
0
0.02
5
-5
17
0
0.005
0.01
Time (s)
0.015
0.02
400
200
0
Figure 2.2: Top: (left) original time-domain signals with random samples in blue and (right) frequency
domain coefficients. Middle: (left) Naïve reconstruction solution with time-domain signal and (right)
transform coefficients on the right. Bottom: (left) l1 -minimization reconstruction computed OMP norm
and (right) Fourier coefficients of reconstructed signal. All plots were generated using the script in
Appendix A.1.
implementation possibilities, we urge the reader to consider using the l1 -MAGIC package [25] to
compute the sparse Fourier solutions using the basis pursuit methods described in Section 2.3.1. An
alternative and very useful example is presented using the discrete cosine transform [26].
2.4.2
IMAGE RECONSTRUCTION FROM FOURIER SAMPLING
The utility of the sparse representation framework and the improvements to existing signal processing
algorithms becomes evident when we consider the classical tomography problem: reconstruct a 2-D
image f (t1 , t2 ) from samples fˆ|( of its discrete Fourier transform on the domain (.This problem has
presented itself across numerous disciplines, including star-shaped domains in medical imaging [27]
and synthetic aperture radar (SAR) [28], in which the demodulated SAR return signals from the
moving antenna are approximately samples of 1-D Fourier transforms of projections of the scene
reflectivity function. In this case, these projections are polar-grid samples of the 2-D FT of the scene.
Figure 2.3(b) illustrates a typical case of a high-resolution imaging scheme that collects samples of
the Fourier transform along radial lines at a relatively few number of angles (256 samples along each
of 22 radial lines) for the Logan-Shepp phantom test image of Figure 2.3(a).
18
2. INTRODUCTION TO SPARSE REPRESENTATIONS
Extensive work both in the medical imaging and radar image processing communities have
focused on reconstruction of an object from polar frequency samples using filtered back-projection
algorithms [9].The caveat of the algorithm is the assumption that all unobserved Fourier coefficients
are zero, in essence reconstructing an image of minimal energy using the observation constraints.
Figure 2.3(c) details the severe deficiencies of this approach, producing non-local artifacts that make
basic image processing techniques difficult. The desire to accurately interpolate the missing values
proves even more difficult due to the oscillatory nature of the Fourier transform.
(a)
(b)
(c)
(d)
Figure 2.3: Example of image reconstruction using the Logan-Shepp phantom image: (a) original
256×256 image; (b) 22 radial samples in Fourier domain; (c) minimum-energy reconstruction; and
(d) total variation reconstruction (nearly exact replica).
In their ground-breaking work on compressive sensing, Candes et al. [29], however, propose
the use of convex optimization under the assumption that the gradient is sparse [29]. Letting xi,j
denote a pixel in the i th row and j th column of an n × n image X, the total variation of the image
can be defined as the sum of the magnitudes of the discrete gradient at every point:
#
$
%2 $
%2
xi+1,j − xi,j + xi,j +1 − xi,j .
TV(x) =
(2.16)
2.4. EXAMPLES
19
Similar to the way we defined the sparse representation reconstruction in terms of a tractable l1 minimization problem, the problem of image recovery can be recast as a second order cone problem
(SOCP),
min TV(x) subject to y = Dx
(2.17)
where again D is the measurement matrix (a.k.a dictionary) and y is a small set of linear measurements. The SOCP convex optimization problem can be solved with great efficiency using interior
point methods. Figure 2.3(d) shows the exact replica of the test image obtained using the convex
optimization solvers contained in the l1 -Magic [25] package. The package additionally contains a
demo script to compute the reconstructed image, of which we have modified to generate the images
in Figure 2.3 and included in Appendix A.2.
21
CHAPTER
3
Dimensionality Reduction
Dimensionality reduction refers to methods in which high-dimensional data is transformed into
more compact lower-dimensional representations. These techniques have been proven to be particularly useful in signal compression, discovering the underlying structure of the high-dimensional
data and in providing the necessary understanding of patterns for data analysis and visualization.
Ideally, the compact representation has a dimensionality that corresponds to the intrinsic dimensionality of the data, i.e., the dimensionality corresponding to the minimum number of free parameters
of the data. In developing an in-depth and detailed understanding of the underlying structure of the
data, methods to facilitate classification, visualization, and compression of high-dimensional data
can be developed.
The problems encountered with dimensionality reduction have also been addressed using a
technique known as manifold learning. When considering data points in a high-dimensional space, it
is often expected that these data points will lie on or close to a low-dimensional non-linear manifold
[30]. The discovery of the underlying structure of the manifold from the data points, assumed to
be noisy samples from the manifold, is a very challenging unsupervised learning problem. The nonlinear techniques developed attempt to preserve either the global or local properties of the manifold
in the low-dimensional embedding. Consistent across all techniques however is the dependence on
noise mitigation, needed to correct for or restrict data point outliers that are sufficiently outside the
manifold.
The traditional linear dimensionality reduction methods, such as principal component analysis
(PCA) [31, 32] and linear discriminant analysis (LDA) [33], have given way to non-linear techniques
that have been shown to be more successful in discovering the underlying structure for highly nonlinear data sets, e.g., the Swiss data roll. While the linear techniques were unsuccessful in determining
the true embedding dimension for artificial data sets, they have been shown to outperform the nonlinear techniques for real-world data sets [34].The reasons for the poor performance of the non-linear
algorithms include difficulties in model parameter selection, local minima, and manifold overfitting.
Going forward, there seems to be increased interest in generating non-linear models that account
for the geometry of the manifold using linear models [35].
In this chapter, a brief review of the popular dimensionality reduction/manifold learning
methods will be reviewed. For the purposes of illustration, suppose we have an n × D matrix X
consisting of n datavectors x i each of dimensionality D. Assume that the intrinsic dimensionality of
the data is known to be d << D. The dimensionality reduction techniques discussed will transform
the dataset X into a reduced form Y of dimensionality n × d, while retaining the geometry of
the embedded manifold as much as possible. Neither the geometry of the original or embedded
22
3. DIMENSIONALITY REDUCTION
manifolds is known, nor is the intrinsic dimensionality of the datasets. Therefore, the problem of
dimensionality reduction is typically an ill-posed problem that can only be solved by assuming some
properties of the data [34].
The techniques for dimensionality reduction can roughly be classified into two groups: linear
and non-linear. The linear techniques discussed in Section 3.1 assume the data lie on or near a linear
subspace of the high-dimensional space. The non-linear techniques discussed in Section 3.2 do not
make such an assumption and can thus form more complex embeddings of the high-dimensional
data. Section 3.3 will present an emerging dimensionality reduction technique known as random
projections that have become vital in compressive sensing. The use of the techniques is presented
along with the discussion on sparse representations to show the correspondence between the two
approaches. Examples of the lower-dimensional representations for radar imagery are presented in
the final section as motivation for the use of sparse representations in radar image classification to
be presented in Chapter 5.
3.1
LINEAR DIMENSIONALITY REDUCTION TECHNIQUES
The two well-known techniques for linear dimensionality reduction are principal component analysis
(PCA) and linear discriminant analysis (LDA). As mentioned, both techniques assume a linear
subspace of the high-dimensional data input space and as such, perform well when the underlying
manifold is a linear or affine subspace of the input space. Note that a vector in the D-dimensional
input space will be denoted by x i and its low-dimensional counterpart by y i , where the subscript i
denotes the i th row of the input data matrix X.
3.1.1
PRINCIPAL COMPONENT ANALYSIS (PCA) AND
MULTIDIMENSIONAL SCALING (MDS)
PCA is an extremely common and useful technique often used not only for dimensionality reduction
but data visualization and feature extraction. Bishop [36] describes two different definitions for PCA:
(1) maximization of the variance of the projected data using an orthogonal projection onto a lowerdimensional subspace (the principal subspace) and (2) minimization of the average projection cost,
defined as the mean squared distance between the data points and their projections [37].
Mathematically, if we define the matrix S to be the covariance matrix for the data set X, PCA
involves finding the d eigenvectors (i.e., principal components) of the covariance matrix. In other
words, PCA solves the eigenproblem
SM = λM ,
(3.1)
where M is the sorted matrix of the d column eigenvectors corresponding to the d largest eigenvalues.
The low-dimensional data representations are then computed by mapping them onto the linear basis,
i.e., Y = XM.
The principal components can be determined in an incremental fashion to mitigate the large
computational costs required for eigendecomposition of large covariance matrices. Efficient tech-
3.1. LINEAR DIMENSIONALITY REDUCTION TECHNIQUES
23
niques such as the power method or EM algorithm can reduce PCA computation time by a factor of D
(the dimensionality of the data set) [36]. Additionally, iterative techniques such as simple PCA [38]
and probabilistic PCA [39] may be employed for approximation of the principal eigenvectors.
To see the usefulness of PCA, consider the data set and results shown in Figure 3.1 generated
using script Program 3.1. It is obvious that a linear trend exists between the random variable x and
the random variable y. Using the supplied script, the principal eigenvectors are computed using
the MATLAB® function princomp() and have been overlayed onto the data set for easy analysis.
Although not specifically shown, it can easily be verified using the ‘latent’ variable that the first
principal component returned in the ‘coeff’ structure is the first principal eigenvector and is
aligned with the linear trend of the data set. Similarly, the slope of this eigenvector is nearly identical
to that used to generate the line. Intuitively, to maximize the variance between the projected points,
we would like to see an eigenvector with the exact slope of our designed line so that the distance
between the projected points is maximized.
Figure 3.1: Principal component analysis for noisy points sampled from a line. Left: principal eigenvectors superimposed onto data set. Right: projection onto eigenvectors.
Similarly, multidimensional scaling (MDS) allows for efficient dimensionality reduction by
attempting to maintain the original pairwise distances between points in the data set after reduction.
Numerous stress functions exist for MDS which take into account the desire to maintain distances
between differing sets of points. The raw stress function (3.2) for example weights the projected
distance error the same for all data points.The Samson stress function however places more emphasis
on maintaining the smaller distances between points in the input space. All stress functions, however,
rely on simple and efficient techniques for minimization of the pairwise distance matrix, including
popular techniques such as eigendecomposition, the conjugate gradient method, and the pseudoNewton method [34]:
!
φ(Y ) = ij (∥x i − x j ∥ − ∥x i − y j ∥)2 .
(3.2)
24
3. DIMENSIONALITY REDUCTION
clear all; close all; clc
%Global variables
var = 1; slope = 2;
%Generate data set
x = linspace(0,10,100)';
y = slope*x + sqrt(var)*randn(length(x),1);
X = [x y];
%Normalize data (subtract mean across each dimension)
Y = X - repmat(mean(X),size(X,1),1);
%Compute principal components
[coeff,score,latent] = princomp(X);
%Plot the results and overlay the principal components
figure;
plot(Y(:,1),Y(:,2),'kx');hold on;
xlabel('x');ylabel('y');title('Principal Component Analysis');
%Determine the two endpoints for each line
m = coeff(2,:)./coeff(1,:);
%Slope of line
xhat = mean(x);
%Mean of x
yl = m'*[xhat -xhat];
plot([xhat xhat; -xhat -xhat], yl','k-');
%Project onto single principal component
Yproj = coeff'*Y';
figure;
plot(Yproj(1,:),Yproj(2,:),'kx');axis([-15 15 -xhat xhat]);
xlabel('x_p');ylabel('y_p');title('PCA Transformation’);
Program 3.1: PCA analysis.
3.1.2
LINEAR DISCRIMINANT ANALYSIS (LDA)
Originally developed in 1936 by R.A. Fisher, discriminant analysis was designed primarily as a
classification tool but has since been used in dimensionality reduction. The aim of the algorithm
was to find a linear mapping M that maximized the linear class separability in the low-dimensional
3.2. NONLINEAR DIMENSIONALITY REDUCTION TECHNIQUES
25
representation of the data. In other words, the basic idea was to maximize a function that gives a
large separation between projected class means while also giving a small variance within each class
to consequently minimize class overlap [36].
As with all classification architectures, perfect class separability is typically not feasible. The
goal of Fisher’s algorithm was to optimize the ratio of the between-class variance to within-class
variance. Defining the between-class variance as S b and the within-class variance as S w , the optimization is found using a linear mapping that maximizes the Fisher criterion:
φ(M) =
M T SbM
.
M T SwM
(3.3)
For the purposes of dimensionality reduction, the low-dimensional data representation for
the datapoints X are computed by mapping them onto the linear basis M. It is important to note,
however, that unlike PCA, LDA is a supervised learning algorithm and requires the knowledge of
the number of classes. For this reason, as a dimensionality reduction tool, LDA is not as common
as its linear cousin, PCA. For MATLAB® implementations of the LDA algorithm, refer to the
documentation for the ‘classify’ function.
3.2
NONLINEAR DIMENSIONALITY REDUCTION
TECHNIQUES
The linear techniques discussed in the preceding section are established and well-understood techniques for dimensionality reduction. These techniques are able to discover the true structure of
high-dimensional data if it lies on or near a linear subspace of the input space. Consider, however,
a highly non-linear case, such as data sampled from the “Swiss Roll” shown in Figure 3.2 and
generated using the algorithm in Program 3.2. Linear techniques effectively use Euclidian distance
between points to determine embedding. Unfortunately, points lying far apart on the non-linear,
lower-dimensional manifold may be much closer in the higher-dimensional input space, as measured
by their straight-line Euclidian distance.
Non-linear techniques attempt to mitigate the issues encountered when using linear dimensionality reduction algorithms on non-linear manifolds.These techniques continue to see an increase
in development as additional avenues for research are explored, including studies in human vision,
speech, and motor control [7]. Technique variations are numerous but Maaten [34] classifies nonlinear dimensionality reduction techniques into three broad categories: (1) global preservation of
original data; (2) local preservation of original data; and (3) global alignment of a number of linear models. Recent research [40] has also focused on methods that attempt to mitigate the effect
of noise which can drastically change the manifold structure. Known as neighborhood smoothing
embedding, the method can be used as a preprocessing technique to improve the performance of
nonlinear dimensionality reduction techniques. For the purposes of brevity and demonstration, only
a single technique from each category will be discussed while references for similar techniques will
be given.
26
3. DIMENSIONALITY REDUCTION
15
10
10
5
5
0
0
-5
-5
-10
-10
100
100
50
0
-5
0
5
10
50
0
-5
0
5
10
Figure 3.2: “Swiss roll” manifold with noisy version on right (N = 1000, ht = 100, var = 2).
function data = swissRoll(N, ht, var)
tt0 = (3*pi/2)*(1+2*linspace(0,1,sqrt(N)));
hh = linspace(0,1,sqrt(N))*ht;
xx = (tt0.*cos(tt0))'*ones(size(hh));
yy = ones(size(tt0))'*hh;
zz = sqrt(var)*rand(size(yy))+((tt0.*sin(tt0))'*ones(size(hh)));
cc = tt0'*ones(size(hh));
surf(xx,yy,zz,cc);
axis tight;
data = [xx(:) yy(:) zz(:) cc(:)];
Program 3.2: “Swiss roll” data generation.
3.2.1
ISOMAP
The ability of the human nervous system to extract relevant features from over 30,000 auditory or
106 optic nerve fibers is an excellent demonstration in dimensionality reduction. Consider pictures of
a human face observed under different pose and lighting conditions. The high-dimensional images
lie on an intrinsically three-dimensional manifold that can be parameterized using just two pose
variations and a single lighting angle [7].
While the linear techniques are guaranteed to recover the true structure of the data given that
it lies on or near a linear subspace, ISOMAP provides improvements by allowing for the flexibility
to learn the intrinsic geometry of nonlinear manifolds. Additionally, ISOMAP retains the advantageous algorithmic features of the linear techniques, such as implementation ability, computational
tractability, and the convergence guarantees of the linear learning methods.
3.2. NONLINEAR DIMENSIONALITY REDUCTION TECHNIQUES
27
The basic tenet of the algorithm is the use of geodesic distances, or shortest path distances
along the manifold, rather than Euclidian distances. As mentioned with the Swiss roll example, these
can be markedly different from Euclidian distances for non-linear manifolds. The difficulty in the
algorithm lies in estimating the geodesic distance given only input-space distances. For neighboring
points, the geodesic distance can be estimated using input-space distance. For far away points, this can
be approximated by adding up a sequence of “short hops” between neighboring points [7]. Spectral
graph theory allows for the efficient computation of these distances by computing the shortest paths
in a graph with edges connecting neighboring data points.
The weighted graph G is computed by identifying neighboring points (those within a fixed
radius ϵ or its K nearest neighbors) based on input-space distance. Edge weights between neighboring points are determined by the input-space distance. The geodesic distances are then estimated
by computing the shortest path distances in the graph G from one point to another. Classical MDS
can be used to construct an embedding of the data in d-dimensional Euclidian space.
ISOMAP is a non-iterative, polynomial time procedure that retains the global characteristics
of the manifold, i.e., the geodesic distances between points on the manifold. As with PCA or MDS,
the algorithm allows for the discovery of the true dimensionality of the data set by estimating the
“elbow” in the error decrease as the embedding dimension is increased. Additionally, in the limit
of infinite data, the geodesic distance approximations become arbitrarily accurate, guaranteeing
asymptotic convergence.
Similar to ISOMAP, maximum variance unfolding (MVU) defines a neighborhood graph on
the data and retains the pairwise distances [41]. By maximizing the Euclidian distance between the
datapoints in the embedding, while retaining the distances in the neighborhood graph, MVU can
efficiently “unfold” the manifold, again without altering the local geometry, using basic semidefinite
programming techniques. Other methods that retain the local manifold structure include diffusion
maps [42], and kernel PCA [43], which is an extension of the original linear PCA algorithm
using kernel functions. Reformulation of PCA in kernel space (essentially the inner-product of the
datapoints in high-dimensional space) allows for the construction of nonlinear mappings.
3.2.2
LOCAL LINEAR EMBEDDING (LLE)
Local linear embedding is an unsupervised learning algorithm that preserves neighborhood relationships for high-dimensional inputs. Unlike clustering methods however, LLE maps its inputs
into a single global coordinate system of lower dimensionality and its optimizations do not involve
local minima [44]. The algorithm is seen as an improvement over techniques like multidimensional
scaling (MDS) [45] and the previously mentioned ISOMAP that attempt to preserve the pairwise
distances or geodesic distances between all points in the data set. In doing so, LLE becomes less
sensitive to short-circuiting caused by the estimation of Euclidian or geodesic distances between
widely separated points.
The basic assumption of the LLE algorithm is that the data, consisting of N real-valued
vectors x i , i = 1, . . . , N, each of dimensionality D, are sampled from some underlying manifold.
28
3. DIMENSIONALITY REDUCTION
Each data point and its neighbors are assumed to lie on or close to a locally linear patch of the
manifold. The local properties of the manifold can then be estimated by treating each data point as
a linear combination of its nearest neighbors, essentially fitting a hyperplane to each data point and
its neighbors [34].
The reconstruction weights for each data point are computed using the K nearest neighbors for
that data point while requiring that the sum of the weights (rows of the weight matrix W ) sum to one,
!
i.e., j wij = 1. When formulated as such, the solution for the reconstruction weights is found by
solving a least squares problem [44]. Moreover, since the reconstruction weights reflect the intrinsic
properties of that data and are invariant to translation, rotation, and scaling transformations, the
reconstruction weights for each data point in D dimensions also reconstructs its embedded manifold
coordinates in d dimensions. Each high-dimensional observation x i is mapped to a low-dimensional
vector y i by choosing the d-dimensional coordinates ŷ i that minimize the embedding cost function
!
!
(3.4)
+(Y ) = i |ŷ i − j wij ŷ j |2 .
The coordinates of the low-dimensional representations y i that minimize this cost function
can be found by computing the eigenvectors corresponding to the smallest d non-zero eigenvalues
of the matrix I − W , where I is the N × N identity matrix.
Sample results generated using script Program 3.4 and the function provided in script Program 3.3 for the Swiss roll dataset are shown in Figure 3.3. Using an embedding dimension of
2 and 12 nearest neighbors, the intrinsic 2-D manifold for the 2000 point data set can be fairly
well estimated. As with ISOMAP, the true embedding dimension can be estimated by locating the
dimensionality d in which no appreciable variance increase is noticed.
2
1.5
15
1
10
0.5
5
0
0
-5
-0.5
-10
-1
20
15
10
5
0
-5
0
5
10
-1.5
-2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Figure 3.3: 2-D embedding of Swiss roll dataset using LLE.
Additional techniques that maintain local properties of the data include Laplacian eigenmaps [46], wherein cost function weights are inversely proportional to the distance between neighbors in the input space, Hessian LLE [47] which minimizes the “curviness” of the high-dimensional
manifold in the low-dimensional embedding and local tangent space alignment (LTSA) which at-
3.2. NONLINEAR DIMENSIONALITY REDUCTION TECHNIQUES
function Y = lle(X,K,d)
[D,N] = size(X);
% Compute pairwise distance
X2 = sum(X.^2,1);
dist= repmat(X2,N,1)+repmat(X2',1,N)-2*(X'*X);
% Compute neighbors
[sorted,index] = sort(dist);
nhbd = index(2:(1+K),:);
%Sort along columns
%Ignore index to itself
% Compute reconstruction weights
if(K>D)
%Regularization will be used for ill fits
tol=1e-3;
else
tol=0;
end
W = zeros(K,N);
for ii=1:N
z = X(:,nhbd(:,ii))-repmat(X(:,ii),1,K);
% shift ith pt to origin
C = z'*z;
% local covariance
C = C + eye(K,K)*tol*trace(C);
% regularlization (K>D)
W(:,ii) = C\ones(K,1);
W(:,ii) = W(:,ii)/sum(W(:,ii));
% solve Cw=1
% enforce sum(w)=1
end;
% Compute embedding
M = sparse(1:N,1:N,ones(1,N),N,N,4*K*N);
for ii=1:N
w = W(:,ii);
jj = nhbd(:,ii);
M(ii,jj) = M(ii,jj) - w';
M(jj,ii) = M(jj,ii) - w;
M(jj,jj) = M(jj,jj) + w*w';
end;
% Make symmetric
% Calculate embedding
options.disp = 0; options.isreal = 1; options.issym = 1;
[Y,eigenvals] = eigs(M,d+1,0,options);
Y = Y(:,2:d+1)'*sqrt(N);
data = [xx(:) yy(:) zz(:) cc(:)];
Program 3.3: LLE algorithm.
29
30
3. DIMENSIONALITY REDUCTION
tempts to align the linear mappings from the low- and high-dimensional spaces to a local tangent
space [48].
LLE generation code
X = swissRoll(2000,21,10);
Y = lle(X(:,1:3)',12,2);
figure;scatter(Y(1,:),Y(2,:),12,X(:,4),'filled')
Program 3.4: LLE for Swiss roll dataset code.
3.2.3
LINEAR MODEL ALIGNMENT
The final approach to non-linear dimensionality reduction involves the use of global alignment for a
number of linear models. These methods have been designed to address the shortcomings of global
and local data preservation techniques. These methods include local linear coordination (LLC)
[49] and manifold charting [50] among others. These approaches have been successfully applied in
manifold analysis for facial recognition and handwritten digits.
LLC is employed by first computing a mixture of factor analyzers or probabilistic PCA
components using the expectation-maximization (EM) algorithm. The local linear models are used
to develop independent representations to which each datapoint has an associated responsibility. It is
shown in [49], that using the same approach adopted in LLE, incorporating the identity matrix and
the defined weight matrix W , that the alignment of the linear models can be performed by solving
a generalized eigenproblem. The eigenvectors found are a linear mapping from the responsibility
weighted representation to the low-dimensional data representation.
3.3
RANDOM PROJECTIONS
High-dimensional data sets continue to emerge with new applications and with computer capabilities
on the increase. These data sets, namely text and images, can be represented as points in a highdimensional space. The dimensionality often imposes limitations on conventional data processing
methods such as those discussed in the preceding sections. The statistically optimal (at least in a
mean-squared error sense) reduction is accomplished using PCA. Capturing as much of the variation
in the data as possible comes at an increasingly high computational cost; the cost of determining the
projection onto the lower-dimensional orthogonal subspace scales as the cubic of the dimension of
the data set [51].
Random projection (RP) operators project high-dimensional data onto a lower-dimensional
subspace using a purely random matrix. The Johnson-Lindenstrauss lemma provides the basis for
the use of random projections for dimensionality reduction: a set of n points in a high-dimensional
Euclidian space can be mapped down into an O(log ϵn2 ) dimensional Euclidian space such that the
distance between any two points changes by only a factor of (1 ± ϵ) [52]. The importance of this
3.3. RANDOM PROJECTIONS
31
clear; clc; close all;
figure;
%Number of DCT coefficients to use
k = 2000;
im =
imageRead('C:\DATA\MSTAR\TARGETS\TRAIN\17_DEG\T72\SN_132\HB03814.015'
);
subplot(1,3,1);imagesc(db(im));colormap(gray);axis('image');axis off;
im = imresize(im,[64 64]);
[m n] = size(im);
X = dct2(im);
[vals inds] = sort(abs(X(:)),'descend');
Xhat = zeros(size(X));
Xhat(inds(1:k)) = X(inds(1:k));
imhat = idct2(Xhat);
subplot(1,3,2);imagesc(db(imhat));colormap(gray);axis('image');axis
off;
%Now use random projections
rp = randn(k,m*n); rp = rp./(repmat(sum(rp),k,1));
Xrp = rp*double(im(:));
irphat = rp'*Xrp; irphat = reshape(irphat,[m n]);
subplot(1,3,3);imagesc(db(irphat));colormap(gray);axis('image');axis
off;
Program 3.5: Image compression and reconstruction using DCT and RP.
result is that interpoint distances are preserved when projected onto a randomly selected subspace
of suitable dimension.
Mathematically, random projection of the original d-dimensional data X ∈ Rd×N to a kdimensional (k << d) subspace through the origin, using a random k × d matrix R is given by
X̂ = RX .
(3.5)
It should be noted that (3.5) is a linear mapping and not necessarily a projection as in general R
is not orthogonal and can introduce significant distortions. Orthogonalizing R is very expensive but
as noted in [53], for a high-dimensional space, there exists a much larger number of almost orthogonal
32
3. DIMENSIONALITY REDUCTION
than orthogonal directions. Thus, vectors generated randomly in high-dimensional spaces can be
considered approximately orthogonal and the mapping to be a projection.
Bingham and Mannila [53] present distance distortion performances for text and image data
using RP and other popular dimensionality reduction techniques. In addition to being significantly
faster than its conventional counterparts, RP proved to not distort data significantly more than
PCA. More importantly, RP proved adept at providing greater accuracies for significantly lower
dimensions.
It should be noted that using random projections as a dimensionality reduction technique
is primarily, although not exclusively, beneficial in applications where the distances of the original
high dimensional data points are meaningful. As an illustration, consider the alternative use for
random projections as an image compression technique, akin to the discrete cosine transform (DCT).
Bingham [53] showed the superior performance of RP over the DCT for dimensionality reduction
of text and data but Figure 3.4 clearly shows that the DCT is able to capture and reconstruct at
least some vital information necessary for interpretation of a radar image by the human eye. (Note:
Figure 3.4: Left: Original MSTAR [54] SAR dB domain image. Middle: image reconstruction after
using largest 2500 DCT components. Right: image reconstructions after random projection to a 2500dimensional subspace.
The pseudoinverse of the random projection matrix R needed for reconstruction is expensive to compute,
but since R is almost orthogonal, the transpose of R is a sufficient approximation.) Random projections
discard vital signal information at the expense of maintaining interpoint distances in the lower
dimensional subspaces. Distance preservation however becomes important for tasks such as training
neural networks using clustering or k nearest neighbors or techniques that rely heavily on interpoint
distances. As such, the use of random projections for dimensionality reduction should be approached
with care.
33
CHAPTER
4
Radar Signal Processing
Fundamentals
By radiating energy into space and analyzing the echo reflection signals, radar systems have the
ability to detect all types of objects as well as determine their distance and speed relative to the radar
system. The ability to do so is rooted in basic signal processing fundamentals.
Consider the block diagram for a pulsed monostatic radar shown in Figure 4.1. The electromagnetic signal transmitted by the antenna is reradiated by the target(s) in the scene back to the
radar. The received signal is then processed by both hardware and software modules to extract the
information that is useful to the system. For early radar systems, this included manual extraction
of target presence along with information about its range, angular location and relative velocity.
The range, or distance, to a target is proportional to the total propagation time for the transmitted
and reflected signal to return to the receiver. Angular location can be computed by considering the
angular direction of the receiver corresponding to the maximal amplitude of the return signal at the
instant a target is detected. Doppler frequency shifts in the echo signal are a result of the relative
motion between the radar platform and the target, indicating the targets radial velocity. These three
fundamental measurements have allowed for application extensions that early radar engineers may
never have imagined.
This chapter will provide the very basics in radar signal processing needed to understand
the emerging radar technologies that have incorporated both sparse representations and compressive sensing. For advanced details of radar signal processing techniques, consider the texts by both
Skolnik [4] and Richards [2].
4.1
ELEMENTS OF A PULSED RADAR
A sample block diagram for a conventional monostatic pulsed radar is shown in Figure 4.2. While
this layout is by no means unique, it does allow for the identification of areas within the system in
which sparse representations have the potential to be included. The location for the digitization of
the analog signal in current digital radar systems has been purposefully left out. Early digital systems
had embedded the A/D converter within the signal processing unit, but advancements have allowed
for digitization of the signal at the IF stage, moving the A/D converter closer to the radar front end.
Sparse representations have seen implementations both before and after the A/D conversion. The
highlighted system areas show this capability to operate either directly on an analog IF signal (signal
34
4. RADAR SIGNAL PROCESSING FUNDAMENTALS
Antenna
Transmitted Signal
Target
Transmitter
Receiver
Echo Signal
Range to Target
Figure 4.1: Basic principle of radar. (Based on Jakowatz, et al., Spotlight-Mode Synthetic Aperture Radar:
c 1996,
A Signal Processing Approach. New York: Springer Science + Business Media, 1996. Copyright ⃝
Springer Science + Business Media [28].)
processing) or on the converted and perhaps already extracted radar data itself (data processing).
Examples of each technique will be discussed in Chapter 5.
Figure 4.2: Block diagram for conventional pulsed monostatic radar.
There are essentially three key areas in a pulsed radar system. The transmitter and waveform
generator properties, such as transmission frequency, are vital in the tuning the sensitivity and range
4.1. ELEMENTS OF A PULSED RADAR
35
resolution of the radar. Most radars operate in the microwave frequency region from 200 MHz
to about 95 GHz. Transmission frequencies are selected based on the system requirements for
transmission power, atmospheric attenuation and antenna size. The transmitted signal is typically
a pulse-train (hence the name pulsed radar) and contains a series of shaped pulses modulating a
sinewave carrier, as shown in Figure 4.3.The pulse width τb = 1µs, pulse repetition period Tp = 1 ms
and the peak power is Pt = 1 MW represent values comparable to those seen in a medium-range
air-surveillance radar [4]. These systems can operate at peak powers up to 10 MW with average
powers below 10–20 kW. As with any system, there are important design trade-offs that must be
considered. Detection performance increases with transmission power but range resolution decreases
as the pulse length increases. This is discussed later in this chapter.
Figure 4.3: Typical pulse waveform values for a medium-range air-surveillance radar. The rectangular
pulses represent pulse-modulated sinewaves.
The second element of a pulsed radar system is the antenna itself. Angular resolution is a
function of the main lobe width for the signal transmitted by the antenna. Smaller beamwidths
require larger apertures or shorter transmission wavelengths. In addition, side lobe levels must be
kept at a minimum to reduce the effect echoes from nearby scatterers have on one another. Synthetic
aperture radar, presented in Section 4.3, overcomes these obstacles by synthesizing a larger antenna
aperture to improve angular resolution for the formation of a 2-D image.The third and final element,
the radar receiver located in the bottom portion of Figure 4.2, is responsible for the demodulation of
the return signal to baseband and information extraction for further manual or automated processing.
Typically, the received signal is split into two channels: the in-phase and quadrature phase channels
so as to remove any ambiguity in the received signals phase. Once the signal has been successfully
demodulated, signal processing algorithms like moving target indicators (MTIs) can be implemented
36
4. RADAR SIGNAL PROCESSING FUNDAMENTALS
to locate moving targets. Similarly, data processing algorithms like automatic target recognition
systems (ATRs) attempt to identify a target using a 2-D radar image.
4.2
RANGE AND ANGULAR RESOLUTION
Resolution refers to the ability of the radar to distinguish between targets. Range resolution specifically
is the ability to identify two or more targets on the same bearing but at different ranges (straight-line
distance from the radar). Similarly, angular resolution concerns target identification in both the azimuth (2nd ) and elevation (3rd ) dimensions. Resolution capabilities are controlled by the transmitted
waveform properties and antenna beamwidths, respectively. A simple, intuitive introduction to the
two concepts follows in this section. For more complete discussions, see Richards [2], Skolnik [4],
or Jakowatz [28].
To analyze range resolution, consider a standard pulse waveform element for the train in
Figure 4.3:
s(t) = b(t) cos(ω0 t) .
(4.1)
The envelope function b(t) (usually rectangular, hamming, raised cosine, etc.), with a duration
of τb seconds, modulates a carrier wave at a frequency ω0 . An example waveform generated using
the raised cosine envelope function b(t) = .5[1 + cos( 2πt
τb )] is shown in Figure 4.5(a). The time tR
it takes to receive the return signal for a target at range R is
tR =
2R
,
c
(4.2)
where c ≈ 3 × 108 m/s is the speed of light or the rate electromagnetic energy travels through free
space. Assuming a constant-frequency pulse is transmitted at time t = 0 for τb seconds, for two
targets at ranges R1 and R2 , the leading edge of the return pulse for each signal will be received at
times t1 and t2 respectively. For the constant-frequency pulse, the system will only be able to discern
between the two echo signals if there is no overlap in the return echoes. Thus, the range resolution
!R = R2 − R1 is determined by finding the minimum distance the targets must be separated to
prevent signal return overlap:
t2 − t1 = τb
2R1
2R2
−
= τb
c
c
2
2
(R2 − R1 ) = !R = τb
c
c
τb c
!R =
.
2
(4.3)
This point is illustrated in Figure 4.4. We can similarly discuss the range resolution from
the signal bandwidth perspective. For rectangular and shaped pulse waveforms the time-bandwidth
4.2. RANGE AND ANGULAR RESOLUTION
37
product is always equal to unity in cycle measure (τb B = 1) [28]. Recall that the bandwidth of the
transmitted pulse is inversely proportional to the pulse length. This implies that as the pulse length
increases, the bandwidth of the transmitted signal decreases as shown in Figure 4.5(c). Substituting
B = 1/τb into (4.3), we see that range resolution decreases as the bandwidth increases, i.e.,
!R =
c
.
2B
(4.4)
Figure 4.4: Geometry for describing range resolution. For constant frequency pulses, there must be no
overlap between the two return signals. (Based on Richards, Fundamentals of Radar Signal Processing.
c 2005, McGraw-Hill, [2].)
New York: McGraw-Hill, 2005. Copyright ⃝
Improved range resolution thus requires either a shorter pulse (reducing the average transmit
power) or increased bandwidth. Luckily, there are alternative ways to encode radar signals to increase
bandwidth to improve range resolution while not suffering from the low average power levels that
plague short continuous-wave (CW) bursts. The most common of these stretched or dispersed [28]
waveforms is the linear FM chirp
s(t) = Re{exp[j (ω0 t + αt 2 )]} .
(4.5)
The FM chirp contains a linearly increasing frequency component based on the so-called chirp
rate of 2α. As shown in Figure 4.5(b), the frequencies encoded by the chirp extend from ω0 − ατb
to ω0 + ατb , resulting in a much larger effective bandwidth of approximately
Bc =
ατb
.
π
(4.6)
The increase in bandwidth is obvious from the chirp frequency spectrum shown in red in
Figures 4.5(c) and 4.5(d). The FM chirp has the capability to transmit higher average power signals
using longer pulse lengths while maintaining the large bandwidths needed for high range resolution.
Known as pulse compression, the FM chirp produces range resolution properties of a pulse with a
duration that is equivalent to the inverse of its bandwidth [28].
38
4. RADAR SIGNAL PROCESSING FUNDAMENTALS
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-1
0
-1
0
0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
(a)
(a)
(b)
50
40
CW Pulse
Chirp
40
30
0
dB(Xf)
dB(Xf)
20
10
0
-20
-40
-60
-10
-80
-20
-30
250
CW Pulse
Chirp
20
300
350
400
450
500
f
(c)
(c)
550
600
650
700
750
-100
250
300
350
400
450
500
f
550
600
650
700
750
(d)
Figure 4.5: (a) Continuous wave pulse raised cosine envelope function. (b) Linear FM chirp waveform.
(c) Frequency spectrum for CW and linear FM signal using rectangular pulse. (d) Frequency spectrum
for CW and linear FM signal using raised cosine envelope. The script used to generate all figures is
contained in Appendix A.3.
As mentioned, angular resolution is the ability to distinguish between two targets at the
same range but at different azimuth or elevation angles. Scatterer echoes will be combined at the
receiver if the targets simultaneously lie in the main lobe of the illumination beam of the antenna.
While the discussion on antenna properties is outside the scope of this text, it is worth noting that
the estimation of the angular resolution is determined by locating the 3-dB beamwidth θ3 of the
antenna. The cross-range resolution, or the resolution in the dimension orthogonal to the range, can
be estimated as
& '
θ3
!CR = 2R sin
≈ Rθ3 ,
(4.7)
2
4.3. IMAGING
39
where the approximation holds when the 3-dB beamwidth is small, typical for pencil beam antennas [2]. This approximation is shown in Figure 4.6.
Figure 4.6: Angular resolution as a function of the 3-dB beamwidth of the antenna.
4.3
IMAGING
The idea of radar imaging brings to mind the 2-dimensional target indication screens known as
plan-position indicators shown previously in Chapter 1. Radar systems also have the capability
to produce high-resolution imagery, such as those created using a synthetic aperture radar (SAR)
imaging system. A SAR image of Washington, D.C. is shown in Figure 4.7. The image itself is
very similar to that of an optical image however some immediate differences can be noted. The
monochromatic nature of the image makes object differentiation and detail extraction difficult. The
image noise, or speckle, appears to significantly reduce the resolution of the photograph. These
difficulties may seem to hinder the use of radar imagery but with current resolutions that approach
that of optical imagery, radar images have the unique advantage of not only being able to be formed
at long distances, but also in inclement weather and under the cover of night.
Of the radar imaging technologies available, of particular importance in this book is SAR
imagery. The notion of range resolution was previously discussed and can be improved using various
signal processing techniques. The cross-range resolution, however, for a real aperture radar is determined by the width of the antenna beam. For even nominal stand-off distances, conventional radar
imaging systems would produce images with cross-range resolutions on the order of 100 meters,
far too coarse for real-world use. Synthetic aperture radars synthesize the larger antenna aperture
sizes needed to reduce the beamwidth by having the antenna move in relation to the area being
imaged. These airborne or space-based radars transmit pulses along a desired path and after proper
processing, produce extremely detailed imagery.
Consider the real-aperture radar system shown in Figure 4.8(a). For a moving imaging system
transmitting and processing a single pulse, the width of the cross-range beam on the ground, and
40
4. RADAR SIGNAL PROCESSING FUNDAMENTALS
Figure 4.7: 1-m resolution SAR image of Washington, D.C. (Image courtesy of Sandia National Labs,
http://www.sandia.gov/radar/imageryku.html.)
subsequently the cross-range resolution, is given by
!CR = βR =
Rλ
,
D
(4.8)
where λ is the transmitted signal wavelength, D is the diameter of the physical aperture, and β is
the angular beamwidth. The cross-range resolution is thus proportional to the range and inversely
proportional to the size of the aperture. For an X-band radar operating at 10 GHz from a stand-off
range of 10 km using a 1-m antenna, the cross-range resolution is only 300 m. Since decreasing the
stand-off range is undesirable for many military scenarios and wavelength selection is a function of
the desired electromagnetic properties for the transmitted signal, the only feasible way to increase
cross-range resolution is to increase the aperture size. Since imaging antennas are often mounted on
aircraft, the physical size is often limited to that of the imaging platforms payload capabilities.
The goal of the radar imaging system is to produce an estimate of the two-dimensional
reflectivity function, |g(x, y)|. If we consider transmitting multiple pulses, as shown in Figure 4.8(b),
all targets lying along the same constant-range contour will be received at the same instant. The
received signal thus cannot be related to any particular cross-range/range position (x, y). Instead,
the received signal is the integration of the reflectivity values from all targets that lie along an
approximately straight constant range line. Given that the transmitted signal is a linear FM chirp,
the received signal (after deramp processing) is of the form
rc (t) = C
(
L
−L
)(
L
−L
*
g(x, y)dx e−j φ dy ,
(4.9)
4.3. IMAGING
41
(b)
(a)
Figure 4.8: (a) Real aperture radar system. Angular beamwidth is determined by the ratio of the wavelength to the diameter of the physical aperture. (b) Synthetic aperture radar imaging concept. Multiple
pulses along the flight path are processed to improve cross-range resolution. (Based on Jakowatz, et al.,
Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach. New York: Springer Science +
c 1996, Springer Science + Business Media [28].)
Business Media, 1996. Copyright ⃝
where the inner integral indicates the estimate of the line integrals of g(x, y) taken along the crossrange direction. The radar system processes many echoes over some interval of viewing angles, !θ.
The corresponding deramped return at each interval is of the form
*
( L )( L
rc (t) = C
g(x̄ cos θ − ȳ sin θ, x̄ sin θ + ȳ cos θ )d x̄ e−j φ dy .
(4.10)
−L
−L
With the antenna ensured to point at the same ground patch center, at each viewing angle θ , information about the scene reflectivity is found along different line integrals. This range
averaging is a projection, at least in the tomographic sense, of the two-dimensional scene into a
one-dimensional function [2]. For the purposes of image construction, according to the projection slice theorem [55], the one-dimensional Fourier transform of the projection function pθ (ȳ) =
"L
−L g(x̄ cos θ − ȳ sin θ, x̄ sin θ + ȳ cos θ )d x̄ given by (4.10) is equal to the two-dimensional Fourier
transform of the image to be reconstructed evaluated along a line in the Fourier plane that lies at the
same angle θ. In other words, each slice provides a measurement of the two-dimensional Fourier
transform of the scene along the same angle. These slices can then be interpolated to build up a
complete Fourier transform of the scene reflectivity function.
42
4. RADAR SIGNAL PROCESSING FUNDAMENTALS
Additional processing is required to produce a useful image including the use of polar formatting to transform the data from a polar to a rectangular grid. As it turns out, the degree of cross-range
resolution in the reconstructions depends only on the transmission wavelength λ and on the size of
!θ. The ability to sweep out measurements to synthesize a larger aperture and improve cross-range
resolution has afforded SAR systems amazing capabilities that continue to see use in applications
similar to those for their optical counterparts.
4.4
DETECTION
Target detection was one of the primary goals when engineers considered the first radar system.
Detection refers to the process by which a decision is made as to whether the received signal is
the result of an echo from a target or simply represents the effects of interference. The complexity
of radar signals requires the use of statistical models due to the multitude of interference signals.
Comprehensive models have been developed both for interference and echoes from complex targets.
The processing of detection decisions then is a problem in statistical hypothesis testing. When testing
for the presence of a target, one of two hypotheses can be assumed to be true: the null hypothesis H0
which assumes the signal is the result of interference only and the second hypothesis H1 where the
signal consists of both interference and target echo returns. Once the two conditional probability
density functions
py (y|H0 ) = pdf of y given that a target was not present
py (y|H1 ) = pdf of y given that a target was present
have been defined, metrics such as the probability of detection, PD , probability of false alarm, PFA ,
and the probability of miss, PM = 1 − PD can be defined. The Neyman-Pearson criterion can then
be used to maximize the probability of detection under the constraint that the probability of false
alarm does not exceed a set value [2]. Using this technique, the method of threshold detection, where
a signal’s amplitude above a set threshold indicates target presence, can be used for detection.
Figure 4.9 illustrates threshold detection for a 1-D range signal return. This threshold is
determined based on a desired operating point, set for a specific false alarm or detection rate.There are
of course numerous details in implementing a threshold detector not discussed.Various designs utilize
the magnitude, squared-magnitude, or even log-magnitude of the complex signal samples. Constant
false alarm rate (CFAR) detectors serve to provide an estimate for the threshold when interference
statistics are not known. For a detailed discussion on detection theory with implementation examples,
see Kay [56].
4.4. DETECTION
Figure 4.9: Threshold detection for range signal trace.
43
45
CHAPTER
5
Sparse Representations in Radar
While military applications such as surveillance, navigation, and weapons guidance dominate the
current radar landscape and drive radar technology development, increasing civilian radar applications continue to rise. In addition to the technologies previously mentioned, radar has found use
in pedestrian and vehicular collision avoidance systems, environmental mapping and even in the
study of the movement of insects and birds. All these technologies rely on a continually evolving
signal processing methodology. As such, the potential for radar to impact new civilian and military
applications relies heavily on the use of innovative and efficient signal processing algorithms. Advanced algorithm architectures and new or improved radar technologies often go hand in hand. The
emergence of sparse representation theory in the field of radar systems is no different. This chapter
will serve as a direct example of the utility of the sparse representation framework throughout the
field of radar.
Each section of this chapter will focus on a recently developed radar application that uses
sparse representations and elements of compressive sensing in its signal processing architecture. The
ability of sparse representations to find accurate and concise signal models allows for improvements
in signal representation, compression and inference, as evidenced by this growing list of radar technologies. This of course is not an exhaustive list, as the number of available radar applications and
implementations of the sparse representation framework seems to grow daily. The varied application
techniques for sparse representations throughout the radar community should serve to display the
power of the approach and potentially spawn ideas for future radar applications.
5.1
ECHO SIGNAL DETECTION AND IMAGE FORMATION
As mentioned in Section 4.2, to simultaneously achieve the energy of a long pulse and the resolution
of a short pulse, pulse compression technology takes advantage of the fact that a long pulse can have
the same spectral bandwidth as a short pulse if modulated in frequency or phase [4]. Unfortunately,
the strain to process the ultra-wideband signal falls on the Analog-to-Digital Converters (ADCs),
whose hardware capabilities remain the largest limiting factor in radar system developments [57].
Both existing and advanced radar systems can benefit from the ability to lower the required sampling
rate. The reduced requirements for the ADC have the potential to not only improve hardware
reliability but reduce development cost.
Consider the standard linear filter model for a received radar signal [4]:
( ∞
y(t) =
x(t)s(t − τ )dτ + η(t) ,
(5.1)
−∞
46
5. SPARSE REPRESENTATIONS IN RADAR
where x(t) is the transmitted signal, s(t) is the radar reflectivity response for the target, and η(t) is
noise. The received signal y(t) is the convolution of the transmitted signal with the target impulse
response. When discretized using N samples this becomes [58]
y[n] =
N
+
k=1
x[k]s[n − k] + η[n] = Xs + η ,
(5.2)
where s, η ∈
transmission signal matrix X ∈ RN×N contains time-shifted versions of
the transmitted signal in each column. It is important to note that the matrix X is Toeplitz since
each descending diagonal from left to right is constant.
The objective of the radar system is to accurately recover s from y. A simple solution is to
estimate the signal that produces the least-squared error, i.e., compute s by minimizing the l2 -norm
of the estimation error. Recall, this is found by computing the pseudo-inverse of the transmission
signal matrix X. If it is known that the target scene is sparse, i.e., supp(s) = S << N , the least squares
solution will be uninformative as energy will be distributed among all coefficients in s. Recall that
the exact sparse solution can be recovered using convex optimization techniques, however, the large
sampling rates result in extremely large data vectors and matrices, making solution rates lengthy if
not impractical.
Compressive sensing [59] provides the capability of measuring y using M < N measurements
by projecting the signal onto a second set of basic vectors {ψ m }, m = 1, . . . , M by
RN×1 . The
z(m) = ⟨y, ψ Tm ⟩ ,
(5.3)
where ψ Tm denotes the transpose of ψ m and ⟨·, ·⟩ refers to the inner product. In matrix notation,
this is equivalent to
z = $y = $Xs + $η ,
(5.4)
where $ ∈ RM×N is the measurement matrix with each row corresponding to a basis vector $ m
and z is an M × 1 column vector. Compressive sensing relies on the near orthonormality of matrices
satisfying the restricted isometry property when operating on sparse vectors. Mathematically, matrix
A = $X ∈ RM×N is said to satisfy the restricted isometry property if for an integer s < p and every
submatrix As , there exists a constant δs for every vector u such that
(1 − δs )∥u∥2 ≤ ∥As u∥2 ≤ (1 + δs )∥u∥2 .
(5.5)
This basic principle states the embedding of the p-dimensional vector in a random mdimensional space does not severely distort the norm of the vector u.
The restricted isometry property holds for many pairs of bases including delta spikes and
Fourier sinusoids. Interestingly, for a random, noise-like matrix 0 (i.e., Gaussian random generated
via randn in MATLAB® ), with an overwhelmingly high probability, the matrix will be incoherent
with any fixed basis, thus satisfying the restricted isometry property [57]. For signal-sensing applications, including radar imaging, this random sampling allows for accurate data compression at a
rate much lower than Nyquist presented.
5.2. ANGLE-DOPPLER-RANGE ESTIMATION
47
For the received radar signal in (5.1) or (5.2), the sparse target reflectivity function s(t) or
s[n] can thus be estimated using the l1 -minimization problem and the random sensing matrix $ as
ŝ = min ∥s∥1
s
subject to ∥z − $Xs∥ ≤∈ .
(5.6)
The use of compressive sensing for both radar imaging and target echo signal detection provides the possibility of significantly lowering the number of samples/measurements needed according
to the Nyquist sampling criterion [60]. Unfortunately, practical implementations utilizing Gaussian
and sub-Gaussian random variables require the use of M correlation channels among the hardware, allowing for time-domain signal correlations. For even slightly large values of M, this becomes
infeasible.
Applications of compressive sensing for echo signal detection and image formation include
work by Baraniuk [57] who demonstrated the capability to eliminate the matched filter in the receiver, as well as reducing the required A/D conversion bandwidth. Sampling rate reductions have
additionally been shown for 1-D echo signal detection [61] and 2-D image formation [58, 62]. Shastry [58] proposed transmitting stochastic waveforms allowing for the reflected time-domain signal
to be sampled at a lower rate while still retaining the properties necessary for the signal reflectivity
to be accurately restored. Carin [62] demonstrated that with prior knowledge of the environment,
using a two-way Green’s function, the scattered fields can be inverted for image generation using
a relatively small number of compressive sensing measurements. Gao [61] proposed the use of a
waveform-matched dictionary, consisting of time-domain shifted versions of the transmitted signal,
for 1-D target echo signal detection. The return signal clearly has a sparse representation using
the waveform-matched basis allowing for a simplistic approach that can reduce the sampling rate
while providing adequate target signal detection among clutter and noise. Chi [63] analyzed the
performance degradations as a result of mismatches in the assumed basis for sparsity and the actual
sparsity basis. The special care that must be taken to account for the mismatch was shown to be
particularly important for the problem of image inversion common to radar and sonar systems.
5.2
ANGLE-DOPPLER-RANGE ESTIMATION
Early work with antenna arrays focused on directive radiation patterns. Known primarily as phased
array radar systems, the relative phase of a signal to be transmitted was varied for each antenna,
allowing for both reinforcement and suppression of the radiation pattern in certain directions. More
recently, considerable attention has been paid to the exploitation of independent transmission and
reception of signals at antenna arrays [64]. Whereas beamforming presumes a high-correlation
between signals either transmitted or received by an array, multi-input multi-output (MIMO) radar
systems utilizing separate signals at each antenna have been shown to have the ability to improve
target resolution for both widely separated [65] and co-located antennas [66].
The introduction of a distributed MIMO radar system in 2008 [67] utilized a small scale
network of randomly located nodes linked by a fusion center. Under the assumption that target
presence was sparse in the angle-Doppler space, a compressive sensing based MIMO radar system
48
5. SPARSE REPRESENTATIONS IN RADAR
was developed to extract target angle and Doppler (velocity) information. The result was not only superior resolution over a conventional pulsed radar system but the acquisition of target characteristics
using far fewer samples than required by Nyquist.
Subsequent work has focused on improving target resolution, including range estimates,
through the use of step-frequency radar (SFR). Recalling from Section 4.2 that the range resolution for a pulsed radar system is inversely proportional to the bandwidth of the transmitted pulse,
wideband signals required for enhanced range resolution suffer not only from low signal-to-noise
ratio (SNR) but carry additional computational burden due to the need for high-speed ADCs and
processors [68]. SFR systems transmit several narrowband pulses at different frequencies. The frequency remains constant during each pulse resulting in a narrow instantaneous bandwidth. The
range in transmitted frequencies allows for a large effective bandwidth, enhancing range resolution.
Decoupled schemes using two separate pulse trains, one with a constant carrier frequency and the
other that varies, have been introduced to reduce the complexity of jointly estimating angle, Doppler
and range [69]. Compressive sensing was introduced to reduce the large number of pulses needing
to be transmitted for an inverse discrete Fourier transform (IDFT) detector.
As a simple example, consider the early range and speed estimation approach presented by
Shah [68]. Using co-located transmitters and receivers and N transmitted pulses, the target scene
can be decomposed into an M × L range-speed plane with discretized range spaces of [R1 , . . . , RM ]
and speed spaces of [v1 , . . . , vL ]. Representing the target scene as a matrix S of size M × L, the
output of the phase detector for the reflected signal at distance Rm moving at speed vl is [68]
y[k] =
where
S(m, l) =
,
α
0
L
M +
+
2
ei2πfk c (Rm +vl kT ) · S(m, l) + w[k] ,
(5.7)
reflectivity of target present at (RM , vl )
target is absent at ((m − 1)L + l)th grid point
(5.8)
m=1 l=1
and w[k] represents zero-mean white noise. In matrix notation, this takes the familiar form
y = Ds + w ,
(5.9)
where s = [s1 , s2 , . . . , sML ] ∈ RML×1 is the rasterisation of the scene reflectivity matrix S. The
basis matrix D consists of column vectors {d i }ML−1
of size N × 1. The elements of the matrix can
i=0
be shown to be
2
ϕ(k, (m − 1)L + l) = ei2π(f0 +k!f ) c (Rm +vl kT ) ,
(5.10)
where k = 0, 1, . . . , N − 1 and N again is the number of transmitted pulses. The column vectors
of the basis matrix D correspond exactly to the phase detector outputs for all N pulses with a phase
shift equivalent to that for a target located at the i th grid point. Assuming the measurement matrix
$ in (5.6) is the identity matrix, simple convex optimization techniques such as the Dantzig selector
can be used to recover the signal s with surprisingly high probability [17]. More importantly, since
5.3. IMAGE REGISTRATION (MATCHING) AND CHANGE DETECTION FOR SAR
49
the matrix D consists of all possible range-speed combinations, the compressive sensing approach to
range-speed estimation does not suffer from the shifting and spreading effects common for moving
targets using the IDFT [68]. Compressive sensing techniques have also led to the use of non-identity
measurement matrices to reduce data sampling rates and allow for the decoupling of range, speed
and angle estimation [69].
5.3
IMAGE REGISTRATION (MATCHING) AND CHANGE
DETECTION FOR SAR
Synthetic aperture radar (SAR) uses airborne or spaceborne mounted radar sensors to synthesize the
large antenna aperture sizes needed for adequate cross range resolution. This technique is required in
addition to the pulse compression technologies used to improve range resolution. Figure 5.1 shows
the difference between SAR and optical imagery. Current SAR techniques produce near optical
quality imagery with the added benefits of being able to be formed at long stand-off distances,
under the cover of night and even in adverse weather conditions [28]. The obvious utilities of such
an amazing technology are too large to enumerate, but we will focus on two topics that continue
to see development throughout the literature. The automatic generation of scene height profiles,
known as digital terrain map (DTM) generation, will be discussed in this section and the ability to
automatically classify objects within the scene will be discussed in the ensuing section.
One benefit of SAR imagery is the coherent nature in which the images are formed. The
process allows for coherent signal processing techniques such as interferometry [70] in which 3-D
maps of the Earth’s surface can be generated. The ability to do so is a result of the collection of the
complex reflectivity of the illuminated scene. When the collection geometries for two separate SAR
images are very similar, the two images can be interfered with each other so as to cancel the common
reflectivities allowing for the inference of the scene geometries [70]. A typical scenario, known
as one-pass interferometric SAR (IFSAR), utilizes two separate antennas on a single platform,
differing only in depression angle geometry, to achieve extremely precise height estimation. The
estimation precision is a direct consequence of the use of relatively short (on the order of centimeters)
transmission wavelengths within the SAR sensing system.
An alternative approach to DTM generation that continues to see interest is the use of
stereo radar image pairs. Since the electromagnetic reflectivity of all objects in a scene are laid over
(projected) into a two-dimensional (range and cross range) imaging plane as a function of their height,
different collection geometries produce different height-dependent placements, creating parallax
between targets with height relief [71]. The potential for greater estimation accuracy increases with
the parallax, or the angular difference between the SAR collection geometries. The benefit of the
stereo approach is the known exact analytical solution for target heights in the scene based on the
target disparity between image pairs.
Common to both approaches is the requirement for accurate feature correspondence between
the image pairs. Image registration for natural imagery has always been a hot-bed of research, but the
problem proves to be even more difficult for SAR imagery. In addition to the sometimes debilitating
50
5. SPARSE REPRESENTATIONS IN RADAR
Figure 5.1: Ku-band (15 GHz) SAR image (left) and optical image (right) of Albuquerque International
Airport. (Image courtesy of Sandia National Labs, http://www.sandia.gov/radar/imageryku.
html.)
multiplicative noise known as speckle, large translations, rotations, and image distortions are common
for both imaging scenarios. For one-pass IFSAR, translations and rotations are mitigated by the use
of multiple sensors aboard a single platform, but registration between image pairs is still an issue
due to the height-dependent projection on the different imaging planes [28]. For stereo SAR, the
large crossing angles required for greater parallax increase the effect of distortion and target layover
as a result of the different collection geometries. An extreme example would be the different target
signatures that can be expected when viewing a complex target from two completely opposite sides.
An example of a digital elevation map using IFSAR imagery for the capitol building in
Washington, D.C. is shown in Figure 5.2. While the technique is amazingly accurate, the spatial
distortions still prove difficult for registration and can severely impact the accuracy of the height
estimates. Previous approaches for SAR image registration have relied upon both the coherent
nature of the collection process, as well as the complex reflectivity (magnitude and phase) of each
image pixel. These traditional translation-only complex correlation searches subdivide the image
and find the translation-dependence required for maximal correlation between image pairs. Recent
approaches [71] have used a shift-scale complex correlation approach to mitigate the effects of
substantial height relief.
5.3. IMAGE REGISTRATION (MATCHING) AND CHANGE DETECTION FOR SAR
51
The problem of image registration, specifically for SAR imagery, was recently addressed by
Nguyen and Tran [72]. Their approach to identifying the correspondence between local patches in
the reference and test images was to create a dictionary using all overlapping patches of the reference
image within a specified search region and find the best sparse approximation for each test image
patch using a linear combination of the reference patches stored in the dictionary. Similar to the
approach by Wright [8] used to account for image occlusion, an additional scaled identity matrix is
appended to the dictionary to account for image noise.
Figure 5.2: Digital elevation map for U.S. Capitol area generated using IFSAR imagery. (Image courtesy
of Sandia National Labs, http://www.sandia.gov/radar/imageryku.html.)
Using image patches of size M × N, a d-element dictionary D ∈ RMN ×d and a test image
patch y ∈ RMN ×1 , the problem of determining a sparse representation becomes
*
)
αD
= Aα ,
(5.11)
y = [D µI ]
αI
where I is the MN × MN identity matrix, µ is a positive scalar (mean of reference search area), α D
are the coefficients associated with the dictionary elements and α I are the coefficients associated
with the identity matrix. Again, the addition of a scaled identity matrix helps to account for pixel
irregularities in the approximation caused by image noise. The sparse solution for α can be computed
using any of the techniques discussed in Section 2.3.The final approximation for the test image patch
is given by
ŷ = D!
αD .
(5.12)
The approach does not determine the spatial displacement of the pixels between the two
images, although it is easy to imagine a simple solution to determine such a measure needed for
DTM generation. Instead, the authors considered the additional task of identifying changes that exist
52
5. SPARSE REPRESENTATIONS IN RADAR
between the two images. Known as change detection, the second challenge relies on the suppression
of the target signatures from both images. The authors pose a novel solution that includes the use
of both the estimation accuracy and the degree of sparsity required for the estimation. The basic
hypothesis being that when able to easily and accurately represent a test image patch using only a
few non-zero coefficients, no change has occurred. By considering both the estimation accuracy and
the degree of sparsity, a general measure of the amount of change observed can be estimated.
Change detection is only a single application once accurate image registration and pixel
correspondence has been achieved. While extremely interesting, all applications can unfortunately
not be considered. We refer the interested reader to the voluminous literature on stereo SAR [71],
environmental monitoring [73], and moving target indication [74], among others.
5.4
AUTOMATIC TARGET CLASSIFICATION
As mentioned in Chapter 2, a byproduct of the computation of a compact and accurate signal within
the sparse representation framework is the ability to infer further details about the signal itself. A
significant amount of time is being devoted to the exploration of the ability for sparse representations
to aid in object classification in areas including handwritten digits [75], facial recognition [8], and
of course radar imagery [76].
A strong desire exists, particularly in military scenarios, to automatically detect and classify a
wide variety of objects using high-resolution imagery. SAR imagery, with its multitude of benefits
mentioned in the previous section, is a perfect candidate for object recognition. Early versions of
automatic target recognition (ATR) systems utilizing SAR imagery have predominantly consisted of
three stages [77]: an early detection stage that identifies local regions of interest using a constant false
alarm rate (CFAR) detector, a one-class discrimination stage that aims to eliminate false-alarms while
passing targets and the final classification stage responsible for classifying the remaining detections
against a set of known target types.
In identifying regions of interest, the detection stage generates false alarms. No matter the
classification system, subsequent stages are responsible for rejecting natural and man-made false
alarms commonly referred to as clutter. While intuitive and easy to implement, template-matching
based algorithms [78] correlate image detections with training templates for each target type. These
templates are typically created at 5◦ increments in aspect or rotation angles of the target. The
obvious crux of the approach is the intensive nature of correlating each detection with each template,
particularly as more targets become known. Feature-based approaches, which use either computed
image features or the training images directly, have been shown to be highly effective in target
generalization and confuser rejection. Implementations such as the multilayer perceptron and support
vector machine (SVM) have typically relied upon target pose estimation techniques which have
proven to be particularly difficult in the presence of image speckle [79]. Recent studies have focused
on finding a single discrimination/classification solution that is both efficient and accurate both in
target classification and clutter rejection.
5.4. AUTOMATIC TARGET CLASSIFICATION
53
Building on both feature-based (training images used directly) and template matching (nearest
neighbor approximation) algorithms, the use of sparse representations for SAR target classification
was recently explored using two separate image processing techniques. In the next two sections,
we present our work on target classification in SAR imagery using sparse representations. These
algorithms have been shown to be effective at classifying targets in SAR imagery, however, they
continue to be refined in hopes of developing the next state-of-the-art SAR target classification
scheme.
5.4.1
SPARSE REPRESENTATION FOR TARGET CLASSIFICATION
Emulating the approach presented by Wright [8] for facial recognition, we recently presented the
extension of sparse representation based target classification for SAR imagery [76]. The approach
was motivated from the perspective of manifold approximation, wherein test images for a given class
were assumed to be sampled from an underlying manifold composed of training images of the same
type. Under the assumption that the test images do not lie far from the class manifold, a reasonable
linear approximation based on the training images would then be sufficient to represent the test
image. Given two manifolds, (M1 , M2 ) as shown in Figure 5.3, the linear approximations for test
image x onto the j th , j = 1, 2 class manifolds are given by
x̂ J =
+
αk,j d k,j ,
(5.13)
k∈!Jx
j
where !k is the set containing the indices k of the training vectors from class j so that d k,j is the k th
Figure 5.3: Classification of test data (x) using local linear projections (x1 , x-2 ) and residuals (r1 , r2 ) on
the manifolds (M1 , M2 ).
dictionary element for class j . The training images for all known classes are combined into a single
dictionary, D, so that the approximation coefficient vector α can be computed by solving x = Dα
using one of the l1 -norm solution methods discussed in Section 2.3. Classification decisions are
54
5. SPARSE REPRESENTATIONS IN RADAR
made by selecting the target class j that minimizes the projected image residual, i.e.,
(5.14)
c = arg min ∥r j ∥2 = arg min ∥x − x̂j ∥ .
j
j
Classification performance
95
Full dimensional data
Reduced dimensional data
90
85
1
3
5
Sparsity level
7
9
Figure 5.4: Classification performance in full and reduced dimensional cases [76]. Target images collected at a 17◦ depression angle were used for training and 15◦ depression angles images were used for
testing.
Results for 3 targets (T72, BMP2, BTR70) from the public Moving and Stationary Target
Acquisition and Recognition (MSTAR) database [54] are shown in Figure 5.4. The solid line
indicates classification accuracy using the full 128 × 128 power-domain imagery while the dashed
line indicates performance results when using random projections to reduce the image dimensionality
by almost 90% to 1792 pixels. Results above 90% Pcc are virtually equivalent to that for recent work
done using machine learning algorithms. More importantly, however, is the lack of performance
degradation when using random projections for dimensionality reduction in conjunction with the
sparse learning architecture.
5.4.2
SPARSE REPRESENTATION-BASED SPATIAL PYRAMIDS
We have recently been working to extend the sparse representation framework [80] to work in
conjunction with an emerging and highly useful classification/recognition architecture known as
spatial pyramid matching [81]. Initially designed to address the issue of scene categorization, the
spatial pyramid matching scheme was shown to be useful for recognition of objects within the
scene itself. The ability to do so lies primarily in the approximate geometric correspondence that is
established when systematically aggregating image features over fixed sub-regions in the image. The
integration of the sparse representation framework into the feature pooling portion of the algorithm
5.4. AUTOMATIC TARGET CLASSIFICATION
55
improves the spatial pyramid framework by identifying only the strongest dictionary elements within
each sub-region.
Consider the generation of a three-level spatial pyramid as shown in Figure 5.5. Using a generic
dictionary, D ∈ Rd×F , where d is the number of elements in the dictionary and F is the length of
an arbitrary image feature vector, local image features such as scale-invariant feature transforms
(SIFTs) or even basic FFTs can be encoded to provide a feature descriptor for interesting points in
the image. Currently, dictionary generation is accomplished using clustering of all feature descriptors
from the training imagery, although more sophisticated dictionary generation algorithms continue
to be explored.
Level 0
Level 1
Level 2
Figure 5.5: Generation of a three-level spatial pyramid.
For each sub-region at each level L sparse approximations are used to encode the local image
features using the dictionary D. Given fl feature vectors for a sub-region, the sparse representation
framework generates a coefficient matrix C ∈ Rd×fl where each element cij corresponds to the
representation coefficient for the i th dictionary element for feature vector j . Contrast this to the
vector quantization approach presented by Lezabnik, where each column of C contains a single
unit element indicating the most similar (l2 -norm) dictionary atom for that feature vector. A single
feature descriptor f for each sub-region is then generated from C using either average (histogram
binning) or max pooling. Mathematically, these are equivalent to
+
Average Pooling: f =
cij
j
Max Pooling: f = max cij .
j
Average pooling identifies the total number of dictionary elements present in each sub-region
while max pooling identifies the maximum component for each dictionary element (if present at all)
in the sub-region.
56
5. SPARSE REPRESENTATIONS IN RADAR
The final feature vector for the entire image is the concatenation of each sub-region feature
vector f . The length d 13 (4L − 1) vector is itself very sparse, a property which we seek to exploit
in further research. For the time being, the long feature vector is used as the input to a one-versusone linear SVM, whose raw output score for each class will be used to make the classification
decision. Preliminary results using SIFT features and 16 × 16 FFT patch features for the same
three targets from the MSTAR data set from Section 5.4.1 are shown in Table 5.1. The use of
sparse representations and max pooling clearly improve classification performance for SAR imagery.
The nearly 10% increase in classification performance when using FFT features rather than SIFT
features indicates the potential for significant improvement when more advanced image features,
tuned specifically for SAR imagery, are used.
Table 5.1: MSTAR classification results using spatial pyramid matching
SIFT
w/Histograms
SIFT w/SR
FFT
w/Histograms
FFT w/SR
L=1
M = 100
L=5
L = 10
L=1
73.19%
77.73%
78.32%
87.25%
L = 10
L=1
77.14%
77.14%
80.29%
84.91%
88.94%
M = 200
L=5
79.27%
90.26%
86.81%
L = 10
82.78%
78.39%
87.33%
80.88%
M = 400
L=5
81.54%
79.27%
79.27%
87.91%
81.69%
88.57% 88.13%
82.56%
Moving forward, it is expected that the utility of the approach will be exemplified by its ability
to handle target occlusion. The ability to maintain geometric correspondence while segmenting the
target image into parts could prove to be extremely useful when the target is partially hidden, either
intentionally or unintentionally. In addition, if indeed geometric correspondence between target
features is maintained, it is expected that the algorithm will prove to be robust to target translations.
Translation and occlusion independence have proven to be difficult tasks for even the most advanced
SAR detection algorithms throughout the literature. An algorithm that is able to successfully handle
these and other unexpected non-benign operating conditions could very well be the next state-ofthe-art SAR automatic target recognition system.
57
APPENDIX
A
Code Sample
A.1
NON-UNIFORM SAMPLING AND SIGNAL
RECONSTRUCTION CODE
%nonUniform_Sampling.m - Non-Uniform Sampling
clear; clc; close all
f1 = 1e3; f2 = 2e3; f3 = 4e3;
%Hz
fs = 10e3;
%Hz
T = 1/fs;
%Period in seconds
%Number of samples
N = 1028;
t = [0:N-1]'*T;
%For .1s of data
%Full-length and sampled signals
x = sin(2*pi*f1*t) + sin(2*pi*f2*t) + sin(2*pi*f3*t);
%Generate random samples
M = 151;
k = randperm(N);
m = k(1:M);
b = x(sort(m));
%Approximately 10% of total signal
%Plot time-domain signal
figure;subplot(3,2,1);plot(t,x,'r-',t(m),b,'bo');
xlabel('Time (s)');ylabel('x(t)');title('Sum of sinusoids');
%Plot (sparse) Fourier domain signal
X = fft(x);
subplot(3,2,2);plot(linspace(0,fs,length(X)),abs(X));
xlabel('Frequency (Hz)');ylabel('|X(f)|');
%Generate linear transform matrix
PHI = fft(eye(N));
S = zeros(M,N);
S(sub2ind(size(S),1:M,sort(m))) = 1;
A = S*(1/N)*PHI';
58
A. CODE SAMPLE
%Naive l2 minimization solution
l2 = pinv(A)*b;
subplot(3,2,4);plot(linspace(0,fs,length(X)),abs(l2));
xlabel('Frequency (Hz)');ylabel('|X_l_2(f)|');
%Reconstruct signal
xl2 = (1/N)*real(PHI'*l2);
subplot(3,2,3);plot(t,xl2);
xlabel('Time (s)');ylabel('x_l_2(t)');
%Approximate l1 solution using OMP
l1 = OMPnorm(A,b,floor(M/4),0);
subplot(3,2,6);plot(linspace(0,fs,length(X)),abs(l1));
xlabel('Frequency (Hz)');ylabel('|X_l_1(f)|');
%Reconstruct signal
xl1 = (1/N)*real(PHI'*l1);
subplot(3,2,5);plot(t,xl1);
xlabel('Time (s)');ylabel('x_l_1(t)');
xlabel('x_p');ylabel('y_p');title('PCA Transformation w/2
Eigenvectors');
A1: 1-D Signal Reconstruction Analysis
A.2. LONG-SHEPP PHANTOM TEST IMAGE RECONSTRUCTION CODE
A.2
LONG-SHEPP PHANTOM TEST IMAGE
RECONSTRUCTION CODE
%********************************************************************
% DISCLAIMER: This code has been modified from the original
% tveq_phantom.m script supplied with l1-MAGIC package. It has been
% updated to generate a single figure containing the test image,
% Fourier sampling mask, minimum energy solution, and total
% variation solution.
%********************************************************************
clear; clc; close all;
figure;
%
n
N
X
x
Phantom
= 256;
= n*n;
= phantom(n);
= X(:);
%Size of image
%Number of pixels
%Rasterize image
% Number of radial lines in the Fourier domain
L = 22;
% Fourier samples we are given
[M,Mh,mh,mhi] = LineMask(L,n);
OMEGA = mhi;
A = @(z) A_fhp(z, OMEGA);
At = @(z) At_fhp(z, OMEGA, n);
% measurements
y = A(x);
% min l2 reconstruction (backprojection)
xbp = At(y);
Xbp = reshape(xbp,n,n);
% recovery
tvI = sum(sum(sqrt([diff(X,1,2) zeros(n,1)].^2 ...
[diff(X,1,1); zeros(1,n)].^2 )));
disp(sprintf('Original TV = %8.3f', tvI));
xp = tveq_logbarrier(xbp, A, At, y, 1e-1, 2, 1e-8, 600);
Xtv = reshape(xp, n, n);
59
60
A. CODE SAMPLE
%Plot everything
subplot(2,2,1);imagesc(X);colormap gray;axis image;axis off;
xlabel('(a)');
subplot(2,2,2);imagesc(fftshift(M));colormap gray;axis image;axis
off;xlabel('(b)');
subplot(2,2,3);imagesc(Xbp);colormap gray;axis image;axis off;
xlabel('(c)');
subplot(2,2,4);imagesc(Xtv);colormap gray;axis image;axis off;
xlabel('(d)');
A2: 2-D Phantom Image Reconstruction Using Total Variation
A.3. SIGNAL BANDWIDTH CODE
A.3
SIGNAL BANDWIDTH CODE
%Ex - 4.1 - Pulse radar signal and bandwidth examples
clear; clc; close all
%Operating parameters
n = 1;
DC = .05;
tau_b = DC*n;
f = 500; fs = 5000;
t = 0:1/fs:n;
%Length of signal (in seconds)
%Duty cycle
%10% Duty Cycle
% Generate envelope functions
cenv
= .5*(1+cos(2*pi*(t-tau_b/2)/tau_b));
cenv(t>tau_b)
= 0;
sqp
= ones(1,length(cenv));
sqp(t>tau_b)
= 0;
%Part (a) - CW Pulse
x_ncenv
= cenv.*sin(2*pi*f*t);
x_nsqp
= sqp.*sin(2*pi*f*t);
figure;plot(t,x_ncenv);xlim([0 tau_b]);
%Part (b) - Chirp Pulse
x_c
= chirp(0:1/fs:tau_b,f-.1*f,tau_b,f+.1*f);
x_c
= [x_c zeros(1,length(t)-length(x_c))];
x_ccenv
= cenv.*x_c;
x_csqp
= sqp.*x_c;
figure;plot(t,x_csqp);xlim([0 tau_b]);
% Part (c) - Frequency Spectrum for Rectangular Pulse
fn = linspace(-fs/2,fs/2,length(t));
figure;
plot(fn,db(abs(fftshift(fft(x_nsqp)))),'-k');hold on;
plot(fn,db(abs(fftshift(fft(x_csqp)))),'-r');
xlim([f-.5*f f+.5*f]);legend('CW Pulse','Chirp');
xlabel('f');ylabel('dB(X_f)');
61
62
A. CODE SAMPLE
% Part (d) - Frequency Spectrum for Raised Cosine Pulse
fn = linspace(-fs/2,fs/2,length(t));
figure;
plot(fn,db(abs(fftshift(fft(x_ncenv)))),'-k');hold on;
plot(fn,db(abs(fftshift(fft(x_ccenv)))),'-r');
xlim([f-.5*f f+.5*f]);legend('CW Pulse','Chirp');
xlabel('f');ylabel('dB(X_f)');
A3:Signal bandwidhts for CW and compressed waveforms
63
Bibliography
[1] H. Hertz, Electric Waves. New York: Dover Publications, 1962. (Republication of the work first
published in 1983 by Macmillan and Company.) Cited on page(s) 1
[2] M. A. Richards, Fundamentals of Radar Signal Processing. New York: McGraw-Hill, 2005.
Cited on page(s) 1, 2, 33, 36, 37, 39, 41, 42
[3] M. Elad, Sparse and Redundant Representations. New York: Springer, 2010.
DOI: 10.1007/978-1-4419-7011-4 Cited on page(s) 1, 10, 12
[4] M. Skolnik, Introduction to Radar Systems, 3rd ed. New York: McGraw-Hill, 2007. Cited on
page(s) 1, 2, 3, 33, 35, 36, 45
[5] S. G. Marconi, “Radio Telegraphy,” in Proc. IRE, vol. 10, 1992, p. 237. Cited on page(s) 1
[6] J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, pp. 465–471, 1978.
DOI: 10.1016/0005-1098(78)90005-5 Cited on page(s) 7
[7] J. B. Tenenbaum, V. de Silva and J. C. Langford, “A global geometric framework for nonlinear
dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
DOI: 10.1126/science.290.5500.2319 Cited on page(s) 7, 25, 26, 27
[8] J. Wright et. al., “Robust face recognition via sparse representation,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
DOI: 10.1109/TPAMI.2008.79 Cited on page(s) 7, 8, 51, 52, 53
[9] E. Candès, J. Romberg and T. Tao, “Stable signal recovery from incomplete and inaccurate
measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–
1223, 2005. DOI: 10.1002/cpa.20124 Cited on page(s) 7, 18
[10] D. L. Donoho, “For most large undeterdetermined systems of linear equations the minimal l_1norm solution is also the sparsest solution,” Communications on Pure and Applied Mathematics,
vol. 59, no. 6, pp. 797–829, 2006. DOI: 10.1002/cpa.20132 Cited on page(s) 7, 9
[11] E. Candes, J. Romberg and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transaction on Information Theory,
vol. 52, no. 2, pp. 489–509, February 2006. DOI: 10.1109/TIT.2005.862083 Cited on page(s)
8
64
BIBLIOGRAPHY
[12] G. Strang, Introduction to Linear Algebra, 4th ed. Wesseley, MA: Wesseley-Cambridge Press,
2009. Cited on page(s) 8
[13] I. Daubechies, “Time-frequency localization operators: a geometric phase space approach,”
IEEE Transaction on Information Theory, vol. 34, pp. 605–612, 1988. DOI: 10.1109/18.9761
Cited on page(s) 8
[14] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to Theory of NPcompleteness. San Francisco: W. H. Freeman and Company, 1979. Cited on page(s) 9
[15] E. Amaldi and V. Kann, “On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems,” Theoretical Computer Science, vol. 209, pp. 237–260, 1998.
DOI: 10.1016/S0304-3975(97)00115-1 Cited on page(s) 9
[16] S. Chen, D. Donoho, and M. Saunders,“Atomic decomposition by basis pursuit,” SIAM Review,
vol. 43, no. 1, pp. 129–159, 2001. DOI: 10.1137/S003614450037906X Cited on page(s) 9, 12
[17] E. Candes and T. Tao, “The Dantzig selector: Statistical estimation when p is much larger than
n,” Annals of Stastistics, vol. 35, pp. 2313–2351, 2007. DOI: 10.1214/009053606000001523
Cited on page(s) 12, 48
[18] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, 1993. DOI: 10.1109/78.258082
Cited on page(s) 12
[19] Y. C. Pati, R. Rezaifar and P. S. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in 27th Asilomar Conf. on Signals, Systems and Comput., Nov. 1993. DOI: 10.1109/ACSSC.1993.342465 Cited on page(s)
12
[20] M. E. Davies and T. Blumesath, “Faster and greedier: algorithms for sparse reconstruction of
large datasets,” in Proceedings of ISCCSP 2008, 2008, pp. 774–779.
DOI: 10.1109/ISCCSP.2008.4537327 Cited on page(s) 12, 13, 16
[21] J. H. Friedman and W. Steutzle, “Projection pursuit regression,” American Statistics Association,
vol. 76, pp. 817–823, 1981. DOI: 10.1080/01621459.1981.10477729 Cited on page(s) 13
[22] T. Blumensath and M. E. Davies, “Gradient pursuits,” IEEE Transactions on Signal Processing,
vol. 56, no. 6, pp. 2370–2382, 2008. DOI: 10.1109/TSP.2007.916124 Cited on page(s) 15
[23] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal
matching pursuit,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4655–4666,
December 2007. DOI: 10.1109/TIT.2007.909108 Cited on page(s) 15
BIBLIOGRAPHY
65
[24] E. Candès, “Compressive Sampling,” in International Congress of Mathematicians, Madrid,
Spain, 2006, pp. 1433–1452. Cited on page(s) 15
[25] E. Candès. (2005, October) l1 Magic. [Online]. http://www.acm.caltech.edu/l1magic/
Cited on page(s) 17, 19
[26] C. Moler. (2010) ‘Magic’ Reconstruction: Compressed Sensing. [Online]. http://www.
mathworks.com/company/newsletters/articles/clevescorner-compressedsensing.html?issue=nn2010 Cited on page(s) 17
[27] A. H. Delaney and Y. Bresler, “A fast and accurate iterative reconstruction algorithm,” IEEE
Transactions on Image Processing, vol. 5, pp. 740–753, 1996. DOI: 10.1109/83.495957 Cited
on page(s) 17
[28] C. V. Jakowatz, D. E., Eichel, P. H. Wahl, D. C. Ghiglia, and P. A. Thompson, Spotlight-Mode
Synthetic Aperture Radar: A Signal Processing Approach. New York: Springer Science + Business
Media, 1996. DOI: 10.1007/978-1-4613-1333-5 Cited on page(s) 17, 34, 36, 37, 41, 49, 50
[29] E. J. Candès, J. Romberg and T. Tao, “Robust uncertainty principles: exact signal reconstruction
from highly incomplete frequency information,” IEEE Transactions on Information Theory,
vol. 52, pp. 489–509, 2006. DOI: 10.1109/TIT.2005.862083 Cited on page(s) 18
[30] Z. Zhang and H. Zha, Local linear smoothing for nonlinear manifold learning, 2003,Technical
Report, Zhejjang University. Cited on page(s) 21
[31] H. Hotelling, “Analysis of a complex of statistical variables into principal components,” Journal
of Educational Psychology, vol. 24, pp. 417–441, 1933. DOI: 10.1037/h0070888 Cited on page(s)
21
[32] I. T. Jollife, Principal Component Analysis. New York: Springer-Verlag, 2002. Cited on page(s)
21
[33] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics,
vol. 7, pp. 179–188, 1936. DOI: 10.1111/j.1469-1809.1936.tb02137.x Cited on page(s) 21
[34] L. J. P. Van Der Maaten, E. O., Van Den Herik and H. J. Postma, “Dimensionality reduction:
A comparative overview,” Submitted for publication to Elsevier, 2007. Cited on page(s) 21,
22, 23, 25, 28
[35] Y. Bengio and M. Monperrus, “Non-local manifold tangent learning,” Advances in Neural
Information Processing Systems, vol. 17, pp. 129–136, 2005. Cited on page(s) 21
[36] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer Science + Business Media, 2006. Cited on page(s) 22, 23, 25
66
BIBLIOGRAPHY
[37] K. Pearson, “On lines and planes of closest fit to systems of points in space,” The London,
Edinburgh and Dublin Philosophical Magazine and Journal of Science, Sixth Series, vol. 2, pp. 559–
572, 1901. Cited on page(s) 22
[38] M. G. Partridge and R. A. Calvo, “Fast dimensionality reduction and simple PCA,” Intelligent
Data Analysis, vol. 2, no. 1, pp. 203–214, 1998. DOI: 10.1016/S1088-467X(98)00024-9 Cited
on page(s) 23
[39] S. Roweis, “EM Algorithms for PCA and SPCA,” in Advances in Neural Information Processing
Systems, 1998, pp. 626–632. Cited on page(s) 23
[40] J. Yin, D. Hu and Z. Zhou, “Noisy manifold learning using neighborhood smoothing embedding,” Pattern Recognition Letters, vol. 29, no. 11, pp. 1613–1620, 2008.
DOI: 10.1016/j.patrec.2008.04.002 Cited on page(s) 25
[41] K. Q. Weinberger and L. K. Saul, “An introduction to nonlinear dimensionality reduction maximum variance unfolding,” in Proceedings of the 21st National Conference on Artificial Intelligence,
2006. Cited on page(s) 27
[42] S. Lagon and A. B. Lee, “Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 28, pp. 1393–1403, 2006.
DOI: 10.1109/TPAMI.2006.184 Cited on page(s) 27
[43] B Scholkopf, A. J. Smola and K. R. Muller, “Nonlinear componenet analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.
DOI: 10.1162/089976698300017467 Cited on page(s) 27
[44] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,”
Science, vol. 290, pp. 2323–2326, 2000. DOI: 10.1126/science.290.5500.2323 Cited on page(s)
27, 28
[45] T. F. Cox and M. A. Cox, Multidimensional Scaling. London: Chapman and Hall, 2001. Cited
on page(s) 27
[46] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and
clustering,” Advances in Neural Information Processing Systems, vol. 14, pp. 585–591, 2002. Cited
on page(s) 28
[47] D. L. Donoho and C. Grimes, “Hessian eigenmaps: New locally linear embedding techniques
for high-dimensional data,” in Proceedings of the National Academy of Sciences, vol. 102, 2005,
pp. 7426–7431. DOI: 10.1073/pnas.1031596100 Cited on page(s) 28
BIBLIOGRAPHY
67
[48] Z. Zhang and H. Zha,“Principal manifolds and nonlinear dimension reduction via local tangent
space alignment,” SIAM Journal of Scientific Computing, vol. 26, no. 1, pp. 313–338, 2004.
DOI: 10.1137/S1064827502419154 Cited on page(s) 30
[49] Y. W. Teh and S. T. Roweis, “Automatic alignment of local representations,” in Advances in
Neural Information Processing Systems, vol. 15, 2003, pp. 841–848. Cited on page(s) 30
[50] M. Brand, “Charting a manifold,” in Advances in Neural Information Processing Systems, vol. 15,
2002, pp. 985–992. Cited on page(s) 30
[51] G. H. Golub and C. F. van Loan, Matrix Computations. Oxford, UK: North Oxford Academic,
1983. Cited on page(s) 30
[52] S. Dasgupta and A. Gupta, “An elementary proof of the Johnson-Lindenstrauss lemma,” U.
C. Berkeley, Technical Report 99–006 Mar. 1999. Cited on page(s) 30
[53] E. Bingham and H. Mannila, “Random projection in dimensionality reduction: applications
to image and text data,” in Knolwedge Discovery and Data Mining, 2001, pp. 245–250.
DOI: 10.1145/502512.502546 Cited on page(s) 31, 32
[54] E. R. Keydel, “MSTAR extended operating conditions,” in Proceedings of SPIE, vol. 2757, 1996,
pp. 228–242. DOI: 10.1117/12.242059 Cited on page(s) 32, 54
[55] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing. Englewood
Cliffs, NJ: Prentice Hall, 1984. Cited on page(s) 41
[56] S. M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory, 2nd ed. Upper Saddle
River, NJ: Prentice Hall, 1993. Cited on page(s) 42
[57] R. Baraniuk and P. Steeghs,“Compressive radar imaging,” in IEEE Radar Conference, Waltham,
MA, Apr. 2007, pp. 128–133. DOI: 10.1109/RADAR.2007.374203 Cited on page(s) 45, 46,
47
[58] M. C. Shastry, R. M. Narayanan and M. Rangaswamy, “Compressive radar imaging using
white stochastic waveforms,” in Proceedings of the 5th IEEE International Waveform Diversity
and Design, Aug. 2010, pp. 90–94. DOI: 10.1109/WDD.2010.5592367 Cited on page(s) 46,
47
[59] D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4,
pp. 1289–1306, April 2006. DOI: 10.1109/TIT.2006.871582 Cited on page(s) 46
[60] H. Nyquist, “Certain topics in telegraph transmission theory,” Proceedings of the IEEE, vol. 90,
no. 2, pp. 280–305, Feb. 2002. DOI: 10.1109/5.989875 Cited on page(s) 47
68
BIBLIOGRAPHY
[61] D. Gao, D. Liu, Y. Feng, Q. An and F. Yu, “Radar echo signal detection with sparse representations,” in Proceedings of the 2nd International Conference on Signal Processing Systems (ICSPS),
July 2010, pp. 495–498. DOI: 10.1109/ICSPS.2010.5555846 Cited on page(s) 47
[62] L. Carin, D. Liu and B. Guo, “In situ compressive sensing multi-static scattering: Imaging and
the restricted isometry property,” preprint, 2008. Cited on page(s) 47
[63] Y. Chi, L. Scharf, A. Pezeshki and R. A. Calderbank, “Sensitivity to basis mismatch in compressed sensing,” IEEE Transactions on Signal Processing, vol. 59, no. 5, pp. 2182–2195, May
2011. DOI: 10.1109/TSP.2011.2112650 Cited on page(s) 47
[64] E. Fishler et al., “MIMO radar: An idea whose time has come,” in Proc. IEEE Radar Conf,
Philadelphia, PA, Apr. 2004, pp. 71–78. DOI: 10.1109/NRC.2004.1316398 Cited on page(s)
47
[65] A. M. Haimovich, R. S. Blum and L. J. Cimini, “MIMO radar with widely separated antennas,”
IEEE Signal Processing Magazine, vol. 25, no. 1, pp. 116–129, 2008.
DOI: 10.1109/MSP.2008.4408448 Cited on page(s) 47
[66] P. Stoica and J. Li, “MIMO radar with colocated antennas,” IEEE Signal Processing Magazine,
vol. 24, no. 5, pp. 106–114, 2007. DOI: 10.1109/MSP.2007.904812 Cited on page(s) 47
[67] A. P. Petropulu, Y Yu and H. V. Poor, “Distributed MIMO radar using compressive sampling,”
in Proc. 42nd Asilmoar Conf. Signals, Syst. Comput., Pacific Grove, CA, Nov. 2008, pp. 203–207.
Cited on page(s) 47
[68] S. Shah, Y. Yu and A. P. Petropulu, “Step-frequency radar with compressive sampling (SFRCS),” in Proc. ICASSP 2010, Dallas, TX, Mar. 2010, pp. 1686–1689.
DOI: 10.1109/ICASSP.2010.5495497 Cited on page(s) 48, 49
[69] Y. Yu, A. P. Petropulu and H. V. Poor, “Reduced complexity angle-Doppler-range estimation
for MIMO radar that employs compressive sensing,” in Proceedings of the Forty-Third Asilomar
Conference on Signals, Systems and Computers, Nov. 2009, pp. 1196–1200.
DOI: 10.1109/ACSSC.2009.5469995 Cited on page(s) 48, 49
[70] P. A. Rosen, “Synthetic aperture radar interferometry,” Proceedings of the IEEE, vol. 88, no. 3,
pp. 333–382, March 2000. DOI: 10.1109/5.838084 Cited on page(s) 49
[71] D. A. Yocky and C. V. Jakowatz,“Shift-scale complex correlation for wide-angle coherent crosstrack SAR stereo processing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no.
3, pp. 576–583, March 2007. DOI: 10.1109/TGRS.2006.886193 Cited on page(s) 49, 50, 52
[72] L. H. Nguyen and T. D. Tran, “A sparsity-driven joint image registration and change detection
technique for SAR imagery,” in IEEE International Conference on Acoustics, Speech and Signal
BIBLIOGRAPHY
69
Processing, Mar. 2010, pp. 2798–2801. DOI: 10.1109/ICASSP.2010.5496197 Cited on page(s)
51
[73] C.T. Wang et al.,“Disaster monitoring and environmental alert in Taiwan by repeat-pass spaceborne SAR,” in International Geoscience and Remote Sensing Symposium, Jul. 2007, pp. 2628–
2631. DOI: 10.1109/IGARSS.2007.4423384 Cited on page(s) 52
[74] X. Wang, Y. Liu and Y. Huang, “The application of image registration based on genetic
algorithm with real data,” in 2nd Asian-Pacific Conference on Synthetic Aperture Radar, Oct. 2009,
pp. 844–847. DOI: 10.1109/APSAR.2009.5374187 Cited on page(s) 52
[75] K. Huang and S. Aviyente, “Sparse representation for signal classification,” in Advances in
Neural Information Processing Systems, 2006, pp. 609–617. Cited on page(s) 52
[76] J. Thiagarajan, K. Ramamurthy, P. Knee and A. Spanias, “Sparse representations for automatic
target classification in SAR images,” in 4th International Symposium on Communications, Control
and Signal Processing (ISCCSP), Mar. 2010, pp. 1–4. DOI: 10.1109/ISCCSP.2010.5463416
Cited on page(s) 52, 53, 54
[77] D. E. Kreithen, S. D. Halversen and G. J. Owirka,“Discriminating targets from clutter,” Lincoln
Laboratory Journal, vol. 6, no. 1, pp. 25–52, 1993. Cited on page(s) 52
[78] G. J. Owirka, S. M. Verbout and L. M. Novak, “Template-based SAR ATR performance using
different image enhancement techniques,” in Proceedings of SPIE, vol. 3721, 1999, pp. 302–319.
DOI: 10.1117/12.357648 Cited on page(s) 52
[79] Q. Zhao et al., “Support vector machines for SAR automatic target recognition,” IEEE Transactions on Aerospace and Electronic Systems, vol. 37, no. 2, pp. 643–653, 2001.
DOI: 10.1109/7.937475 Cited on page(s) 52
[80] P. Knee, J. Thiagarajan, K. Ramamurthy and A. Spanias, “SAR target classification using sparse
representations and spatial pyramids,” in IEEE Internation Radar Conference, Kansas City, MO,
2011. DOI: 10.1109/RADAR.2011.5960546 Cited on page(s) 54
[81] S. Lazebnik, C. Schmid and J. Ponce, “Beyond bags of features: Spatial pyramid matching for
recognizing natural scene categories,” in IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR), vol. 2, 2006, pp. 2169–2178.
DOI: 10.1109/CVPR.2006.68 Cited on page(s) 54
71
Author’s Biography
PETER A. KNEE
Peter A. Knee received a B.S. (with honors) in electrical engineering from the University of New
Mexico, Albuquerque, New Mexico, in 2006, and an M.S. degree in electrical engineering from
Arizona State University in 2010. While at Arizona State University, his research included the
analysis of high-dimensional Synthetic Aperture Radar (SAR) imagery for use with Automatic
Target Recognition (ATR) systems as well as dictionary learning and data classification using sparse
representations. He is currently an employee at Sandia National Laboratories in Albuquerque, New
Mexico, focusing on SAR image analysis and software defined radios.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement