Mathematics of Signal Processing: A First Course " (pdf file)

Mathematics of Signal Processing: A First Course " (pdf file)
Mathematics of Signal Processing:
A First Course
Charles L. Byrne
Department of Mathematical Sciences
University of Massachusetts Lowell
Lowell, MA 01854
March 31, 2013
(Text for 92.548 Mathematics of Signal Processing)
(The most recent version is available as a pdf file at
http://faculty.uml.edu/cbyrne/cbyrne.html)
2
Contents
I
Introduction
xiii
1 Preface
1.1 Chapter Summary . . . . . . . . . . . . .
1.2 Course Aims and Topics . . . . . . . . . .
1.2.1 Some Examples of Remote Sensing
1.2.2 A Role for Mathematics . . . . . .
1.2.3 Limited Data . . . . . . . . . . . .
1.2.4 Course Emphasis . . . . . . . . . .
1.2.5 Course Topics . . . . . . . . . . . .
1.3 Applications of Interest . . . . . . . . . .
1.4 Sensing Modalities . . . . . . . . . . . . .
1.4.1 Active and Passive Sensing . . . .
1.4.2 A Variety of Modalities . . . . . .
1.5 Inverse Problems . . . . . . . . . . . . . .
1.6 Using Prior Knowledge . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
4
4
4
5
5
5
5
6
8
9
2 Urn
2.1
2.2
2.3
2.4
2.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
14
15
16
II
Models in Remote Sensing
Chapter Summary . . . . . . . . .
The Urn Model . . . . . . . . . . .
Some Mathematical Notation . . .
An Application to SPECT Imaging
Hidden Markov Models . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Fundamental Examples
3 Transmission and Remote Sensing- I
3.1 Chapter Summary . . . . . . . . . . .
3.2 Fourier Series and Fourier Coefficients
3.3 The Unknown Strength Problem . . .
3.3.1 Measurement in the Far-Field .
3.3.2 Limited Data . . . . . . . . . .
i
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
21
22
23
24
ii
CONTENTS
3.4
3.5
3.6
3.7
III
3.3.3 Can We Get More Data? . . . . . . . .
3.3.4 The Fourier Cosine and Sine Transforms
3.3.5 Over-Sampling . . . . . . . . . . . . . .
3.3.6 Other Forms of Prior Knowledge . . . .
Estimating the Size of Distant Objects . . . . .
The Transmission Problem . . . . . . . . . . .
3.5.1 Directionality . . . . . . . . . . . . . . .
3.5.2 The Case of Uniform Strength . . . . .
Remote Sensing . . . . . . . . . . . . . . . . . .
One-Dimensional Arrays . . . . . . . . . . . . .
3.7.1 Measuring Fourier Coefficients . . . . .
3.7.2 Over-sampling . . . . . . . . . . . . . .
3.7.3 Under-sampling . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Signal Models
4 Undetermined-Parameter Models
4.1 Chapter Summary . . . . . . . . . . . . . . . .
4.2 Fundamental Calculations . . . . . . . . . . . .
4.2.1 Evaluating a Trigonometric Polynomial
4.2.2 Determining the Coefficients . . . . . .
4.3 Two Examples . . . . . . . . . . . . . . . . . .
4.3.1 The Unknown Strength Problem . . . .
4.3.2 Sampling in Time . . . . . . . . . . . .
4.3.3 The Issue of Units . . . . . . . . . . . .
4.4 Estimation and Models . . . . . . . . . . . . . .
4.5 A Polynomial Model . . . . . . . . . . . . . . .
4.6 Linear Trigonometric Models . . . . . . . . . .
4.6.1 Equi-Spaced Frequencies . . . . . . . . .
4.6.2 Equi-Spaced Sampling . . . . . . . . . .
4.7 Recalling Fourier Series . . . . . . . . . . . . .
4.7.1 Fourier Coefficients . . . . . . . . . . . .
4.7.2 Riemann Sums . . . . . . . . . . . . . .
4.8 Simplifying the Calculations . . . . . . . . . . .
4.8.1 The Main Theorem . . . . . . . . . . . .
4.8.2 The Proofs as Exercises . . . . . . . . .
4.8.3 More Computational Issues . . . . . . .
4.9 Approximation, Models, or Truth? . . . . . . .
4.9.1 Approximating the Truth . . . . . . . .
4.9.2 Modeling the Data . . . . . . . . . . . .
4.10 From Real to Complex . . . . . . . . . . . . . .
25
25
26
27
28
30
30
30
32
32
32
34
35
41
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
43
44
44
45
45
46
46
47
47
48
49
49
50
50
50
51
51
53
55
55
55
55
57
CONTENTS
iii
5 Complex Numbers
5.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . .
5.2 Definition and Basics . . . . . . . . . . . . . . . . . . . . . .
5.3 Complex Numbers as Matrices . . . . . . . . . . . . . . . .
59
59
59
61
6 Complex Exponential Functions
6.1 Chapter Summary . . . . . . . . . . . . . . . . . . . .
6.2 The Complex Exponential Function . . . . . . . . . .
6.2.1 Real Exponential Functions . . . . . . . . . . .
6.2.2 Why is h(x) an Exponential Function? . . . . .
6.2.3 What is ez , for z complex? . . . . . . . . . . .
6.3 Complex Exponential Signal Models . . . . . . . . . .
6.4 Coherent and Incoherent Summation . . . . . . . . . .
6.5 Uses in Quantum Electrodynamics . . . . . . . . . . .
6.6 Using Coherence and Incoherence . . . . . . . . . . . .
6.6.1 The Discrete Fourier Transform . . . . . . . . .
6.7 Some Exercises on Coherent Summation . . . . . . . .
6.8 Complications . . . . . . . . . . . . . . . . . . . . . . .
6.8.1 Multiple Signal Components . . . . . . . . . .
6.8.2 Resolution . . . . . . . . . . . . . . . . . . . . .
6.8.3 Unequal Amplitudes and Complex Amplitudes
6.8.4 Phase Errors . . . . . . . . . . . . . . . . . . .
6.9 Undetermined Exponential Models . . . . . . . . . . .
6.9.1 Prony’s Problem . . . . . . . . . . . . . . . . .
6.9.2 Prony’s Method . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
63
63
64
64
65
66
67
67
68
68
69
71
71
72
72
72
72
73
73
7 Transmission and Remote Sensing- II
7.1 Chapter Summary . . . . . . . . . . . . . . . . .
7.2 Directional Transmission . . . . . . . . . . . . . .
7.3 Multiple-Antenna Arrays . . . . . . . . . . . . .
7.3.1 The Array of Equi-Spaced Antennas . . .
7.3.2 The Far-Field Strength Pattern . . . . . .
7.3.3 Can the Strength be Zero? . . . . . . . .
7.3.4 Diffraction Gratings . . . . . . . . . . . .
7.4 Phase and Amplitude Modulation . . . . . . . .
7.5 Steering the Array . . . . . . . . . . . . . . . . .
7.6 Maximal Concentration in a Sector . . . . . . . .
7.7 Higher Dimensional Arrays . . . . . . . . . . . .
7.7.1 The Wave Equation . . . . . . . . . . . .
7.7.2 Planewave Solutions . . . . . . . . . . . .
7.7.3 Superposition and the Fourier Transform
7.7.4 The Spherical Model . . . . . . . . . . . .
7.7.5 The Two-Dimensional Array . . . . . . .
7.7.6 The One-Dimensional Array . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
77
77
78
78
78
79
80
81
81
82
83
83
84
85
85
85
86
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iv
CONTENTS
7.8
7.9
IV
7.7.7 Limited Aperture . . . . . . . . . . . . . .
7.7.8 Other Limitations on Resolution . . . . .
An Example: The Solar-Emission Problem . . . .
Another Example: Scattering in Crystallography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Fourier Methods
87
87
88
88
95
8 Fourier Analysis
8.1 Chapter Summary . . . . . . . . . . . . .
8.2 The Fourier Transform . . . . . . . . . . .
8.3 The Unknown Strength Problem Again .
8.4 Two-Dimensional Fourier Transforms . . .
8.4.1 Two-Dimensional Fourier Inversion
8.5 Fourier Series and Fourier Transforms . .
8.5.1 Support-Limited F (ω) . . . . . . .
8.5.2 Shannon’s Sampling Theorem . . .
8.5.3 Sampling Terminology . . . . . . .
8.5.4 What Shannon Does Not Say . . .
8.5.5 Sampling from a Limited Interval .
8.6 The Problem of Finite Data . . . . . . . .
8.7 Best Approximation . . . . . . . . . . . .
8.7.1 The Orthogonality Principle . . . .
8.7.2 An Example . . . . . . . . . . . .
8.7.3 The DFT as Best Approximation .
8.7.4 The Modified DFT (MDFT) . . .
8.7.5 The PDFT . . . . . . . . . . . . .
8.8 The Vector DFT . . . . . . . . . . . . . .
8.9 Using the Vector DFT . . . . . . . . . . .
8.10 A Special Case of the Vector DFT . . . .
8.11 Plotting the DFT . . . . . . . . . . . . . .
8.12 The Vector DFT in Two Dimensions . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
97
97
97
98
100
101
101
101
102
102
103
103
104
104
104
106
106
106
107
108
109
110
111
112
9 Properties of the Fourier Transform
9.1 Chapter Summary . . . . . . . . . . . . .
9.2 Fourier-Transform Pairs . . . . . . . . . .
9.2.1 Decomposing f (x) . . . . . . . . .
9.3 Basic Properties of the Fourier Transform
9.4 Some Fourier-Transform Pairs . . . . . . .
9.5 Dirac Deltas . . . . . . . . . . . . . . . . .
9.6 More Properties of the Fourier Transform
9.7 Convolution Filters . . . . . . . . . . . . .
9.7.1 Blurring and Convolution Filtering
9.7.2 Low-Pass Filtering . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
115
115
116
116
117
119
120
121
121
123
CONTENTS
9.8
v
Functions in the Schwartz Class . . . . . . . . . . . . . . . . 123
9.8.1 The Schwartz Class . . . . . . . . . . . . . . . . . . 124
9.8.2 A Discontinuous Function . . . . . . . . . . . . . . . 125
10 The Fourier Transform and Convolution Filtering
10.1 Chapter Summary . . . . . . . . . . . . . . . . . . .
10.2 Linear Filters . . . . . . . . . . . . . . . . . . . . . .
10.3 Shift-Invariant Filters . . . . . . . . . . . . . . . . .
10.4 Some Properties of a SILO . . . . . . . . . . . . . .
10.5 The Dirac Delta . . . . . . . . . . . . . . . . . . . .
10.6 The Impulse Response Function . . . . . . . . . . . .
10.7 Using the Impulse-Response Function . . . . . . . .
10.8 The Filter Transfer Function . . . . . . . . . . . . .
10.9 The Multiplication Theorem for Convolution . . . .
10.10Summing Up . . . . . . . . . . . . . . . . . . . . . .
10.11A Project . . . . . . . . . . . . . . . . . . . . . . . .
10.12Band-Limiting . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
127
127
127
127
128
129
129
130
130
130
131
132
132
11 Infinite Sequences and Discrete Filters
11.1 Chapter Summary . . . . . . . . . . . . . . .
11.2 Shifting . . . . . . . . . . . . . . . . . . . . .
11.3 Shift-Invariant Discrete Linear Systems . . .
11.4 The Delta Sequence . . . . . . . . . . . . . .
11.5 The Discrete Impulse Response . . . . . . . .
11.6 The Discrete Transfer Function . . . . . . . .
11.7 Using Fourier Series . . . . . . . . . . . . . .
11.8 The Multiplication Theorem for Convolution
11.9 The Three-Point Moving Average . . . . . . .
11.10Autocorrelation . . . . . . . . . . . . . . . . .
11.11Stable Systems . . . . . . . . . . . . . . . . .
11.12Causal Filters . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
133
133
133
133
134
134
134
135
136
136
137
138
139
12 Convolution and the Vector DFT
12.1 Chapter Summary . . . . . . . . . . . . . .
12.2 Non-periodic Convolution . . . . . . . . . .
12.3 The DFT as a Polynomial . . . . . . . . . .
12.4 The Vector DFT and Periodic Convolution
12.4.1 The Vector DFT . . . . . . . . . . .
12.4.2 Periodic Convolution . . . . . . . . .
12.5 The vDFT of Sampled Data . . . . . . . . .
12.5.1 Superposition of Sinusoids . . . . . .
12.5.2 Rescaling . . . . . . . . . . . . . . .
12.5.3 The Aliasing Problem . . . . . . . .
12.5.4 The Discrete Fourier Transform . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
141
141
141
142
143
143
143
144
145
145
146
146
.
.
.
.
.
.
.
.
.
.
.
vi
CONTENTS
12.5.5 Calculating Values of the DFT
12.5.6 Zero-Padding . . . . . . . . . .
12.5.7 What the vDFT Achieves . . .
12.5.8 Terminology . . . . . . . . . .
12.6 Understanding the Vector DFT . . . .
13 The
13.1
13.2
13.3
13.4
13.5
Fast Fourier Transform (FFT)
Chapter Summary . . . . . . . . .
Evaluating a Polynomial . . . . . .
The DFT and Vector DFT . . . .
Exploiting Redundancy . . . . . .
The Two-Dimensional Case . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
146
147
147
147
148
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
151
151
151
152
153
154
14 Plane-wave Propagation
14.1 Chapter Summary . . . . . . . . . . . . . . .
14.2 The Bobbing Boats . . . . . . . . . . . . . . .
14.3 Transmission and Remote-Sensing . . . . . .
14.4 The Transmission Problem . . . . . . . . . .
14.5 Reciprocity . . . . . . . . . . . . . . . . . . .
14.6 Remote Sensing . . . . . . . . . . . . . . . . .
14.7 The Wave Equation . . . . . . . . . . . . . .
14.8 Planewave Solutions . . . . . . . . . . . . . .
14.9 Superposition and the Fourier Transform . .
14.9.1 The Spherical Model . . . . . . . . . .
14.10Sensor Arrays . . . . . . . . . . . . . . . . . .
14.10.1 The Two-Dimensional Array . . . . .
14.10.2 The One-Dimensional Array . . . . . .
14.10.3 Limited Aperture . . . . . . . . . . . .
14.11The Remote-Sensing Problem . . . . . . . . .
14.11.1 The Solar-Emission Problem . . . . .
14.12Sampling . . . . . . . . . . . . . . . . . . . .
14.13The Limited-Aperture Problem . . . . . . . .
14.14Resolution . . . . . . . . . . . . . . . . . . . .
14.14.1 The Solar-Emission Problem Revisited
14.15Discrete Data . . . . . . . . . . . . . . . . . .
14.15.1 Reconstruction from Samples . . . . .
14.16The Finite-Data Problem . . . . . . . . . . .
14.17Functions of Several Variables . . . . . . . . .
14.17.1 Two-Dimensional Farfield Object . . .
14.17.2 Limited Apertures in Two Dimensions
14.18Broadband Signals . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
155
155
155
156
157
158
158
159
160
160
160
161
161
161
162
162
162
163
164
164
166
167
167
168
168
168
169
169
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
V
vii
Nonlinear Models
173
15 Random Sequences
15.1 Chapter Summary . . . . . . . . . . . . .
15.2 What is a Random Variable? . . . . . . .
15.3 The Coin-Flip Random Sequence . . . . .
15.4 Correlation . . . . . . . . . . . . . . . . .
15.5 Filtering Random Sequences . . . . . . . .
15.6 An Example . . . . . . . . . . . . . . . . .
15.7 Correlation Functions and Power Spectra
15.8 The Dirac Delta in Frequency Space . . .
15.9 Random Sinusoidal Sequences . . . . . . .
15.10Random Noise Sequences . . . . . . . . .
15.11Increasing the SNR . . . . . . . . . . . . .
15.12Colored Noise . . . . . . . . . . . . . . . .
15.13Spread-Spectrum Communication . . . . .
15.14Stochastic Difference Equations . . . . . .
15.15Random Vectors and Correlation Matrices
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
175
175
175
176
177
178
179
179
181
181
182
183
183
183
184
185
16 Classical and Modern Methods
16.1 Chapter Summary . . . . . . . . . . . .
16.2 The Classical Methods . . . . . . . . . .
16.3 Modern Signal Processing and Entropy .
16.4 Related Methods . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
187
187
187
188
17 Entropy Maximization
17.1 Chapter Summary . . . . . . . . . . . . . . . . . .
17.2 Estimating Non-Negative Functions . . . . . . . .
17.3 Philosophical Issues . . . . . . . . . . . . . . . . .
17.4 The Autocorrelation Sequence {r(n)} . . . . . . .
17.5 Minimum-Phase Vectors . . . . . . . . . . . . . . .
17.6 Burg’s MEM . . . . . . . . . . . . . . . . . . . . .
17.6.1 The Minimum-Phase Property . . . . . . .
17.6.2 Solving Ra = δ Using Levinson’s Algorithm
17.7 A Sufficient Condition for Positive-definiteness . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
191
191
191
192
193
194
195
196
197
198
18 Eigenvector Methods in Estimation
18.1 Chapter Summary . . . . . . . . .
18.2 Some Eigenvector Methods . . . .
18.3 The Sinusoids-in-Noise Model . . .
18.4 Autocorrelation . . . . . . . . . . .
18.5 Determining the Frequencies . . .
18.6 The Case of Non-White Noise . . .
18.7 Sensitivity . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
207
207
207
207
208
209
210
210
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
viii
CONTENTS
19 The
19.1
19.2
19.3
19.4
19.5
19.6
VI
IPDFT
Chapter Summary . . . . . . . . . . . . . . . . . . . . . .
The Need for Prior Information in Non-Linear Estimation
What Wiener Filtering Suggests . . . . . . . . . . . . . .
Using a Prior Estimate . . . . . . . . . . . . . . . . . . . .
Properties of the IPDFT . . . . . . . . . . . . . . . . . . .
Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
Wavelets
213
213
213
214
215
215
217
223
20 Analysis and Synthesis
20.1 Chapter Summary . . . . . . . . . . . . . .
20.2 The Basic Idea . . . . . . . . . . . . . . . .
20.3 Polynomial Approximation . . . . . . . . .
20.4 Signal Analysis . . . . . . . . . . . . . . . .
20.5 Practical Considerations in Signal Analysis
20.5.1 The Finite Data Problem . . . . . .
20.6 Frames . . . . . . . . . . . . . . . . . . . . .
20.7 Bases, Riesz Bases and Orthonormal Bases
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
225
225
225
226
226
227
229
230
231
21 Ambiguity Functions
21.1 Chapter Summary . . . . . . . . . . . . . .
21.2 Radar Problems . . . . . . . . . . . . . . .
21.3 The Wideband Cross-Ambiguity Function .
21.4 The Narrowband Cross-Ambiguity Function
21.5 Range Estimation . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
233
233
233
234
236
237
22 Time-Frequency Analysis
22.1 Chapter Summary . . . . . . . . .
22.2 Non-stationary Signals . . . . . . .
22.3 The Short-Time Fourier Transform
22.4 The Wigner-Ville Distribution . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
239
239
239
240
241
23 Wavelets
23.1 Chapter Summary . . . . . . . . . . . . . . .
23.2 Background . . . . . . . . . . . . . . . . . . .
23.3 A Simple Example . . . . . . . . . . . . . . .
23.4 The Integral Wavelet Transform . . . . . . .
23.5 Wavelet Series Expansions . . . . . . . . . . .
23.6 Multiresolution Analysis . . . . . . . . . . . .
23.6.1 The Shannon Multiresolution Analysis
23.6.2 The Haar Multiresolution Analysis . .
23.6.3 Wavelets and Multiresolution Analysis
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
243
243
243
244
245
246
247
247
248
249
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
ix
23.7 Signal Processing Using Wavelets . . . . .
23.7.1 Decomposition and Reconstruction
23.8 Generating the Scaling Function . . . . .
23.9 Generating the Two-scale Sequence . . . .
23.10Wavelets and Filter Banks . . . . . . . . .
23.11Using Wavelets . . . . . . . . . . . . . . .
VII
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimation and Detection
24 The
24.1
24.2
24.3
24.4
24.5
24.6
24.7
24.8
24.9
BLUE and The Kalman Filter
Chapter Summary . . . . . . . . .
The Simplest Case . . . . . . . . .
A More General Case . . . . . . .
Some Useful Matrix Identities . . .
The BLUE with a Prior Estimate .
Adaptive BLUE . . . . . . . . . . .
The Kalman Filter . . . . . . . . .
Kalman Filtering and the BLUE .
Adaptive Kalman Filtering . . . .
250
250
252
252
254
254
259
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
261
261
262
263
265
266
267
267
269
270
25 Signal Detection and Estimation
25.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . .
25.2 The Model of Signal in Additive Noise . . . . . . . . . . . .
25.3 Optimal Linear Filtering for Detection . . . . . . . . . . . .
25.4 The Case of White Noise . . . . . . . . . . . . . . . . . . .
25.4.1 Constant Signal . . . . . . . . . . . . . . . . . . . . .
25.4.2 Sinusoidal Signal, Frequency Known . . . . . . . . .
25.4.3 Sinusoidal Signal, Frequency Unknown . . . . . . . .
25.5 The Case of Correlated Noise . . . . . . . . . . . . . . . . .
25.5.1 Constant Signal with Unequal-Variance Uncorrelated
Noise . . . . . . . . . . . . . . . . . . . . . . . . . .
25.5.2 Sinusoidal signal, Frequency Known, in Correlated
Noise . . . . . . . . . . . . . . . . . . . . . . . . . .
25.5.3 Sinusoidal Signal, Frequency Unknown, in Correlated
Noise . . . . . . . . . . . . . . . . . . . . . . . . . .
25.6 Capon’s Data-Adaptive Method . . . . . . . . . . . . . . . .
271
271
271
272
274
274
275
275
275
VIII
279
Appendices
276
276
277
278
26 Appendix: Inner Products
281
26.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . 281
26.2 Cauchy’s Inequality . . . . . . . . . . . . . . . . . . . . . . 281
x
CONTENTS
26.3
26.4
26.5
26.6
The Complex Vector Dot Product . . . . . . .
Orthogonality . . . . . . . . . . . . . . . . . . .
Generalizing the Dot Product: Inner Products
The Orthogonality Principle . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
281
283
284
287
27 Appendix: Reverberation and Echo Cancellation
27.1 Chapter Summary . . . . . . . . . . . . . . . . . .
27.2 The Echo Model . . . . . . . . . . . . . . . . . . .
27.3 Finding the Inverse Filter . . . . . . . . . . . . . .
27.4 Using the Fourier Transform . . . . . . . . . . . . .
27.5 The Teleconferencing Problem . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
291
291
291
292
293
294
28 Appendix: Using Prior Knowledge to Estimate the Fourier
Transform
295
28.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . 295
28.2 Over-sampling . . . . . . . . . . . . . . . . . . . . . . . . . 295
28.3 Using Other Prior Information . . . . . . . . . . . . . . . . 297
28.4 Analysis of the MDFT . . . . . . . . . . . . . . . . . . . . . 298
28.4.1 Eigenvector Analysis of the MDFT . . . . . . . . . . 299
28.4.2 The Eigenfunctions of SΩ . . . . . . . . . . . . . . . 300
29 Appendix: The Vector Wiener Filter
29.1 Chapter Summary . . . . . . . . . . . .
29.2 The Vector Wiener Filter in Estimation
29.3 The Simplest Case . . . . . . . . . . . .
29.4 A More General Case . . . . . . . . . .
29.5 The Stochastic Case . . . . . . . . . . .
29.6 The VWF and the BLUE . . . . . . . .
29.7 Wiener Filtering of Functions . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
305
305
305
306
306
307
307
309
30 Appendix: Wiener Filter Approximation
30.1 Chapter Summary . . . . . . . . . . . . . . . . . .
30.2 The Discrete Stationary Case . . . . . . . . . . . .
30.3 Approximating the Wiener Filter . . . . . . . . . .
30.4 Adaptive Wiener Filters . . . . . . . . . . . . . . .
30.4.1 An Adaptive Least-Mean-Square Approach
30.4.2 Adaptive Interference Cancellation (AIC) .
30.4.3 Recursive Least Squares (RLS) . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
311
311
311
313
314
315
316
316
31 Appendix: Fourier Series and Analytic Functions
31.1 Chapter Summary . . . . . . . . . . . . . . . . . .
31.2 Laurent Series . . . . . . . . . . . . . . . . . . . . .
31.3 An Example . . . . . . . . . . . . . . . . . . . . . .
31.4 Fejér-Riesz Factorization . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
319
319
319
320
321
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
xi
31.5 Burg Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 321
32 Appendix: Inverse Problems and the Laplace Transform
32.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . .
32.2 The Laplace Transform and the Ozone Layer . . . . . . . .
32.2.1 The Laplace Transform . . . . . . . . . . . . . . . .
32.2.2 Scattering of Ultraviolet Radiation . . . . . . . . . .
32.2.3 Measuring the Scattered Intensity . . . . . . . . . .
32.2.4 The Laplace Transform Data . . . . . . . . . . . . .
32.3 The Laplace Transform and Energy Spectral Estimation . .
32.3.1 The Attenuation Coefficient Function . . . . . . . .
32.3.2 The Absorption Function as a Laplace Transform . .
323
323
323
323
324
324
324
325
325
326
33 Appendix: Matrix Theory
33.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . .
33.2 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . .
33.3 Basic Linear Algebra . . . . . . . . . . . . . . . . . . . . . .
33.3.1 Bases and Dimension . . . . . . . . . . . . . . . . . .
33.3.2 Systems of Linear Equations . . . . . . . . . . . . .
33.3.3 Real and Complex Systems of Linear Equations . . .
33.4 Solutions of Under-determined Systems of Linear Equations
33.5 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . .
33.6 Vectorization of a Matrix . . . . . . . . . . . . . . . . . . .
33.7 The Singular Value Decomposition (SVD) . . . . . . . . . .
33.7.1 The SVD . . . . . . . . . . . . . . . . . . . . . . . .
33.7.2 Using the SVD in Image Compression . . . . . . . .
33.7.3 An Application in Space Exploration . . . . . . . . .
33.7.4 Pseudo-Inversion . . . . . . . . . . . . . . . . . . . .
33.8 Singular Values of Sparse Matrices . . . . . . . . . . . . . .
327
327
327
328
328
330
331
332
334
335
336
336
337
337
338
339
34 Appendix: Matrix and Vector Differentiation
34.1 Chapter Summary . . . . . . . . . . . . . . . .
34.2 Functions of Vectors and Matrices . . . . . . .
34.3 Differentiation with Respect to a Vector . . . .
34.4 Differentiation with Respect to a Matrix . . . .
34.5 Eigenvectors and Optimization . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
343
343
343
344
345
348
35 Appendix: Compressed Sensing
35.1 Chapter Summary . . . . . . . . . . . . . . . .
35.2 Compressed Sensing . . . . . . . . . . . . . . .
35.3 Sparse Solutions . . . . . . . . . . . . . . . . .
35.3.1 Maximally Sparse Solutions . . . . . . .
35.3.2 Minimum One-Norm Solutions . . . . .
35.3.3 Minimum One-Norm as an LP Problem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
351
351
351
353
353
353
353
xii
CONTENTS
35.3.4 Why the One-Norm? . . . .
35.3.5 Comparison with the PDFT
35.3.6 Iterative Reweighting . . .
35.4 Why Sparseness? . . . . . . . . . .
35.4.1 Signal Analysis . . . . . . .
35.4.2 Locally Constant Signals . .
35.4.3 Tomographic Imaging . . .
35.5 Compressed Sampling . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
354
355
355
356
356
357
358
359
36 Appendix: Transmission Tomography I
36.1 Chapter Summary . . . . . . . . . . . .
36.2 X-ray Transmission Tomography . . . .
36.3 The Exponential-Decay Model . . . . .
36.4 Difficulties to be Overcome . . . . . . .
36.5 Reconstruction from Line Integrals . . .
36.5.1 The Radon Transform . . . . . .
36.5.2 The Central Slice Theorem . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
361
361
361
362
362
363
363
364
37 Appendix: Transmission Tomography II
37.1 Chapter Summary . . . . . . . . . . . . . . . . . . . .
37.2 Inverting the Fourier Transform . . . . . . . . . . . . .
37.2.1 Back-Projection . . . . . . . . . . . . . . . . .
37.2.2 Ramp Filter, then Back-project . . . . . . . . .
37.2.3 Back-project, then Ramp Filter . . . . . . . . .
37.2.4 Radon’s Inversion Formula . . . . . . . . . . .
37.3 From Theory to Practice . . . . . . . . . . . . . . . . .
37.3.1 The Practical Problems . . . . . . . . . . . . .
37.3.2 A Practical Solution: Filtered Back-Projection
37.4 Some Practical Concerns . . . . . . . . . . . . . . . . .
37.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
367
367
367
367
368
369
370
370
370
371
371
372
Bibliography
372
Index
391
Part I
Introduction
xiii
Chapter 1
Preface
1.1
Chapter Summary
In a course in signal processing it is easy to get lost in the details and lose
sight of the big picture. The main goals of this first course are to present
the most important ideas, techniques and methods, to describe how they
relate to one another, and to illustrate their uses in several applications.
For signal processing, the most important mathematical tools are Fourier
series and related notions, matrices, and probability and statistics. Most
students with a solid mathematical background have probably encountered
each of these topics in previous courses, and therefore already know some
signal processing, without realizing it.
Our discussion here will involve primarily functions of a single real variable, although most of the concepts will have multi-dimensional versions.
It is not our objective to treat each topic with the utmost mathematical
rigor, and we shall seek to avoid issues that are primarily of mathematical
concern.
1.2
Course Aims and Topics
The term signal processing has broad meaning and covers a wide variety of
applications. In this course we focus on those applications of signal processing that can loosely be called remote sensing, although the mathematics
we shall study is fundamental to all areas of signal processing.
There are a wide variety of problems in which what we want to know
about is not directly available to us and we need to obtain information by
more indirect methods.
1
2
1.2.1
CHAPTER 1. PREFACE
Some Examples of Remote Sensing
Here are several examples of remote sensing.
Full-Body Scanners
Recently there has been much discussion about the use of full-body scanners
in airports. What we really want to know about the passenger can only
be completely determined by methods that are completely impractical,
particularly if we want to discover explosive material that may be carried
within the body. Instead, we use these low-energy back-scatter scanners
that penetrate only clothing.
CAT Scans and MRI
Someone who has been hit in the head may have a concussion or a fractured
skull. To know with perfect confidence is impossible. Instead, we perform
an x-ray CAT scan or take a magnetic-resonance image (MRI).
Cosmic Ray Tomography
Because of their ability to penetrate granite, cosmic rays are being used to
obtain transmission-tomographic three-dimensional images of the interiors
of active volcanos. Where magma has replaced granite there is less attenuation of the rays, so the image can reveal the size and shape of the magma
column. It is hoped that this will help to predict the size and occurrence
of eruptions.
Spectral Analysis
Scientists want to know what elements are in the outer layers of the sun
and other stars. We cannot travel there to find out, but we can perform
spectral analysis on the electro-magnetic radiation coming from the sun and
look for spectral lines that indicate the presence of particular elements.
Seismic Exploration
Oil companies want to know if it is worth their while drilling in a particular
place. If they go ahead and drill, they will find out, but they would like to
know what is the chance of finding oil without actually drilling. Instead,
they set off explosions and analyze the signals produced by the seismic
waves, which will tell them something about the materials the waves encountered.
1.2. COURSE AIMS AND TOPICS
3
Astronomy
Astronomers know that there are radio waves, visible-light waves, and other
forms of electro-magnetic radiation coming from distant regions of space,
and they would like to know precisely what is coming from which regions.
They cannot go there to find out, so they set up large telescopes and
antenna arrays and process the signals that they are able to measure.
Radar
Those who predict the weather use radar to help them see what is going on
in the atmosphere. Radio waves are sent out and the returns are analyzed
and turned into images. The location of airplanes is also determined by
radar. The radar returns from different materials are different from one
another and can be analyzed to determine what materials are present.
Synthetic-aperture radar is used to obtain high-resolution images of regions
of the earth’s surface. The radar returns from different geometric shapes
also differ in strength; by avoiding right angles in airplane design stealth
technology attempts to make the plane invisible to radar.
Sonar
Features on the bottom of the ocean are imaged with sonar, in which
sound waves are sent down to the bottom and the returning waves are
analyzed. Sometimes near or distant objects of interest in the ocean emit
their own sound, which is measured by sensors. The signals received by the
sensors are processed to determine the nature and location of the objects.
Even changes in the temperature at different places in the ocean can be
determined by sending sound waves through the region of interest and
measuring the travel times.
Gravity Maps
The pull of gravity varies with the density of the material. Features on the
surface of the earth, such as craters from ancient asteroid impacts, can be
imaged by mapping the variations in the pull of gravity, as measured by
satellites.
Echo Cancelation
In a conference call between locations A and B, what is transmitted from A
to B can get picked up by microphones in B, transmitted back to speakers
in A and then retransmitted to B, producing an echo of the original transmission. Signal processing performed at the transmitter in A can reduce
the strength of the second version of the transmission and decrease the
echo effect.
4
CHAPTER 1. PREFACE
Hearing Aids
Makers of digital hearing aids include signal processing to enhance the
quality of the received sounds, as well as to improve localization, that is,
the ability of the hearer to tell where the sound is coming from. When a
hearing aid is used, sounds reach the ear in two ways: first, the usual route
directly into the ear, and second, through the hearing aid. Because that
part that passes through the hearing aid is processed, there is a slight delay.
In order for the delay to go unnoticed, the processing must be very fast.
When hearing aids are used in both ears, more sophisticated processing
can be used.
1.2.2
A Role for Mathematics
The examples just presented look quite different from one another, but the
differences are often more superficial than real. As we begin to use mathematics to model these various situations we often discover a common core
of mathematical tools and ideas at the heart of each of these applications.
1.2.3
Limited Data
As we shall see, it is often the case that the data we measure is not sufficient
to provide a single unique answer to our problem. There may be many,
often quite different, answers that are consistent with what we have measured. In the absence of prior information about what the answer should
look like, we do not know how to select one solution from the many possibilities. For that reason, I believe that to get information out we must put
information in. How to do this is one of the main topics of the course. The
example at the end of this chapter will illustrate this point.
1.2.4
Course Emphasis
This text is designed to provide the necessary mathematical background
to understand and employ signal processing techniques in an applied environment. The emphasis is on a small number of fundamental problems
and essential tools, as well as on applications. Certain topics that are commonly included in textbooks are touched on only briefly or in exercises or
not mentioned at all. Other topics not usually considered to be part of
signal processing, but which are becoming increasingly important, such as
matrix theory and linear algebra, are included.
The term signal is not meant to imply a specific context or a restriction
to functions of time, or even to functions of a single variable; indeed, most
of what we discuss in this text applies equally to functions of one and
several variables and therefore to image processing. However, there are
special problems that arise in image processing, such as edge detection,
1.3. APPLICATIONS OF INTEREST
5
and special techniques to deal with such problems; we shall not consider
such techniques in this text.
1.2.5
Course Topics
Topics discussed include the following: Fourier series and transforms in
one and several variables; applications to acoustic and EM propagation
models, transmission and emission tomography, and image reconstruction;
sampling and the limited data problem; matrix methods, singular value decomposition, and data compression; optimization techniques in signal and
image reconstruction from projections; autocorrelations and power spectra;
high-resolution methods; detection and optimal filtering; eigenvector-based
methods for array processing and statistical filtering.
1.3
Applications of Interest
The applications of interest to us here can be summarized as follows: the
data has been obtained through some form of sensing; physical models,
often simplified, describe how the data we have obtained relates to the
information we seek; there usually isn’t enough data and what we have
is corrupted by noise and other distortions. Although applications differ
from one another in their details they often make use of a common core
of mathematical ideas; for example, the Fourier transform and its variants
play an important role in many areas of signal and image processing, as
do the language and theory of matrix analysis, iterative optimization and
approximation techniques, and the basics of probability and statistics. This
common core provides the subject matter for this course. Applications of
the core material to tomographic medical imaging, optical imaging, and
acoustic signal processing are included.
1.4
1.4.1
Sensing Modalities
Active and Passive Sensing
In some signal and image processing applications the sensing is active,
meaning that we have initiated the process, by, say, sending an x-ray
through the body of a patient, injecting a patient with a radionuclide,
transmitting an acoustic signal through the ocean, as in sonar, or transmitting a radio wave, as in radar. In such cases, we are interested in
measuring how the system, the patient, the quiet submarine, the ocean
floor, the rain cloud, will respond to our probing. In many other applications, the sensing is passive, which means that the object of interest to us
provides its own signal of some sort, which we then detect, analyze, image,
6
CHAPTER 1. PREFACE
or process in some way. Certain sonar systems operate passively, listening
for sounds made by the object of interest. Optical and radio telescopes
are passive, relying on the object of interest to emit or reflect light, or
other electromagnetic radiation. Night-vision instruments are sensitive to
lower-frequency, infrared radiation.
From Aristotle and Euclid until the middle ages there was an ongoing
debate concerning the active or passive nature of human sight [162]. Those,
like Euclid, whose interests were largely mathematical, believed that the
eye emitted rays, the extramission theory. Aristotle and others, more interested in the physiology and anatomy of the eye than in mathematics,
believed that the eye received rays from observed objects outside the body,
the intromission theory. Finally, around 1000 AD, the Arabic mathematician and natural philosopher Alhazen demolished the extramission theory
by noting the potential for bright light to hurt the eye, and combined the
mathematics of the extramission theorists with a refined theory of intromission.
1.4.2
A Variety of Modalities
Although acoustic and electromagnetic sensing are the most commonly
used methods, there are other modalities employed in remote sensing.
Radiation
In transmission tomography x-rays are transmitted along line segments
through the object and the drop in intensity along each line is recorded.
In emission tomography radioactive material is injected into the body of
the living subject and the photons resulting from the radioactive decay are
detected and recorded outside the body.
Cosmic-Ray Scattering
In addition to mapping the interior of volcanos, cosmic rays can also be
used to detect the presence of shielding around nuclear material in a cargo
container. The shielding can be sensed by the characteristic scattering by
it of muons from cosmic rays; here neither we nor the objects of interest
are the sources of the probing. This is about as “remote” as sensing can
be.
Variations in Gravity
Gravity, or better, changes in the pull of gravity from one location to
another, was used in the discovery of the crater left behind by the asteroid
strike in the Yucatan that led to the extinction of the dinosaurs. The rocks
and other debris that eventually filled the crater differ in density from the
1.4. SENSING MODALITIES
7
surrounding material, thereby exerting a slightly different gravitational pull
on other masses. This slight change in pull can be detected by sensitive
instruments placed in satellites in earth orbit. When the intensity of the
pull, as a function of position on the earth’s surface, is displayed as a
two-dimensional image, the presence of the crater is evident.
Seismic Exploration
In seismic oil exploration, explosive charges create waves that travel through
the ground and are picked up by sensors. The waves travel at different
speeds through different materials. Information about the location of different materials in the ground is then extracted from the received signals.
Spectral Analysis
In our detailed discussion of transmission and remote sensing we shall, for
simplicity, concentrate on signals consisting of a single frequency. Nevertheless, there are many important applications of signal processing in which
the signal being studied has a broad spectrum, indicative of the presence
of many different frequencies. The purpose of the processing is often to
determine which frequencies are present, or not present, and to determine
their relative strengths. The hotter inner body of the sun emits radiation
consisting of a continuum of frequencies. The cooler outer layer absorbs
the radiation whose frequencies correspond to the elements present in that
outer layer. Processing these signals reveals a spectrum with a number
of missing frequencies, the so-called Fraunhofer lines, and provides information about the makeup of the sun’s outer layers. This sort of spectral
analysis can be used to identify the components of different materials, making it an important tool in many applications, from astronomy to forensics.
Back-Scatter Detectors
There is considerable debate at the moment about the use of so-called
full-body scanners at airports. These are not scanners in the sense of a
CAT-scan; indeed, if the images were skeletons there would probably be
less controversy. These are images created by the returns, or backscatter, of
millimeter-wavelength (MMW) radio-frequency waves, or sometimes lowenergy x-rays, that penetrate only the clothing and then reflect back to the
machine. The controversies are not really about safety to the passenger
being imaged. The MMW imaging devices use about 10, 000 times less
energy than a cell phone, and the x-ray exposure is equivalent to two minutes of flying in an airplane. At present, the images are fuzzy and faces
are intentionally blurred, but there is some concern that the images will
get sharper, will be permanently stored, and eventually end up on the net.
8
CHAPTER 1. PREFACE
Given what is already available on the net, the market for these images
will almost certainly be non-existent.
Near-Earth Asteroids
An area of growing importance is the search for potentially damaging nearearth asteroids. These objects are initially detected by passive optical
observation, as small dots of reflected sunlight; once detected, they are
then imaged by active radar to determine their size, shape, rotation, path,
and other important parameters.
1.5
Inverse Problems
Many of the problems we study in applied mathematics are direct problems.
For example, we imagine a ball dropped from a building of known height h
and we calculate the time it takes for it to hit the ground and the impact
velocity. Once we make certain simplifying assumptions about gravity and
air resistance, we are able to solve this problem easily. Using his inversesquare law of universal gravitation, Newton was able to show that planets
move in ellipses, with the sun at one focal point. Generally, direct problems
conform to the usual flow of time and seek the effects due to known causes.
Problems we call inverse problems go the other way, seeking the causes of
observed effects; we measure the impact velocity to determine the height h
of the building. Newton solved an inverse problem when he determined that
Kepler’s empirical laws of planetary motion follow from an inverse-square
law of universal gravitation.
In each of the examples of remote sensing just presented, we have measured some of the effects and want to know the causes. In x-ray tomography,
for example, we observe that the x-rays that passed through the body of
the patient come out weaker than when they went in. We know that they
were weakened, or attenuated, because they were partially absorbed by the
material they had to pass through; we want to know precisely where the
attenuation took place. This is an inverse problem; we are trying to go
back in time, to uncover the causes of the observed effects.
Direct problems have been studied for a long time, while the theory
of inverse problems is still being developed. Generally speaking, direct
problems are easier than inverse problems. Direct problems, at least those
corresponding to actual physical situations, tend to be well-posed in the
sense of Hadamard, while inverse problems are often ill-posed. A problem is said to be well-posed if there is a unique solution for each input to
the problem and the solution varies continuously with the input; roughly
speaking, small changes in the input lead to small changes in the solution.
If we vary the height of the building slightly, the time until the ball hits the
1.6. USING PRIOR KNOWLEDGE
9
ground and its impact velocity will change only slightly. For inverse problems, there may be many solutions, or none, and slight changes in the data
can cause the solutions to differ greatly. In [14] Bertero and Boccacci give
a nice illustration of the difference between direct and inverse problems,
using the heat equation.
Suppose that u(x, t) is the temperature distribution for x in the interval
[0, a] and t ≥ 0. The function u(x, t) satisfies the heat equation
1 ∂u
∂2u
=
,
∂x2
D ∂t
where D > 0 is the thermal conductivity. In addition, we adopt the boundary conditions u(x, 0) = f (x), and u(0, t) = u(a, t) = 0, for all t. By
separating the variables, and using Fourier series, we find that, if
∞
X
f (x) =
fn sin(
n=1
where
2
fn =
a
then
u(x, t) =
a
Z
f (x) sin(
0
∞
X
fn e−D(
nπx
),
a
nπx
)dx,
a
πn 2
a ) t
sin(
n=1
nπx
).
a
The direct problem is to find u(x, t), given f (x). Suppose that we know
f (x) with some finite precision, that is, we know those Fourier coefficients
fn for which |fn | ≥ > 0. Because of the decaying exponential factor,
fewer Fourier coefficients in the expansion of u(x, t) will be above this
threshold, and we can determine u(x, t) with the same precision or better.
The solution to the heat equation tends to be smoother than the input
distribution.
The inverse problem is to determine the initial distribution f (x) from
knowledge of u(x, t) at one or more times t > 0. As we just saw, for any
fixed time t > 0, the Fourier coefficients of u(x, t) will die off faster than
the fn do, leaving fewer coefficients above the threshold of . This means
we can determine fewer and fewer of the fn as t grows larger. For t beyond
some point, it will be nearly impossible to say anything about f (x).
1.6
Using Prior Knowledge
An important point to keep in mind when doing signal processing is that,
while the data is usually limited, the information we seek may not be lost.
Although processing the data in a reasonable way may suggest otherwise,
10
CHAPTER 1. PREFACE
other processing methods may reveal that the desired information is still
available in the data. Figure 1.1 illustrates this point.
The original image on the upper right of Figure 1.1 is a discrete rectangular array of intensity values simulating a slice of a head. The data
was obtained by taking the two-dimensional discrete Fourier transform of
the original image, and then discarding, that is, setting to zero, all these
spatial frequency values, except for those in a smaller rectangular region
around the origin. The problem then is under-determined. A minimumnorm solution would seem to be a reasonable reconstruction method.
The minimum-norm solution is shown on the lower right. It is calculated simply by performing an inverse discrete Fourier transform on the
array of modified discrete Fourier transform values. The original image
has relatively large values where the skull is located, but the minimumnorm reconstruction does not want such high values; the norm involves the
sum of squares of intensities, and high values contribute disproportionately
to the norm. Consequently, the minimum-norm reconstruction chooses instead to conform to the measured data by spreading what should be the
skull intensities throughout the interior of the skull. The minimum-norm
reconstruction does tell us something about the original; it tells us about
the existence of the skull itself, which, of course, is indeed a prominent
feature of the original. However, in all likelihood, we would already know
about the skull; it would be the interior that we want to know about.
Using our knowledge of the presence of a skull, which we might have obtained from the minimum-norm reconstruction itself, we construct the prior
estimate shown in the upper left. Now we use the same data as before, and
calculate a minimum-weighted-norm reconstruction, using as the weight
vector the reciprocals of the values of the prior image. This minimumweighted-norm reconstruction is shown on the lower left; it is clearly almost
the same as the original image. The calculation of the minimum-weighted
norm solution can be done iteratively using the ART algorithm [204].
When we weight the skull area with the inverse of the prior image,
we allow the reconstruction to place higher values there without having
much of an effect on the overall weighted norm. In addition, the reciprocal
weighting in the interior makes spreading intensity into that region costly,
so the interior remains relatively clear, allowing us to see what is really
present there.
When we try to reconstruct an image from limited data, it is easy to
assume that the information we seek has been lost, particularly when a
reasonable reconstruction method fails to reveal what we want to know.
As this example, and many others, show, the information we seek is often
still in the data, but needs to be brought out in a more subtle way.
1.6. USING PRIOR KNOWLEDGE
Figure 1.1: Extracting information in image reconstruction.
11
12
CHAPTER 1. PREFACE
Chapter 2
Urn Models in Remote
Sensing
2.1
Chapter Summary
Most of the signal processing that we shall discuss in this book is related
to the problem of remote sensing, which we might also call indirect measurement. In such problems we do not have direct access to what we are
really interested in, and must be content to measure something else that is
related to, but not the same as, what interests us. For example, we want
to know what is in the suitcases of airline passengers, but, for practical
reasons, we cannot open every suitcase. Instead, we x-ray the suitcases. A
recent paper [197] describes progress in detecting nuclear material in cargo
containers by measuring the scattering, by the shielding, of cosmic rays;
you can’t get much more remote than that. Before we get into the mathematics of signal processing, it is probably a good idea to consider a model
that, although quite simple, manages to capture many of the important
features of remote sensing applications. To convince the reader that this is
indeed a useful model, we relate it to the problem of image reconstruction
in single-photon computed emission tomography (SPECT).
2.2
The Urn Model
There seems to be a tradition in physics of using simple models or examples
involving urns and marbles to illustrate important principles. In keeping
with that tradition, we have here two examples, to illustrate various aspects
of remote sensing.
Suppose that we have J urns numbered j = 1, ..., J, each containing
marbles of various colors. Suppose that there are I colors, numbered i =
13
14
CHAPTER 2. URN MODELS IN REMOTE SENSING
1, ..., I. Suppose also that there is a box containing a large number of small
pieces of paper, and on each piece is written the number of one of the J
urns. Assume that I know the precise contents of each urn. My objective is
to determine the precise contents of the box, that is, to estimate, for each
j = 1, ..., J, the probability of selecting the jth urn, which is the relative
number of pieces of paper containing the number j.
Out of my view, my assistant removes one piece of paper from the box,
takes one marble from the indicated urn, announces to me the color of the
marble, and then replaces both the piece of paper and the marble. This
action is repeated N times, at the end of which I have a long list of colors,
i = {i1 , i2 , ..., iN }, where in denotes the color of the nth marble drawn.
This list i is my data, from which I must determine the contents of the
box.
This is a form of remote sensing; what we have access to is related to,
but not equal to, what we are interested in. What I wish I had is the list of
urns used, j = {j1 , j2 , ..., jN }; instead I have i, the list of colors. Sometimes
data such as the list of colors is called “incomplete data” , in contrast to
the “complete data” , which would be the list j of the actual urn numbers
drawn from the box.
Using our urn model, we can begin to get a feel for the resolution
problem. If all the marbles of one color are in a single urn, the problem is
trivial; when I hear a color, I know immediately which urn contained that
marble. My list of colors is then a list of urn numbers; I have the complete
data now. My estimate of the number of pieces of paper containing the
urn number j is then simply the proportion of draws that resulted in urn
j being selected.
At the other extreme, suppose two urns have identical contents. Then I
cannot distinguish one urn from the other and I am unable to estimate more
than the total number of pieces of paper containing either of the two urn
numbers. If the two urns have nearly the same contents, we can distinguish
them only by using a very large N . This is the resolution problem.
Generally, the more the contents of the urns differ, the easier the task
of estimating the contents of the box. In remote sensing applications, these
issues affect our ability to resolve individual components contributing to
the data.
2.3
Some Mathematical Notation
To introduce some mathematical notation, let us denote by xj the proportion of the pieces of paper that have the number j written on them. Let Pij
be the proportion of the marbles in urn j that have the color i. Let yi be the
proportion of times the color i occurs in the list of P
colors. The expected
J
proportion of times i occurs in the list is E(yi ) = j=1 Pij xj = (P x)i ,
2.4. AN APPLICATION TO SPECT IMAGING
15
where P is the I by J matrix with entries Pij and x is the J by 1 column
vector with entries xj . A reasonable way to estimate x is to replace E(yi )
PJ
with the actual yi and solve the system of linear equations yi = j=1 Pij xj ,
i = 1, ..., I. Of course, we require that the xj be nonnegative and sum to
one, so special algorithms may be needed to find such solutions. In a number of applications that fit this model, such as medical tomography, the
values xj are taken to be parameters, the data yi are statistics, and the xj
are estimated by adopting a probabilistic model and maximizing the likelihood function. Iterative algorithms, such as the expectation maximization
(EMML) algorithm are often used for such problems.
2.4
An Application to SPECT Imaging
In single-photon computed emission tomography (SPECT) the patient is
injected with a chemical to which a radioactive tracer has been attached.
Once the chemical reaches its destination within the body the photons
emitted by the radioactive tracer are detected by gamma cameras outside
the body. The objective is to use the information from the detected photons
to infer the relative concentrations of the radioactivity within the patient.
We discretize the problem and assume that the body of the patient
consists of J small volume elements, called voxels, analogous to pixels in
digitized images. We let xj ≥ 0 be the unknown amount of the radioactivity that is present in the jth voxel, for j = 1, ..., J. There are I detectors,
denoted {i = 1, 2, ..., I}. For each i and j we let Pij be the known probability that a photon that is emitted from voxel j is detected at detector i.
We denote by in the detector at which the nth emitted photon is detected.
This photon was emitted at some voxel, denoted jn ; we wish that we had
some way of learning what each jn is, but we must be content with knowing
only the in . After N photons have been emitted, we have as our data the
list i = {i1 , i2 , ..., iN }; this is our incomplete data. We wish we had the
complete data, that is, the list j = {j1 , j2 , ..., jN }, but we do not. Our goal
is to estimate the frequency with which each voxel emitted a photon, which
we assume, reasonably, to be proportional to the unknown amounts xj , for
j = 1, ..., J.
This problem is completely analogous to the urn problem previously
discussed. Any mathematical method that solves one of these problems
will solve the other one. In the urn problem, the colors were announced;
here the detector numbers are announced. There, I wanted to know the
urn numbers; here I want to know the voxel numbers. There, I wanted to
estimate the frequency with which the jth urn was used; here, I want to
estimate the frequency with which the jth voxel is the site of an emission.
In the urn model, two urns with nearly the same contents are hard to
distinguish unless N is very large; here, two neighboring voxels will be
16
CHAPTER 2. URN MODELS IN REMOTE SENSING
very hard to distinguish (i.e., to resolve) unless N is very large. But in the
SPECT case, a large N means a high dosage, which will be prohibited by
safety considerations. Therefore, we have a built-in resolution problem in
the SPECT case.
Both problems are examples of probabilistic mixtures, in which the
mixing probabilities are the xj that we seek. The maximum likelihood
(ML) method of statistical parameter estimation can be used to solve such
problems. The interested reader should consult the text [48].
2.5
Hidden Markov Models
In the urn model we just discussed, the order of the colors in the list is
unimportant; we could randomly rearrange the colors on the list without
affecting the nature of the problem. The probability that a green marble
will be chosen next is the same, whether a blue or a red marble was just
chosen the last time. This independence from one selection to another is
fine for modeling certain physical situations, such as emission tomography.
However, there are other situations in which this independence does not
conform to reality.
In written English, for example, knowing the current letter helps us,
sometimes more, sometimes less, to predict what the next letter will be.
We know that if the current letter is a “q”, then there is a high probability
that the next one will be a “u” . So what the current letter is affects the
probabilities associated with the selection of the next one.
Spoken English is even tougher. There are many examples in which
the pronunciation of a certain sound is affected, not only by the sound or
sounds that preceded it, but by the sound or sounds that will follow. For
example, the sound of the “e” in the word “bellow” is different from the
sound of the “e” in the word “below” ; the sound changes, depending on
whether there is a double “l” or a single “l” following the “e” . Here the
entire context of the letter affects its sound.
Hidden Markov models (HMM) are increasingly important in speech
processing, optical character recognition and DNA sequence analysis. They
allow us to incorporate dependence on the past into our model. In this
section we illustrate HMM using a modification of the urn model.
Suppose, once again, that we have J urns, indexed by j = 1, ..., J and
I colors of marbles, indexed by i = 1, ..., I. Associated with each of the
J urns is a box, containing a large number of pieces of paper, with the
number of one urn written on each piece. My assistant selects one box,
say the j0 th box, to start the experiment. He draws a piece of paper from
that box, reads the number written on it, call it j1 , goes to the urn with
the number j1 and draws out a marble. He then announces the color. He
then draws a piece of paper from box number j1 , reads the next number,
2.5. HIDDEN MARKOV MODELS
17
say j2 , proceeds to urn number j2 , etc. After N marbles have been drawn,
the only data I have is a list of colors, i = {i1 , i2 , ..., iN }.
The transition probability that my assistant will
PJ proceed from the urn
numbered k to the urn numbered j is bjk , with j=1 bjk = 1. The number of the current urn is the current state. In an ordinary Markov chain
model, we observe directly a sequence of states governed by the transition
probabilities. The Markov chain model provides a simple formalism for describing a system that moves from one state into another, as time goes on.
In the hidden Markov model we are not able to observe the states directly;
they are hidden from us. Instead, we have indirect observations, the colors
of the marbles in our urn example.
The probability that the
PI color numbered i will be drawn from the urn
numbered j is aij , with
i=1 aij = 1, for all j. The colors announced
are the visible states, while the unannounced urn numbers are the hidden
states.
There are several distinct objectives one can have, when using HMM.
We assume that the data is the list of colors, i.
• Evaluation: For given probabilities aij and bjk , what is the probability that the list i was generated according to the HMM? Here, the
objective is to see if the model is a good description of the data.
• Decoding: Given the model, the probabilities and the list i, what
list j = {j1 , j2 , ..., jN } of urns is most likely to be the list of urns
actually visited? Now, we want to infer the hidden states from the
visible ones.
• Learning: We are told that there are J urns and I colors, but are not
told the probabilities aij and bjk . We are given several data vectors
i generated by the HMM; these are the training sets. The objective
is to learn the probabilities.
Once again, the ML approach can play a role in solving these problems
[102]. The Viterbi algorithm is an important tool used for the decoding
phase (see [209]).
18
CHAPTER 2. URN MODELS IN REMOTE SENSING
Part II
Fundamental Examples
19
Chapter 3
Transmission and Remote
Sensing- I
3.1
Chapter Summary
In this chapter we illustrate the roles played by Fourier series and Fourier
coefficients in the analysis of signal transmission and remote sensing, and
use these examples to motivate several of the problems we shall consider
in detail later in the text.
3.2
Fourier Series and Fourier Coefficients
We suppose that f (x) is defined for −L ≤ x ≤ L, with Fourier series
representation
f (x) =
∞
X
1
nπ
nπ
an cos( x) + bn sin( x).
a0 +
2
L
L
n=1
(3.1)
To find the Fourier coefficients an and bn we make use of orthogonality.
For any m and n we have
Z
L
cos(
mπ
nπ
x) sin( x)dx = 0,
L
L
cos(
mπ
nπ
x) cos( x)dx = 0,
L
L
−L
and for m 6= n we have
Z
L
−L
21
22
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
and
Z
L
sin(
−L
mπ
nπ
x) sin( x)dx = 0.
L
L
Therefore, to find the an and bn we multiply both sides of Equation (3.1) by
mπ
cos( mπ
L x), or sin( L x) and integrate. We find that the Fourier coefficients
are
Z
nπ
1 L
f (x) cos( x)dx,
(3.2)
an =
L −L
L
and
1
bn =
L
Z
L
f (x) sin(
−L
nπ
x)dx.
L
(3.3)
In the examples in this chapter, we shall see how Fourier coefficients
can arise as data obtained through measurements. However, we shall be
able to measure only a finite number of the Fourier coefficients. One issue
that will concern us is the effect on the representation of f (x) if we use
some, but not all, of its Fourier coefficients.
Suppose that we have an and bn for n = 0, 1, 2, ..., N . It is not unreasonable to try to estimate the function f (x) using the discrete Fourier
transform (DFT) estimate, which is
fDF T (x) =
N
X
nπ
1
nπ
an cos( x) + bn sin( x).
a0 +
2
L
L
n=1
(3.4)
In Figure 3.1 below, the function f (x) is the solid-line figure in both graphs.
In the bottom graph, we see the true f (x) and a DFT estimate. The top
graph is the result of band-limited extrapolation, a technique for predicting
missing Fourier coefficients that we shall discuss later.
3.3
The Unknown Strength Problem
In this example, we imagine that each point x in the interval [−L, L] is
sending a sine function signal at the frequency ω, each with its own strength
f (x); that is, the signal sent by the point x is
f (x) sin(ωt).
(3.5)
In our first example, we imagine that the strength function f (x) is unknown
and we want to determine it. It could be the case that the signals originate
at the points x, as with light or radio waves from the sun, or are simply
reflected from the points x, as is sunlight from the moon or radio waves
in radar. Later in this chapter, we shall investigate a related example, in
which the points x transmit known signals and we want to determine what
is received elsewhere.
3.3. THE UNKNOWN STRENGTH PROBLEM
3.3.1
23
Measurement in the Far-Field
Now let us consider what is received by a point P on the circumference
of a circle centered at the origin and having large radius D. The point P
corresponds to the angle θ as shown in Figure 3.2; we use θ in the interval
[0, π]. It takes a finite time for the signal sent from x at time t to reach P ,
so there is a delay.
We assume that c is the speed at which the signal propagates. Because
D is large relative to L, we make the far-field assumption, which allows us
to approximate the distance from x to P by D − x cos(θ). Therefore, what
P receives at time t from x is approximately what was sent from x at time
t − 1c (D − x cos(θ)).
Exercise 3.1 Show that, for any point P on the circle of radius D and
any x 6= 0, the distance from x to P is always greater than or equal to the
far-field approximation D − x cos(θ), with equality if and only if θ = 0 or
θ = π.
At time t, the point P receives from x the signal
ω cos θ D
x =
f (x) sin ω(t − ) +
c
c
D
ω cos(θ)
D
ω cos(θ) f (x) sin(ω(t − )) cos(
x) + cos(ω(t − )) sin(
x) ,(3.6)
c
c
c
c
and the point Q corresponding to the angle θ + π receives
ω cos(θ)
D
ω cos(θ) D
x) − cos(ω(t − )) sin(
x) .(3.7)
f (x) sin(ω(t − )) cos(
c
c
c
c
Because P and Q receive signals from all the x, not just from one x, what
P and Q receive at time t involves integrating over all x. Therefore, from
our measurements at P and Q, we obtain the quantities
Z L
D
ω cos(θ)
D
ω cos(θ) f (x) sin(ω(t − )) cos(
x) + cos(ω(t − )) sin(
x) dx,(3.8)
c
c
c
c
−L
and
Z L
ω cos(θ)
D
ω cos(θ) D
x) − cos(ω(t − )) sin(
x) dx.(3.9)
f (x) sin(ω(t − )) cos(
c
c
c
c
−L
Adding the quantities in (3.8) and (3.9), we obtain
Z
2
L
−L
f (x) cos(
ω cos(θ)
D
x)dx sin(ω(t − )),
c
c
(3.10)
24
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
while subtracting the latter from the former, we get
Z L
ω cos(θ)
D
2
f (x) sin(
x)dx cos(ω(t − )).
c
c
−L
(3.11)
Evaluating the signal in Equation (3.10) at the time when
ω(t −
π
D
)= ,
c
2
and dividing by 2, we get
Z L
f (x) cos(
−L
ω cos(θ)
x)dx,
c
while evaluating the signal in Equation (3.11) at the time when
ω(t −
D
) = 2π
c
and dividing by 2 gives us
Z L
f (x) sin(
−L
ω cos(θ)
x)dx.
c
If we can select an angle θ for which
ω cos(θ)
nπ
=
,
c
L
(3.12)
then we have an and bn .
3.3.2
Limited Data
Note that we will be able to solve Equation (3.12) for θ only if we have
Lω
.
(3.13)
πc
This tells us that we can measure only finitely many of the Fourier coefficients of f (x). It is common in signal processing to speak of the wavelength
of a sinusoidal signal; the wavelength associated with a given ω and c is
n≤
2πc
.
(3.14)
ω
Therefore the number N of Fourier coefficients we can measure is the largest
integer not greater than 2L
λ , which is the length of the interval [−L, L],
measured in units of wavelength λ. We get more Fourier coefficients when
the product Lω is larger; this means that when L is small, we want ω to be
large, so that λ is small and N is large. As we saw previously, using these
finitely many Fourier coefficients to calculate the DFT reconstruction of
f (x) can lead to a poor estimate of f (x), particularly when N is small.
λ=
3.3. THE UNKNOWN STRENGTH PROBLEM
3.3.3
25
Can We Get More Data?
As we just saw, we can make measurements at any points P and Q in the
far-field; perhaps we do not need to limit ourselves to just those angles that
lead to the an and bn . It may come as somewhat of a surprise, but from
the theory of complex analytic functions we can prove that there is enough
data available to us here to reconstruct f (x) perfectly, at least in principle.
The drawback, in practice, is that the measurements would have to be free
of noise and impossibly accurate. All is not lost, however.
3.3.4
The Fourier Cosine and Sine Transforms
As we just saw, if θ is chosen so that
nπ
ω cos(θ)
=
,
c
L
(3.15)
then our measurements give us the Fourier coefficients an and bn . But we
can select any angle θ and use any P and Q we want. In other words, we
can obtain the values
Z L
ω cos(θ)
f (x) cos(
x)dx,
(3.16)
c
−L
and
Z
L
f (x) sin(
−L
ω cos(θ)
x)dx
c
(3.17)
for any angle θ. With the change of variable
γ=
ω cos(θ)
,
c
we can obtain the values of the functions
Z L
Fc (γ) =
f (x) cos(γx)dx
(3.18)
−L
and
Z
L
Fs (γ) =
f (x) sin(γx)dx,
(3.19)
−L
for any γ in the interval [− ωc , ωc ]. The functions Fc (γ) and Fs (γ) are the
Fourier cosine transform and Fourier sine transform of f (x), respectively.
We are free to measure at any P and Q and therefore to obtain values
of Fc (γ) and Fs (γ) for any value of γ in the interval [− ωc , ωc ]. We need to
be careful how we process the resulting data, however.
26
3.3.5
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
Over-Sampling
Suppose, for the sake of illustration, that we measure the far-field signals
at points P and Q corresponding to angles θ that satisfy
ω cos(θ)
nπ
=
,
c
2L
(3.20)
instead of
nπ
ω cos(θ)
=
.
c
L
Now we have twice as many data points and from our new measurements
we can obtain
Z L
nπ
cn =
f (x) cos( x)dx,
2L
−L
and
Z
L
dn =
f (x) sin(
−L
nπ
x)dx,
2L
for n = 0, 1, ..., 2N . We say now that our data is twice over-sampled. Note
that we call it over-sampled because the rate at which we are sampling is
higher, even though the distance between samples is lower.
Since f (x) = 0 for L < |x| ≤ 2L, we can say that we have
2L
1
1
cn =
An =
2L
4L
Z
1
1
Bn =
dn =
2L
4L
Z
g(x) cos(
nπ
x)dx,
2L
(3.21)
g(x) sin(
nπ
x)dx,
2L
(3.22)
−2L
and
2L
−2L
for n = 0, 1, ..., 2N , which are Fourier coefficients for the function g(x) that
equals f (x) for |x| ≤ L, and equals zero for L < |x| ≤ 2L.
We have twice the number of Fourier coefficients that we had previously,
but for the function g(x). A DFT reconstruction using this larger set of
Fourier coefficients will reconstruct g(x) on the interval [−2L, 2L]. This
will give us a reconstruction of f (x) itself over the interval [−L, L], but will
also give us a reconstruction of the rest of g(x), which we already know
to be zero. So we are wasting the additional data by reconstructing g(x)
instead of f (x). We need to use our prior knowledge that g(x) = 0 for
L < |x| ≤ 2L.
Later, we shall describe in detail the use of prior knowledge about f (x)
to obtain reconstructions that are better than the DFT. In the example
we are now considering, we have prior knowledge that f (x) = 0 for L <
|x| ≤ 2L. We can use this prior knowledge to improve our reconstruction.
3.3. THE UNKNOWN STRENGTH PROBLEM
27
Suppose that we take as our reconstruction the modified DFT (MDFT),
which is a function defined only for |x| ≤ L and having the form
2N
X
nπ
1
nπ
u0 +
un cos( x) + vn sin( x),
2
2L
2L
n=1
fM DF T (x) =
(3.23)
where the un and vn are unknowns to be determined. Then we calculate
the un and vn by requiring that it be possible for the function fM DF T (x)
to be the correct answer; that is, we require that fM DF T (x) be consistent
with the measured data. Therefore, we must have
L
Z
fM DF T (x) cos(
nπ
x)dx = cn ,
2L
(3.24)
fM DF T (x) sin(
nπ
x)dx = dn ,
2L
(3.25)
−L
and
Z
L
−L
for n = 0, 1, ..., 2N . It is important to note now that the un and vn are
not the An and Bn ; this is because we no longer have orthogonality. For
example, when we calculate the integrals
Z
L
cos(
−L
mπ
nπ
x) cos(
x)dx,
2L
2L
(3.26)
for m 6= n, we do not get zero. To find the un and vn we need to solve a
system of linear equations in these unknowns.
The top graph in Figure (3.1) illustrates the improvement over the DFT
that can be had using the MDFT. In that figure, we took data that was
thirty times over-sampled, not just twice over-sampled, as in our previous
discussion. Consequently, we had thirty times the number of Fourier coefficients we would have had otherwise, but for an interval thirty times longer.
To get the top graph, we used the MDFT, with the prior knowledge that
f (x) was non-zero only within the central thirtieth of the long interval. The
bottom graph shows the DFT reconstruction using the larger data set, but
only for the central thirtieth of the full period, which is where the original
f (x) is non-zero.
3.3.6
Other Forms of Prior Knowledge
As we just showed, knowing that we have over-sampled in our measurements can help us improve the resolution in our estimate of f (x). We
may have other forms of prior knowledge about f (x) that we can use. If
28
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
we know something about large-scale features of f (x), but not about finer
details, we can use the PDFT estimate, which is a generalization of the
MDFT. In an earlier chapter, the PDFT was compared to the DFT in a
two-dimensional example of simulated head slices. There are other things
we may know about f (x).
For example, we may know that f (x) is non-negative, which we have
not assumed explicitly previously in this chapter. Or, we may know that
f (x) is approximately zero for most x, but contains very sharp peaks at
a few places. In more formal language, we may be willing to assume that
f (x) contains a few Dirac delta functions in a flat background. There are
non-linear methods, such as the maximum entropy method, the indirect
PDFT (IPDFT), and eigenvector methods that can be used to advantage
in such cases; these methods are often called high-resolution methods.
3.4
Estimating the Size of Distant Objects
Suppose, in the previous example of the unknown strength problem, we
assume that f (x) = B, for all x in the interval [−L, L], where B > 0 is the
unknown brightness constant, and we don’t know L. More realistic, twodimensional versions of this problem arise in astronomy, when we want to
estimate the diameter of a distant star.
In this case, the measurement of the signal at the point P gives us
Z
L
f (x) cos
−L
Z
L
=B
cos
−L
ω cos θ x dx
c
ω cos θ 2Bc
Lω cos(θ)
x dx =
sin(
),
c
ω cos(θ)
c
(3.27)
when cos θ 6= 0, whose absolute value is then the strength of the signal at P .
Notice that we have zero signal strength at P when the angle θ associated
with P satisfies the equation
sin(
Lω cos(θ)
) = 0,
c
without
cos(θ) = 0.
But we know that the first positive zero of the sine function is at π, so the
signal strength at P is zero when θ is such that
Lω cos(θ)
= π.
c
3.4. ESTIMATING THE SIZE OF DISTANT OBJECTS
If
29
Lω
≥ π,
c
then we can solve for L and get
L=
πc
.
ω cos(θ)
When Lω is too small, there will be no angle θ for which the received signal
strength at P is zero. If the signals being sent are actually broadband,
meaning that the signals are made up of components at many different
frequencies, not just one ω, which is usually the case, then we might be
able to filter our measured data, keep only the component at a sufficiently
high frequency, and then proceed as before.
But even when we have only a single frequency ω and Lω is too small,
there is something we can do. The received strength at θ = π2 is
Z L
Fc (0) = B
dx = 2BL.
−L
If we knew B, this measurement alone would give us L, but we do not
assume that we know B. At any other angle, the received strength is
Fc (γ) =
Lω cos(θ)
2Bc
sin(
).
ω cos(θ)
c
Therefore,
Fc (γ)/Fc (0) =
where
sin(A)
,
A
Lω cos(θ)
.
c
From the measured value Fc (γ)/Fc (0) we can solve for A and then for L.
In actual optical astronomy, atmospheric distortions make these measurements noisy and the estimates have to be performed more carefully. This
issue is discussed in more detail in a later chapter, in the section on the
Two-Dimensional Fourier Transform.
There is a wonderful article by Eddington [104], in which he discusses
the use of signal processing methods to discover the properties of the star
Algol. This star, formally Algol (Beta Persei) in the constellation Perseus,
turns out to be three stars, two revolving around the third, with both of
the first two taking turns eclipsing the other. The stars rotate around
their own axes, as our star, the sun, does, and the speed of rotation can
be estimated by calculating the Doppler shift in frequency, as one side of
the star comes toward us and the other side moves away. It is possible to
measure one side at a time only because of the eclipse caused by the other
revolving star.
A=
30
3.5
3.5.1
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
The Transmission Problem
Directionality
Now we turn the table around and suppose that we are designing a broadcasting system, using transmitters at each x in the interval [−L, L]. At
each x we will transmit f (x) sin(ωt), where both f (x) and ω are chosen by
us. We now want to calculate what will be received at each point P in the
far-field. We may wish to design the system so that the strengths of the
signals received at the various P are not all the same. For example, if we
are broadcasting from Los Angeles, we may well want a strong signal in the
north and south directions, but weak signals east and west, where there are
fewer people to receive the signal. Clearly, our model of a single-frequency
signal is too simple, but it does allow us to illustrate several important
points about directionality in array processing.
3.5.2
The Case of Uniform Strength
For concreteness, we investigate the case in which f (x) = 1 for |x| ≤
L. Since this function is even, we need only the an . In this case, the
measurement of the signal at the point P gives us
Z L
ω cos θ x dx
F (P ) =
f (x) cos
c
−L
Z
L
=
cos
−L
ω cos θ 2c
Lω cos(θ)
x dx =
sin(
),
c
ω cos(θ)
c
(3.28)
when cos θ 6= 0. The absolute value of F (P ) is then the strength of the
signal at P .
1
In the figures below we see the plots of the function 2L
F (P ), for various
values of the aperture
2L
Lω
=
.
A=
πc
λ
Beam-Pattern Nulls
Is it possible for the strength of the signal received at some P to be zero?
As we saw in the previous section, to have zero signal strength, that is, to
have F (P ) = 0, we need
sin(
Lω cos(θ)
) = 0,
c
without
cos(θ) = 0.
3.5. THE TRANSMISSION PROBLEM
31
Therefore, we need
Lω cos(θ)
= nπ,
c
(3.29)
for some positive integers n ≥ 1. Notice that this can happen only if
n≤
Lωπ
2L
=
.
c
λ
(3.30)
Therefore, if 2L < λ, there can be no P with signal strength zero. The
larger 2L is, with respect to the wavelength λ, the more angles at which
the signal strength is zero.
Local Maxima
Is it possible for the strength of the signal received at some P to be a local
maximum, relative to nearby points in the farfield? We write
F (P ) =
Lω cos(θ)
2c
sin(
) = 2Lsinc (A(θ)),
ω cos(θ)
c
where
A(θ) =
Lω cos(θ)
c
and
sinc (A(θ)) =
sin A(θ)
,
A(θ)
for A(θ) 6= 0, and equals one for A(θ) = 1. The value of A used previously
is then A = A(0).
Local maxima or minima of F (P ) occur when the derivative of sinc (A(θ))
equals zero, which means that
A(θ) cos A(θ) − sin A(θ) = 0,
or
tan A(θ) = A(θ).
If we can solve this equation for A(θ) and then for θ, we will have found
angles corresponding to local maxima of the received signal strength. The
largest value of F (P ) occurs when θ = π2 , and the peak in the plot of F (P )
centered at θ = π2 is called the main lobe. The smaller peaks on either side
are called the grating lobes. We can see grating lobes in some of the polar
plots.
32
3.6
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
Remote Sensing
A basic problem in remote sensing is to determine the nature of a distant
object by measuring signals transmitted by or reflected from that object.
If the object of interest is sufficiently remote, that is, is in the farfield, the
data we obtain by sampling the propagating spatio-temporal field is related,
approximately, to what we want by Fourier transformation. The problem
is then to estimate a function from finitely many (usually noisy) values
of its Fourier transform. The application we consider here is a common
one of remote-sensing of transmitted or reflected waves propagating from
distant sources. Examples include optical imaging of planets and asteroids
using reflected sunlight, radio-astronomy imaging of distant sources of radio
waves, active and passive sonar, radar imaging using micro-waves, and
infra-red (IR) imaging to monitor the ocean temperature .
3.7
One-Dimensional Arrays
Now we imagine that the points P are the sources of the signals and we
are able to measure the transmissions at points x in [−L, L]. The P corresponding to the angle θ sends F (θ) sin(ωt), where the absolute value of
F (θ) is the strength of the signal coming from P . In narrow-band passive sonar, for example, we may have hydrophone sensors placed at various
points x and our goal is to determine how much acoustic energy at a specified frequency is coming from different directions. There may be only a
few directions contributing significant energy at the frequency of interest.
3.7.1
Measuring Fourier Coefficients
To simplify notation, we shall introduce the variable u = cos(θ). We then
have
p
du
= − sin(θ) = − 1 − u2 ,
dθ
so that
1
dθ = − √
du.
1 − u2
Now let G(u) be the function
G(u) =
F (arccos(u))
√
,
1 − u2
defined for u in the interval [−1, 1].
Measuring the signals received at x and −x, we can obtain the integrals
Z 1
xω
G(u) cos( u)du,
(3.31)
c
−1
3.7. ONE-DIMENSIONAL ARRAYS
33
and
Z
1
G(u) sin(
−1
xω
u)du.
c
(3.32)
The Fourier coefficients of G(u) are
1
1
2
Z
1
2
Z
G(u) cos(nπu)du,
(3.33)
G(u) sin(nπu)du.
(3.34)
−1
and
1
−1
Therefore, in order to have our measurements match Fourier coefficients of
G(u) we need
xω
= nπ,
c
(3.35)
for some positive integer n. Therefore, we need to take measurements at
the points x and −x, where
x=n
πc
λ
= n = n∆,
ω
2
(3.36)
where ∆ = λ2 is the Nyquist spacing. Since x is restricted to [−L, L], there
is an upper limit to the n we can use; we must have
n≤
L
2L
=
.
λ/2
λ
(3.37)
The upper bound 2L
λ , which is the length of our array of sensors, in units
of wavelength, is often called the aperture of the array.
Once we have some of the Fourier coefficients of the function G(u), we
can estimate G(u) for |u| ≤ 1 and, from that estimate, obtain an estimate
of the original F (θ).
As we just saw, the number of Fourier coefficients of G(u) that we
can measure, and therefore the resolution of the resulting reconstruction
of F (θ), is limited by the aperture, that is, the length 2L of the array of
sensors, divided by the wavelength λ. One way to improve resolution is
to make the array of sensors longer, which is more easily said than done.
However, synthetic-aperture radar (SAR) effectively does this. The idea of
SAR is to mount the array of sensors on a moving airplane. As the plane
moves, it effectively creates a longer array of sensors, a virtual array if you
will. The one drawback is that the sensors in this virtual array are not
34
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
all present at the same time, as in a normal array. Consequently, the data
must be modified to approximate what would have been received at other
times.
As in the examples discussed previously, we do have more measurements
we can take, if we use values of x other than those described by Equation
(3.36). The issue will be what to do with these over-sampled measurements.
3.7.2
Over-sampling
One situation in which over-sampling arises naturally occurs in sonar array
processing. Suppose that an array of sensors has been built to operate at
a design frequency of ω0 , which means that we have placed sensors at the
points x in [−L, L] that satisfy the equation
x=n
λ0
πc
=n
= n∆0 ,
ω0
2
(3.38)
where λ0 is the wavelength corresponding to the frequency ω0 and ∆0 = λ20
is the Nyquist spacing for frequency ω0 . Now suppose that we want to
operate the sensing at another frequency, say ω. The sensors cannot be
moved, so we must make due with sensors at the points x determined by
the design frequency.
Consider, first, the case in which the second frequency ω is less than
the design frequency ω0 . Then its wavelength λ is larger than λ0 , and the
Nyquist spacing ∆ = λ2 for ω is larger than ∆0 . So we have over-sampled.
The measurements taken at the sensors provide us with the integrals
1
1
2K
Z
1
2K
Z
G(u) cos(
nπ
u)du,
K
(3.39)
G(u) sin(
nπ
u)du,
K
(3.40)
−1
and
1
−1
where K = ωω0 > 1. These are Fourier coefficients of the function G(u),
viewed as defined on the interval [−K, K], which is larger than [−1, 1], and
taking the value zero outside [−1, 1]. If we then use the DFT estimate of
G(u), it will estimate G(u) for the values of u within [−1, 1], which is what
we want, as well as for the values of u outside [−1, 1], where we already
know G(u) to be zero. Once again, we can use the modified DFT, the
MDFT, to include the prior knowledge that G(u) = 0 for u outside [−1, 1]
to improve our reconstruction of G(u) and F (θ). In the over-sampled case
the interval [−1, 1] is called the visible region (although audible region seems
more appropriate for sonar), since it contains all the values of u that can
correspond to actual angles of arrival of acoustic energy.
3.7. ONE-DIMENSIONAL ARRAYS
3.7.3
35
Under-sampling
Now suppose that the frequency ω that we want to consider is greater than
the design frequency ω0 . This means that the spacing between the sensors
is too large; we have under-sampled. Once again, however, we cannot move
the sensors and must make due with what we have.
Now the measurements at the sensors provide us with the integrals
Z 1
nπ
1
G(u) cos( u)du,
(3.41)
2K −1
K
and
1
2K
Z
1
G(u) sin(
−1
nπ
u)du,
K
(3.42)
where K = ωω0 < 1. These are Fourier coefficients of the function G(u),
viewed as defined on the interval [−K, K], which is smaller than [−1, 1],
and taking the value zero outside [−K, K]. Since G(u) is not necessarily
zero outside [−K, K], treating it as if it were zero there results in a type
of error known as aliasing, in which energy corresponding to angles whose
u lies outside [−K, K] is mistakenly assigned to values of u that lie within
[−K, K]. Aliasing is a common phenomenon; the strobe-light effect is
aliasing, as is the apparent backward motion of the wheels of stage-coaches
in cowboy movies. In the case of the strobe light, we are permitted to view
the scene at times too far apart for us to sense continuous, smooth motion.
In the case of the wagon wheels, the frames of the film capture instants of
time too far apart for us to see the true rotation of the wheels.
36
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
Figure 3.1: The non-iterative band-limited extrapolation method (MDFT)
(top) and the DFT (bottom) for N = 64, 30 times over-sampled data.
3.7. ONE-DIMENSIONAL ARRAYS
Figure 3.2: Farfield Measurements.
37
38
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
Figure 3.3: Relative strength at P for A = 0.5.
Figure 3.4: Relative strength at P for A = 1.0.
3.7. ONE-DIMENSIONAL ARRAYS
Figure 3.5: Relative strength at P for A = 1.5.
Figure 3.6: Relative strength at P for A = 1.8.
39
40
CHAPTER 3. TRANSMISSION AND REMOTE SENSING- I
Figure 3.7: Relative strength at P for A = 3.2.
Figure 3.8: Relative strength at P for A = 6.5.
Part III
Signal Models
41
Chapter 4
Undetermined-Parameter
Models
4.1
Chapter Summary
All of the techniques discussed in this book deal, in one way or another,
with one fundamental problem: estimate the values of a function f (x) from
finitely many (usually noisy) measurements related to f (x); here x can be
a multi-dimensional vector, so that f can be a function of more than one
variable. To keep the notation relatively simple here, we shall assume,
throughout this chapter, that x is a real variable, but all of what we shall
say applies to multi-variate functions as well.
4.2
Fundamental Calculations
In this section we present the two most basic calculational problems in
signal processing. Both problems concern a real trigonometric polynomial
f (x), with
K
f (x) =
X
1
a0 +
ak cos(kx) + bk sin(kx).
2
(4.1)
k=1
After we have discussed the complex exponential functions, we shall revisit
the material in this section, using complex numbers. Then it will become
clear why we call such functions trigonometric polynomials.
43
44
4.2.1
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
Evaluating a Trigonometric Polynomial
This function f (x) is 2π-periodic, so we need to study it only over one
period. For that reason, we shall restrict the variable x to the interval
[0, 2π]. Now let N = 2K + 1, and
xn =
2π
n,
N
for n = 0, 1, ..., N − 1. We define fn = f (xn ). The computational problem
is to calculate the N real numbers fn , knowing the N real numbers a0 and
ak and bk , for k = 1, ..., K.
This problem may seem trivial, and it is, in a sense. All we need to do
is to write
K
X
2π
2π
1
ak cos( nk) + bk sin( nk),
fn = a0 +
2
N
N
(4.2)
k=1
and compute the sum of the right side, for each n = 0, 1, ..., N − 1. The
problem is that, in most practical applications, the N is very large, calculating each sum requires N multiplications, and there are N such sums to
be evaluated. So this is an “N -squared problem” . As we shall see later, the
fast Fourier transform (FFT) can be used to accelerate these calculations.
4.2.2
Determining the Coefficients
Now we reverse the problem. Suppose that we have determined the values
fn , say from measurements, and we want to find the coefficients a0 and ak
and bk , for k = 1, ..., K. Again we have
K
fn =
X
2π
2π
1
a0 +
ak cos( nk) + bk sin( nk),
2
N
N
(4.3)
k=1
only now it is the left side of each equation that we know. This problem
is also trivial, in a sense; all we need to do is to solve this system of linear
equations. Again, it is the size of N that is the problem, and again the
FFT comes to the rescue.
In the next section we discuss two examples that lead to these calculational problems. Then we show how trigonometric identities can be used
to obtain a type of orthogonality for finite sums of trig functions. This
orthogonality will provide us with a quicker way to determine the coefficients. It will reduce the problem of solving the N by N system of linear
equations to the simpler problem of evaluation discussed in the previous
section. But we can simplify even further, as we shall see in our discussion
of the FFT.
4.3. TWO EXAMPLES
4.3
45
Two Examples
Signal processing begins with measurements. The next step is to use these
measurements to perform various calculations. We consider two examples.
4.3.1
The Unknown Strength Problem
In our discussion of remote sensing we saw that, if each point x in the
interval [−L, L] is emitting a signal f (x) sin ωt, and f (x) has the Fourier
series expansion
∞
f (x) =
X
kπ
1
kπ
a0 +
ak cos( x) + bk sin( x),
2
L
L
(4.4)
k=1
then, by measuring the propagating signals in the far-field, we can determine the Fourier coefficients ak and bk , for k = 0, 1, 2, ..., K, where K is
the largest positive integer such that
K≤
Lω
.
πc
Once we have these ak and bk , we can approximate f (x) by calculating the
finite sum
K
fDF T (x) =
X
kπ
kπ
1
a0 +
ak cos( x) + bk sin( x).
2
L
L
(4.5)
k=1
To plot this approximation or to make use of it in some way, we need to
evaluate fDF T (x) for some finite set of values of x.
To evaluate this function at a single x requires 2K + 1 multiplications.
If K is large, and there are many x at which we wish to evaluate fDF T (x),
then we must perform quite a few multiplications. The fast Fourier transform (FFT) algorithm, which we shall study later, is a fast method for
obtaining these evaluations.
Suppose, for example, that we choose to evaluate fDF T (x) at N =
2K + 1 values of x, equi-spaced within the interval [−L, L]; in other words,
we evaluate fDF T (x) at the points
xn = −L +
2L
n,
N
for n = 0, 1, ..., N − 1. Using trig identities, we can easily show that
K
fDF T (xn ) =
X
1
2π
2π
a0 +
ak (−1)k cos( kn) + bk (−1)k sin( kn).
2
N
N
k=1
(4.6)
46
4.3.2
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
Sampling in Time
Much of signal processing begins with taking samples, or evaluations, of a
function of time. Let f (t) be the function we are interested in, with the
variable t denoting time. To learn about f (t), we evaluate it at, say, the
points t = tn , for n = 1, 2, ..., N , so that our data are the N numbers f (tn ).
Our ultimate objective may be to estimate a value of f (t) that we
haven’t measured, perhaps to predict a future value of the function, or to
fill in values of f (t) for t between the tn at which we have measurements.
It may be the case that the function f (t) represents sound, someone
singing or speaking, perhaps, and contains noise that we want to remove,
if we can. In such cases, we think of f (t) as f (t) = s(t) + v(t), where v(t) is
the noise function, and s(t) is the clear signal that we want. Then we may
want to use all the values f (tn ) to estimate s(t) at some finite number of
values of t, not necessarily the same tn at which we have measured f (t).
To estimate f (t) from the sampled values, we often use signal models.
These models are functions with finitely many unknown parameters, which
are to be determined from the samples. For example, we may wish to think
of the function f (t) as made up of some finite number of sines and cosines;
then
K
f (t) =
X
1
a0 +
ak cos(ωk t) + bk sin(ωk t) ,
2
(4.7)
k=1
where the ωk are chosen by us and, therefore, known, but the ak and bk are
not known. Now the goal is to use the N data points f (tn ) to determine the
ak and bk . Once again, if N and K are large, this can be computationally
costly. As with the previous problem, the FFT can help us here.
4.3.3
The Issue of Units
When we write cos π = −1, it is with the understanding that π is a measure
of angle, in radians; the function cos will always have an independent variable in units of radians. Therefore, when we write cos(xω), we understand
the product xω to be in units of radians. If x is measured in seconds, then
ω is in units of radians per second; if x is in meters, then ω is in units of
radians per meter. When x is in seconds, we sometimes use the variable
ω
ω
2π ; since 2π is then in units of radians per cycle, the variable 2π is in units
of cycles per second, or Hertz. When we sample f (x) at values of x spaced
1
∆ apart, the ∆ is in units of x-units per sample, and the reciprocal, ∆
,
which is called the sampling frequency, is in units of samples per x-units.
1
If x is in seconds, then ∆ is in units of seconds per sample, and ∆
is in
units of samples per second.
4.4. ESTIMATION AND MODELS
4.4
47
Estimation and Models
Our measurements, call them dm , for m = 1, ..., M , can be actual values
of f (x) measured at several different values of x, or the measurements can
take the form of linear functional values:
Z
dm = f (x)gm (x)dx,
for known functions gm (x). For example, we could have Fourier cosine
transform values of f (x),
Z ∞
f (x) cos(ωm x)dx,
dm =
−∞
or Fourier sine transform values of f (x),
Z ∞
dm =
f (x) sin(ωm x)dx,
−∞
where the ωm are known real constants, or Laplace transform values
Z ∞
dm =
f (x)e−sm x dx,
0
where the sm > 0 are known constants. The point to keep in mind is
that the number of measurements is finite, so, even in the absence of measurement error or noise, the data are not usually sufficient to single out
precisely one function f (x). For this reason, we think of the problem as
approximating or estimating f (x), rather than finding f (x).
The process of approximating or estimating the function f (x) often
involves making simplifying assumptions about the algebraic form of f (x).
For example, we may assume that f (x) is a polynomial, or a finite sum of
trigonometric functions. In such cases, we are said to be adopting a model
for f (x). The models involve finitely many as yet unknown parameters,
which we can determine from the data by solving systems of equations.
In the next section we discuss briefly the polynomial model, and then
turn to a more detailed treatment of trigonometric models. In subsequent
chapters we focus on the important topic of complex exponential-function
models, which combine features of polynomial models and trigonometric
models.
4.5
A Polynomial Model
A fundamental problem in signal processing is to extract information about
a function f (x) from finitely many values of that function. One way to solve
48
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
the problem is to model the function f (x) as a member of a parametric
family of functions. For example, suppose we have the measurements f (xn ),
for n = 1, ..., N , and we model f (x) as a polynomial of degree N − 1, so
that
N
−1
X
f (x) = a0 + a1 x + a2 x2 + ... + aN −1 xN −1 =
ak xk ,
k=0
for some coefficients ak to be determined. Inserting the known values, we
find that we must solve the system of N equations in N unknowns given
by
N
−1
X
f (xn ) = a0 + a1 xn + a2 x2n + ... + aN −1 xnN −1 =
ak xkn ,
k=0
for n = 1, ..., N . In theory, this is simple; all we need to do is to use MATLAB or some similar software that includes routines to solve such systems.
In practice, the situation is usually more complicated, in that the system
may be ill-conditioned and the solution highly sensitive to errors in the
measurements f (xn ); this will be the case if the xn are not well separated.
It is unwise, in such cases, to use as many parameters as we have data. For
example, if we have reason to suspect that the function f (x) is actually
linear, we can do linear regression. When there are fewer parameters than
measurements, we usually calculate a least-squares solution for the system
of equations.
At this stage in our discussion, however, we shall ignore these practical
problems and focus on the use of finite-parameter models.
4.6
Linear Trigonometric Models
Another popular finite-parameter model is to consider f (x) as a finite sum
of trigonometric functions.
Suppose that we have the values f (xn ), for N values x = xn , n =
1, ..., N , where, for convenience, we shall assume that N = 2K + 1 is odd.
It is not uncommon to assume that f (x) is a function of the form
f (x) =
K X
1
a0 +
ak cos(ωk x) + bk sin(ωk x) ,
2
(4.8)
k=1
where the ωk are chosen by us and, therefore, known, but the ak and bk
are not known. It is sometimes the case that the data values f (xn ) are
used to help us select the values of ωk prior to using the model for f (x)
given by Equation (4.8); the problem of determining the ωk from data will
be discussed later, when we consider Prony’s method.
4.6. LINEAR TRIGONOMETRIC MODELS
49
Once again, we find the unknown ak and bk by fitting the model to the
data. We insert the data f (xn ) corresponding to the N points xn , and we
solve the system of N linear equations in N unknowns,
f (xn ) =
K X
1
a0 +
ak cos(ωk xn ) + bk sin(ωk xn ) ,
2
k=1
for n = 0, ..., N − 1, to find the ak and bk . When K is large, calculating
the coefficients can be time-consuming. One particular choice for the xn
and ωk reduces the computation time significantly.
4.6.1
Equi-Spaced Frequencies
It is often the case in signal processing that the variable x is time, in which
case we usually replace the letter x with the letter t. The variables ωk are
then frequencies. When the variable x represents distance along its axis,
the ωk are called spatial frequencies. Here, for convenience, we shall refer to
the ωk as frequencies, without making any assumptions about the nature
of the variable x.
Unless we have determined the frequencies ωk from our data, or have
prior knowledge of which frequencies ωk are involved in the problem, it is
convenient to select the ωk equi-spaced within some interval. The simplest
choice, from an algebraic stand-point, is ωk = k, with appropriately chosen
units. Then our model becomes
f (x) =
K X
1
a0 +
ak cos(kx) + bk sin(kx) .
2
(4.9)
k=1
The function f (x) is then 2π-periodic, so we restrict the variable x to the
interval [0, 2π], which is one full period. The goal is still the same: calculate
the coefficients from the values f (xn ), n = 0, 1, ..., N −1, where N = 2K +1;
this involves solving a system of N linear equations in N unknowns, which
is computationally expensive when N is large. For particular choices of the
xn the computational cost can be considerably reduced.
4.6.2
Equi-Spaced Sampling
It is often the case that we can choose the xn at which we evaluate the
function f (x). We suppose now that we have selected xn = n∆, for ∆ = 2π
N
and n = 0, ..., N − 1. In keeping with the common notation, we write
fn = f (n∆) for n = 0, ..., N − 1. Then we have to solve the system
K
fn =
X
1
2π
2π
a0 +
ak cos( kn) + bk sin( kn) ,
2
N
N
k=1
(4.10)
50
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
for n = 0, ..., N − 1, to find the N coefficients a0 and ak and bk , for k =
1, ..., K.
4.7
4.7.1
Recalling Fourier Series
Fourier Coefficients
In the study of Fourier series we encounter models having the form in
Equation (4.9). The function f (x) in that equation is 2π-periodic, and
when we want to determine the coefficients, we integrate:
2π
1
π
Z
1
bk =
π
Z
ak =
f (x) cos(kx)dx,
(4.11)
f (x) sin(kx)dx.
(4.12)
0
and
2π
0
It is the mutual orthogonality of the functions cos(kx) and sin(kx) over the
interval [0, 2π] that enables us to write the values of the coefficients in such
a simple way.
To determine the coefficients this way, we need to know the function
f (x) ahead of time, since we have to be able to calculate the integrals, or
these integrals must be among the measurements we have taken. When
all we know about f (x) are its values at finitely many values of x, we
cannot find the coefficients this way. As we shall see shortly, we can still
exploit a type of orthogonality to obtain a relatively simple expression for
the coefficients in terms of the sampled values of f (x).
4.7.2
Riemann Sums
Suppose that we have obtained the values of the function f (x) at the N
points 2πn
N , for n = 0, 1, ..., N − 1. We can get at least approximate values
of the ak and bk by replacing the integrals in Equations (4.11) and (4.12)
with Riemann sums. Then these integrals are replaced by the sums
N −1
2 X 2πn
2π
f(
) cos( nk),
N n=0
N
N
(4.13)
N −1
2 X 2πn
2π
f (x) sin(kx)dx ≈
f(
) sin( nk).
N n=0
N
N
(4.14)
2π
1
π
Z
1
π
Z
f (x) cos(kx)dx ≈
0
and
0
2π
4.8. SIMPLIFYING THE CALCULATIONS
51
What is remarkable here is that these sums give us the ak and bk exactly,
not just approximately, when we select N = 2K + 1, so that the number
of values of f (x) is the same as the number of unknown coefficients we are
attempting to find. This happens because there is a type of orthogonality
for these finite sums of trigonometric functions that is analogous to the
integral orthogonality of the trig functions. The details are in the next
section.
4.8
Simplifying the Calculations
As we shall see in this section, choosing N = 2K + 1, ωk = k and ∆ =
2π
N leads to a form of orthogonality that will allow us to calculate the
parameters in a relatively simple manner. Because the function in Equation
(4.9) is 2π-periodic, the measurements f (n∆), n = 0, 1, ..., N − 1 will be
repeated if we continue to sample f (x) at points n∆, for n > N − 1.
4.8.1
The Main Theorem
As we remarked earlier, when we replace the integrals in Equations (4.11)
and (4.12) with the particular Riemann sum approximations in Equations
(4.13) and (4.14), we do not get approximate values of the ak and bk ; we
get the exact values.
We can view the Riemann sums another way. To calculate the Fourier
coefficients in Equation (4.4), we multiply both sides of the equation by a
sine or cosine function and integrate over x; orthogonality does the rest.
Now we multiply each side of Equation (4.10) by a sine or cosine and sum
over n; orthogonality does the rest once again.
For fixed j = 1, ..., K consider the sums
N
−1
X
fn cos(
n=0
and
N
−1
X
fn sin(
n=0
2π
jn)
N
2π
jn).
N
Replacing fn with the right side of Equation (4.10), we get
N
−1
X
fn cos(
n=0
+
K
X
k=1
ak
N −1
1 X
2π
2π
jn) = a0
cos( jn)
N
2 n=0
N
−1
NX
n=0
cos(
2π
2π
kn) cos( jn)
N
N
52
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
+bk
−1
NX
!
2π
2π
sin( kn) cos( jn) ,
N
N
n=0
(4.15)
and
N
−1
X
fn sin(
n=0
+
K
X
ak
−1
NX
cos(
n=0
k=1
+bk
N −1
2π
1 X
2π
jn) = a0
sin( jn)
N
2 n=0
N
2π
2π
kn) sin( jn)
N
N
−1
NX
!
2π
2π
sin( kn) sin( jn) .
N
N
n=0
(4.16)
Our main goal is the proof of the next theorem, which will follow immediately from Lemma 4.1.
Theorem 4.1 The trigonometric coefficients can be found using the following formulas:
N
−1
X
N
fn = a0 ,
2
n=0
N
−1
X
fn cos(
2π
N
nj) = aj ,
N
2
fn sin(
2π
N
nj) = bj ,
N
2
n=0
and
N
−1
X
n=0
for j = 1, ..., K.
Lemma 4.1 For N = 2K + 1 and j, k = 0, 1, 2, ..., K, we have
N
−1
X
2π
2π
kn) cos( jn) = 0,
N
N
n=0

N
−1
if j 6= k;
 0,
X
2π
2π
cos( kn) cos( jn) = N2 , if j = k 6= 0;

N
N
n=0
N, if j = k = 0;
and
N
−1
X
n=0
sin(
sin(
2π
2π
kn) sin( jn) =
N
N
0,
N
2,
if j 6= k, or j = k = 0;
if j = k 6= 0.
The proof of this lemma is contained in the following sequence of exercises.
4.8. SIMPLIFYING THE CALCULATIONS
4.8.2
The Proofs as Exercises
Exercise 4.1 Using trigonometric identities, show that
2π
2π
2π
1
2π
cos( kn) cos( jn) =
cos( (k + j)n) + cos( (k − j)n) ,
N
N
2
N
N
2π
2π
2π
1
2π
sin( (k + j)n) + sin( (k − j)n) ,
sin( kn) cos( jn) =
N
N
2
N
N
and
2π
2π
2π
1
2π
cos( (k + j)n) − cos( (k − j)n) .
sin( kn) sin( jn) = −
N
N
2
N
N
Exercise 4.2 Use trigonometric identities to show that
1
1
x
sin((n + )x) − sin((n − )x) = 2 sin( ) cos(nx),
2
2
2
and
1
x
1
cos((n + )x) − cos((n − )x) = −2 sin( ) sin(nx).
2
2
2
Exercise 4.3 Use the previous exercise to show that
N −1
x X
1
x
2 sin( )
cos(nx) = sin((N − )x) + sin( ),
2 n=0
2
2
and
N −1
x X
x
1
2 sin( )
sin(nx) = cos( ) − cos((N − )x).
2 n=0
2
2
Hints: sum over n = 0, 1, ..., N − 1 on both sides and note that
x
x
sin( ) = − sin(− ).
2
2
Exercise 4.4 Use trigonometric identities to show that
x
N −1
N
1
sin((N − )x) + sin( ) = 2 cos(
x) sin( x),
2
2
2
2
and
cos
x
1
N
N −1
− cos((N − )x) = 2 sin( x) sin(
x).
2
2
2
2
Hints: Use
N−
and
1
N
N −1
=
+
,
2
2
2
1
N
N −1
=
−
.
2
2
2
53
54
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
Exercise 4.5 Use the previous exercises to show that
N −1
x X
N −1
N
sin( )
x),
cos(nx) = sin( x) cos(
2 n=0
2
2
and
N −1
x X
N −1
N
sin( )
x).
sin(nx) = sin( x) sin(
2 n=0
2
2
Let m be any integer. Substituting x =
previous exercise, we obtain
2πm
N
in the equations in the
sin(
N
−1
X
π
N −1
2πmn
m)
) = sin(πm) cos(
πm),
cos(
N
N
N
n=0
(4.17)
sin(
N
−1
X
π
2πmn
N −1
m)
sin(
) = sin(πm) sin(
πm).
N
N
N
n=0
(4.18)
and
With m = k + j, we have
N
−1
X
π
2π(k + j)n
N −1
sin( (k + j))
cos(
) = sin(π(k + j)) cos(
π(k + j)),
(4.19)
N
N
N
n=0
and
sin(
N
−1
X
π
2π(k + j)n
N −1
(k + j))
sin(
) = sin(π(k + j)) sin(
π(k + j)).
(4.20)
N
N
N
n=0
Similarly, with m = k − j, we obtain
sin(
N
−1
X
2π(k − j)n
N −1
π
(k − j))
cos(
) = sin(π(k − j)) cos(
π(k − j)),
(4.21)
N
N
N
n=0
and
sin(
N
−1
X
π
2π(k − j)n
N −1
(k − j))
sin(
) = sin(π(k − j)) sin(
π(k − j)).
(4.22)
N
N
N
n=0
Exercise 4.6 Prove Lemma 4.1.
4.9. APPROXIMATION, MODELS, OR TRUTH?
4.8.3
55
More Computational Issues
In many applications of signal processing N , the number of measurements
of the function f (x), can be quite large. In the previous subsection, we
found a relatively inexpensive way to find the undetermined parameters
of the trigonometric model, but even this way poses computational problems when N is large. The computation of a single aj or bj requires N
multiplications and we have to calculate N − 1 of these parameters. Thus,
the complexity of the problem is on the order of N squared. Fortunately,
there is a fast algorithm, known as the fast Fourier transform (FFT), that
enables us to perform these calculations in far fewer multiplications. We
shall investigate the FFT in a later chapter, after we have discussed the
complex exponential functions.
4.9
4.9.1
Approximation, Models, or Truth?
Approximating the Truth
In the unknown strength problem we are interested in the unknown function
f (x), with the Fourier series
∞
f (x) =
X
kπ
kπ
1
a0 +
ak cos( x) + bk sin( x).
2
L
L
(4.23)
k=1
Because our far-field measurements only give us finitely many of its Fourier
coefficients, we cannot obtain a perfect description of f (x). Instead, we can
try to approximate f (x). One way to do this is to use the DFT:
K
fDF T (x) =
X
1
kπ
kπ
a0 +
ak cos( x) + bk sin( x).
2
L
L
(4.24)
k=1
Once we have decided to use fDF T (x) as our approximation, we probably
want to evaluate this approximation at some number of values of x, in order
to plot fDF T (x), for example. This step is purely a calculation problem.
4.9.2
Modeling the Data
In the problem of sampling in time, we have some unknown function of
time, f (t), and we measure its values f (tn ) at the N sampling points t = tn ,
n = 1, ..., N . There are several different possible objectives that we may
have at this point.
56
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
Extrapolation
We may want to estimate values of f (t) at points t at which we do not
have measurements; these other points may represent time in the future,
for example, and we are trying to predict future values of f (t). In such
cases, it is common to adopt a model for f (t), which is typically some
function of t with finitely many as yet undetermined parameters, such as
a polynomial or a sum of trig functions. We must select our model with
care, particularly if the data is assumed to be noisy, as most data is. Even
though we may have a large number of measurements, it may be a mistake
to model f (t) with as many parameters as we have data.
We do not really believe that f (t) is a polynomial or a finite sum of trig
functions. We may not even believe that the model is a good approximation
of f (t) for all values of t. We do believe, however, that adopting such
a model will enable us to carry out our prediction task in a reasonably
accurate way. The task may be something like predicting the temperature
at noon tomorrow, on the basis of noon-time temperatures for the previous
five days.
Filtering the Data
Suppose that the values f (tn ) are sampled data from an old recording of
a singer. We may want to clean up this digitized data, in order to be able
to recapture the original sound. Now we may only desire to modify each
of the values f (tn ) in some way, to improve the quality. To perform this
restoring task, we may model the data as samples of a finite sum of trig
functions
f (tn ) =
K X
1
a0 +
ak cos(ωk tn ) + bk sin(ωk tn ) ,
2
(4.25)
k=1
where the frequencies ωk are chosen by us. We then solve for the parameters
ak and bk .
To clean up the sound, we may modify the values of the ak and the bk .
For example, we may believe that certain of the frequencies come primarily
from a noise component in the recording. To remove, or at least diminish,
this component, we can reduce the associated ak and bk . We may feel
that the original recording technology failed to capture some of the higher
notes sung by the soprano. Then we can increase the values of the ak and
bk associated with those frequencies that need to be restored. Obviously,
restoring old recordings of opera singers is more involved than this, but
you get the idea.
The point here is that we need not believe that the entire recording
can be accurately described, or even approximated, by a finite sum of trig
functions. The sum of trig functions in Equation (4.7) does give another
4.10. FROM REAL TO COMPLEX
57
way to describe the measured data, and as such, another way to modify
this data, namely by modifying the ak and bk . We do not need to believe
that the entire opera can be accurately approximated by such a sum in
order for this restoring procedure to be helpful.
Note that if our goal is to recapture a high note sung by the soprano,
we do not really need to use samples of the function f (t) that correspond
to times when only the tenor was on stage singing. It would make more
sense to process only those measurements taken right around the time the
high note was sung by the soprano. This is short-time Fourier analysis, an
issue that we deal with in the appendix on wavelets.
4.10
From Real to Complex
Throughout this chapter we have limited the discussion to real data and
models involving only real coefficients and real-valued functions. Beginning
with the next chapter, we shall turn to complex data and complex-valued
models. Limiting the discussion to the real numbers comes at a price.
Although complex variables may not be as familiar to the reader as real
variables, there is some advantage in allowing the data and the models to
be complex, as is the common practice in signal processing. The algebra
is a bit simpler, in that we will no longer need to involve trigonometric
identities at every turn, and the results that we shall obtain are, in some
respects, better than those we obtained in this chapter.
58
CHAPTER 4. UNDETERMINED-PARAMETER MODELS
Chapter 5
Complex Numbers
5.1
Chapter Summary
It is standard practice in signal processing to employ complex numbers
whenever possible. One of the main reasons for doing this is that it enables
us to represent the important sine and cosine functions in terms of complex exponential functions and to replace trigonometric identities with the
somewhat simpler rules for the manipulation of exponents. In this chapter
we review the basic algebra of complex numbers.
5.2
Definition and Basics
The complex numbers are the points in the x, y-plane: the complex number
z = (a, b) is identified with the point in the plane having a = Re(z), the
real part of z, for its x-coordinate and b = Im(z), the imaginary part of
z, for its y-coordinate. We call (a, b) the rectangular form of the complex
number z. The conjugate of the complex number z is z = (a, −b). We
can
√ also represent z in its polar form: let the magnitude of z be |z| =
a2 + b2 and the phase angle of z, denoted θ(z), be the angle in [0, 2π)
with cos θ(z) = a/|z|. Then the polar form for z is
z = (|z| cos θ(z), |z| sin θ(z)).
Any complex number z = (a, b) for which the imaginary part Im(z) = b
is zero is identified with (treated the same as) its real part Re(z) = a;
that is, we identify a and z = (a, 0). These real complex numbers lie
along the x-axis in the plane, the so-called real line. If this were the whole
story complex numbers would be unimportant; but they are not. It is the
arithmetic associated with complex numbers that makes them important.
59
60
CHAPTER 5. COMPLEX NUMBERS
We add two complex numbers using their rectangular representations:
(a, b) + (c, d) = (a + c, b + d).
This is the same formula used to add two-dimensional vectors. We multiply
complex numbers more easily when they are in their polar representations:
the product of z and w has |z||w| for its magnitude and θ(z) + θ(w) modulo
2π for its phase angle. Notice that the complex number z = (0, 1) has
θ(z) = π/2 and |z| = 1, so z 2 = (−1, 0), which we identify with the real
number −1. This tells us that within the realm of complex numbers the
real number −1 has a square root, i = (0, 1); note that −i = (0, −1) is also
a square root of −1.
To multiply z = (a, b) = a + ib by w = (c, d) = c + id in rectangular
form, we simply multiply the binomials
(a + ib)(c + id) = ac + ibc + iad + i2 bd
and recall that i2 = −1 to get
zw = (ac − bd, bc + ad).
If (a, b) is real, that is, if b = 0, then (a, b)(c, d) = (a, 0)(c, d) = (ac, ad),
which we also write as a(c, d). Therefore, we can rewrite the polar form for
z as
z = |z|(cos θ(z), sin θ(z)) = |z|(cos θ(z) + i sin θ(z)).
We will have yet another way to write the polar form of z when we consider
the complex exponential function.
Exercise 5.1 Derive the formula for dividing one complex number in rectangular form by another (nonzero) one.
Exercise 5.2 Show that for any two complex numbers z and w we have
|zw| ≥
1
(zw + zw).
2
(5.1)
Hint: Write |zw| as |zw| and zw as zw.
Exercise 5.3 Show that, for any constant a with |a| =
6 1, the function
G(z) =
has |G(z)| = 1 whenever |z| = 1.
z−a
1 − az
5.3. COMPLEX NUMBERS AS MATRICES
5.3
61
Complex Numbers as Matrices
The rules for multiplying and dividing two complex numbers may seem
a bit ad hoc; everything works out in the end, but there seems to be a
lack of motivation for the definitions. In this section we take a different approach to complex numbers, thinking of them as special two-by-two
matrices. From this perspective, multiplication and division of complex
numbers become the usual matrix multiplication and multiplication by the
inverse, respectively.
Let K be the set of all two-by-two real matrices having the form
a −b
Z=
,
(5.2)
b a
where a and b are any real numbers. Let R be the subset of K consisting of
those matrices for which b = 0. Clearly, if we make the natural association
between the real numbers a and c and the matrices
a 0
A=
0 a
and
C=
0
,
c
c
0
respectively, then the product AC of the two matrices is in R and is naturally associated with the real number ac. In fact, the set R, with the
usual matrix operations, is isomorphic to the set of real numbers, which
means that any differences between the two sets are merely superficial. In
the exercises that follow, we shall study the isomorphism between the set
K and the set of complex numbers.
Exercise 5.4
the form
• a. Show that multiplying a matrix Z by a matrix of
d
D=
0
0
d
gives the matrix dZ.
• b. Let z = a + bi be the complex number naturally associated with
the matrix Z, and w = c + di the complex number associated with the
matrix
c −d
W =
.
d c
Show that the matrix ZW is a member of K and is associated with
the complex number zw.
62
CHAPTER 5. COMPLEX NUMBERS
Exercise 5.5 The matrix naturally associated with the real number 1 is
the identity matrix
1 0
I=
,
0 1
since a = 1 and b = 0. Show that the matrix naturally associated with the
purely imaginary number i = 0 + 1i, the matrix
0 −1
E=
,
1 0
has the property that E 2 = −I, so E is the square root of the matrix −I,
just as i is the square root of −1.
Exercise 5.6 Relate the formula for the inverse of Z to the formula for
dividing a non-zero complex number by z. Note that the non-zero z are
naturally associated with the invertible matrices Z in K.
Exercise 5.7 Show that multiplying a two-dimensional column vector (x, y)T
by the matrix
cos θ − sin θ
Rθ =
sin θ
cos θ
rotates the vector (x, y)T counter-clockwise through an angle θ, so that
multiplying a complex number z = a+bi by the complex number cos θ+i sin θ
rotates z the same way.
Chapter 6
Complex Exponential
Functions
6.1
Chapter Summary
In signal processing, we are concerned with extracting information from
measured data. Often, the data are values of some underlying function of
one or several real variables. This function of interest may be the sum of
several simpler component functions from parameterized families and the
information we seek pertains to the number of these components and the
values of their parameters. For example, the function may be the sum of
trigonometric functions, each with an amplitude, a frequency and a phase.
For reasons of notational and computational convenience, such trigonometric functions are often replaced by complex exponential functions, the main
topic of this chapter.
6.2
The Complex Exponential Function
The most important function in signal processing is the complex-valued
function of the real variable x defined by
h(x) = cos(x) + i sin(x).
(6.1)
For reasons that will become clear shortly, this function is called the complex exponential function. Notice that the magnitude of the complex number h(x) is always equal to one, since cos2 (x) + sin2 (x) = 1 for all real x.
Since the functions cos(x) and sin(x) are 2π-periodic, that is, cos(x+2π) =
cos(x) and sin(x + 2π) = sin(x) for all x, the complex exponential function
h(x) is also 2π-periodic.
63
64
6.2.1
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
Real Exponential Functions
In calculus we encounter functions of the form g(x) = ax , where a > 0 is
an arbitrary constant. These functions are the exponential functions, the
most well-known of which is the function g(x) = ex . Exponential functions
are those with the property
g(u + v) = g(u)g(v)
(6.2)
for every u and v. Recall from calculus that for exponential functions
g(x) = ax with a > 0 the derivative g 0 (x) is
g 0 (x) = ax ln(a) = g(x) ln(a).
(6.3)
Now we consider the function h(x) in light of these ideas.
6.2.2
Why is h(x) an Exponential Function?
We show now that the function h(x) in Equation (6.1) has the property
given in Equation (6.2), so we have a right to call it an exponential function;
that is, h(x) = cx for some constant c. Since h(x) has complex values, the
constant c cannot be a real number, however.
Calculating h(u)h(v), we find
h(u)h(v) = (cos(u) cos(v) − sin(u) sin(v)) + i(cos(u) sin(v) + sin(u) cos(v))
= cos(u + v) + i sin(u + v) = h(u + v).
So h(x) is an exponential function; h(x) = cx for some complex constant
c. Inserting x = 1, we find that c is
c = cos(1) + i sin(1).
Let’s find another way to express c, using Equation (6.3). Since
h0 (x) = − sin(x) + i cos(x) = i(cos(x) + i sin(x)) = ih(x),
we conjecture that ln(c) = i; but what does this mean?
For a > 0 we know that b = ln(a) means that a = eb . Therefore, we
say that ln(c) = i means c = ei ; but what does it mean to take e to a
complex power? To define ei we turn to the Taylor series representation
for the exponential function g(x) = ex , defined for real x:
ex = 1 + x + x2 /2! + x3 /3! + ....
Inserting i in place of x and using the fact that i2 = −1, we find that
ei = (1 − 1/2! + 1/4! − ...) + i(1 − 1/3! + 1/5! − ...);
6.2. THE COMPLEX EXPONENTIAL FUNCTION
65
note that the two series are the Taylor series for cos(1) and sin(1), respectively, so ei = cos(1) + i sin(1). Then the complex exponential function in
Equation (6.1) is
h(x) = (ei )x = eix .
Inserting x = π, we get
h(π) = eiπ = cos(π) + i sin(π) = −1
or
eiπ + 1 = 0,
which is the remarkable relation discovered by Euler that combines the five
most important constants in mathematics, e, π, i, 1, and 0, in a single
equation.
Note that e2πi = e0i = e0 = 1, so
e(2π+x)i = e2πi eix = eix
for all x.
6.2.3
What is ez , for z complex?
We know from calculus what ex means for real x, and now we also know
what eix means. Using these we can define ez for any complex number
z = a + ib by ez = ea+ib = ea eib .
We know from calculus how to define ln(x) for x > 0, and we have just
defined ln(c) = i to mean c = ei . But we could also say that ln(c) = i(1 +
2πk) for any integer k; that is, the periodicity of the complex exponential
function forces the function ln(x) to be multi-valued.
For any nonzero complex number z = |z|eiθ(z) , we have
ln(z) = ln(|z|) + ln(eiθ(z) ) = ln(|z|) + i(θ(z) + 2πk),
for any integer k. If z = a > 0 then θ(z) = 0 and ln(z) = ln(a) + i(kπ)
for any even integer k; in calculus class we just take the value associated
with k = 0. If z = a < 0 then θ(z) = π and ln(z) = ln(−a) + i(kπ) for
any odd integer k. So we can define the logarithm of a negative number; it
just turns out not to be a real number. If z = ib with b > 0, then θ(z) = π2
and ln(z) = ln(b) + i( π2 + 2πk) for any integer k; if z = ib with b < 0, then
3π
θ(z) = 3π
2 and ln(z) = ln(−b) + i( 2 + 2πk) for any integer k.
−ix
Adding e
= cos(x) − i sin(x) to eix given by Equation (6.1), we get
cos(x) =
1 ix
(e + e−ix );
2
sin(x) =
1 ix
(e − e−ix ).
2i
subtracting, we obtain
66
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
These formulas allow us to extend the definition of cos and sin to complex
arguments z:
1
cos(z) = (eiz + e−iz )
2
and
1
sin(z) = (eiz − e−iz ).
2i
In signal processing the complex exponential function is often used to describe functions of time that exhibit periodic behavior:
h(ωt + θ) = ei(ωt+θ) = cos(ωt + θ) + i sin(ωt + θ),
where the frequency ω and phase angle θ are real constants and t denotes
time. We can alter the magnitude by multiplying h(ωt + θ) by a positive
constant |A|, called the amplitude, to get |A|h(ωt + θ). More generally, we
can combine the amplitude and the phase, writing
|A|h(ωt + θ) = |A|eiθ eiωt = Aeiωt ,
where A is the complex amplitude A = |A|eiθ . Many of the functions
encountered in signal processing can be modeled as linear combinations of
such complex exponential functions or sinusoids, as they are often called.
6.3
Complex Exponential Signal Models
In a previous chapter we considered signal models f (x) that are sums of
trigonometric functions;
f (x) =
L X
1
a0 +
ak cos(ωk x) + bk sin(ωk x) ,
2
(6.4)
k=1
where the ωk are known, but the ak and bk are not. Now that we see how
to convert sines and cosines to complex exponential functions, using
1
cos(ωk x) =
exp(iωk x) + exp(−iωk x)
(6.5)
2
and
sin(ωk x) =
1
exp(iωk x) − exp(−iωk x) ,
2i
(6.6)
we can write f (x) as
f (x) =
L
X
m=−L
cm exp(iωm x),
(6.7)
6.4. COHERENT AND INCOHERENT SUMMATION
67
where c0 = 12 a0 ,
ck =
1
(ak − ibk ),
2
(6.8)
and
c−k =
1
(ak + ibk ),
2
(6.9)
for k = 1, ..., L. Note that if the original coefficients ak and bk are real
numbers, then c−m = cm .
6.4
Coherent and Incoherent Summation
We begin this section with an exercise.
Exercise 6.1 On a blank sheet of paper, draw a horizontal and vertical
axis. Starting at the origin, draw a vector with length one unit (a unit can
be, say, one inch), in an arbitrary direction. Now, from the tip of the first
vector, draw another vector of length one, again in an arbitrary direction.
Repeat this process several times, using M vectors in all. Now measure the
distance from the origin to the tip of the last vector drawn. Compare this
length with the number M , which would be the distance from the origin to
the tip of the last vector, if all the vectors had had the same direction.
This exercise reveals the important difference between coherent and
incoherent summation, or, if you will, between constructive and destructive
interference. Each of the unit vectors drawn can be thought of as a complex
number eiθm , where θm is its arbitrary angle. The distance from the origin
to the tip of the last vector drawn is then
|eiθ1 + eiθ2 + ... + eiθM |.
(6.10)
If all the angles θm are equal, then this distance is M ; in all other cases
the distance is quite a bit less than M . The distinction between coherent
and incoherent summation plays a central role in signal processing, as well
as in quantum physics, as we discuss briefly in the next section.
6.5
Uses in Quantum Electrodynamics
In his experiments with light, Newton discovered the phenomenon of partial
reflection. The proportion of the light incident on a glass surface that is
reflected varies with the thickness of the glass, but the proportion oscillates
between zero and about sixteen percent as the glass thickens. He tried to
explain this puzzling behavior, but realized that he had not obtained a
68
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
satisfactory explanation. In his beautiful small book “QED: The Strange
Theory of Light and Matter” [108], the physicist Richard Feynman illustrates how the quantum theory applied to light, quantum electrodynamics
or QED, can be used to unravel many phenomena involving the interaction
of light with matter, including the partial reflection observed by Newton,
the least time principle, the array of colors we see on the surface of an oily
mud puddle, and so on. He is addressing an audience of non-physicists,
including even some non-scientists, and avoids mathematics as much as
possible. The one mathematical notion that he uses repeatedly is the addition of two-dimensional vectors pointing in a variety of directions, that
is, coherent and incoherent summation. The vector sum is the probability
amplitude of the event being discussed, and the square of its length is the
probability of the event.
6.6
Using Coherence and Incoherence
Suppose we are given as data the M complex numbers dm = eimγ , for
m = 1, ..., M , and we are asked to find the real number γ. We can exploit
the ideas of the previous section to get our answer.
First of all, from the data we have been given, we cannot distinguish γ
from γ + 2π, since, for all integers m
eim(γ+2π) = eimγ e2mπi = eimγ (1) = eimγ .
Therefore, we assume, from the beginning, that the γ we want to find lies
in the interval [−π, π). Note that we could have selected any interval of
length 2π, not necessarily [−π, π); if we have no prior knowledge of where
γ is located, the intervals [−π, π) or [0, 2π) are the most obvious choices.
6.6.1
The Discrete Fourier Transform
Now we take any value ω in the interval [−π, π), multiply each of the
numbers dm by e−imω , and sum over m to get
DF Td (ω) =
M
X
dm e−imω .
(6.11)
m=1
The sum we denote by DF Td will be called the discrete Fourier transform
(DFT) of the data (column) vector d = (d1 , ..., dM )T . We define the column
vector eω to be
eω = (eiω , e2iω , ..., eiM ω )T ,
(6.12)
which allows us to write DF Td = e†ω d, where the dagger denotes conjugate
transformation of a matrix or vector.
6.7. SOME EXERCISES ON COHERENT SUMMATION
69
Rewriting the exponential terms in the sum in Equation (6.11), we
obtain
DF Td (ω) =
M
X
dm e−imω =
m=1
M
X
eim(γ−ω) .
(6.13)
m=1
Performing this calculation for each ω in the interval [−π, π), we obtain the
function DF Td (ω). For each ω, the complex number DF Td (ω) is the sum
of M complex numbers, each having length one, and angle θm = m(γ − ω).
So long as ω is not equal to γ, these θm are all different, and DF Td (ω)
is an incoherent sum; consequently, |DF Td (ω)| will be smaller than M .
However, when ω = γ, each θm equals zero, and DF Td (ω) = |DF Td (ω)| =
M ; the reason for putting the minus sign in the exponent e−imω is so that
we get the term γ − ω, which is zero when γ = ω. We find the true γ
by computing the value |DF Td (ω)| for finitely many values of ω, plot the
result and look for the highest value. Of course, it may well happen that
the true value ω = γ is not exactly one of the points we choose to plot;
it may happen that the true γ is half way between two of the plot’s grid
points, for example. Nevertheless, if we know in advance that there is only
one true γ, this approach will give us a good idea of its value.
In many applications, the number M will be quite large, as will be the
number of grid points we wish to use for the plot. This means that the
number DF Td (ω) is a sum of a large number of terms, and that we must
calculate this sum for many values of ω. Fortunately, there is a wonderful
algorithm, called the fast Fourier transform (FFT), that we can use for
this purpose.
6.7
Some Exercises on Coherent Summation
The exercises in this section are designed to make a bit more quantitative
the ideas of the previous sections pertaining to coherent and incoherent
summation. The formulas obtained in these exercises will be used repeatedly throughout the text.
Exercise 6.2 Show that if sin x2 6= 0 then
EM (x) =
M
X
m=1
eimx = eix(
M +1
2 )
sin(M x/2)
.
sin(x/2)
(6.14)
Hint: Note that EM (x) is the sum of terms in a geometric progression;
EM (x) = eix + (eix )2 + (eix )3 + ... + (eix )M = eix (1 − eiM x )/(1 − eix ).
70
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
Now use the fact that, for any t, we have
1 − eit = eit/2 (e−it/2 − eit/2 ) = eit/2 (−2i) sin(t/2).
Exercise 6.3 The Dirichlet kernel of size M is defined as
DM (x) =
XM
m=−M
eimx .
Use Equation (6.14) to obtain the closed-form expression
DM (x) =
sin((M + 21 )x)
;
sin( x2 )
note that DM (x) is real-valued.
Hint: Reduce the problem to that of Exercise 6.2 by factoring appropriately.
Exercise 6.4 Use the result in Equation (6.14) to obtain the closed-form
expressions
M
X
+1
x)
M + N sin( M −N
2
x)
x
2
sin 2
(6.15)
+1
x)
M + N sin( M −N
2
x)
.
x
2
sin 2
(6.16)
cos mx = cos(
m=N
and
M
X
m=N
sin mx = sin(
Hint: Recall that cos mx and sin mx are the real and imaginary parts of
eimx .
Exercise 6.5 Obtain the formulas in the previous exercise using the trigonometric identity
1
1
x
sin((n + )x) − sin((n − )x) = 2 sin( ) cos(nx).
2
2
2
Exercise 6.6 Graph the function EM (x) for various values of M .
We note in passing that the function EM (x) equals M for x = 0 and
equals zero for the first time at x = 2π/M . This means that the main
lobe of EM (x), the inverted parabola-like portion of the graph centered at
x = 0, crosses the x-axis at x = 2π/M and x = −2π/M , so its height is M
6.8. COMPLICATIONS
71
and its width is 4π/M . As M grows larger the main lobe of EM (x) gets
higher and thinner.
In the exercise that follows we examine the resolving ability of the DFT.
Suppose we have M equi-spaced samples of a function f (x) having the form
For f (x) have the form
f (x) = eixγ1 + eixγ2 ,
where γ1 and γ2 are in the interval (−π, π). If M is sufficiently large, the
DFT should show two peaks, at roughly the values ω = γ1 and ω = γ2 . As
the distance |γ2 − γ1 | grows smaller, it will require a larger value of M for
the DFT to show two peaks.
Exercise 6.7 For this exercise, we take γ1 = −α and γ2 = α, for some
α in the interval (0, π). Select a value of M that is greater than two and
calculate the values f (m) for m = 1, ..., M . Plot the graph of the function
|DF Td (ω)| on (−π, π). Repeat the exercise for various values of M and
values of α closer to zero. Notice how DF Td (0) behaves as α goes to zero.
For each fixed value of M there will be a critical value of α such that, for
any smaller values of α, DF Td (0) will be larger than DF Td (α). This is
loss of resolution.
6.8
Complications
In the real world, of course, things are not so simple. In most applications,
the data comes from measurements, and so contains errors, also called
noise. The noise terms that appear in each dm are usually viewed as
random variables, and they may or may not be independent. If the noise
terms are not independent, we say that we have correlated noise. If we know
something about the statistics of the noises, we may wish to process the
data using statistical estimation methods, such as the best linear unbiased
estimator (BLUE).
6.8.1
Multiple Signal Components
It sometimes happens that there are two or more distinct values of ω that
we seek. For example, suppose the data is
dm = eimα + eimβ ,
for m = 1, ..., M , where α and β are two distinct numbers in the interval
[0, 2π), and we need to find both α and β. Now the function DF Td (ω) will
be
DF Td (ω) =
M
X
(e
m=1
imα
+e
imβ
)e
−imω
=
M
X
m=1
imα −imω
e
e
+
M
X
m=1
eimβ e−imω ,
72
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
so that
DF Td (ω) =
M
X
m=1
eim(α−ω) +
M
X
eim(β−ω) .
m=1
So the function DF Td (ω) is the sum of the DF Td (ω) that we would have
obtained separately if we had had only α and only β.
6.8.2
Resolution
If the numbers α and β are well separated in the interval [0, 2π) or M
is very large, the plot of |DF T d(ω)| will show two high values, one near
ω = α and one near ω = β. However, if the M is smaller or the α and β
are too close together, the plot of |DF T d(ω)| may show only one broader
high bump, centered between α and β; this is loss of resolution. How close
is close will depend on the value of M and where loss of resolution occurs
will depend on the M
6.8.3
Unequal Amplitudes and Complex Amplitudes
It is also often the case that two two signal components, the one from α
and the one from β, are not equally strong. We could have
dm = Aeimα + Beimβ ,
where A > B > 0. In fact, both A and B could be complex numbers, that
is, A = |A|eiθ1 and B = |B|eiθ2 , so that
dm = |A|eimα+θ1 + |B|eimβ+θ2 .
In stochastic signal processing, the A and B are viewed as random variables;
A and B may or may not be mutually independent.
6.8.4
Phase Errors
It sometimes happens that the hardware that provides the measured data
is imperfect and instead of giving us the values dm = eimα , we get dm =
eimα+φm . Now each phase error φm depends on m, which makes matters
worse than when we had θ1 and θ2 previously, neither depending on the
index m.
6.9
Undetermined Exponential Models
In our previous discussion, we assumed that the frequencies were known
and only the coefficients needed to be determined. The problem was then
a linear one. It is sometimes the case that we also want to estimate the
6.9. UNDETERMINED EXPONENTIAL MODELS
73
frequencies from the data. This is computationally more difficult and is a
nonlinear problem. Prony’s method is one approach to this problem.
The date of publication of [190] is often taken by editors to be a typographical error and is replaced by 1995; or, since it is not written in English,
perhaps 1895. But the 1795 date is the correct one. The mathematical
problem Prony solved arises also in signal processing, and his method for
solving it is still used today. Prony’s method is also the inspiration for the
eigenvector methods described in a later chapter.
6.9.1
Prony’s Problem
Prony considers a function of the form
f (x) =
N
X
an eγn x ,
(6.17)
n=1
where we allow the an and the γn to be complex. If we take the γn = iωn
to be imaginary, f (x) becomes the sum of complex exponentials, which
we discuss later; if we take γn to be real, then f (x) is the sum of real
exponentials, either increasing or decreasing. The problem is to determine
from samples of f (x) the number N , the γn , and the an .
6.9.2
Prony’s Method
Suppose that we have data fm = f (m∆), for some ∆ > 0 and for m =
1, ..., M , where we assume that M = 2N . We seek a vector c with entries
cj , j = 0, ..., N such that
c0 fk+1 + c1 fk+2 + c2 fk+3 + ... + cN fk+N +1 = 0,
(6.18)
for k = 0, 1, ..., M − N − 1. So, we want a complex vector c in C N +1
orthogonal to M − N = N other vectors. In matrix-vector notation we are
solving the linear system
f1
 f2

 .

 .

.
fN

f2
f3
fN +1

... fN +1
... fN +2  





...
fM
  
c0
0
c1   0 
  
.  .
 =  ,
.  .
  
.
.
cN
0
which we write as F c = 0. Since F † F c = 0 also, we see that c is an
eigenvector associated with the eigenvalue zero of the hermitian nonnegative definite matrix F † F ; here F † denotes the conjugate transpose of the
matrix F .
74
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
Fix a value of k and replace each of the fk+j in Equation (6.18) with
the value given by Equation (6.17) to get
0=
N
X
an [
n=0
=
N
X
N
X
cj eγn (k+j+1)∆ ]
j=0
an eγn (k+1)∆ [
n=0
N
X
cj (eγn ∆ )j ].
j=0
Since this is true for each of the N fixed values of k, we conclude that the
inner sum is zero for each n; that is,
N
X
cj (eγn ∆ )j = 0,
j=0
for each n. Therefore, the polynomial
C(z) =
N
X
cj z j
j=0
has for its roots the N values z = eγn ∆ . Once we find the roots of this
polynomial we have the values of eγn ∆ . If the γn are real, they are uniquely
determined from the values eγn ∆ , whereas, for non-real γn , this is not the
case, as we saw when we studied the complex exponential functions.
Then, we obtain the an by solving a linear system of equations. In practice we would not know N so would overestimate N somewhat in selecting
M . As a result, some of the an would be zero.
If we believe that the number N is considerably smaller than M , we do
not assume that 2N = M . Instead, we select L somewhat larger than we
believe N is and then solve the linear system









f1
f2
.
.
.
.
f2
f3
fM −L
fM −L+1
...
...

 
fL+1  
0
c0
fL+2 
0
  c1   
   . 
 .   
  =  . .
 .   
   . 
 .
 
0
cL
... fM
0
This system has M − L equations and L + 1 unknowns, so is quite overdetermined. We would then use the least-squares approach to obtain the
vector c. Again writing the system as F c = 0, we note that the matrix
F † F is L+1 by L+1 and has λ = 0 for its lowest eigenvalue; therefore, it is
6.9. UNDETERMINED EXPONENTIAL MODELS
75
not invertible. When there is noise in the measurements, this matrix may
become invertible, but will still have at least one very small eigenvalue.
Finding the vector c in either case can be tricky because we are looking for a nonzero solution of a homogeneous system of linear equations.
For a discussion of the numerical issues involved in these calculations, the
interested reader should consult the book by Therrien [214].
76
CHAPTER 6. COMPLEX EXPONENTIAL FUNCTIONS
Chapter 7
Transmission and Remote
Sensing- II
7.1
Chapter Summary
An important example of the use of the DFT is the design of directional
transmitting or receiving arrays of antennas. In this chapter we revisit
transmission and remote sensing, this time with emphasis on the roles
played by complex exponential functions and the DFT.
7.2
Directional Transmission
Parabolic mirrors behind car headlamps reflect the light from the bulb, concentrating it directly ahead. Whispering at one focal point of an elliptical
room can be heard clearly at the other focal point. When I call to someone
across the street, I cup my hands in the form of a megaphone to concentrate the sound in that direction. In all these cases the transmitted signal
has acquired directionality. In the case of the elliptical room, not only does
the soft whispering reflect off the walls toward the opposite focal point,
but the travel times are independent of where on the wall the reflections
occur; otherwise, the differences in time would make the received sound
unintelligible. Parabolic satellite dishes perform much the same function,
concentrating incoming signals coherently. In this chapter we discuss the
use of amplitude and phase modulation of transmitted signals to concentrate the signal power in certain directions. Following the lead of Richard
Feynman in [109], we use radio broadcasting as a concrete example of the
use of directional transmission.
Radio broadcasts are meant to be received and the amount of energy
that reaches the receiver depends on the amount of energy put into the
77
78
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
transmission as well as on the distance from the transmitter to the receiver.
If the transmitter broadcasts a spherical wave front, with equal power in
all directions, the energy in the signal is the same over the spherical wavefronts, so that the energy per unit area is proportional to the reciprocal
of the surface area of the front. This means that, for omni-directional
broadcasting, the energy per unit area, that is, the energy supplied to any
receiver, falls off as the distance squared. The amplitude of the received
signal is then proportional to the reciprocal of the distance.
Returning to the example we studied previously, suppose that you own
a radio station in Los Angeles. Most of the population resides along the
north-south coast, with fewer to the east, in the desert, and fewer still to
the west, in the Pacific Ocean. You might well want to transmit the radio
signal in a way that concentrates most of the power north and south. But
how can you do this? The answer is to broadcast directionally. By shaping
the wavefront to have most of its surface area north and south you will
enable to have the broadcast heard by more people without increasing the
total energy in the transmission. To achieve this shaping you can use an
array of multiple antennas.
7.3
7.3.1
Multiple-Antenna Arrays
The Array of Equi-Spaced Antennas
We place 2N + 1 transmitting antennas a distance ∆ > 0 apart along an
east-west axis, as shown in Figure 7.1. For convenience, let the locations
of the antennas be n∆, n = −N, ..., N . To begin with, let us suppose that
we have a fixed frequency ω and each of the transmitting antennas sends
1
cos(ωt). With this normalization the
out the same signal fn (t) = √2N
+1
total energy is independent of N . Let (x, y) be an arbitrary location on
the ground, and let s be the vector from the origin to the point (x, y).
Let θ be the angle measured clockwise from the positive horizontal axis
to the vector s. Let D be the distance from (x, y) to the origin. Then,
if (x, y) is sufficiently distant from the antennas, the distance from n∆ on
the horizontal axis to (x, y) is approximately D − n∆ cos(θ). The signals
arriving at (x, y) from the various antennas will have traveled for different
times and so will be out of phase with one another to a degree that depends
on the location of (x, y).
7.3.2
The Far-Field Strength Pattern
Since we are concerned only with wavefront shape, we omit for now the
distance-dependence in the amplitude of the received signal. The signal
7.3. MULTIPLE-ANTENNA ARRAYS
79
received at (x, y) is proportional to
f (s, t) = √
N
X
1
cos(ω(t − tn )),
2N + 1 n=−N
where
1
(D − n∆ cos(θ))
c
and c is the speed of propagation of the signal. Writing
tn =
cos(ω(t − tn )) = cos(ω(t −
for γ =
ω∆
c ,
D
) + nγ cos(θ))
c
we have
cos(ω(t−tn )) = cos(ω(t−
D
D
)) cos(nγ cos(θ))−sin(ω(t− )) sin(nγ cos(θ)).
c
c
Using Equations (6.15) and (6.16), we find that the signal received at (x, y)
is
D
1
(7.1)
A(θ) cos(ω(t − ))
f (s, t) = √
c
2N + 1
for
sin((N + 12 )γ cos(θ))
;
sin( 21 γ cos(θ))
√
when the denominator equals zero the signal equals 2N + 1 cos(ω(t− Dc )).
A(θ) =
7.3.3
Can the Strength be Zero?
We see from Equation (7.1) that the maximum power is in the north-south
direction. What about the east-west direction? In order to have negligible
signal power wasted in the east-west direction, we want the numerator, but
not the denominator, in Equation (7.1) to be zero when θ = 0. This means
that ∆ = mλ/(2N + 1), where λ = 2πc/ω is the wavelength and m is some
positive integer less than 2N + 1. Recall that the wavelength for broadcast
radio is tens to hundreds of meters.
Exercise 7.1 Graph the function A(θ) in polar coordinates for various
choices of N and ∆.
Figures at the end of this chapter show that transmission pattern A(θ)
for various choices of m and N . In Figure 7.2 N = 5 for each plot and
the m changes, illustrating the effect of changing the spacing of the array
elements. The plots in Figure 7.3 differ from those in Figure 7.2 only in
that N = 21 now. In Figure 7.4 we allow the m to be less than one, showing
the loss of the nulls in the east and west directions.
80
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
7.3.4
Diffraction Gratings
I have just placed on the table next to me a CD, with the shinier side
up. Beyond it is a lamp. The CD acts as a mirror, and I see in the CD
the reflection of the lamp. Every point of the lamp seems to be copied in
a particular point on the surface of the CD, as if the ambient light that
illuminates a particular point of the lamp travels only to a single point on
the CD and then is reflected on into my eye. Each point of the lamp has its
own special point on the CD. We know from basic optics that that point
is such that the angle of incidence equals the angle of reflection, and the
path (apparently) taken by the light beam is the shortest path the light
can take to get from the lamp to the CD and then on to my eye. But how
does the light know where to go?
In fact, what happens is that light beams take many paths from each
particular point on the lamp to the CD and on to my eye. The reason I see
only the one path is that all the other paths require different travel times,
and so light beams on different paths arrive at my eye out of phase with
one another. Only those paths very close to the one I see have travel times
sufficiently similar to avoid this destructive interference. Speaking a bit
more mathematically, if we define the function that associates with each
path the time to travel along that path, then, at the shortest path, the
first derivative of this function, in the sense of the calculus of variations,
is zero. Therefore deviations from the shortest path correspond only to
second-order changes in travel time, not first-order ones, which reduces the
destructive interference.
But, as I look at the CD on the table, I see more than the reflection of
the lamp. I see streaks of color also. There is a window off to the side and
the sun is shining into the room through this window. When I place my
hand between the CD and the window, some of the colored streaks disappear, and other colored streaks seem to appear. I am not seeing a direct
reflection of the sun; it is off to the side. What is happening is that the
grooves on the surface of the CD are each reflecting sunlight and acting as
little transmitters. Each color in the spectrum corresponds to a particular
frequency ω of light and at just the proper angle the spacing between the
grooves on the CD leads to coherent transmission of the reflected light in
the direction of my eye. The combination of frequency and spacing between
the grooves determines what color I see and at what angle. When I reach
over and tilt the CD off the table, the colors of the streaks change, because
I have changed the spacing of the little transmitters, relative to my eye.
An arrangement like this is called a diffraction grating and has many uses
in physics. For a wonderful, and largely math-free, introduction to these
ideas, see the book by Feynman [108].
7.4. PHASE AND AMPLITUDE MODULATION
7.4
81
Phase and Amplitude Modulation
In the previous section the signal broadcast from each of the antennas was
the same. Now we look at what directionality can be obtained by using
different amplitudes and phases at each of the antennas. Let the signal
broadcast from the antenna at n∆ be
fn (t) = |An | cos(ωt − φn ) = |An | cos(ω(t − τn )),
for some amplitude |An | > 0 and phase φn = ωτn . Now the signal received
at s is proportional to
N
X
f (s, t) =
|An | cos(ω(t − tn − τn )).
(7.2)
n=−N
If we wish, we can repeat the calculations done earlier to see what the effect
of the amplitude and phase changes is. Using complex notation simplifies
things somewhat.
Let us consider a complex signal; suppose that the signal transmitted
from the antenna at n∆ is gn (t) = |An |eiω(t−τn ) . Then, the signal received
at location s is proportional to
g(s, t) =
N
X
|An |eiω(t−tn −τn ) .
n=−N
Then we have
D
g(s, t) = B(θ)eiω(t− c )
for An = |An |e−iφn , x =
ω∆
c
cos(θ), and
B(θ) =
N
X
An einx .
n=−N
Note that the complex amplitude function B(θ) depends on our choices of
N and ∆ and takes the form of a finite Fourier series or DFT. We can design
B(θ) to approximate the desired directionality by choosing the appropriate
complex coefficients An and selecting the amplitudes |An | and phases φn
accordingly. We can generalize further by allowing the antennas to be
spaced irregularly along the east-west axis, or even distributed irregularly
over a two-dimensional area on the ground.
7.5
Steering the Array
In our previous discussion, we selected An = 1 and φn = 0 for all n and
saw that the maximum transmitted power was along the north-to-south
82
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
axis. Suppose that we want to design a transmitting array that maximally
concentrates signal power in another direction. Theoretically, we could
physically rotate or steer the array until it ran along a different axis, and
then proceed as before, with An = 1 and φn = 0. This is not practical, in
most cases. There is an alternative, fortunately. We can “steer” the array
mathematically.
If An = 1, and
n∆ω
cos α,
φn = −
c
for some angle α, then, for x = ω∆
c cos(θ), we have
B(θ) =
N
X
N
X
einx eiφn =
n=−N
ein
ω∆
c (cos θ−cos α)
.
n=−N
The maximum absolute value of B(θ) occurs when cos θ = cos α, or when
θ = α or θ = −α. Now the greatest power is concentrated in these directions. The point here is that we have altered the directionality of the transmission, not by physically moving the array of antennas, but by changing
the phases of the transmitted signals. This approach is sometimes called
phase steering. The same basic idea applies when we are receiving signals, rather than sending them. In radar and sonar, the array of sensors is
steered mathematically, by modifying the phases of the measured data, to
focus the sensitivity of the detecting array in a particular direction.
7.6
Maximal Concentration in a Sector
ω∆
In this section we take ∆ = πc
ω , so that c = π. Suppose that we want
to concentrate the transmitted power in the directions θ corresponding to
ω∆ ω∆
x = ω∆
c cos(θ) in the sub-interval [a, b] of the interval [− c , c ]. Let
T
u = (A−N , ..., AN ) be the vector of coefficients for the function
B(x) =
N
X
An e−inx .
n=−N
We want |B(x)| to be concentrated in the interval a ≤ x ≤ b.
Exercise 7.2 Show that
ω∆
c
1
2π
Z
1
2π
Z
|B(x)|2 dx = u† u,
− ω∆
c
and
a
b
|B(x)|2 dx = u† Qu,
7.7. HIGHER DIMENSIONAL ARRAYS
83
where Q is the matrix with entries
Qmn
1
=
2π
Z
b
exp(i(m − n)x) dx.
a
Maximizing the concentration of power within the interval [a, b] is then
equivalent to finding the vector u that maximizes the ratio u† Qu/u† u.
The matrix Q is positive-definite, all its eigenvalues are positive, and the
optimal u is the eigenvector of Q associated with the largest eigenvalue.
This largest eigenvalue is the desired ratio and is always less than one. As
N increases this ratio approaches one, for any fixed sub-interval [a, b].
7.7
Higher Dimensional Arrays
Up to now, we have considered sensors placed within a one-dimensional
interval [−L, L] and signals propagating within a plane containing [−L, L].
In such an arrangement there is a bit of ambiguity; we cannot tell if a
signal is coming from the angle θ or the angle θ + π. When propagating
signals can come to the array from any direction in three-dimensional space,
there is greater ambiguity. To resolve the ambiguities, we can employ twoand three-dimensional arrays of sensors. To analyze the higher-dimensional
cases, it is helpful to use the wave equation.
7.7.1
The Wave Equation
In many areas of remote sensing, what we measure are the fluctuations
in time of an electromagnetic or acoustic field. Such fields are described
mathematically as solutions of certain partial differential equations, such
as the wave equation. A function u(x, y, z, t) is said to satisfy the threedimensional wave equation if
utt = c2 (uxx + uyy + uzz ) = c2 ∇2 u,
(7.3)
where utt denotes the second partial derivative of u with respect to the time
variable t twice and c > 0 is the (constant) speed of propagation. More
complicated versions of the wave equation permit the speed of propagation
c to vary with the spatial variables x, y, z, but we shall not consider that
here.
We use the method of separation of variables at this point, to get some
idea about the nature of solutions of the wave equation. Assume, for the
moment, that the solution u(t, x, y, z) has the simple form
u(t, x, y, z) = f (t)g(x, y, z).
(7.4)
84
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
Inserting this separated form into the wave equation, we get
f 00 (t)g(x, y, z) = c2 f (t)∇2 g(x, y, z)
(7.5)
f 00 (t)/f (t) = c2 ∇2 g(x, y, z)/g(x, y, z).
(7.6)
or
The function on the left is independent of the spatial variables, while the
one on the right is independent of the time variable; consequently, they
must both equal the same constant, which we denote −ω 2 . From this we
have two separate equations,
f 00 (t) + ω 2 f (t) = 0,
(7.7)
and
∇2 g(x, y, z) +
ω2
g(x, y, z) = 0.
c2
(7.8)
Equation (7.8) is the Helmholtz equation.
Equation (7.7) has for its solutions the functions f (t) = cos(ωt) and
sin(ωt). Functions u(t, x, y, z) = f (t)g(x, y, z) with such time dependence
are called time-harmonic solutions.
7.7.2
Planewave Solutions
Suppose that, beginning at time t = 0, there is a localized disturbance.
As time passes, that disturbance spreads out spherically. When the radius
of the sphere is very large, the surface of the sphere appears planar, to
an observer on that surface, who is said then to be in the far field. This
motivates the study of solutions of the wave equation that are constant on
planes; the so-called planewave solutions.
Let s = (x, y, z) and u(s, t) = u(x, y, z, t) = eiωt eik·s . Then we can show
that u satisfies the wave equation utt = c2 ∇2 u for any real vector k, so long
as ||k||2 = ω 2 /c2 . This solution is a planewave associated with frequency
ω and wavevector k; at any fixed time the function u(s, t) is constant on
any plane in three-dimensional space having k as a normal vector.
In radar and sonar, the field u(s, t) being sampled is usually viewed as
a discrete or continuous superposition of planewave solutions with various
amplitudes, frequencies, and wavevectors. We sample the field at various
spatial locations s, for various times t. Here we simplify the situation a
bit by assuming that all the planewave solutions are associated with the
same frequency, ω. If not, we can perform an FFT on the functions of time
received at each sensor location s and keep only the value associated with
the desired frequency ω.
7.7. HIGHER DIMENSIONAL ARRAYS
7.7.3
85
Superposition and the Fourier Transform
It is notationally convenient now to use the complex exponential functions
eiωt = cos(ωt) + i sin(ωt)
instead of cos(ωt) and sin(ωt).
In the continuous superposition model, the field is
Z
iωt
F (k)eik·s dk.
u(s, t) = e
Our measurements at the sensor locations s give us the values
Z
f (s) = F (k)eik·s dk.
(7.9)
(7.10)
The data are then Fourier transform values of the complex function F (k);
F (k) is defined for all three-dimensional real vectors k, but is zero, in
theory, at least, for those k whose squared length ||k||2 is not equal to
ω 2 /c2 . Our goal is then to estimate F (k) from measured values of its
Fourier transform. Since each k is a normal vector for its planewave field
component, determining the value of F (k) will tell us the strength of the
planewave component coming from the direction k.
7.7.4
The Spherical Model
We can imagine that the sources of the planewave fields are the points P
that lie on the surface of a large sphere centered at the origin. For each
P , the ray from the origin to P is parallel to some wavevector k. The
function F (k) can then be viewed as a function F (P ) of the points P . Our
measurements will be taken at points s inside this sphere. The radius of
the sphere is assumed to be orders of magnitude larger than the distance
between sensors. The situation is that of astronomical observation of the
heavens using ground-based antennas. The sources of the optical or electromagnetic signals reaching the antennas are viewed as lying on a large sphere
surrounding the earth. Distance to the sources is not considered now, and
all we are interested in are the amplitudes F (k) of the fields associated
with each direction k.
7.7.5
The Two-Dimensional Array
In some applications the sensor locations are essentially arbitrary, while
in others their locations are carefully chosen. Sometimes, the sensors are
collinear, as in sonar towed arrays. Figure 14.1 illustrates a line array.
86
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
Suppose now that the sensors are in locations s = (x, y, 0), for various
x and y; then we have a planar array of sensors. Then the dot product s · k
that occurs in Equation (7.10) is
s · k = xk1 + yk2 ;
(7.11)
we cannot see the third component, k3 . However, since we know the size
of the vector k, we can determine |k3 |. The only ambiguity that remains
is that we cannot distinguish sources on the upper hemisphere from those
on the lower one. In most cases, such as astronomy, it is obvious in which
hemisphere the sources lie, so the ambiguity is resolved.
The function F (k) can then be viewed as F (k1 , k2 ), a function of the
two variables k1 and k2 . Our measurements give us values of f (x, y), the
two-dimensional Fourier transform of F (k1 , k2 ). Because of the limitation
||k|| = ωc , the function F (k1 , k2 ) has bounded support. Consequently, its
Fourier transform cannot have bounded support. As a result, we can never
have all the values of f (x, y), and so cannot hope to reconstruct F (k1 , k2 )
exactly, even for noise-free data.
7.7.6
The One-Dimensional Array
If the sensors are located at points s having the form s = (x, 0, 0), then we
have a line array of sensors, as we discussed previously. The dot product
in Equation (7.10) becomes
s · k = xk1 .
(7.12)
Now the ambiguity is greater than in the planar array case. Once we have
k1 , we know that
ω
k22 + k32 = ( )2 − k12 ,
c
(7.13)
which describes points P lying on a circle on the surface of the distant
sphere, with the vector (k1 , 0, 0) pointing at the center of the circle. It
is said then that we have a cone of ambiguity. One way to resolve the
situation is to assume k3 = 0; then |k2 | can be determined and we have
remaining only the ambiguity involving the sign of k2 . Once again, in many
applications, this remaining ambiguity can be resolved by other means.
Once we have resolved any ambiguity, we can view the function F (k)
as F (k1 ), a function of the single variable k1 . Our measurements give us
values of f (x), the Fourier transform of F (k1 ). As in the two-dimensional
case, the restriction on the size of the vectors k means that the function
F (k1 ) has bounded support. Consequently, its Fourier transform, f (x),
cannot have bounded support. Therefore, we shall never have all of f (x),
and so cannot hope to reconstruct F (k1 ) exactly, even for noise-free data.
7.7. HIGHER DIMENSIONAL ARRAYS
7.7.7
87
Limited Aperture
In both the one- and two-dimensional problems, the sensors will be placed
within some bounded region, such as |x| ≤ A, |y| ≤ B for the twodimensional problem, or |x| ≤ L for the one-dimensional case. The size
of these bounded regions, in units of wavelength, are the apertures of the
arrays. The larger these apertures are, the better the resolution of the
reconstructions. In digital array processing there are only finitely many
sensors, which then places added limitations on our ability to reconstruction the field amplitude function F (k).
7.7.8
Other Limitations on Resolution
In imaging regions of the earth from satellites in orbit there is a trade-off
between resolution and the time available to image a given site. Satellites
in geostationary orbit, such as weather and TV satellites, remain stationary,
relative to a fixed position on the earth’s surface, but to do so must orbit
22, 000 miles above the earth. If we tried to image the earth from that
height, a telescope like the Hubble Space Telescope would have a resolution
of about 21 feet, due to the unavoidable blurring caused by the optics of
the lens itself. The Hubble orbits 353 miles above the earth, but because
it looks out into space, not down to earth, it only needs to be high enough
to avoid atmospheric distortions. Spy satellites operate in low Earth orbit
(LEO), about 200 miles above the earth, and achieve a resolution of about
2 or 3 inches, at the cost of spending only about 1 or 2 minutes over their
target. The satellites used in the GPS system maintain a medium Earth
orbit (MEO) at a height of about 12, 000 miles, high enough to be seen over
the horizon most of the time, but no so high as to require great power to
send their signals.
In the February 2003 issue of Harper’s Magazine there is an article on
“scientific apocalypse”, dealing with the search for near-earth asteroids.
These objects are initially detected by passive optical observation, as small
dots of reflected sunlight; once detected, they are then imaged by active
radar to determine their size, shape, rotation and such. Some Russian
astronomers are concerned about the near-earth asteroid Apophis 2004
MN4, which, they say, will pass within 30, 000 km of earth in 2029, and
come even closer in 2036. This is closer to earth than the satellites in
geostationary orbit. As they say, “Stay tuned for further developments.”
88
7.8
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
An Example: The Solar-Emission Problem
In [23] Bracewell discusses the solar-emission problem. In 1942, it was
observed that radio-wave emissions in the one-meter wavelength range were
arriving from the sun. Were they coming from the entire disk of the sun
or were the sources more localized, in sunspots, for example? The problem
then was to view each location on the sun’s surface as a potential source of
these radio waves and to determine the intensity of emission corresponding
to each location.
For electromagnetic waves the propagation speed is the speed of light
in a vacuum, which we shall take here to be c = 3 × 108 meters per second.
The wavelength λ for gamma rays is around one Angstrom, that is, 10−10
meters, which is about the diameter of an atom; for x-rays it is about one
millimicron, or 10−9 meters. The visible spectrum has wavelengths that
are a little less than one micron, that is, 10−6 meters, while infrared radiation (IR), predominantly associated with heat, has a wavelength somewhat
longer. Infrared radiation with a wavelength around 6 or 7 microns can be
used to detect water vapor; we use near IR, with a wavelength near that
of visible light, to change the channels on our TV sets. Shortwave radio
has a wavelength around one millimeter. Microwaves have wavelengths
between one centimeter and one meter; those used in radar imaging have
a wavelength about one inch and can penetrate clouds and thin layers of
leaves. Broadcast radio has a λ running from about 10 meters to 1000 meters. The so-called long radio waves can have wavelengths several thousand
meters long, necessitating clever methods of large-antenna design for radio
astronomy.
The sun has an angular diameter of 30 min. of arc, or one-half of a
degree, when viewed from earth, but the needed resolution was more like
3 min. of arc. Such resolution requires a radio telescope 1000 wavelengths
across, which means a diameter of 1km at a wavelength of 1 meter; in
1942 the largest military radar antennas were less than 5 meters across.
A solution was found, using the method of reconstructing an object from
line-integral data, a technique that surfaced again in tomography.
7.9
Another Example: Scattering in Crystallography
In [150] Körner reveals how surprised he was when he heard that large
amounts of computer time are spent by crystallographers computing Fourier
transforms numerically. He goes on to describe this application.
The structure to be analyzed consists of some finite number of particles
7.9. ANOTHER EXAMPLE: SCATTERING IN CRYSTALLOGRAPHY89
that will retransmit (scatter) in all directions any electromagnetic radiation
that hits them. A beam of monochromatic light with unit strength and
frequency ω is sent into the structure and the resulting scattered beams
are measured at some number of observation points.
We say that the scattering particles are located in space at the points
rm , m = 1, ..., M , and that the incoming light arrives as a planewave with
wavevector k0 . Then the planewave field generated by the incoming light
is
g(s, t) = eiωt eik0 ·s .
What is received at each rm is then
g(rm , t) = eiωt eik0 ·rm .
We observe the scattered signals at s, where the retransmitted signal coming from rm is
f (s, t) = eiωt eik0 ·rm eiks−rm k .
When s is sufficiently remote from the scattering particles, the retransmitted signal from rm arrives at s as a planewave with wavevector
km =
ω
(s − rm )/ks − rm k.
c
Therefore, at s we receive
u(s, t) = eiωt
M
X
eikm ·s .
m=1
The objective is to determine the km , which will then tell us the locations rm of the scattering particles. To do this, we imagine an infinity
of possible locations r for the particles and define a(r) = 1 if r = rm for
some m, and a(r) = 0 otherwise. More precisely, we define a(r) as a sum
of unit-strength Dirac delta functions supported at the rm , a topic we shall
deal with later. At each r we obtain (in theory) a value of the function
A(k), the Fourier transform of the function a(r).
In practice, the crystallographers cannot measure the complex numbers
A(k), but only the magnitudes |A(k)|; the phase angle of A(k) is lost. This
presents the crystallographers with the phase problem, in which we must
estimate a function from values of the magnitude of its Fourier transform.
In 1985, Hauptman and Karle won the Nobel Prize in Chemistry for
developing a new method for finding a(s) from measurements. Their technique is highly mathematical. It is comforting to know that, although there
is no Nobel Prize in Mathematics, it is still possible to win the prize for
doing mathematics.
90
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
Figure 7.1: Antenna array and far-field receiver.
7.9. ANOTHER EXAMPLE: SCATTERING IN CRYSTALLOGRAPHY91
Figure 7.2: Transmission Pattern A(θ): m = 1, 2, 4, 8 and N = 5.
92
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
Figure 7.3: Transmission Pattern A(θ): m = 1, 2, 4, 8 and N = 21.
7.9. ANOTHER EXAMPLE: SCATTERING IN CRYSTALLOGRAPHY93
Figure 7.4: Transmission Pattern A(θ): m = 0.9, 0.5, 0.25, 0.125 and N =
21.
94
CHAPTER 7. TRANSMISSION AND REMOTE SENSING- II
Part IV
Fourier Methods
95
Chapter 8
Fourier Analysis
8.1
Chapter Summary
The Fourier transform and Fourier series play major roles in signal and
image processing. They are useful in understanding the workings of a broad
class of linear systems. In transmission tomography, magnetic-resonance
imaging, radar, sonar and array processing in general, what we are able to
measure is related by the Fourier transform to what we are interested in.
8.2
The Fourier Transform
Let f (x) be a complex-valued function of the real variable x. The Fourier
transform (FT) of f (x), also called the Fourier integral, is the function
F (ω) defined for all real ω by
Z ∞
F (ω) =
f (x)eixω .
(8.1)
−∞
If we know F (ω), we can recapture f (x) using the formula for the Inverse
Fourier Transform (IFT)
Z ∞
1
F (ω)e−ixω dω.
f (x) =
(8.2)
2π −∞
The Fourier transform is related to Fourier series, a topic that may be more
familiar.
In particular applications the variables x and ω will take on actual
physical meaning. If x is time, in which case we usually replace x with t,
the variable ω becomes frequency. If x is spatial, that is, position along
the x-axis, then ω is spatial frequency. Spatial frequencies become more
97
98
CHAPTER 8. FOURIER ANALYSIS
important when we consider functions of more than one variable, as in
image processing. In our theoretical discussions of Fourier transformation,
however, the variables x and ω have no physical significance.
There is one situation, which we encounter in the next section, in which
the use of the variable ω may cause some confusion and the reader is cautioned to be careful. In the unknown strength problem, we have both
a temporal frequency and a spatial frequency, so we need two different
variables. We are interested in what is received at various locations in the
far-field when a single-frequency signal is broadcast from the various points
of the interval [−L, L]. By convention, we denote by ω the fixed frequency
of signal that is broadcast. The strength of the signal broadcast at x is
f (x), and its Fourier transform, which we shall denote by F (γ), will then
be evaluated at points on a circle in the far-field. The variable γ will be
proportional to the cosine of the angle determined by the far-field point
and the x-axis. We use γ, and not ω, in this case, because ω is already
being used to denote the temporal frequency of the broadcast signal. In the
later sections and in the chapters that follow,we shall return to the original
choice of the variables.
As an example of Fourier transformation, consider the function F (ω) =
χΩ (ω) that is one for |ω| ≤ Ω, and zero otherwise. Inserting this function
into Equation (8.2), we get
1
f (x) =
2π
Z
Ω
e
−Ω
−ixω
1
dω =
2π
Z
Ω
cos(xω)dω,
−Ω
since the sine function is odd and its integral is therefore zero. We can see
easily that
Ω
f (0) = .
π
For x 6= 0, we perform the integration, and obtain
f (x) =
8.3
sin(Ωx)
1 1
.
sin(Ωx) − sin(−Ωx) =
2π x
πx
(8.3)
The Unknown Strength Problem Again
To help us appreciate the role of the Fourier transform in remote sensing,
we revisit the unknown strength problem discussed earlier.
In our previous discussion, we assumed that each point x in the interval
[−L, L] was sending the signal f (x) sin ωt, where the value of f (x) was
the strength of the signal sent from x. Because we had not yet introduced
complex exponential functions, it was necessary to rely on sines and cosines
throughout. As you may recall, this required the use of trigonometric
identities and led to somewhat involved calculations. In addition, to obtain
8.3. THE UNKNOWN STRENGTH PROBLEM AGAIN
99
the Fourier coefficients, it was necessary to combine the readings at two
different locations 180 degrees apart. Now we want to make use of complex
exponential functions to simplify the calculations.
Note that in the discussion of the transmission problems the variable ω
is the frequency of the signal transmitted, not the argument of the Fourier
transform, as it is elsewhere in this chapter.
Because sin ωt can be written as
1 iωt
e − e−iωt ,
sin ωt =
2i
we shall consider the purely theoretical problem of finding what each point
P in the far-field would receive if each point x is sending only the signal
f (x)eiωt , where
f (x) = |f (x)|eiφ(x) ,
with |f (x)| ≥ 0 the strength of the signal, and φ(x) its phase. We shall
return to the original problem at the end.
The same far-field assumption we used previously tells us that the point
P receives from x a delayed version of what x sent; the point P receives
ω cos θ
ωD
1
f (x) exp(iω(t− (D−x cos θ))) = f (x) exp(i
x) exp(iωt) exp(−i
).
c
c
c
What P receives from all the points x in [−L, L] is then
ωD
exp(iωt) exp(−i
)
c
Z
L
f (x) exp(i
−L
ω cos θ
x)dx.
c
Ignoring the first two factors, which do not depend on what is coming from
θ
the points x, we see that what P receives is F ( ω cos
), which we can write
c
ω cos θ
as F (γ), where γ = c .
So, by measuring what each point P in the far-field receives, we obtain
values of F (γ), the Fourier transform of the function f (x), for any value of
the variable γ in the interval [− ωc , ωc ].
To get back to the original problem, in which the point x sends f (x) sin ωt,
we simply repeat the derivation in the previous paragraphs, but imagine
that the point x now sends the signal f (x)e−iωt . Then P receives
ωD
exp(−iωt) exp(i
)
c
Z
L
f (x) exp(−i
−L
ω cos θ
x)dx.
c
Combining what P receives in the two cases, we get back what we found
in our earlier discussion.
The point here is that we can simplify our calculations by using complex exponential signals and complex exponential functions in the definition of the Fourier transform, without losing anything. While it is true that
100
CHAPTER 8. FOURIER ANALYSIS
what is actually sent and received involves only real-valued functions, not
complex-valued ones, we can always return to the real case by expressing
the complex exponential functions in terms of sines and cosines. We are
simply replacing the more complicated calculations of trigonometric identities with the simpler algebra of exponential functions. This is standard
practice throughout signal processing.
8.4
Two-Dimensional Fourier Transforms
More generally, we consider a function f (x, y) of two real variables. Its
Fourier transformation is
Z Z
F (α, β) =
f (x, y)ei(xα+yβ) dxdy.
(8.4)
p
For example, suppose that f (x, y) = 1 for x2 + y 2 ≤ R, and zero,
otherwise. Then we have
Z π Z R
F (α, β) =
e−i(αr cos θ+βr sin θ) rdrdθ.
(8.5)
−π
0
In polar coordinates, with α = ρ cos φ and β = ρ sin φ, we have
Z RZ π
F (ρ, φ) =
eirρ cos(θ−φ) dθrdr.
0
(8.6)
−π
The inner integral is well known;
Z π
eirρ cos(θ−φ) dθ = 2πJ0 (rρ),
(8.7)
−π
where J0 and Jn denote the 0th order and nth order Bessel functions,
respectively. Using the following identity
Z z
tn Jn−1 (t)dt = z n Jn (z),
(8.8)
0
we have
F (ρ, φ) =
2πR
J1 (ρR).
ρ
(8.9)
Notice that, since f (x, z) is a radial function, that is, dependent only on
the distance from (0, 0) to (x, y), its Fourier transform is also radial.
The first positive zero of J1 (t) is around t = 4, so when we measure
F at various locations and find F (ρ, φ) = 0 for a particular (ρ, φ), we can
estimate R ≈ 4/ρ. So, even when a distant spherical object, like a star,
is too far away to be imaged well, we can sometimes estimate its size by
finding where the intensity of the received signal is zero [150].
8.5. FOURIER SERIES AND FOURIER TRANSFORMS
8.4.1
101
Two-Dimensional Fourier Inversion
Just as in the one-dimensional case, the Fourier transformation that produced F (α, β) can be inverted to recover the original f (x, y). The Fourier
Inversion Formula in this case is
Z Z
1
F (α, β)e−i(αx+βy) dαdβ.
(8.10)
f (x, y) =
4π 2
It is important to note that this procedure can be viewed as two onedimensional Fourier inversions: first, we invert F (α, β), as a function of,
say, β only, to get the function of α and y
Z
1
g(α, y) =
F (α, β)e−iβy dβ;
(8.11)
2π
second, we invert g(α, y), as a function of α, to get
Z
1
f (x, y) =
g(α, y)e−iαx dα.
2π
(8.12)
If we write the functions f (x, y) and F (α, β) in polar coordinates, we obtain
alternative ways to implement the two-dimensional Fourier inversion. We
shall consider these other ways when we discuss the tomography problem
of reconstructing a function f (x, y) from line-integral data.
8.5
Fourier Series and Fourier Transforms
When the function F (ω) is zero outside of some finite interval, there is a
useful relationship between the Fourier coefficients of F (ω) and its inverse
Fourier transform, f (x).
8.5.1
Support-Limited F (ω)
Suppose now that F (ω) is zero, except for ω in the interval [−Ω, Ω]. We
then say that F (ω) is support-limited to the band [−Ω, Ω]. Then F (ω) has
a Fourier series expansion
F (ω) =
+∞
X
π
an ei Ω nω ,
(8.13)
n=−∞
where the Fourier coefficients an are given by
an =
1
2Ω
Z
Ω
−Ω
π
F (ω)e−i Ω nω dω.
(8.14)
102
CHAPTER 8. FOURIER ANALYSIS
Comparing Equations (8.2) and (8.14), we see that an =
π
, we can write
∆= Ω
F (ω) = ∆
+∞
X
f (n∆)eiωn∆ .
π
π
Ω f (n Ω ).
With
(8.15)
n=−∞
8.5.2
Shannon’s Sampling Theorem
This tells us that if F (ω) is zero outside the interval [−Ω, Ω], then F (ω)
can be completely determined by the values of its inverse Fourier transform
π
. Once we have determined
f (x) at the infinite discrete set of points x = n Ω
F (ω) from these discrete samples, as they are called, we can also determine
all of the function f (x), by applying the inversion formula in Equation (8.2).
Inserting F (ω) as given in Equation (8.15) into the integral in Equation
(8.2), and using Equation (8.3), we get
f (x) =
+∞
X
n=−∞
f (n∆)
sin(Ω(n∆ − x))
.
Ω(n∆ − x)
(8.16)
This result is known as Shannon’s Sampling Theorem.
8.5.3
Sampling Terminology
In electrical engineering it is common to consider frequency in units of
cycles per second, or Hertz, and to denote frequency by the variable f, not
to be confused with the function f (x), where 2πf= ω. When we say that
ω lies in the interval [−Ω, Ω], we are also saying that f lies in the interval
Ω Ω
[− 2π
, 2π ]. Then
1
π
=
∆=
,
Ω
2fmax
where fmax is the largest value of f involved. For this reason, we sometimes
speak of the sampling rate as
1
= 2fmax ,
∆
and say that the appropriate sampling rate is twice the highest frequency
involved.
It is important to remember that this rule of thumb that the appropriate sampling rate is twice the highest frequency, measured in Hertz, has
meaning only in the context of Shannon’s Sampling Theorem, which deals
with infinite sequences of data.
8.5. FOURIER SERIES AND FOURIER TRANSFORMS
8.5.4
103
What Shannon Does Not Say
It is important to remember that Shannon’s Sampling Theorem tells us that
the doubly infinite sequence of values {f (n∆)}∞
n=−∞ is sufficient to recover
exactly the function F (ω) and, thereby, the function f (x). Therefore, sampling at the rate of twice the highest frequency (in Hertz) is sufficient only
when we have the complete doubly infinite sequence of samples. Of course,
in practice, we never have an infinite number of values of anything, so
the rule of thumb expressed by Shannon’s Sampling Theorem is not valid.
Since we know that we will end up with only finitely many samples, each
additional data value is additional information. There is no reason to stick
to the sampling rate of twice the highest frequency.
Exercise 8.1 Let ∆ = π, fm = f (m), and gm = g(m). Use the orthogonality of the functions eimω on [−π, π] to establish Parseval’s equation:
Z π
X∞
F (ω)G(ω)dω/2π,
hf, gi =
fm gm =
m=−∞
−π
from which it follows that
Z
∞
|F (ω)|2 dω/2π.
hf, f i =
−∞
Exercise 8.2 Let f (x) be defined for all real x and let F (ω) be its FT. Let
g(x) =
∞
X
f (x + 2πk),
k=−∞
assuming the sum exists. Show that g is a 2π -periodic function. Compute
its Fourier series and use it to derive the Poisson summation formula:
∞
X
k=−∞
8.5.5
f (2πk) =
∞
1 X
F (n).
2π n=−∞
Sampling from a Limited Interval
It is often the case that we have the opportunity to extract as many values
of f (x) as we desire, provided we take x within some fixed interval. If x = t
is time, for example, the signal f (t) may die out rapidly, so that we can
take measurements of f (t) only for t in an interval [0, T ], say. Do we limit
ourselves to a sampling rate of twice the highest frequency, if by doing that
we obtain only a small number of values of f (t)? No! We should oversample, and take data at a faster rate, to get more values of f (t). How we
then process this over-sampled data becomes an important issue, and noise
104
CHAPTER 8. FOURIER ANALYSIS
is ultimately the limiting factor in how much information we can extract
from over-sampled data.
In the next section we take a closer look at the problems presented by
the finiteness of the data.
8.6
The Problem of Finite Data
In practice, of course, we never have infinite sequences; we have finitely
many data points. In a number of important applications, such as sonar,
radar, and medical tomography, the object of interest will be represented
by the function F (ω), or a multi-dimensional version, and the data will be
finitely many values of f (x). Our goal is then to estimate F (ω) from the
data.
π
, and we have
Suppose, for example, that F (ω) = 0, for |ω| > Ω, ∆ = Ω
the values f (n∆), for n = 0, 1, ..., N − 1. Motivated by Equation (8.15), we
may take as an estimate of the function F (ω) the discrete Fourier transform
(DFT) of the data from the function f (x), which is the finite sum
DF T (ω) = ∆
N
−1
X
f (n∆)ein∆ω ,
(8.17)
n=0
defined for |ω| ≤ Ω. It is good to note that the DFT is consistent with
the data, meaning that, if we insert DF T (ω) into the integral in Equation
(8.2) and set x = n∆, for any n = 0, 1, ..., N − 1 the result is exactly the
data value f (n∆).
8.7
Best Approximation
The basic problem here is to estimate F (ω) from finitely many values of
f (x), under the assumption that F (ω) = 0 for |ω| > Ω, for some Ω > 0.
Since we do not have all of f (x), the best we can hope to do is to approximate F (ω) in some sense. To help us understand how best approximation
works, we consider the orthogonality principle.
8.7.1
The Orthogonality Principle
Imagine that you are standing and looking down at the floor. The point
B on the floor that is closest to the tip of your nose, which we label F ,
is the unique point on the floor such that the vector from B to any other
point A on the floor is perpendicular to the vector from B to F ; that is,
F B · AB = 0. This is a simple illustration of the orthogonality principle.
When two vectors are perpendicular to one another, their dot product
is zero. This idea can be extended to functions. We say that two functions
8.7. BEST APPROXIMATION
105
F (ω) and G(ω) defined on the interval [−Ω, Ω] are orthogonal if
Z Ω
F (ω)G(ω)dω = 0.
(8.18)
−Ω
Suppose that Gn (ω), n = 0, ..., N − 1, are known functions, and
A(ω) =
N
−1
X
an Gn (ω),
n=0
for any coefficients an . We want to minimize the approximation error
Z Ω
|F (ω) − A(ω)|2 dω,
(8.19)
−Ω
over all coefficients an . Suppose that the best choices are an = bn . The
orthogonality principle tells us that the best approximation
B(ω) =
N
−1
X
bn Gn (ω)
n=0
is such that the function F (ω) − B(ω) is orthogonal to A(ω) − B(ω) for
every choice of the an .
Suppose that we fix m and select an = bn , for n 6= m, and am = bm + 1.
Then we have
Z Ω
(8.20)
(F (ω) − B(ω))Gm (ω)dω = 0.
−Ω
We can use Equation (8.20) to help us find the best bn .
From Equation (8.20) we have
Z
Ω
F (ω)Gm (ω)dω =
−Ω
N
−1
X
n=0
Z
Ω
bn
Gn (ω)Gm (ω)dω.
−Ω
Since we know the Gn (ω), we know the integrals
Z Ω
Gn (ω)Gm (ω)dω.
−Ω
If we can learn the values
Z
Ω
F (ω)Gm (ω)dω
−Ω
from measurements, then we simply solve a system of linear equations to
find the bn .
106
8.7.2
CHAPTER 8. FOURIER ANALYSIS
An Example
Suppose that we have measured the values f (xn ), for n = 0, ..., N − 1,
where the xn are arbitrary real numbers. Then, from these measurements,
we can find the best approximation of F (ω) of the form
A(ω) =
N
−1
X
an Gn (ω),
n=0
if we select Gn (ω) = eiωxn .
8.7.3
The DFT as Best Approximation
Suppose now that our data values are f (∆n), for n = 0, 1, ..., N − 1, where
π
. We can view the DFT as a best approximation of
we have chosen ∆ = Ω
the function F (ω) over the interval [−Ω, Ω], in the following sense. Consider
all functions of the form
A(ω) =
N
−1
X
an ein∆ω ,
(8.21)
n=0
where the best coefficients an = bn are to be determined. Now select those
bn for which the approximation error
Z
Ω
|F (ω) − A(ω)|2 dω
(8.22)
−Ω
is minimized. Then it is easily shown that these optimal bn are precisely
bn = ∆f (n∆),
for n = 0, 1, ..., N − 1.
Exercise 8.3 Show that bn = ∆f (n∆), for n = 0, 1, ..., N − 1, are the
optimal coefficients.
The DFT estimate is reasonably accurate when N is large, but when
N is not large there are usually better ways to estimate F (ω), as we shall
see.
8.7.4
The Modified DFT (MDFT)
We suppose, as in the previous subsection, that F (ω) = 0, for |ω| > Ω,
and that our data values are f (n∆), for n = 0, 1, ..., N − 1. It is often
π
in order to
convenient to use a sampling interval ∆ that is smaller than Ω
8.7. BEST APPROXIMATION
107
obtain more data values. Therefore, we assume now that ∆ <
again, we seek the function of the form
A(ω) =
N
−1
X
an ein∆ω ,
π
Ω.
Once
(8.23)
n=0
defined for |ω| ≤ Ω, for which the error measurement
Ω
Z
|F (ω) − A(ω)|2 dω
−Ω
is minimized.
In the previous example, for which ∆ =
Z
π
Ω,
we have
Ω
ei(n−m)∆ω dω = 0,
−Ω
for m 6= n. As the reader will discover in doing Exercise 8.3, this greatly
simplifies the system of linear equations that we need to solve to get the
π
optimal bn . Now, because ∆ 6= Ω
, we have
Z
Ω
ei(n−m)∆ω dω =
−Ω
sin((n − m)∆Ω)
,
π(n − m)∆
which is not zero when n 6= m. This means that we have to solve a
more complicated system of linear equations in order to find the bn . It is
important to note that the optimal bn are not equal to ∆f (n∆) now, so
the DFT is not the optimal approximation. The best approximation in this
case we call the modified DFT (MDFT).
8.7.5
The PDFT
In the previous subsection, the functions A(ω) were defined for |ω| ≤ Ω.
Therefore, we could have written them as
A(ω) = χΩ (ω)
N
−1
X
an ein∆ω ,
n=0
where χΩ (ω) is the function that is one for |ω| ≤ Ω and zero otherwise.
The factor χΩ (ω) serves to incorporate into our approximating function
our prior knowledge that F (ω) = 0 outside the interval [−Ω, Ω]. What can
we do if we have additional prior knowledge about the broad features of
F (ω) that we wish to include?
108
CHAPTER 8. FOURIER ANALYSIS
Suppose that P (ω) ≥ 0 is a prior estimate of |F (ω)|. Now we approximate F (ω) with functions of the form
N
−1
X
C(ω) = P (ω)
cn ein∆ω .
(8.24)
n=0
As we shall see later in the text, the best choice of the cn are the ones that
satisfy the equations
f (m∆) =
N
−1
X
cn p((n − m)∆),
(8.25)
n=0
for m = 0, 1, ..., N − 1, where
p(x) =
1
2π
Ω
Z
P (ω)e−ixω dω
−Ω
is the inverse Fourier transform of the function P (ω). This best approximation we call the PDFT.
8.8
The Vector DFT
We turn now to the vector DFT, which may appear, initially, to be unrelated to the Fourier transform and Fourier series.
Let f = (f0 , f1 , ..., fN −1 )T be a column vector with complex entries;
here the superscript T denotes transposition. For k = 0, 1, ..., N − 1, define
the complex number Fk by
Fk =
N
−1
X
2π
fn ei N kn ,
(8.26)
n=0
and let F = (F0 , F1 , ..., FN −1 )T . We shall call the vector F the vector DFT
(vDFT) of the vector f . For the moment we attach no specific significance
to the entries of f or F.
Exercise 8.4 Let G be the N by N matrix with entries
2π
Gjm = ei N (j−1)(m−1) .
Show that
F = Gf .
Exercise 8.5 Show that the inverse of G is
denotes conjugate transposition. Therefore,
f=
1 †
G F.
N
1
†
NG ,
where the superscript †
8.9. USING THE VECTOR DFT
109
Exercise 8.6 Suppose that the function f (x) of interest is known to have
the form
N
−1
X
2π
f (x) =
ak ei N kx ,
k=0
for some coefficients ak , and suppose also that we have sampled f (x) to
obtain the values f (n), for n = 0, 1, ..., N −1. Use the results of the previous
exercises to show that ak = N1 FN −k , for k = 0, 1, ..., N −1. If, once we have
found the ak , we insert these values into the sum above and set x = n, for
each n = 0, 1, ..., N − 1, do we get back the original values f (n)? Compare
these results with those obtained previously for the function given by the
trigonometric polynomial in Equation (4.9).
Later, we shall study the fast Fourier transform (FFT) algorithm, which
provides an efficient way to calculate F from f . Now, we relate the vector
DFT to the DFT.
8.9
Using the Vector DFT
Suppose now that the function we want to estimate is F (ω) and that
π
and sample the function f (x)
F (ω) = 0 for |ω| > Ω. We take ∆ = Ω
to get our data f (n∆), for n = 0, 1, ..., N − 1. Note that we could have
used any N sample points with spacing ∆ and our choice here is simply for
notational convenience.
Let us take N equi-spaced values of ω in the interval [−Ω, Ω), with
ω0 = −Ω, ω1 = −Ω + 2Ω
N , and so on, that is, with
ωk = −Ω +
2Ω
k,
N
for k = 0, 1, ..., N − 1. Now we evaluate the function
DF T (ω) = ∆
N
−1
X
f (n∆)ein∆ω
n=0
at the points ω = ωk . We get
DF T (ωk ) = ∆
N
−1
X
2Ω
f (n∆)ein∆(−Ω+ N k) ,
n=0
or
DF T (ωk ) = ∆
N
−1
X
n=0
2π
f (n∆)e−inπ ei N kn .
110
CHAPTER 8. FOURIER ANALYSIS
If we let fn = ∆f (n∆)e−inπ in the definition of the vector DFT, we find
that
N
−1
X
2π
DF T (ωk ) = Fk =
fn ei N kn ,
n=0
for k = 0, 1, ..., N − 1.
What we have just seen is that the vector DFT, applied to the fn
obtained from the sampled data f (n∆), has for its entries the values of
the DF T (ω) at the N points ωk . So, when the vector DFT is used on
data consisting of sampled values of the function f (x), what we get are
not values of F (ω) itself, but rather values of the DFT estimate of F (ω).
How useful or accurate the vector DFT is in such cases depends entirely
on how useful or accurate the DFT is as an estimator of the true F (ω) in
each case.
There is one case, which we shall discuss in the next section, in which
the vector DFT gives us more than merely an approximation. This case,
although highly unrealistic, is frequently employed to motivate the use of
the vector DFT.
8.10
A Special Case of the Vector DFT
For concreteness, in this section we shall replace the variable x with the
time variable t and speak of the variable ω as frequency.
Suppose that we have sampled the function f (t) at the times t = n∆,
π
. In addition, we assume that f (t) has
and that F (ω) = 0 for |ω| > Ω = ∆
the special form
f (t) =
N
−1
X
2Ω
ck e−i(−Ω+ N k)t ,
(8.27)
k=0
for some coefficients ck . Inserting t = n∆, we get
f (n∆) =
N
−1
X
ck e
−i(−Ω+ 2Ω
N k)n∆
=
k=0
N
−1
X
k=0
Therefore, we can write
f (n∆)e−inπ =
N
−1
X
ck e−i N kn .
k=0
It follows that
ck =
2π
1
Fk ,
N
2π
ck einπ e−i N kn .
8.11. PLOTTING THE DFT
111
for
fn = f (n∆)e−inπ .
So, in this special case, the vector DFT formed by using fn provides us with
exact values of ck , and so allows us to recapture f (t) completely. However,
this special case is not at all realistic and gives a misleading impression of
what the vector DFT is doing.
2Ω
First of all, the complex exponential functions e−i(−Ω+ N k)t are periodic, with period N ∆. This means that, if we were to observe more values of
the function f (t), at the spacing ∆, we would see merely an endless string
of the N values already observed. How convenient that we stopped our
measurements of f (t) precisely when taking more of them would have been
unnecessary anyway. Besides, how would we ever know that a real-world
function of time was actually periodic? Second, the number of periodic
components in f (t) happens to be N , precisely the number of data values
we have taken. Third, the frequency of each component is an integer multiple of the fundamental frequency 2Ω
N , which just happens to involve N ,
the number of data points. It should be obvious by now that this special
case serves no practical purpose and only misleads us into thinking that the
vector DFT is doing more than it really is. In general, the vector DFT is
simply giving us N values of the DFT estimate of the true function F (ω).
8.11
Plotting the DFT
Once we have decided to use the DFT as an estimate of the function F (ω),
we may wish to plot it. Then we need to evaluate the DFT at some finite
number of ω points. There is no particular reason why we must let the
number of grid points be N ; we can take any number.
As we noted previously, the FFT is a fast algorithm for calculating
the vector DFT of any vector f . When we have as our data f (n∆), for
n = 0, 1, ..., N − 1, we can use the FFT to evaluate the DFT of the data
at N equi-spaced values of ω. The FFT is most efficient when the number
of entries in f is a power of two. Therefore, it is common to augment the
data by including some number of zero values, to make a vector with the
number of its entries a power of two. For example, suppose we have six
data points, f (0), f (∆), ..., f (5∆). We form the vector
f = (∆f (0), ∆f (∆), ∆f (2∆), ..., ∆f (5∆), 0, 0)T ,
which has eight entries. The vector DFT has for its entries eight equispaced values of the DFT estimator in the interval [−Ω, Ω).
Appending zero values to make the vector f longer is called zero-padding.
We can also use it to obtain the values of the DFT on a grid with any
number of points. Suppose, for example, that we have 400 samples of f (t),
112
CHAPTER 8. FOURIER ANALYSIS
that is, f (n∆), for n = 0, 1, ..., 399. If we want to evaluate the DFT at, say,
512 grid points, for the purpose of graphing, we make the first 400 entries
of f the data, and make the remaining 112 entries all zero. The DFT, as a
function of ω, is unchanged by this zero-padding, but the vector DFT now
produces 512 evaluations.
In a later chapter we consider how we can use prior knowledge to improve the DFT estimate.
8.12
The Vector DFT in Two Dimensions
We consider now a complex-valued function f (x, y) of two real variables,
with Fourier transformation
Z Z
F (α, β) =
f (x, y)ei(xα+yβ) dxdy.
(8.28)
Suppose that F (α, β) = 0, except for α and β in the interval [0, 2π]; this
means that the function F (α, β) represents a two-dimensional object with
bounded support, such as a picture. Then F (α, β) has a Fourier series
expansion
F (α, β) =
∞
X
∞
X
f (m, n)eimα einβ
(8.29)
m=−∞ n=−∞
for 0 ≤ α ≤ 2π and 0 ≤ β ≤ 2π.
In image processing, F (α, β) is our two-dimensional analogue image,
where α and β are continuous variables. The first step in digital image
processing is to digitize the image, which means forming a two-dimensional
array of numbers Fj,k , for j, k = 0, 1, ..., N − 1. For concreteness, we let
2π
the Fj,k be the values F ( 2π
N j, N k).
From Equation (8.29) we can write
Fj,k = F (
∞
∞
X
X
2π
2π
2π 2π
j,
k) =
f (m, n)ei N jm ei N kn ,
N N
m=−∞ n=−∞
(8.30)
for j, k = 0, 1, ..., N − 1.
We can also find coefficients fm,n , for m, n = 0, 1, ..., N − 1, such that
Fj,k = F (
N
−1 N
−1
X
X
2π
2π
2π 2π
j,
k) =
fm,n ei N jm ei N kn ,
N N
m=0 n=0
(8.31)
for j, k = 0, 1, ..., N − 1. These fm,n are only approximations of the values
f (m, n), as we shall see.
8.12. THE VECTOR DFT IN TWO DIMENSIONS
113
Just as in the one-dimensional case, we can make use of orthogonality
to find the coefficients fm,n . We have
fm,n
N −1 N −1
2π
2π
1 X X
2π
2π
= 2
kn)e−i N jm e−i N kn ,
F ( jm,
N j=0
N
N
(8.32)
k=0
for m, n = 0, 1, ..., N − 1. Now we show how the fm,n can be thought of as
approximations of the f (m, n).
We know from the Fourier Inversion Formula in two dimensions, Equation (8.10), that
f (m, n) =
1
4π 2
Z
2π
Z
0
2π
F (α, β)e−i(αm+βn) dαdβ.
(8.33)
0
When we replace the right side of Equation (8.33) with a Riemann sum,
we get
f (m, n) ≈
N −1 N −1
2π
2π
1 X X
2π
2π
F ( jm,
kn)e−i N jm e−i N kn ;
2
N j=0
N
N
(8.34)
k=0
the right side is precisely fm,n , according to Equation (8.32).
Notice that we can compute the fm,n from the Fj,k using one-dimensional
vDFTs. For each fixed j we compute the one-dimensional vDFT
Gj,n =
N −1
2π
1 X
Fj,k e−i N kn ,
N
k=0
for n = 0, 1, ..., N −1. Then for each fixed n we compute the one-dimensional
vDFT
N
−1
X
2π
fm,n =
Gj,n e−i N jm ,
j=0
for m = 0, 1, ..., N − 1. From this, we see that estimating f (x, y) by calculating the two-dimensional vDFT of the values from F (α, β) requires us to
obtain 2N one-dimensional vector DFTs.
Calculating the fm,n from the pixel values Fj,k is the main operation
in digital image processing. The fm,n approximate the spatial frequencies
in the image and modifications to the image, such as smoothing or edge
enhancement, can be made by modifying the values fm,n . Improving the
resolution of the image can be done by extrapolating the fm,n , that is, by
approximating values of f (x, y) other than x = m and y = n. Once we
have modified the fm,n , we return to the new values of Fj,k , so calculating
Fj,k from the fm,n is also an important step in image processing.
114
CHAPTER 8. FOURIER ANALYSIS
In some areas of medical imaging, such as transmission tomography
and magnetic-resonance imaging, the scanners provide the fm,n . Then the
desired digitized image of the patient is the array Fj,k . In such cases, the
fmn are considered to be approximate values of f (m, n). For more on the
role of the two-dimensional Fourier transform in medical imaging, see the
appendices on transmission tomography.
Even if we managed to have the true values, that is, even if fm,n =
2π
f (m, n), the values Fj,k are not the true values F ( 2π
N m, N n). The number
Fj,k is a value of the DFT approximation of F (α, β). This DFT approximation is the function given by
DF T (α, β) =
N
−1 N
−1
X
X
fm,n eiαm eiβn .
(8.35)
m=0 n=0
The number Fj,k is the value of this approximation at the point α =
and β = 2π
N k. In other words,
Fj,k = DF T (
2π
N j
2π 2π
j,
k),
N N
for j, k = 0, 1, ..., N − 1. How good this discrete image is as an approximation of the true F (α, β) depends primarily on two things: first, how
accurate an approximation of the numbers f (m, n) the numbers fm,n are;
and second, how good an approximation of the function F (α, β) the function DF T (α, β) is.
We can easily see now how important the fast Fourier transform algorithm is. Without the fast Fourier transform to accelerate the calculations,
obtaining a two-dimensional vDFT would be prohibitively expensive.
Exercise 8.7 Show that if f (x, y) is radial then its FT F is also radial.
Find the FT of the radial function f (x, y) = √ 21 2 .
x +y
Chapter 9
Properties of the Fourier
Transform
9.1
Chapter Summary
In this chapter we review the basic properties of the Fourier transform.
9.2
Fourier-Transform Pairs
Let f (x) be defined for the real variable x in (−∞, ∞). The Fourier transform (FT) of f (x) is the function of the real variable ω given by
Z ∞
F (ω) =
f (x)eiωx dx.
(9.1)
−∞
Precisely how we interpret the infinite integrals that arise in the discussion
of the Fourier transform will depend on the properties of the function f (x).
A detailed treatment of this issue, which is beyond the scope of this book,
can be found in almost any text on the Fourier transform (see, for example,
[116]).
If we have F (ω) for all real ω, then we can recover the function f (x)
using the Fourier Inversion Formula:
Z ∞
1
F (ω)e−iωx dω.
(9.2)
f (x) =
2π −∞
The functions f (x) and F (ω) are called a Fourier-transform pair, and f (x)
is sometimes called the inverse Fourier transform (IFT) of F (ω).
Note that the definitions of the FT and IFT just given may differ slightly
from the ones found elsewhere; our definitions are those of Bochner and
115
116
CHAPTER 9. PROPERTIES OF THE FOURIER TRANSFORM
Chandrasekharan [20] and Twomey [218]. The differences are minor and
involve only the placement of the quantity 2π and of the minus sign in
the exponent. One sometimes sees the Fourier transform of the function f
denoted fˆ; here we shall reserve the symbol fˆ for estimates of the function
f.
Once again, the proper interpretation of Equation (9.2) will depend
on the properties of the functions involved. It may happen that one or
both of these integrals will fail to be defined in the usual way and will be
interpreted as the principal value of the integral [116].
9.2.1
Decomposing f (x)
One way to view Equation (9.2) is that it shows us the function f (x)
as a superposition of complex exponential functions e−iωx , where ω runs
over the entire real line. The use of the minus sign here is simply for
notational convenience later. For each fixed value of ω, the complex number
F (ω) = |F (ω)|eiθ(ω) tells us that the amount of eiωx in f (x) is |F (ω)|, and
that eiωx involves a phase shift by θ(ω).
9.3
Basic Properties of the Fourier Transform
In this section we present the basic properties of the Fourier transform.
Proofs of these assertions are left as exercises.
Exercise 9.1 Let F (ω) be the FT of the function f (x). Use the definitions
of the FT and IFT given in Equations (9.1) and (9.2) to establish the
following basic properties of the Fourier transform operation:
• Symmetry: The FT of the function F (x) is 2πf (−ω). For example,
is χΩ (ω), so the FT of g(x) =
the FT of the function f (x) = sin(Ωx)
πx
χΩ (x) is G(ω) = 2π sin(Ωω)
.
πω
• Conjugation: The FT of f (x) is F (−ω).
• Scaling: The FT of f (ax) is
1
ω
|a| F ( a )
for any nonzero constant a.
• Shifting: The FT of f (x − a) is eiaω F (ω).
• Modulation: The FT of f (x) cos(ω0 x) is 21 [F (ω + ω0 ) + F (ω − ω0 )].
• Differentiation: The FT of the nth derivative, f (n) (x) is (−iω)n F (ω).
The IFT of F (n) (ω) is (ix)n f (x).
9.4. SOME FOURIER-TRANSFORM PAIRS
117
• Convolution in x: Let f, F , g, G and h, H be FT pairs, with
Z
h(x) = f (y)g(x − y)dy,
so that h(x) = (f ∗ g)(x) is the convolution of f (x) and g(x). Then
H(ω) = F (ω)G(ω). For example, if we take g(x) = f (−x), then
Z
Z
h(x) = f (x + y)f (y)dy = f (y)f (y − x)dy = rf (x)
is the autocorrelation function associated with f (x) and
H(ω) = |F (ω)|2 = Rf (ω) ≥ 0
is the power spectrum of f (x).
• Convolution in ω: Let f, F , g, G and h, H be FT pairs, with h(x) =
1
(F ∗ G)(ω).
f (x)g(x). Then H(ω) = 2π
9.4
Some Fourier-Transform Pairs
In this section we present several Fourier-transform pairs.
Exercise
9.2 Show that the Fourier transform of f (x) = e−α
√
ω 2
π −( 2α
)
.
α e
2
x2
is F (ω) =
Hint: Calculate the derivative F 0 (ω) by differentiating under the integral
sign in the definition of F and integrating by parts. Then solve the resulting
differential equation. Alternatively, perform the integration by completing
the square.
Let u(x) be the Heaviside function that is +1 if x ≥ 0 and 0 otherwise.
Let χA (x) be the characteristic function of the interval [−A, A] that is +1
for x in [−A, A] and 0 otherwise. Let sgn(x) be the sign function that is
+1 if x > 0, −1 if x < 0 and zero for x = 0.
Exercise 9.3 Show that the FT of the function f (x) = u(x)e−ax is F (ω) =
1
a−iω , for every positive constant a, where u(x) is the Heaviside function.
Exercise 9.4 Show that the FT of f (x) = χA (x) is F (ω) = 2 sin(Aω)
.
ω
118
CHAPTER 9. PROPERTIES OF THE FOURIER TRANSFORM
Exercise 9.5 Show that the IFT of the function F (ω) = 2i/ω is f (x) =
sgn(x).
Hints: Write the formula for the inverse Fourier transform of F (ω) as
Z +∞
Z +∞
1
2i
2i
i
f (x) =
cos ωxdω −
sin ωxdω,
2π −∞ ω
2π −∞ ω
which reduces to
1
f (x) =
π
Z
+∞
−∞
1
sin ωxdω,
ω
since the integrand of the first integral is odd. For x > 0 consider the
Fourier transform of the function χx (t). For x < 0 perform the change of
variables u = −x.
Generally, the functions f (x) and F (ω) are complex-valued, so that we
may speak about their real and imaginary parts. The next exercise explores
the connections that hold among these real-valued functions.
Exercise 9.6 Let f (x) be arbitrary and F (ω) its Fourier transform. Let
F (ω) = R(ω) + iX(ω), where R and X are real-valued functions, and
similarly, let f (x) = f1 (x) + if2 (x), where f1 and f2 are real-valued. Find
relationships between the pairs R,X and f1 ,f2 .
Definition 9.1 We define the even part of f (x) to be the function
fe (x) =
f (x) + f (−x)
,
2
and the odd part of f (x) to be
fo (x) =
f (x) − f (−x)
;
2
define Fe and Fo similarly for F the FT of f .
Exercise 9.7 Show that F (ω) is real-valued and even if and only if f (x)
is real-valued and even.
Exercise 9.8 Let F (ω) = R(ω) + iX(ω) be the decomposition of F into its
real and imaginary parts. We say that f is a causal function if f (x) = 0 for
all x < 0. Show that, if f is causal, then R and X are related; specifically,
show that X is the Hilbert transform of R, that is,
Z
1 ∞ R(α)
X(ω) =
dα.
π −∞ ω − α
9.5. DIRAC DELTAS
119
Hint: If f (x) = 0 for x < 0 then f (x)sgn(x) = f (x). Apply the convolution
theorem, then compare real and imaginary parts.
9.5
Dirac Deltas
We saw earlier that the F (ω) = χΩ (ω) has for its inverse Fourier transform
the function f (x) = sinπxΩx ; note that f (0) = Ω
π and f (x) = 0 for the first
π
time when Ωx = π or x = Ω
. For any Ω-band-limited function g(x) we
have G(ω) = G(ω)χΩ (ω), so that, for any x0 , we have
Z ∞
sin Ω(x − x0 )
g(x)
dx.
g(x0 ) =
π(x − x0 )
−∞
We describe this by saying that the function f (x) = sinπxΩx has the sifting
property for all Ω-band-limited functions g(x).
As Ω grows larger, f (0) approaches +∞, while f (x) goes to zero for
x 6= 0. The limit is therefore not a function; it is a generalized function
called the Dirac delta function at zero, denoted δ(x). For this reason the
function f (x) = sinπxΩx is called an approximate delta function. The FT
of δ(x) is the function F (ω) = 1 for all ω. The Dirac delta function δ(x)
enjoys the sifting property for all g(x); that is,
Z ∞
g(x0 ) =
g(x)δ(x − x0 )dx.
−∞
It follows from the sifting and shifting properties that the FT of δ(x − x0 )
is the function eix0 ω .
The formula for the inverse FT now says
Z ∞
1
δ(x) =
e−ixω dω.
(9.3)
2π −∞
If we try to make sense of this integral according to the rules of calculus we
get stuck quickly. The problem is that the integral formula doesn’t mean
quite what it does ordinarily and the δ(x) is not really a function, but
an operator on functions; it is sometimes called a distribution. The Dirac
deltas are mathematical fictions, not in the bad sense of being lies or fakes,
but in the sense of being made up for some purpose. They provide helpful
descriptions of impulsive forces, probability densities in which a discrete
point has nonzero probability, or, in array processing, objects far enough
away to be viewed as occupying a discrete point in space.
We shall treat the relationship expressed by Equation (9.3) as a formal
statement, rather than attempt to explain the use of the integral in what
is surely an unconventional manner.
120
CHAPTER 9. PROPERTIES OF THE FOURIER TRANSFORM
If we move the discussion into the ω domain and define the Dirac delta
1
function δ(ω) to be the FT of the function that has the value 2π
for all
1 −iω0 x
is δ(ω − ω0 ),
x, then the FT of the complex exponential function 2π e
visualized as a ”spike” at ω0 , that is, a generalized function that has the
value +∞ at ω = ω0 and zero elsewhere. This is a useful result, in that
it provides the motivation for considering the Fourier transform of a signal
s(t) containing hidden periodicities. If s(t) is a sum of complex exponentials
with frequencies −ωn , then its Fourier transform will consist of Dirac delta
functions δ(ω − ωn ). If we then estimate the Fourier transform of s(t) from
sampled data, we are looking for the peaks in the Fourier transform that
approximate the infinitely high spikes of these delta functions.
Exercise 9.9 Use the fact that sgn(x) = 2u(x) − 1 and Exercise 9.5 to
show that f (x) = u(x) has the FT F (ω) = i/ω + πδ(ω).
Exercise 9.10 Let f, F be a FT pair. Let g(x) =
the FT of g(x) is G(ω) = πF (0)δ(ω) +
Rx
−∞
f (y)dy. Show that
iF (ω)
ω .
Hint: For u(x) the Heaviside function we have
Z x
Z ∞
f (y)dy =
f (y)u(x − y)dy.
−∞
9.6
−∞
More Properties of the Fourier Transform
We can use properties of the Dirac delta functions to extend the Parseval
Equation in Fourier series to Fourier transforms, where it is usually called
the Parseval-Plancherel Equation.
Exercise 9.11 Let f (x), F (ω) and g(x), G(ω) be Fourier transform pairs.
Use Equation (9.3) to establish the Parseval-Plancherel equation
Z
Z
1
F (ω)G(ω)dω,
hf, gi = f (x)g(x)dx =
2π
from which it follows that
||f ||2 = hf, f i =
Z
|f (x)|2 dx =
1
2π
Z
|F (ω)|2 dω.
9.7. CONVOLUTION FILTERS
121
Exercise 9.12 The one-sided Laplace transform (LT) of f is F given by
Z
F(z) =
∞
f (x)e−zx dx.
0
Compute F(z) for f (x) = u(x), the Heaviside function. Compare F(−iω)
with the FT of u.
9.7
Convolution Filters
Let h(x) and H(ω) be a Fourier-transform pair. We have mentioned several
times the basic problem of estimating the function H(ω) from finitely many
values of h(x); for convenience now we use the symbols h and H, rather
than f and F , as we did previously. Sometimes it is H(ω) that we really
want. Other times it is the unmeasured values of h(x) that we want, and
we try to estimate them by first estimating H(ω). Sometimes, neither
of these functions is our main interest; it may be the case that what we
want is another function, f (x), and h(x) is a distorted version of f (x).
For example, suppose that x is time and f (x) represents what a speaker
says into a telephone. The phone line distorts the signal somewhat, often
diminishing the higher frequencies. What the person at the other end
hears is not f (x), but a related signal function, h(x). For another example,
suppose that f (x, y) is a two-dimensional picture viewed by someone with
poor eyesight. What that person sees is not f (x, y) but a related function,
h(x, y), that is a distorted version of the true f (x, y). In both examples,
our goal is to recover the original undistorted signal or image. To do this,
it helps to model the distortion. Convolution filters are commonly used for
this purpose.
9.7.1
Blurring and Convolution Filtering
We suppose that what we measure are not values of f (x), but values of
h(x), where the Fourier transform of h(x) is
H(ω) = F (ω)G(ω).
The function G(ω) describes the effects of the system, the telephone line in
our first example, or the weak eyes in the second example, or the refraction
of light as it passes through the atmosphere, in optical imaging. If we
can use our measurements of h(x) to estimate H(ω) and if we have some
knowledge of the system distortion function, that is, some knowledge of
G(ω) itself, then there is a chance that we can estimate F (ω), and thereby
estimate f (x).
122
CHAPTER 9. PROPERTIES OF THE FOURIER TRANSFORM
If we apply the Fourier Inversion Formula to H(ω) = F (ω)G(ω), we get
Z
1
h(x) =
2π
F (ω)G(ω)e−iωx dx.
(9.4)
The function h(x) that results is h(x) = (f ∗ g)(x), the convolution of the
functions f (x) and g(x), with the latter given by
g(x) =
1
2π
Z
G(ω)e−iωx dx.
(9.5)
Note that, if f (x) = δ(x), then h(x) = g(x). In the image processing
example, this says that if the true picture f is a single bright spot, the
blurred image h is g itself. For that reason, the function g is called the
point-spread function of the distorting system.
Convolution filtering refers to the process of converting any given function, say f (x), into a different function, say h(x), by convolving f (x) with a
fixed function g(x). Since this process can be achieved by multiplying F (ω)
by G(ω) and then inverse Fourier transforming, such convolution filters are
studied in terms of the properties of the function G(ω), known in this context as the system transfer function, or the optical transfer function (OTF);
when ω is a frequency, rather than a spatial frequency, G(ω) is called the
frequency-response function of the filter. The magnitude of G(ω), |G(ω)|,
is called the modulation transfer function (MTF). The study of convolution filters is a major part of signal processing. Such filters provide both
reasonable models for the degradation signals undergo, and useful tools
for reconstruction. For an important example of the use of filtering, see
Appendix: Reverberation and Echo-Cancellation.
Let us rewrite Equation (9.4), replacing F (ω) with its definition, as
given by Equation (9.1). Then we have
Z
h(x) =
1
(
2π
Z
f (t)eiωt dt)G(ω)e−iωx dω.
Interchanging the order of integration, we get
Z
Z
1
h(x) = f (t)(
G(ω)eiω(t−x) dω)dt.
2π
(9.6)
(9.7)
The inner integral is g(x − t), so we have
Z
h(x) =
f (t)g(x − t)dt;
this is the definition of the convolution of the functions f and g.
(9.8)
9.8. FUNCTIONS IN THE SCHWARTZ CLASS
9.7.2
123
Low-Pass Filtering
If we know the nature of the blurring, then we know G(ω), at least to some
degree of precision. We can try to remove the blurring by taking measurements of h(x), then estimating H(ω) = F (ω)G(ω), then dividing these
numbers by the value of G(ω), and then inverse Fourier transforming. The
problem is that our measurements are always noisy, and typical functions
G(ω) have many zeros and small values, making division by G(ω) dangerous, except where the values of G(ω) are not too small. These values of ω
tend to be the smaller ones, centered around zero, so that we end up with
estimates of F (ω) itself only for the smaller values of ω. The result is a
low-pass filtering of the object f (x).
To investigate such low-pass filtering, we suppose that G(ω) = 1, for
|ω| ≤ Ω, and is zero, otherwise. Then the filter is called the ideal Ω-lowpass filter. In the far-field propagation model, the variable x is spatial,
and the variable ω is spatial frequency, related to how the function f (x)
changes spatially, as we move x. Rapid changes in f (x) are associated with
values of F (ω) for large ω. For the case in which the variable x is time, the
variable ω becomes frequency, and the effect of the low-pass filter on f (x)
is to remove its higher-frequency components.
One effect of low-pass filtering in image processing is to smooth out
the more rapidly changing features of an image. This can be useful if
these features are simply unwanted oscillations, but if they are important
detail, such as edges, the smoothing presents a problem. Restoring such
wanted detail is often viewed as removing the unwanted effects of the lowpass filtering; in other words, we try to recapture the missing high-spatialfrequency values that have been zeroed out. Such an approach to image
restoration is called frequency-domain extrapolation . How can we hope
to recover these missing spatial frequencies, when they could have been
anything? To have some chance of estimating these missing values we need
to have some prior information about the image being reconstructed.
9.8
Functions in the Schwartz Class
As we noted previously, the integrals in Equations (9.1) and (9.2) may have
to be interpreted carefully if they are to be applied to fairly general classes
of functions f (x) and F (ω). In this section we describe a class of functions
for which these integrals can be defined. This section may be skipped with
no great loss.
If both f (x) and F (ω) are measurable and absolutely integrable then
both functions are continuous. To illustrate some of the issues involved, we
consider the functions in the Schwartz class [116]
124
9.8.1
CHAPTER 9. PROPERTIES OF THE FOURIER TRANSFORM
The Schwartz Class
A function f (x) is said to be in the Schwartz class, or to be a Schwartz
function, if f (x) is infinitely differentiable and
|x|m f (n) (x) → 0
(9.9)
as x goes to −∞ and +∞. Here f (n) (x) denotes the nth derivative of f (x).
2
An example of a Schwartz function is f (x) = e−x , with Fourier transform
√ −ω2 /4
F (ω) = πe
. The following proposition tells us that Schwartz functions are absolutely integrable on the real line, and so the Fourier transform
is well defined.
Proposition 9.1 If f (x) is a Schwartz function, then
Z ∞
|f (x)|dx < +∞.
−∞
Proof: There is a constant M > 0 such that |x|2 |f (x)| ≤ 1, for |x| ≥ M .
Then
Z ∞
Z M
Z
|f (x)|dx ≤
|f (x)|dx +
|x|−2 dx < +∞.
−∞
−M
|x|≥M
If f (x) is a Schwartz function, then so is its Fourier transform. To prove
the Fourier Inversion Formula it is sufficient to show that
Z ∞
f (0) =
F (ω)dω/2π.
(9.10)
−∞
Write
2
2
2
f (x) = f (0)e−x + (f (x) − f (0)e−x ) = f (0)e−x + g(x).
(9.11)
Then g(0) = 0, so g(x) = xh(x), where h(x) = g(x)/x is also a Schwartz
function. Then the Fourier transform of g(x) is the derivative of the Fourier
transform of h(x); that is,
G(ω) = H 0 (ω).
(9.12)
The function H(ω) is a Schwartz function, so it goes to zero at the infinities. Computing the Fourier transform of both sides of Equation (9.11), we
obtain
√
2
(9.13)
F (ω) = f (0) πe−ω /4 + H 0 (ω).
9.8. FUNCTIONS IN THE SCHWARTZ CLASS
125
Therefore,
Z ∞
F (ω)dω = 2πf (0) + H(+∞) − H(−∞) = 2πf (0).
(9.14)
−∞
To prove the Fourier Inversion Formula, we let K(ω) = F (ω)e−ix0 ω , for
fixed x0 . Then the inverse Fourier transform of K(ω) is k(x) = f (x + x0 ),
and therefore
Z ∞
K(ω)dω = 2πk(0) = 2πf (x0 ).
(9.15)
−∞
In the next subsection we consider a discontinuous f (x).
9.8.2
A Discontinuous Function
1
, for |x| ≤ A, and f (x) = 0, otherwise.
Consider the function f (x) = 2A
The Fourier transform of this f (x) is
F (ω) =
sin(Aω)
,
Aω
(9.16)
for all real ω 6= 0, and F (0) = 1. Note that F (ω) is nonzero throughout
the real line, except for isolated zeros, but that it goes to zero as we go
to the infinities. This is typical behavior. Notice also that the smaller the
π
A, the slower F (ω) dies out; the first zeros of F (ω) are at |ω| = A
, so the
main lobe widens as A goes to zero. The function f (x) is not continuous,
so its Fourier transform cannot be absolutely integrable. In this case, the
Fourier Inversion Formula must be interpreted as involving convergence in
the L2 norm.
126
CHAPTER 9. PROPERTIES OF THE FOURIER TRANSFORM
Chapter 10
The Fourier Transform
and Convolution Filtering
10.1
Chapter Summary
A major application of the Fourier transform is in the study of systems.
We may think of a system as a device that accepts functions as input
and produces functions as output. For example, the differentiation system
accepts a differentiable function f (x) as input and produces its derivative
function f 0 (x) as output. If the input is the function f (x) = 5f1 (x)+3f2 (x),
then the output is 5f10 (x) + 3f20 (x); the differentiation system is linear.
We shall describe systems algebraically by h = T f , where f is any input
function, h is the resulting output function from the system, and T denotes
the operator induced by the system itself. For the differentiation system
we would write the differentiation operator as T f = f 0 .
10.2
Linear Filters
The system operator T is linear if
T (af1 + bf2 ) = aT (f1 ) + bT (f2 ),
for any scalars a and b and functions f1 and f2 . We shall be interested
only in linear systems.
10.3
Shift-Invariant Filters
We denote by Sa the system that shifts an input function by a; that is,
if f (x) is the input to system Sa , then f (x − a) is the output. A system
127
128CHAPTER 10. THE FOURIER TRANSFORM AND CONVOLUTION FILTERING
operator T is said to be shift-invariant if
T (Sa (f )) = Sa (T (f )),
which means that, if input f (x) leads to output h(x), then input f (x − a)
leads to output h(x − a); shifting the input just shifts the output. When
the variable x is time, we speak of time-invariant systems. When T is a
shift-invariant linear system operator we say that T is a SILO.
10.4
Some Properties of a SILO
We show first that (T f )0 = T f 0 . Suppose that h(x) = (T f )(x). For any
∆x we can write
f (x + ∆x) = (S−∆x f )(x)
and
(T S−∆x f )(x) = (S−∆x T f )(x) = (S−∆x h)(x) = h(x + ∆x).
When the input to the system is
1 f (x + ∆x) − f (x) ,
∆x
the output is
1 h(x + ∆x) − h(x) .
∆x
Now we take limits, as ∆x → 0, so that, assuming continuity, we can
conclude that T f 0 = h0 . We apply this now to the case in which f (x) =
e−ixω for some real constant ω.
Since f 0 (x) = −iωf (x) and f (x) = ωi f 0 (x) in this case, we have
h(x) = (T f )(x) =
i
i
(T f 0 )(x) = h0 (x),
ω
ω
so that
h0 (x) = −iωh(x).
Solving this differential equation, we obtain
h(x) = ce−ixω ,
for some constant c. Note that since the c may vary when we vary the
selected ω, we must write c = c(ω). The main point here is that, when T is
a SILO and the input function is a complex exponential with frequency ω,
then the output is again a complex exponential with the same frequency
ω, multiplied by a complex number c(ω). This multiplication by c(ω) only
modifies the amplitude and phase of the exponential function; it does not
alter its frequency. So SILOs do not change the input frequencies, but only
modify their strengths and phases.
10.5. THE DIRAC DELTA
129
Exercise 10.1 Let T be a SILO. Show that T is a convolution operator by
showing that, for each input function f , the output function h = T f is the
convolution of f with g, where g(x) is the inverse FT of the function c(ω)
obtained above. Hint: write the input function f (x) as
Z ∞
1
F (ω)e−ixω dω,
f (x) =
2π −∞
and assume that
(T f )(x) =
1
2π
Z
∞
F (ω)(T e−ixω )dω.
−∞
Now that we know that a SILO is a convolution filter, the obvious
question to ask is What is g(x)? This is the system identification problem.
One way to solve this problem is to consider what the output is when the
input is the Heaviside function u(x). In that case, we have
Z ∞
Z ∞
Z x
h(x) =
u(y)g(x − y)dy =
g(x − y)dy =
g(t)dt.
−∞
0
−∞
0
Therefore, h (x) = g(x).
10.5
The Dirac Delta
The Dirac delta, denoted δ(x), is not truly a function. Its job is best
described by its sifting property: for any fixed value of x,
Z
f (x) = f (y)δ(x − y)dy.
In order for the Dirac delta to perform the sifting operator on any f (x) it
would have to be zero, except at x = 0, where it would have to be infinitely
large. It is possible to give a rigorous treatment of the Dirac delta, using
generalized functions, but that is beyond the scope of this course. The
Dirac delta is useful in our discussion of filters, which is why it is used.
10.6
The Impulse Response Function
We can solve the system identification problem by seeing what the output
is when the input is the Dirac delta; as we shall see, the output is g(x);
that is, T δ = g. Since the SILO T is a convolution operator, we know that
Z ∞
h(x) =
δ(y)g(x − y)dy = g(x).
−∞
For this reason, the function g(x) is called the impulse-response function
of the system.
130CHAPTER 10. THE FOURIER TRANSFORM AND CONVOLUTION FILTERING
10.7
Using the Impulse-Response Function
Suppose now that we take as our input the function f (x), but write it as
Z
f (x) = f (y)δ(x − y)dy.
Then, since T is linear, and the integral is more or less a big sum, we have
Z
Z
T (f )(x) = f (y)T (δ(x − y))dy = f (y)g(x − y)dy.
The function on the right side of this equation is the convolution of the
functions f and g, written f ∗ g. This shows, as we have seen, that T
does its job by convolving any input function f with its impulse-response
function g, to get the output function h = T f = f ∗ g. It is useful to
remember that order does not matter in convolution:
Z
Z
f (y)g(x − y)dy = g(y)f (x − y)dy.
10.8
The Filter Transfer Function
Now let us take as input the complex exponential f (x) = e−ixω , where ω
is fixed. Then the output is
Z
Z
h(x) = T (f )(x) = e−iyω g(x − y)dy = g(y)e−i(x−y)ω dy = e−ixω G(ω),
where G(ω) is the Fourier transform of the impulse-response function g(x);
note that G(ω) = c(ω) from Exercise 10.1. This tells us that when the input
to T is a complex exponential function with “frequency” ω, the output is
the same complex exponential function, the “frequency” is unchanged, but
multiplied by a complex number G(ω). This multiplication by G(ω) can
change both the amplitude and phase of the complex exponential, but the
“frequency” ω does not change. In filtering, this function G(ω) is called the
transfer function of the filter, or sometimes the frequency-response function.
10.9
The Multiplication Theorem for Convolution
Now let’s take as input a function f (x), but now write it using Equation
(8.2),
Z
1
f (x) =
F (ω)e−ixω dω.
2π
10.10. SUMMING UP
131
Then, taking the operator inside the integral, we find that the output is
Z
Z
1
1
h(x) = T (f )(x) =
F (ω)T (e−ixω )dω =
e−ixω F (ω)G(ω)dω.
2π
2π
But, from Equation (8.2), we know that
Z
1
e−ixω H(ω)dω.
h(x) =
2π
This tells us that the Fourier transform H(ω) of the function h = f ∗ g is
the simply product of F (ω) and G(ω); this is the most important property
of convolution.
10.10
Summing Up
It is helpful to take stock of what we have just discovered:
• 1. if h = T (f ) then h0 = T (f 0 );
• 2. T (e−iωx ) = G(ω)e−iωx ;
• 3. writing
1
f (x) =
2π
Z
F (ω)e−iωx dω,
we obtain
h(x) = (T f )(x) =
so that
1
h(x) =
2π
Z
1
2π
Z
F (ω)T (e−iωx )dω,
F (ω)G(ω)e−iωx dω;
• 4. since we also have
1
h(x) =
2π
Z
H(ω)e−iωx dω,
we can conclude that H(ω) = F (ω)G(ω);
• 5. if we define g(x) to be (T δ)(x), then
g(x − y) = (T δ)(x − y).
Writing
Z
f (x) =
f (y)δ(x − y)dy,
132CHAPTER 10. THE FOURIER TRANSFORM AND CONVOLUTION FILTERING
we get
Z
h(x) = (T f )(x) =
Z
f (y)(T δ)(x − y)dy =
f (y)g(x − y)dy,
so that h is the convolution of f and g;
• 6. g(x) is the inverse Fourier transform of G(ω).
10.11
A Project
Previously, we allowed the operator T to move inside the integral. We know,
however, that this is not always permissible. The differentiation operator
T = D, with D(f ) = f 0 , cannot always be moved inside the integral;
as we learn in advanced calculus, we cannot always differentiate under
the integral sign. This raises the interesting issue of how to represent the
differentiation operator as a shift-invariant linear filter. In particular, what
is the impulse-response function? The exercise is to investigate this issue.
Pay some attention to the problem of differentiating the delta function,
to the Green’s Function method for representing the inversion of linear
differential operators, and to generalized functions or distributions.
10.12
Band-Limiting
Suppose that G(ω) = χΩ (ω). Then if F (ω) is the Fourier transform of the
input function, the Fourier transform of the output function h(t) will be
F (ω), if |ω| ≤ Ω ;
H(ω) =
0, if |ω| > Ω .
The effect of the filter is to leave values F (ω) unchanged, if |ω| ≤ Ω, and
to replace F (ω) with zero, if |ω| > Ω. This is called band-limiting. Since
the inverse Fourier transform of G(ω) is
g(t) =
sin(Ωt)
,
πt
the band-limiting system can be described using convolution:
Z
sin(Ω(t − s))
h(t) = f (s)
ds.
π(t − s)
Chapter 11
Infinite Sequences and
Discrete Filters
11.1
Chapter Summary
Many textbooks on signal processing present filters in the context of infinite
sequences. Although infinite sequences are no more realistic than functions
f (t) defined for all times t, they do simplify somewhat the discussion of
filtering, particularly when it comes to the impulse response and to random
signals. Systems that have as input and output infinite sequences are called
discrete systems.
11.2
Shifting
We denote by f = {fn }∞
n=−∞ an infinite sequence. For a fixed integer
k, the system that accepts f as input and produces as output the shifted
sequence h = {hn = fn−k } is denoted Sk ; therefore, we write h = Sk f .
11.3
Shift-Invariant Discrete Linear Systems
A discrete system T is linear if
T (af 1 + bf 2 ) = aT (f 1 ) + bT (f 2 ),
for any infinite sequences f 1 and f 2 and scalars a and b. As previously,
a system T is shift-invariant if T Sk = Sk T . This means that if input f
has output h, then input Sk f has output Sk h; shifting the input by k just
shifts the output by k.
133
134CHAPTER 11. INFINITE SEQUENCES AND DISCRETE FILTERS
11.4
The Delta Sequence
The delta sequence δ = {δn } has δ0 = 1 and δn = 0, for n not equal to zero.
Then Sk (δ) is the sequence Sk (δ) = {δn−k }. For any sequence f we have
fn =
∞
X
∞
X
fm δn−m =
m=−∞
δm fn−m .
(11.1)
m=−∞
This means that we can write the sequence f as an infinite sum of the
sequences Sm δ:
∞
X
f=
fm Sm (δ).
(11.2)
m=−∞
As in the continuous case, we use the delta sequence to understand better
how a shift-invariant discrete linear system T works.
11.5
The Discrete Impulse Response
We let δ be the input to the shift-invariant discrete linear system T , and
denote the output sequence by g = T (δ). Now, for any input sequence f
with h = T (f ), we write f using Equation (11.2), so that
h = T (f ) = T (
∞
X
m=−∞
=
∞
X
∞
X
fm Sm δ) =
fm Sm T (δ) =
m=−∞
fm T Sm (δ)
m=−∞
∞
X
fm Sm (g).
m=−∞
Therefore, we have
hn =
∞
X
fm gn−m ,
(11.3)
m=−∞
for each n. Equation (11.3) is the definition of discrete convolution or the
convolution of sequences. This tells us that the output sequence h = T (f ) is
the convolution of the input sequence f with the impulse-response sequence
g; that is, h = T (f ) = f ∗ g.
11.6
The Discrete Transfer Function
Associated with each ω in the interval [0, 2π) we have the sequence eω =
{e−inω }∞
n=−∞ ; the minus sign in the exponent is just for notational convenience later. What happens when we let f = eω be the input to the system
11.7. USING FOURIER SERIES
135
T ? The output sequence h will be the convolution of the sequence eω with
the sequence g; that is,
∞
X
hn =
∞
X
e−imω gn−m =
m=−∞
gm e−i(n−m)ω = e−inω
m=−∞
∞
X
gm eimω .
m=−∞
Defining
∞
X
G(ω) =
gm eimω
(11.4)
m=−∞
for 0 ≤ ω < 2π, we can write
hn = e−inω G(ω),
or
h = T (eω ) = G(ω)eω .
This tells us that when eω is the input, the output is a multiple of the
input; the “frequency” ω has not changed, but the multiplication by G(ω)
can alter the amplitude and phase of the complex-exponential sequence.
Notice that Equation (11.4) is the definition of the Fourier series associated with the sequence g viewed as a sequence of Fourier coefficients. It
follows that, once we have the function G(ω), we can recapture the original
gn from the formula for Fourier coefficients:
Z 2π
1
G(ω)e−inω dω.
(11.5)
gn =
2π 0
11.7
Using Fourier Series
For any sequence f = {fn }, we can define the function
∞
X
F (ω) =
fn einω ,
(11.6)
n=−∞
for ω in the interval [0, 2π). Then each fn is a Fourier coefficient of F (ω)
and we have
Z 2π
1
fn =
F (ω)e−inω dω.
(11.7)
2π 0
It follows that we can write
f=
1
2π
Z
2π
F (ω)eω dω.
(11.8)
0
We interpret this as saying that the sequence f is a superposition of the
individual sequences eω , with coefficients F (ω).
136CHAPTER 11. INFINITE SEQUENCES AND DISCRETE FILTERS
11.8
The Multiplication Theorem for Convolution
Now consider f as the input to the system T , with h = T (f ) as output.
Using Equation (11.8), we can write
1 Z 2π
h = T (f ) = T
F (ω)eω dω
2π 0
Z 2π
Z 2π
1
1
F (ω)T (eω )dω =
F (ω)G(ω)eω dω.
=
2π 0
2π 0
But, applying Equation (11.8) to h, we have
Z 2π
1
H(ω)eω dω.
h=
2π 0
It follows that H(ω) = F (ω)G(ω), which is analogous to what we found
in the case of continuous systems. This tells us that the system T works
by multiplying the function F (ω) associated with the input by the transfer
function G(ω), to get the function H(ω) associated with the output h =
T (f ). In the next section we give an example.
11.9
The Three-Point Moving Average
We consider now the linear, shift-invariant system T that performs the
three-point moving average operation on any input sequence. Let f be any
input sequence. Then the output sequence is h with
hn =
1
(fn−1 + fn + fn+1 ).
3
The impulse-response sequence is g with g−1 = g0 = g1 = 31 , and gn = 0,
otherwise.
To illustrate, for the input sequence with fn = 1 for all n, the output
is hn = 1 for all n. For the input sequence
f = {..., 3, 0, 0, 3, 0, 0, ...},
the output h is again the sequence hn = 1 for all n. If our input is
the difference of the previous two input sequences, that is, the input is
{..., 2, −1, −1, 2, −1, −1, ...}, then the output is the sequence with all entries equal to zero.
The transfer function G(ω) is
G(ω) =
1
1 iω
(e + 1 + e−iω ) = (1 + 2 cos ω).
3
3
11.10. AUTOCORRELATION
137
4π
The function G(ω) has a zero when cos ω = − 21 , or when ω = 2π
3 or ω = 3 .
Notice that the sequence given by
2π
2π
2π
n
fn = ei 3 n + e−i 3 n = 2 cos
3
is the sequence {..., 2, −1, −1, 2, −1, −1, ...}, which, as we have just seen,
has as its output the zero sequence. We can say that the reason the output
4π
is zero is that the transfer function has a zero at ω = 2π
3 and at ω = 3 =
−2π
3 . Those complex-exponential components of the input sequence that
correspond to values of ω where G(ω) = 0 will be removed in the output.
This is a useful role that filtering can play; we can null out undesired
complex-exponential components of an input signal by designing G(ω) to
have a root at those values of ω.
11.10
Autocorrelation
If we take the input to our convolution filter to be the sequence f related
to the impulse-response sequence by
fn = g −n ,
then the output sequence is h with entries
hn =
+∞
X
gk gk−n
k=−∞
and H(ω) = |G(ω)|2 . The sequence h is called the autocorrelation sequence
for g and |G(ω)|2 is the power spectrum of g.
Autocorrelation sequences have special properties not shared with ordinary sequences, as the exercise below shows. The Cauchy inequality is
valid for infinite sequences: with the length of g defined by
kgk =
+∞
X
|gn |2
1/2
n=−∞
and the inner product of any sequences f and g given by
hf, gi =
+∞
X
fn gn ,
n=−∞
we have
|hf, gi| ≤ kf k kgk,
with equality if and only if g is a constant multiple of f .
Exercise 11.1 Let h be the autocorrelation sequence for g. Show that
h−n = hn and h0 ≥ |hn | for all n.
138CHAPTER 11. INFINITE SEQUENCES AND DISCRETE FILTERS
11.11
Stable Systems
An infinite sequence f = {fn } is called bounded if there is a constant
A > 0 such that |fn | ≤ A, for all n. The shift-invariant linear system with
impulse-response sequence g = T (δ) is said to be stable [179] if the output
sequence h = {hn } is bounded whenever the input sequence f = {fn } is.
In Exercise 11.2 below we ask the reader to prove that, in order for the
system to be stable, it is both necessary and sufficient that
∞
X
|gn | < +∞.
n=−∞
Given a doubly infinite sequence, g = {gn }+∞
n=−∞ , we associate with g its
z-transform, the function of the complex variable z given by
G(z) =
+∞
X
gn z −n .
n=−∞
Doubly infinite series of this form are called Laurent series and occur in
the representation of functions analytic in an annulus. Note that if we take
z = e−iω then G(z) becomes G(ω) as defined by Equation (11.4). The
z-transform is a somewhat more flexible tool in that we are not restricted
to those sequences g for which the z-transform is defined for z = e−iω .
Exercise 11.2 Show that the shift-invariant linear system with impulseresponse sequence g is stable if and only if
+∞
X
|gn | < +∞.
n=−∞
Hint: If, on the contrary,
+∞
X
|gn | = +∞,
n=−∞
consider as input the bounded sequence f with
fn = g−n /|gn |
and show that h0 = +∞.
Exercise 11.3 Consider the linear system determined by the sequence g0 =
2, gn = ( 21 )|n| , for n 6= 0. Show that this system is stable. Calculate the
z-transform of {gn } and determine its region of convergence.
11.12. CAUSAL FILTERS
11.12
139
Causal Filters
The shift-invariant linear system with impulse-response sequence g is said
to be a causal system if the sequence {gn } is itself causal; that is, gn = 0
for n < 0.
Exercise 11.4 Show that the function G(z) = (z−z0 )−1 is the z-transform
of a causal sequence g, where z0 is a fixed complex number. What is the
region of convergence? Show that the resulting linear system is stable if
and only if |z0 | < 1.
140CHAPTER 11. INFINITE SEQUENCES AND DISCRETE FILTERS
Chapter 12
Convolution and the
Vector DFT
12.1
Chapter Summary
Convolution is an important concept in signal processing and occurs in several distinct contexts. In previous chapters, we considered the convolution
of functions of a continuous variable and of infinite sequences. The reader
may also recall an earlier encounter with convolution in a course on differential equations. In this chapter we shall discuss non-periodic convolution
and periodic convolution of vectors.
The simplest example of convolution is the non-periodic convolution of
finite vectors, which is what we do to the coefficients when we multiply two
polynomials together.
12.2
Non-periodic Convolution
Recall the algebra problem of multiplying one polynomial by another. Suppose
A(x) = a0 + a1 x + ... + aM xM
and
B(x) = b0 + b1 x + ... + bN xN .
Let C(x) = A(x)B(x). With
C(x) = c0 + c1 x + ... + cM +N xM +N ,
each of the coefficients cj , j = 0, ..., M +N, can be expressed in terms of the
am and bn (an easy exercise!). The vector c = (c0 , ..., cM +N ) is called the
141
142
CHAPTER 12. CONVOLUTION AND THE VECTOR DFT
non-periodic convolution of the vectors a = (a0 , ..., aM ) and b = (b0 , ..., bN ).
Non-periodic convolution can be viewed as a particular case of periodic
convolution, as we shall see.
12.3
The DFT as a Polynomial
Given the complex numbers f0 , f1 , ..., fN −1 , we form the vector f = (f0 , f1 , ..., fN −1 )T .
The DFT of the vector f is the function
DF Tf (ω) =
N
−1
X
fn einω ,
n=0
defined for ω in the interval [0, 2π). Because einω = (eiω )n , we can write
the DFT as a polynomial
DF Tf (ω) =
N
−1
X
fn (eiω )n .
n=0
If we have a second vector, say d = (d0 , d1 , ..., dN −1 )T , then we define
DF Td (ω) similarly. When we multiply DF Tf (ω) by DF Td (ω), we are
multiplying two polynomials together, so the result is a sum of powers of
the form
c0 + c1 eiω + c2 (eiω )2 + ... + c2N −2 (eiω )2N −2 ,
(12.1)
for
cj = f0 dj + f1 dj−1 + ... + fj d0 .
This is non-periodic convolution again. In the next section, we consider
what happens when, instead of using arbitrary values of ω, we consider
only the N special values ωk = 2π
N k, k = 0, 1, ..., N − 1. Because of the
periodicity of the complex exponential function, we have
(eiωk )N +j = (eiωk )j ,
for each k. As a result, all the powers higher than N − 1 that showed
up in the previous multiplication in Equation (12.1) now become equal
to lower powers, and the product now only has N terms, instead of the
2N − 1 terms we got previously. When we calculate the coefficients of
these powers, we find that we get more than we got when we did the nonperiodic convolution. Now what we get is called periodic convolution.
12.4. THE VECTOR DFT AND PERIODIC CONVOLUTION
12.4
143
The Vector DFT and Periodic Convolution
As we just discussed, non-periodic convolution is another way of looking
at the multiplication of two polynomials. This relationship between convolution on the one hand and multiplication on the other is a fundamental
aspect of convolution. Whenever we have a convolution we should ask what
related mathematical objects are being multiplied. We ask this question
now with regard to periodic convolution; the answer turns out to be the
vector discrete Fourier transform (vDFT).
12.4.1
The Vector DFT
Let f = (f0 , f1 , ..., fN −1 )T be a column vector whose entries are N arbitrary
complex numbers. For k = 0, 1, ..., N − 1, we let
Fk =
N
−1
X
fn e2πikn/N = DF Tf (ωk ).
(12.2)
n=0
Then we let F = (F0 , F1 , ..., FN −1 )T be the column vector with the N
complex entries Fk . The vector F is called the vector discrete Fourier
transform of the vector f , and we denote it by F = vDF Tf .
The entries of the vector F = vDF Tf are N equi-spaced values of the
function DF Tf (ω). If the Fourier transform F (ω) is zero for ω outside the
interval [0, 2π], and fn = f (n), for n = 0, 1, ..., N − 1, then the entries of
the vector F are N estimated values of F (ω).
Exercise 12.1 Let fn be real, for each n. Show that FN −k = Fk , for each
k.
As we can see from Equation (12.2), there are N multiplications involved in the calculation of each Fk , and there are N values of k, so it
would seem that, in order to calculate the vector DFT of f , we need N 2
multiplications. In many applications, N is quite large and calculating the
vector F using the definition would be unrealistically time-consuming. The
fast Fourier transform algorithm (FFT), to be discussed later, gives a quick
way to calculate the vector F from the vector f . The FFT, usually credited
to Cooley and Tukey, was discovered in the mid-1960’s and revolutionized
signal and image processing.
12.4.2
Periodic Convolution
Given the N by 1 vectors f and d with complex entries fn and dn , respectively, we define a third N by 1 vector f ∗ d, the periodic convolution of f
144
CHAPTER 12. CONVOLUTION AND THE VECTOR DFT
and d, to have the entries
(f ∗ d)n = f0 dn + f1 dn−1 + ... + fn d0 + fn+1 dN −1 + ... + fN −1 dn+1 ,(12.3)
for n = 0, 1, ..., N − 1.
Notice that the term on the right side of Equation (12.3) is the sum of
all products of entries, one from f and one from d, where the sum of their
respective indices is either n or n + N . Periodic convolution is illustrated
in Figure 12.1. The first exercise relates the periodic convolution to the
vector DFT.
In the exercises that follow we investigate properties of the vector DFT
and relate it to periodic convolution. It is not an exaggeration to say that
these two exercises are the most important ones in signal processing. The
first exercise establishes for finite vectors and periodic convolution a version
of the multiplication theorems we saw earlier for continuous and discrete
convolution.
Exercise 12.2 Let F = vDF Tf and D = vDF Td . Define a third vector
E having for its kth entry Ek = Fk Dk , for k = 0, ..., N − 1. Show that E
is the vDFT of the vector f ∗ d.
The vector vDF Tf can be obtained from the vector f by means of
matrix multiplication by a certain matrix G, called the DFT matrix. The
matrix G has an inverse that is easily computed and can be used to go
from F = vDF Tf back to the original f . The details are in Exercise 12.3.
Exercise 12.3 Let G be the N by N matrix whose entries are Gjk =
ei(j−1)(k−1)2π/N . The matrix G is sometimes called the DFT matrix. Show
that the inverse of G is G−1 = N1 G† , where G† is the conjugate transpose
of the matrix G. Then f ∗ d = G−1 E = N1 G† E.
12.5
The vDFT of Sampled Data
For a doubly infinite sequence {fn | − ∞ < n < ∞}, the function of F (γ)
given by the infinite series
F (γ) =
∞
X
fn einγ
(12.4)
n=−∞
is sometimes called the discrete-time Fourier transform (DTFT) of the
sequence, and the fn are called its Fourier coefficients. The function F (γ)
is 2π-periodic, so we restrict our attention to the interval 0 ≤ γ ≤ 2π. If
12.5. THE VDFT OF SAMPLED DATA
145
we start with a function F (γ), for 0 ≤ γ ≤ 2π, we can find the Fourier
coefficients by
Z 2π
1
F (γ)e−iγn dγ.
(12.5)
fn =
2π 0
12.5.1
Superposition of Sinusoids
This equation suggests a model for a function of a continuous variable x:
Z 2π
1
f (x) =
F (γ)e−iγx dγ.
(12.6)
2π 0
The values fn then can be viewed as fn = f (n), that is, the fn are sampled
values of the function f (x), sampled at the points x = n. The function
F (γ) is now said to be the spectrum of the function f (x). The function
f (x) is then viewed as a superposition of infinitely many simple functions,
namely the complex exponentials or sinusoidal functions e−iγx , for values
of γ that lie in the interval [0, 2π]. The relative contribution of each e−iγx
1
to f (x) is given by the complex number 2π
F (γ).
12.5.2
Rescaling
In the model just discussed, we sampled the function f (x) at the points x =
n. In applications, the variable x can have many meanings. In particular, x
is often time, denoted by the variable t. Then the variable γ will be related
to frequency. Depending on the application, the frequencies involved in
the function f (t) may be quite large numbers, or quite small ones; there is
no reason to assume that they will all be in the interval [0, 2π]. For this
reason, we have to modify our formulas.
Suppose that the function g(t) is known to involve only frequencies in
the interval [0, 2π
∆ ]. Define f (x) = g(x∆), so that
1
g(t) = f (t/∆) =
2π
Z
2π
F (γ)e−iγt/∆ dγ.
(12.7)
0
Introducing the variable ω = γ/∆, and writing G(ω) = ∆F (ω∆), we get
1
g(t) =
2π
Z
2π
∆
G(ω)e−iωt dω.
(12.8)
0
Now the typical problem is to estimate G(ω) from measurements of g(t).
Note that, using Equation (12.4), the function G(ω) can be written as
follows:
∞
X
G(ω) = ∆F (ω∆) = ∆
fn einω∆ ,
n=−∞
146
CHAPTER 12. CONVOLUTION AND THE VECTOR DFT
so that
∞
X
G(ω) = ∆
g(n∆)ei(n∆)ω .
(12.9)
n=−∞
Note that this is the same result as in Equation (8.15) and shows that
the functions G(ω) and g(t) can be completely recovered from the infinite
sequence of samples {g(n∆)}, whenever G(ω) is zero outside an interval of
total length 2π
∆.
12.5.3
The Aliasing Problem
In the previous subsection, we assumed that we knew that the only frequencies involved in g(t) were in the interval [0, 2π
∆ ], and that ∆ was our
sampling spacing. Notice that, given our data g(n∆), it is impossible for
us to distinguish a frequency ω from ω + 2πk
∆ , for any integer k: for any
integers k and n we have
ei(ω+
12.5.4
2πk
∆ )n∆
= eiωn∆ e2πikn .
The Discrete Fourier Transform
In practice, we will have only finitely measurements g(n∆); even these will
typically be noisy, but we shall overlook this for now. Suppose our data is
g(n∆), for n = 0, 1, ..., N − 1. For notational simplicity, we let fn = g(n∆).
It seems reasonable, in this case, to base our estimate Ĝ(ω) of G(ω) on
Equation (12.9) and write
Ĝ(ω) = ∆
N
−1
X
g(n∆)ei(n∆)ω .
(12.10)
n=0
We shall call Ĝ(ω) the DFT estimate of the function G(ω) and write
DF T (ω) = Ĝ(ω);
it will be clear from the context that the DFT uses samples of g(t) and
estimates G(ω).
12.5.5
Calculating Values of the DFT
Suppose that we want to evaluate this estimate of G(ω) at the N − 1 points
2πk
ωk = N
∆ , for k = 0, 1, ..., N − 1. Then we have
Ĝ(ωk ) = ∆
N
−1
X
n=0
g(n∆)e
i(n∆) 2πk
N∆
=
N
−1
X
n=0
∆g(n∆)e2πikn/N .
(12.11)
12.5. THE VDFT OF SAMPLED DATA
147
Notice that this is the vector DFT entry Fk for the choices fn = ∆g(n∆).
To summarize, given the samples g(n∆), for n = 0, 1, ..., N − 1, we
2πk
can get the N values Ĝ( N
∆ ) by taking the vector DFT of the vector
f = (∆g(0), ∆g(∆), ..., ∆g((N − 1)∆))T . We would normally use the FFT
algorithm to perform these calculations.
12.5.6
Zero-Padding
Suppose we simply want to graph the DFT estimate DF T (ω) = Ĝ(ω) on
some uniform grid in the interval [0, 2π
∆ ], but want to use more than N
points in the grid. The FFT algorithm always gives us back a vector with
the same number of entries as the one we begin with, so if we want to get,
say, M > N points in the grid, we need to give the FFT algorithm a vector
with M entries. We do this by zero-padding, that is, by taking as our input
to the FFT algorithm the M by 1 column vector
f = (∆g(0), ∆g(∆), ..., ∆g((N − 1)∆), 0, 0, ..., 0)T .
The resulting vector DFT F then has the entries
Fk = ∆
N
−1
X
g(n∆)e2πikn/M ,
n=0
for k = 0, 1, ..., M − 1; therefore, we have Fk = Ĝ(2πk/M ).
12.5.7
What the vDFT Achieves
It is important to note that the values Fk we calculate by applying the
FFT algorithm to the sampled data g(n∆) are not values of the function
G(ω), but of the estimate, Ĝ(ω). Zero-padding allows us to use the FFT to
see more of the values of Ĝ(ω). It does not improve resolution, but simply
shows us what is already present in the function Ĝ(ω), which we may not
have seen without the zero-padding. The FFT algorithm is most efficient
when N is a power of two, so it is common practice to zero-pad f using as
M the smallest power of two not less than N .
12.5.8
Terminology
In the signal processing literature no special name is given to what we call
here DF T (ω), and the vector DFT of the data vector is called the DFT
of the data. This is unfortunate, because the function of the continuous
variable given in Equation (12.10) is the more fundamental entity, the
vector DFT being merely the evaluation of that function at N equi-spaced
points. If we should wish to evaluate the DF T (ω) at M > N equi-spaced
148
CHAPTER 12. CONVOLUTION AND THE VECTOR DFT
points, say, for example, for the purpose of graphing the function, we would
zero-pad the data vector, as we just discussed. The resulting vector DFT
is not the same vector as the one obtained prior to zero-padding; it is not
even the same size. But both of these vectors have, as their entries, values
of the same function, DF T (ω).
12.6
Understanding the Vector DFT
Let g(t) be the signal we are interested in. We sample the signal at the
points t = n∆, for n = 0, 1, ..., N − 1, to get our data values, which we
label fn = g(n∆). To illustrate the significance of the vector DFT, we
consider the simplest case, in which the signal g(t) we are sampling is a
single sinusoid.
Suppose that g(t) is a complex exponential function with frequency the
negative of ωm = 2πm/N ∆; the reason for the negative is a technical one
that we can safely ignore at this stage. Then
g(t) = e−i(2πm/N ∆)t ,
(12.12)
for some non-negative integer 0 ≤ m ≤ N − 1. Our data is then
fn = ∆g(n∆) = ∆e−i(2πm/N ∆)n∆ = ∆e−2πimn/N .
Now we calculate the components Fk of the vector DFT. We have
Fk =
N
−1
X
fn e2πikn/N = ∆
n=0
N
−1
X
e2πi(k−m)/N .
n=0
If k = m, then Fm = N ∆, while, according to Exercise 6.14, Fk = 0, for k
not equal to m. Let’s try this on a more complicated signal.
Suppose now that our signal has the form
f (t) =
N
−1
X
Am e−2πimt/N ∆ .
(12.13)
m=0
The data vector is now
fn = ∆
N
−1
X
Am e−2πimn/N .
m=0
The entry Fm of the vector DFT is now the sum of the values it would have
if the signal had consisted only of the single sinusoid e−i(2πm/N ∆)t . As we
just saw, all but one of these values would be zero, and so Fm = N ∆Am ,
and this holds for each m = 0, 1, ..., N − 1.
12.6. UNDERSTANDING THE VECTOR DFT
149
Summarizing, when the signal f (t) is a sum of N sinusoids, with the
frequencies ωk = 2πk/N ∆, for k = 0, 1, ..., N −1, and we sample at t = n∆,
for n = 0, 1, ..., N − 1, the entries Fk of the vector DFT are precisely N ∆
times the corresponding amplitudes Ak . For this particular situation, calculating the vector DFT gives us the amplitudes of the different sinusoidal
components of f (t). We must remember, however, that this applies only
to the case in which f (t) has the form in Equation (12.13). In general, the
entries of the vector DFT are to be understood as approximations, in the
sense discussed above.
As mentioned previously, non-periodic convolution is really a special
case of periodic convolution. Extend the M +1 by 1 vector a to an M +N +1
by 1 vector by appending N zero entries; similarly, extend the vector b to
an M + N + 1 by 1 vector by appending zeros. The vector c is now the
periodic convolution of these extended vectors. Therefore, since we have
an efficient algorithm for performing periodic convolution, namely the Fast
Fourier Transform algorithm (FFT), we have a fast way to do the periodic
(and thereby non-periodic) convolution and polynomial multiplication.
150
CHAPTER 12. CONVOLUTION AND THE VECTOR DFT
Figure 12.1: Periodic convolution of vectors a = (a(0), a(1), a(2), a(3)) and
b = (b(0), b(1), b(2), b(3)).
Chapter 13
The Fast Fourier
Transform (FFT)
13.1
Chapter Summary
A fundamental problem in signal processing is to estimate finitely many
values of the function F (ω) from finitely many values of its (inverse) Fourier
transform, f (t). As we have seen, the DFT arises in several ways in that
estimation effort. The fast Fourier transform (FFT), discovered in 1965 by
Cooley and Tukey, is an important and efficient algorithm for calculating
the vector DFT [86]. John Tukey has been quoted as saying that his main
contribution to this discovery was the firm and often voiced belief that such
an algorithm must exist.
13.2
Evaluating a Polynomial
To illustrate the main idea underlying the FFT, consider the problem of
evaluating a real polynomial P (x) at a point, say x = c. Let the polynomial
be
P (x) = a0 + a1 x + a2 x2 + ... + a2K x2K ,
where a2K might be zero. Performing the evaluation efficiently by Horner’s
method,
P (c) = (((a2K c + a2K−1 )c + a2K−2 )c + a2K−3 )c + ...,
requires 2K multiplications, so the complexity is on the order of the degree
of the polynomial being evaluated. But suppose we also want P (−c). We
can write
P (x) = (a0 + a2 x2 + ... + a2K x2K ) + x(a1 + a3 x2 + ... + a2K−1 x2K−2 )
151
152
CHAPTER 13. THE FAST FOURIER TRANSFORM (FFT)
or
P (x) = Q(x2 ) + xR(x2 ).
Therefore, we have P (c) = Q(c2 ) + cR(c2 ) and P (−c) = Q(c2 ) − cR(c2 ).
If we evaluate P (c) by evaluating Q(c2 ) and R(c2 ) separately, one more
multiplication gives us P (−c) as well. The FFT is based on repeated use
of this idea, which turns out to be more powerful when we are using complex
exponentials, because of their periodicity.
13.3
The DFT and Vector DFT
Suppose that the data are the samples are {f (n∆), n = 1, ..., N }, where
∆ > 0 is the sampling increment or sampling spacing.
The DFT estimate of F (ω) is the function FDF T (ω), defined for ω in
[−π/∆, π/∆], and given by
FDF T (ω) = ∆
N
X
f (n∆)ein∆ω .
n=1
The DFT estimate FDF T (ω) is data consistent; its inverse Fourier-transform
value at t = n∆ is f (n∆) for n = 1, ..., N . The DFT is sometimes used in
a slightly more general context in which the coefficients are not necessarily
viewed as samples of a function f (t).
Given the complex N -dimensional column vector f = (f0 , f1 , ..., fN −1 )T ,
define the DF T of vector f to be the function DF Tf (ω), defined for ω in
[0, 2π), given by
DF Tf (ω) =
N
−1
X
fn einω .
n=0
Let F be the complex N -dimensional vector F = (F0 , F1 , ..., FN −1 )T , where
Fk = DF Tf (2πk/N ), k = 0, 1, ..., N −1. So the vector F consists of N values
of the function DF Tf , taken at N equispaced points 2π/N apart in [0, 2π).
From the formula for DF Tf we have, for k = 0, 1, ..., N − 1,
Fk = F (2πk/N ) =
N
−1
X
fn e2πink/N .
(13.1)
n=0
To calculate a single Fk requires N multiplications; it would seem that to
calculate all N of them would require N 2 multiplications. However, using
the FFT algorithm, we can calculate vector F in approximately N log2 (N )
multiplications.
13.4. EXPLOITING REDUNDANCY
13.4
153
Exploiting Redundancy
Suppose that N = 2M is even. We can rewrite Equation (13.1) as follows:
Fk =
M
−1
X
f2m e2πi(2m)k/N +
m=0
M
−1
X
f2m+1 e2πi(2m+1)k/N ,
m=0
or, equivalently,
Fk =
M
−1
X
f2m e2πimk/M + e2πik/N
m=0
M
−1
X
f2m+1 e2πimk/M .
(13.2)
m=0
Note that if 0 ≤ k ≤ M − 1 then
Fk+M =
M
−1
X
m=0
f2m e2πimk/M − e2πik/N
M
−1
X
f2m+1 e2πimk/M ,
(13.3)
m=0
so there is no additional computational cost in calculating the second half
of the entries of F, once we have calculated the first half. The FFT is the
algorithm that results when we take full advantage of the savings obtainable
by splitting a DFT calculating into two similar calculations of half the size.
We assume now that N = 2L . Notice that if we use Equations (13.2)
and (13.3) to calculate vector F, the problem reduces to the calculation of
two similar DFT evaluations, both involving half as many entries, followed
by one multiplication for each of the k between 0 and M − 1. We can split
these in half as well. The FFT algorithm involves repeated splitting of the
calculations of DFTs at each step into two similar DFTs, but with half the
number of entries, followed by as many multiplications as there are entries
in either one of these smaller DFTs. We use recursion to calculate the cost
C(N ) of computing F using this FFT method. From Equation (13.2) we
see that C(N ) = 2C(N/2) + (N/2). Applying the same reasoning to get
C(N/2) = 2C(N/4) + (N/4), we obtain
C(N ) = 2C(N/2) + (N/2) = 4C(N/4) + 2(N/2) = ...
= 2L C(N/2L ) + L(N/2) = N + L(N/2).
Therefore, the cost required to calculate F is approximately N log2 N .
From our earlier discussion of discrete linear filters and convolution, we
see that the FFT can be used to calculate the periodic convolution (or even
the nonperiodic convolution) of finite length vectors.
Finally, let’s return to the original context of estimating the Fourier
transform F (ω) of function f (t) from finitely many samples of f (t). If we
have N equispaced samples, we can use them to form the vector f and
154
CHAPTER 13. THE FAST FOURIER TRANSFORM (FFT)
perform the FFT algorithm to get vector F consisting of N values of the
DFT estimate of F (ω). It may happen that we wish to calculate more
than N values of the DFT estimate, perhaps to produce a smooth looking
graph. We can still use the FFT, but we must trick it into thinking we have
more data that the N samples we really have. We do this by zero-padding.
Instead of creating the N -dimensional vector f , we make a longer vector by
appending, say, J zeros to the data, to make a vector that has dimension
N + J. The DFT estimate is still the same function of ω, since we have
only included new zero coefficients as fake data; but, the FFT thinks we
have N + J data values, so it returns N + J values of the DFT, at N + J
equispaced values of ω in [0, 2π).
13.5
The Two-Dimensional Case
Suppose now that we have the data {f (m∆x , n∆y )}, for m = 1, ..., M and
n = 1, ..., N , where ∆x > 0 and ∆y > 0 are the sample spacings in the
x and y directions, respectively. The DFT of this data is the function
FDF T (α, β) defined by
FDF T (α, β) = ∆x ∆y
N
M X
X
f (m∆x , n∆y )ei(αm∆x +βn∆y ) ,
m=1 n=1
for |α| ≤ π/∆x and |β| ≤ π/∆y . The two-dimensional FFT produces M N
values of FDF T (α, β) on a rectangular grid of M equi-spaced values of α
and N equi-spaced values of β. This calculation proceeds as follows. First,
for each fixed value of n, a FFT of the M data points {f (m∆x , n∆y )}, m =
1, ..., M is calculated, producing a function, say G(αm , n∆y ), of M equispaced values of α and the N equispaced values n∆y . Then, for each
of the M equi-spaced values of α, the FFT is applied to the N values
G(αm , n∆y ), n = 1, ..., N , to produce the final result.
Chapter 14
Plane-wave Propagation
14.1
Chapter Summary
In this chapter we demonstrate how the Fourier transform arises naturally
as we study the signals received in the far-field from an array of transmitters
or reflectors. We restrict our attention to single-frequency, or narrow-band,
signals. We begin with a simple illustration of some of the issues we deal
with in greater detail later in this chapter.
14.2
The Bobbing Boats
Imagine a large swimming pool in which there are several toy boats arrayed
in a straight line. Although we use Figure 14.1 for a slightly different
purpose elsewhere, for now we can imagine that the black dots in that
figure represent our toy boats. Far across the pool, someone is slapping the
water repeatedly, generating waves that proceed outward, in essentially
concentric circles, across the pool. By the time the waves reach the boats,
the circular shape has flattened out so that the wavefronts are essentially
straight lines. The straight lines in Figure 14.1 at the end of this chapter
can represent these wavefronts.
As the wavefronts reach the boats, the boats bob up and down. If the
lines of the wavefronts were oriented parallel to the line of the boats, then
the boats would bob up and down in unison. When the wavefronts come
in at some angle, as shown in the figure, the boats will bob up and down
out of sync with one another, generally. By measuring the time it takes for
the peak to travel from one boat to the next, we can estimate the angle of
arrival of the wavefronts.
155
156
CHAPTER 14. PLANE-WAVE PROPAGATION
This leads to two questions:
• 1. Is it possible to get the boats to bob up and down in unison, even
though the wavefronts arrive at an angle, as shown in the figure?
• 2. Is it possible for wavefronts corresponding to two different angles
of arrival to affect the boats in the same way, so that we cannot tell
which of the two angles is the real one?
We need a bit of mathematical notation. We let the distance from each
boat to the ones on both sides be a constant distance ∆. We assume that
the water is slapped f times per second, so f is the frequency, in units of
cycles per second. As the wavefronts move out across the pool, the distance
from one peak to the next is called the wavelength, denoted λ. The product
λf is the speed of propagation c; so λf = c. As the frequency changes, so
does the wavelength, while the speed of propagation, which depends solely
on the depth of the pool, remains constant. The angle θ measures the tilt
between the line of the wavefronts and the line of the boats, so that θ = 0
indicates that these wavefront lines are parallel to the line of the boats,
while θ = π2 indicates that the wavefront lines are perpendicular to the line
of the boats.
Exercise 14.1 Let the angle θ be arbitrary, but fixed, and let ∆ be fixed.
Can we select the frequency f in such a way that we can make all the boats
bob up and down in unison?
Exercise 14.2 Suppose now that the frequency f is fixed, but we are free
to alter the spacing ∆. Can we choose ∆ so that we can always determine
the true angle of arrival?
14.3
Transmission and Remote-Sensing
For pedagogical reasons, we shall discuss separately what we shall call the
transmission and the remote-sensing problems, although the two problems
are opposite sides of the same coin, in a sense. In the one-dimensional
transmission problem, it is convenient to imagine the transmitters located
at points (x, 0) within a bounded interval [−A, A] of the x-axis, and the
measurements taken at points P lying on a circle of radius D, centered
at the origin. The radius D is large, with respect to A. It may well be
the case that no actual sensing is to be performed, but rather, we are
simply interested in what the received signal pattern is at points P distant
from the transmitters. Such would be the case, for example, if we were
analyzing or constructing a transmission pattern of radio broadcasts. In the
remote-sensing problem, in contrast, we imagine, in the one-dimensional
14.4. THE TRANSMISSION PROBLEM
157
case, that our sensors occupy a bounded interval of the x-axis, and the
transmitters or reflectors are points of a circle whose radius is large, with
respect to the size of the bounded interval. The actual size of the radius
does not matter and we are interested in determining the amplitudes of the
transmitted or reflected signals, as a function of angle only. Such is the case
in astronomy, farfield sonar or radar, and the like. Both the transmission
and remote-sensing problems illustrate the important role played by the
Fourier transform.
14.4
The Transmission Problem
We identify two distinct transmission problems: the direct problem and
the inverse problem. In the direct transmission problem, we wish to determine the farfield pattern, given the complex amplitudes of the transmitted
signals. In the inverse transmission problem, the array of transmitters or
reflectors is the object of interest; we are given, or we measure, the farfield
pattern and wish to determine the amplitudes. For simplicity, we consider
only single-frequency signals.
We suppose that each point x in the interval [−A, A] transmits the
signal f (x)eiωt , where f (x) is the complex amplitude of the signal and
ω > 0 is the common fixed frequency of the signals. Let D > 0 be large,
with respect to A, and consider the signal received at each point P given
in polar coordinates by P = (D, θ). The distance from (x, 0) to P is
approximately D − x cos θ, so that, at time t, the point P receives from
(x, 0) the signal f (x)eiω(t−(D−x cos θ)/c) , where c is the propagation speed.
Therefore, the combined signal received at P is
Z A
ω cos θ
(14.1)
B(P, t) = eiωt e−iωD/c
f (x)eix c dx.
−A
The integral term, which gives the farfield pattern of the transmission, is
Z A
ω cos θ
ω cos θ
F(
)=
f (x)eix c dx,
(14.2)
c
−A
where F (γ) is the Fourier transform of f (x), given by
Z A
f (x)eixγ dx.
F (γ) =
(14.3)
−A
θ
) behaves, as a function of θ, as we change A and ω, is disHow F ( ω cos
c
cussed in some detail in the chapter on direct transmission.
Consider, for example, the function f (x) = 1, for |x| ≤ A, and f (x) = 0,
otherwise. The Fourier transform of f (x) is
F (γ) = 2Asinc(Aγ),
(14.4)
158
CHAPTER 14. PLANE-WAVE PROPAGATION
where sinc(t) is defined to be
sinc(t) =
sin(t)
,
t
(14.5)
θ
) = 2A when cos θ = 0, so when
for t 6= 0, and sinc(0) = 1. Then F ( ω cos
c
π
3π
ω cos θ
θ
θ = 2 and θ = 2 . We will have F ( c ) = 0 when A ω cos
= π, or
c
πc
πc
cos θ = Aω . Therefore, the transmission pattern has no nulls if Aω > 1. In
order for the transmission pattern to have nulls, we need A > λ2 , where λ =
2πc
ω is the wavelength. This rather counterintuitive fact, namely that we
need more signals transmitted in order to receive less at certain locations,
illustrates the phenomenon of destructive interference.
14.5
Reciprocity
For certain remote-sensing applications, such as sonar and radar array processing and astronomy, it is convenient to switch the roles of sender and
receiver. Imagine that superimposed planewave fields are sensed at points
within some bounded region of the interior of the sphere, having been
transmitted or reflected from the points P on the surface of a sphere whose
radius D is large with respect to the bounded region. The reciprocity principle tells us that the same mathematical relation holds between points P
and (x, 0), regardless of which is the sender and which the receiver. Consequently, the data obtained at the points (x, 0) are then values of the
inverse Fourier transform of the function describing the amplitude of the
signal sent from each point P .
14.6
Remote Sensing
A basic problem in remote sensing is to determine the nature of a distant
object by measuring signals transmitted by or reflected from that object.
If the object of interest is sufficiently remote, that is, is in the farfield, the
data we obtain by sampling the propagating spatio-temporal field is related,
approximately, to what we want by Fourier transformation. The problem
is then to estimate a function from finitely many (usually noisy) values
of its Fourier transform. The application we consider here is a common
one of remote-sensing of transmitted or reflected waves propagating from
distant sources. Examples include optical imaging of planets and asteroids
using reflected sunlight, radio-astronomy imaging of distant sources of radio
waves, active and passive sonar, and radar imaging.
14.7. THE WAVE EQUATION
14.7
159
The Wave Equation
In many areas of remote sensing, what we measure are the fluctuations
in time of an electromagnetic or acoustic field. Such fields are described
mathematically as solutions of certain partial differential equations, such
as the wave equation. A function u(x, y, z, t) is said to satisfy the threedimensional wave equation if
utt = c2 (uxx + uyy + uzz ) = c2 ∇2 u,
(14.6)
where utt denotes the second partial derivative of u with respect to the time
variable t twice and c > 0 is the (constant) speed of propagation. More
complicated versions of the wave equation permit the speed of propagation
c to vary with the spatial variables x, y, z, but we shall not consider that
here.
We use the method of separation of variables at this point, to get some
idea about the nature of solutions of the wave equation. Assume, for the
moment, that the solution u(t, x, y, z) has the simple form
u(t, x, y, z) = g(t)f (x, y, z).
(14.7)
Inserting this separated form into the wave equation, we get
g 00 (t)f (x, y, z) = c2 g(t)∇2 f (x, y, z)
(14.8)
g 00 (t)/g(t) = c2 ∇2 f (x, y, z)/f (x, y, z).
(14.9)
or
The function on the left is independent of the spatial variables, while the
one on the right is independent of the time variable; consequently, they
must both equal the same constant, which we denote −ω 2 . From this we
have two separate equations,
g 00 (t) + ω 2 g(t) = 0,
(14.10)
and
∇2 f (x, y, z) +
ω2
f (x, y, z) = 0.
c2
(14.11)
Equation (14.11) is the Helmholtz equation.
Equation (14.10) has for its solutions the functions g(t) = cos(ωt) and
sin(ωt), or, in complex form, the complex exponential functions g(t) = eiωt
and g(t) = e−iωt . Functions u(t, x, y, z) = g(t)f (x, y, z) with such time
dependence are called time-harmonic solutions.
160
14.8
CHAPTER 14. PLANE-WAVE PROPAGATION
Planewave Solutions
Suppose that, beginning at time t = 0, there is a localized disturbance.
As time passes, that disturbance spreads out spherically. When the radius
of the sphere is very large, the surface of the sphere appears planar, to
an observer on that surface, who is said then to be in the far field. This
motivates the study of solutions of the wave equation that are constant on
planes; the so-called planewave solutions.
Let s = (x, y, z) and u(s, t) = u(x, y, z, t) = eiωt eik·s . Then we can show
that u satisfies the wave equation utt = c2 ∇2 u for any real vector k, so long
as ||k||2 = ω 2 /c2 . This solution is a planewave associated with frequency
ω and wavevector k; at any fixed time the function u(s, t) is constant on
any plane in three-dimensional space having k as a normal vector.
In radar and sonar, the field u(s, t) being sampled is usually viewed as
a discrete or continuous superposition of planewave solutions with various
amplitudes, frequencies, and wavevectors. We sample the field at various
spatial locations s, for various times t. Here we simplify the situation a
bit by assuming that all the planewave solutions are associated with the
same frequency, ω. If not, we can perform an FFT on the functions of time
received at each sensor location s and keep only the value associated with
the desired frequency ω.
14.9
Superposition and the Fourier Transform
In the continuous superposition model, the field is
Z
u(s, t) = eiωt F (k)eik·s dk.
(14.12)
Our measurements at the sensor locations s give us the values
Z
f (s) = F (k)eik·s dk.
(14.13)
The data are then Fourier transform values of the complex function F (k);
F (k) is defined for all three-dimensional real vectors k, but is zero, in
theory, at least, for those k whose squared length ||k||2 is not equal to
ω 2 /c2 . Our goal is then to estimate F (k) from measured values of its
Fourier transform. Since each k is a normal vector for its planewave field
component, determining the value of F (k) will tell us the strength of the
planewave component coming from the direction k.
14.9.1
The Spherical Model
We can imagine that the sources of the planewave fields are the points P
that lie on the surface of a large sphere centered at the origin. For each
14.10. SENSOR ARRAYS
161
P , the ray from the origin to P is parallel to some wavevector k. The
function F (k) can then be viewed as a function F (P ) of the points P . Our
measurements will be taken at points s inside this sphere. The radius of
the sphere is assumed to be orders of magnitude larger than the distance
between sensors. The situation is that of astronomical observation of the
heavens using ground-based antennas. The sources of the optical or electromagnetic signals reaching the antennas are viewed as lying on a large sphere
surrounding the earth. Distance to the sources is not considered now, and
all we are interested in are the amplitudes F (k) of the fields associated
with each direction k.
14.10
Sensor Arrays
In some applications the sensor locations are essentially arbitrary, while
in others their locations are carefully chosen. Sometimes, the sensors are
collinear, as in sonar towed arrays. Figure 14.1 illustrates a line array.
14.10.1
The Two-Dimensional Array
Suppose now that the sensors are in locations s = (x, y, 0), for various x
and y; then we have a planar array of sensors. Then the dot product s · k
that occurs in Equation (14.13) is
s · k = xk1 + yk2 ;
(14.14)
we cannot see the third component, k3 . However, since we know the size
of the vector k, we can determine |k3 |. The only ambiguity that remains
is that we cannot distinguish sources on the upper hemisphere from those
on the lower one. In most cases, such as astronomy, it is obvious in which
hemisphere the sources lie, so the ambiguity is resolved.
The function F (k) can then be viewed as F (k1 , k2 ), a function of the
two variables k1 and k2 . Our measurements give us values of f (x, y), the
two-dimensional Fourier transform of F (k1 , k2 ). Because of the limitation
||k|| = ωc , the function F (k1 , k2 ) has bounded support. Consequently, its
Fourier transform cannot have bounded support. As a result, we can never
have all the values of f (x, y), and so cannot hope to reconstruct F (k1 , k2 )
exactly, even for noise-free data.
14.10.2
The One-Dimensional Array
If the sensors are located at points s having the form s = (x, 0, 0), then we
have a line array of sensors. The dot product in Equation (14.13) becomes
s · k = xk1 .
(14.15)
162
CHAPTER 14. PLANE-WAVE PROPAGATION
Now the ambiguity is greater than in the planar array case. Once we have
k1 , we know that
ω
k22 + k32 = ( )2 − k12 ,
c
(14.16)
which describes points P lying on a circle on the surface of the distant
sphere, with the vector (k1 , 0, 0) pointing at the center of the circle. It
is said then that we have a cone of ambiguity. One way to resolve the
situation is to assume k3 = 0; then |k2 | can be determined and we have
remaining only the ambiguity involving the sign of k2 . Once again, in many
applications, this remaining ambiguity can be resolved by other means.
Once we have resolved any ambiguity, we can view the function F (k)
as F (k1 ), a function of the single variable k1 . Our measurements give us
values of f (x), the Fourier transform of F (k1 ). As in the two-dimensional
case, the restriction on the size of the vectors k means that the function
F (k1 ) has bounded support. Consequently, its Fourier transform, f (x),
cannot have bounded support. Therefore, we shall never have all of f (x),
and so cannot hope to reconstruct F (k1 ) exactly, even for noise-free data.
14.10.3
Limited Aperture
In both the one- and two-dimensional problems, the sensors will be placed
within some bounded region, such as |x| ≤ A, |y| ≤ B for the twodimensional problem, or |x| ≤ A for the one-dimensional case. These
bounded regions are the apertures of the arrays. The larger these apertures
are, in units of the wavelength, the better the resolution of the reconstructions.
In digital array processing there are only finitely many sensors, which
then places added limitations on our ability to reconstruction the field
amplitude function F (k).
14.11
The Remote-Sensing Problem
We shall begin our discussion of the remote-sensing problem by considering an extended object transmitting or reflecting a single-frequency, or
narrowband, signal. The narrowband, extended-object case is a good place
to begin, since a point object is simply a limiting case of an extended object, and broadband received signals can always be filtered to reduce their
frequency band.
14.11.1
The Solar-Emission Problem
In [23] Bracewell discusses the solar-emission problem. In 1942, it was
observed that radio-wave emissions in the one-meter wavelength range were
14.12. SAMPLING
163
arriving from the sun. Were they coming from the entire disk of the sun
or were the sources more localized, in sunspots, for example? The problem
then was to view each location on the sun’s surface as a potential source of
these radio waves and to determine the intensity of emission corresponding
to each location.
For electromagnetic waves the propagation speed is the speed of light
in a vacuum, which we shall take here to be c = 3 × 108 meters per second.
The wavelength λ for gamma rays is around one Angstrom, which is 10−10
meters; for x-rays it is about one millimicron, or 10−9 meters. The visible spectrum has wavelengths that are a little less than one micron, that
is, 10−6 meters. Shortwave radio has a wavelength around one millimeter; microwaves have wavelengths between one centimeter and one meter.
Broadcast radio has a λ running from about 10 meters to 1000 meters,
while the so-called long radio waves can have wavelengths several thousand
meters long.
The sun has an angular diameter of 30 min. of arc, or one-half of a
degree, when viewed from earth, but the needed resolution was more like
3 min. of arc. As we shall see shortly, such resolution requires a radio
telescope 1000 wavelengths across, which means a diameter of 1km at a
wavelength of 1 meter; in 1942 the largest military radar antennas were
less than 5 meters across. A solution was found, using the method of
reconstructing an object from line-integral data, a technique that surfaced
again in tomography. The problem here is inherently two-dimensional, but,
for simplicity, we shall begin with the one-dimensional case.
14.12
Sampling
In the one-dimensional case, the signal received at the point (x, 0, 0) is
essentially the inverse Fourier transform f (x) of the function F (k1 ); for
notational simplicity, we write k = k1 . The F (k) supported on a bounded
interval |k| ≤ ωc , so f (x) cannot have bounded support. As we noted
earlier, to determine F (k) exactly, we would need measurements of f (x)
on an unbounded set. But, which unbounded set?
Because the function F (k) is zero outside the interval [− ωc , ωc ], the function f (x) is band-limited. The Nyquist spacing in the variable x is therefore
πc
∆x =
.
(14.17)
ω
The wavelength λ associated with the frequency ω is defined to be
2πc
λ=
,
(14.18)
ω
so that
λ
∆x = .
(14.19)
2
164
CHAPTER 14. PLANE-WAVE PROPAGATION
The significance of the Nyquist spacing comes from Shannon’s Sampling
Theorem, which says that if we have the values f (m∆x ), for all integers m,
then we have enough information to recover F (k) exactly. In practice, of
course, this is never the case.
14.13
The Limited-Aperture Problem
In the remote-sensing problem, our measurements at points (x, 0, 0) in the
farfield give us the values f (x). Suppose now that we are able to take
measurements only for limited values of x, say for |x| ≤ A; then 2A is the
aperture of our antenna or array of sensors. We describe this by saying that
we have available measurements of f (x)h(x), where h(x) = χA (x) = 1, for
|x| ≤ A, and zero otherwise. So, in addition to describing blurring and
low-pass filtering, the convolution-filter model can also be used to model
the limited-aperture problem. As in the low-pass case, the limited-aperture
problem can be attacked using extrapolation, but with the same sort of risks
described for the low-pass case. A much different approach is to increase
the aperture by physically moving the array of sensors, as in synthetic
aperture radar (SAR).
Returning to the farfield remote-sensing model, if we have Fourier transform data only for |x| ≤ A, then we have f (x) for |x| ≤ A. Using
h(x) = χA (x) to describe the limited aperture of the system, the pointspread function is H(γ) = 2Asinc(γA), the Fourier transform of h(x). The
π
, so the main lobe of the
first zeros of the numerator occur at |γ| = A
2π
point-spread function has width A . For this reason, the resolution of such
a limited-aperture imaging system is said to be on the order of A1 . Since
|k| ≤ ωc , we can write k = ωc sin θ, where θ denotes the angle between the
positive y-axis and the vector k = (k1 , k2 , 0); that is, θ points in the direction of the point P associated with the wavevector k. The resolution, as
measured by the width of the main lobe of the point-spread function H(γ),
in units of k, is 2π
A , but, the angular resolution will depend also on the
frequency ω. Since k = 2π
λ sin θ, a distance of one unit in k may correspond
to a large change in θ when ω is large, but only to a relatively small change
in θ when ω is small. For this reason, the aperture of the array is usually
measured in units of the wavelength; an aperture of A = 5 meters may be
acceptable if the frequency is high, so that the wavelength is small, but not
if the radiation is in the one-meter-wavelength range.
14.14
Resolution
If F (k) = δ(k) and h(x) = χA (x) describes the aperture-limitation of the
imaging system, then the point-spread function is H(γ) = 2Asinc(γA).
The maximum of H(γ) still occurs at γ = 0, but the main lobe of H(γ)
14.14. RESOLUTION
165
π
π
extends from − A
to A
; the point source has been spread out. If the pointsource object shifts, so that F (k) = δ(k − a), then the reconstructed image
of the object is H(k −a), so the peak is still in the proper place. If we know
a priori that the object is a single point source, but we do not know its
location, the spreading of the point poses no problem; we simply look for
the maximum in the reconstructed image. Problems arise when the object
contains several point sources, or when we do not know a priori what we
are looking at, or when the object contains no point sources, but is just a
continuous distribution.
Suppose that F (k) = δ(k − a) + δ(k − b); that is, the object consists
of two point sources. Then Fourier transformation of the aperture-limited
data leads to the reconstructed image
R(k) = 2A sinc(A(k − a)) + sinc(A(k − b)) .
(14.20)
If |b − a| is large enough, R(k) will have two distinct maxima, at approximately k = a and k = b, respectively. For this to happen, we need π/A,
half the width of the main lobe of the function sinc(Ak), to be less than
|b − a|. In other words, to resolve the two point sources a distance |b − a|
apart, we need A ≥ π/|b − a|. However, if |b − a| is too small, the distinct
maxima merge into one, at k = a+b
2 and resolution will be lost. How small
is too small will depend on both A and ω.
Suppose now that F (k) = δ(k − a), but we do not know a priori that
the object is a single point source. We calculate
R(k) = H(k − a) = 2Asinc(A(k − a))
(14.21)
and use this function as our reconstructed image of the object, for all k.
What we see when we look at R(k) for some k = b 6= a is R(b), which is
the same thing we see when the point source is at k = b and we look at
k = a. Point-spreading is, therefore, more than a cosmetic problem. When
the object is a point source at k = a, but we do not know a priori that it
is a point source, the spreading of the point causes us to believe that the
object function F (k) is nonzero at values of k other than k = a. When we
look at, say, k = b, we see a nonzero value that is caused by the presence
of the point source at k = a.
Suppose now that the object function F (k) contains no point sources,
but is simply an ordinary function of k. If the aperture A is very small, then
the function H(k) is nearly constant over the entire extent of the object.
The convolution of F (k) and H(k)R is essentially the integral of F (k), so
the reconstructed object is R(k) = F (k)dk, for all k.
Let’s see what this means for the solar-emission problem discussed earlier.
166
14.14.1
CHAPTER 14. PLANE-WAVE PROPAGATION
The Solar-Emission Problem Revisited
The wavelength of the radiation is λ = 1 meter. Therefore, ωc = 2π, and
k in the interval [−2π, 2π] corresponds to the angle θ in [0, π]. The sun
has an angular diameter of 30 minutes of arc, which is about 10−2 radians.
Therefore, the sun subtends the angles θ in [ π2 −(0.5)·10−2 , π2 +(0.5)·10−2 ],
which corresponds roughly to the variable k in the interval [−3 · 10−2 , 3 ·
10−2 ]. Resolution of 3 minutes of arc means resolution in the variable k of
3 · 10−3 . If the aperture is 2A, then to achieve this resolution, we need
π
≤ 3 · 10−3 ,
(14.22)
A
or
π
A ≥ · 103
(14.23)
3
meters, or A not less than about 1000 meters.
The radio-wave signals emitted by the sun are focused, using a parabolic
radio-telescope. The telescope is pointed at the center of the sun. Because
the sun is a great distance from the earth and the subtended arc is small
(30 min.), the signals from each point on the sun’s surface arrive at the
parabola nearly head-on, that is, parallel to the line from the vertex to the
focal point, and are reflected to the receiver located at the focal point of
the parabola. The effect of the parabolic antenna is not to discriminate
against signals coming from other directions, since there are none, but to
effect a summation of the signals received at points (x, 0, 0), for |x| ≤ A,
where 2A is the diameter of the parabola. When the aperture is large, the
function h(x) is nearly one for all x and the signal received at the focal
point is essentially
Z
f (x)dx = F (0);
(14.24)
we are now able to distinguish between F (0) and other values F (k). When
the aperture is small, h(x) is essentially δ(x) and the signal received at the
focal point is essentially
Z
Z
f (x)δ(x)dx = f (0) = F (k)dk;
(14.25)
now all we get is the contribution from all the k, superimposed, and all
resolution is lost.
Since the solar emission problem is clearly two-dimensional, and we need
3 min. resolution in both dimensions, it would seem that we would need a
circular antenna with a diameter of about one kilometer, or a rectangular
antenna roughly one kilometer on a side. Eventually, this problem was
solved by converting it into essentially a tomography problem and applying
the same techniques that are today used in CAT scan imaging.
14.15. DISCRETE DATA
14.15
167
Discrete Data
A familiar topic in signal processing is the passage from functions of continuous variables to discrete sequences. This transition is achieved by sampling, that is, extracting values of the continuous-variable function at discrete points in its domain. Our example of farfield propagation can be used
to explore some of the issues involved in sampling.
Imagine an infinite uniform line array of sensors formed by placing
receivers at the points (n∆, 0, 0), for some ∆ > 0 and all integers n. Then
our data are the values f (n∆). Because we defined k = ωc cos θ, it is clear
that the function F (k) is zero for k outside the interval [− ωc , ωc ].
Our discrete array of sensors cannot distinguish between the signal arriving from θ and a signal with the same amplitude, coming from an angle
α with
ω
ω
2π
cos α = cos θ +
m,
c
c
∆
(14.26)
where m is an integer. To resolve this ambiguity, we select ∆ > 0 so that
−
ω
ω 2π
+
≥ ,
c
∆
c
(14.27)
or
∆≤
πc
λ
= .
ω
2
(14.28)
The sensor spacing ∆s = λ2 is the Nyquist spacing.
In the sunspot example, the object function F (k) is zero for k outside
of an interval much smaller than [− ωc , ωc ]. Knowing that F (k) = 0 for
|k| > K, for some 0 < K < ωc , we can accept ambiguities that confuse
θ with another angle that lies outside the angular diameter of the object.
Consequently, we can redefine the Nyquist spacing to be
∆s =
π
.
K
(14.29)
This tells us that when we are imaging a distant object with a small angular
diameter, the Nyquist spacing is greater than λ2 . If our sensor spacing has
been chosen to be λ2 , then we have oversampled. In the oversampled case,
band-limited extrapolation methods can be used to improve resolution.
14.15.1
Reconstruction from Samples
From the data gathered at our infinite array we have extracted the Fourier
transform values f (n∆), for all integers n. The obvious question is whether
or not the data is sufficient to reconstruct F (k). We know that, to avoid
168
CHAPTER 14. PLANE-WAVE PROPAGATION
ambiguity, we must have ∆ ≤ πc
ω . The good news is that, provided this
condition holds, F (k) is uniquely determined by this data and formulas
exist for reconstructing F (k) from the data; this is the content of the
Shannon’s Sampling Theorem. Of course, this is only of theoretical interest,
since we never have infinite data. Nevertheless, a considerable amount of
traditional signal-processing exposition makes use of this infinite-sequence
model. The real problem, of course, is that our data is always finite.
14.16
The Finite-Data Problem
Suppose that we build a uniform line array of sensors by placing receivers
at the points (n∆, 0, 0), for some ∆ > 0 and n = −N, ..., N . Then our data
are the values f (n∆), for n = −N, ..., N . Suppose, as previously, that the
object of interest, the function F (k), is nonzero only for values of k in the
interval [−K, K], for some 0 < K < ωc . Once again, we must have ∆ ≤ πc
ω
to avoid ambiguity; but this is not enough, now. The finite Fourier data
is no longer sufficient to determine a unique F (k). The best we can hope
to do is to estimate the true F (k), using both our measured Fourier data
and whatever prior knowledge we may have about the function F (k), such
as where it is nonzero, if it consists of Dirac delta point sources, or if it is
nonnegative. The data is also noisy, and that must be accounted for in the
reconstruction process.
In certain applications, such as sonar array processing, the sensors are
not necessarily arrayed at equal intervals along a line, or even at the grid
points of a rectangle, but in an essentially arbitrary pattern in two, or even
three, dimensions. In such cases, we have values of the Fourier transform
of the object function, but at essentially arbitrary values of the variable.
How best to reconstruct the object function in such cases is not obvious.
14.17
Functions of Several Variables
Fourier transformation applies, as well, to functions of several variables. As
in the one-dimensional case, we can motivate the multi-dimensional Fourier
transform using the farfield propagation model. As we noted earlier, the
solar emission problem is inherently a two-dimensional problem.
14.17.1
Two-Dimensional Farfield Object
Assume that our sensors are located at points s = (x, y, 0) in the x,y-plane.
As discussed previously, we assume that the function F (k) can be viewed
as a function F (k1 , k2 ). Since, in most applications, the distant object has
a small angular diameter when viewed from a great distance - the sun’s is
14.18. BROADBAND SIGNALS
169
only 30 minutes of arc - the function F (k1 , k2 ) will be supported on a small
subset of vectors (k1 , k2 ).
14.17.2
Limited Apertures in Two Dimensions
Suppose we have the values of the Fourier transform, f (x, y), for |x| ≤ A
and |y| ≤ A. We describe this limited-data problem using the function
h(x, y) that is one for |x| ≤ A, and |y| ≤ A, and zero, otherwise. Then the
point-spread function is the Fourier transform of this h(x, y), given by
H(α, β) = 4ABsinc(Aα)sinc(Bβ).
(14.30)
The resolution in the horizontal (x) direction is on the order of A1 , and
1
B in the vertical, where, as in the one-dimensional case, aperture is best
measured in units of wavelength.
Suppose our aperture is circular,
with radius A. Then we have Fourier
p
2 + y 2 ≤ A. Let h(x, y) equal one, for
transform
values
f
(x,
y)
for
x
p
x2 + y 2 ≤ A, and zero, otherwise. Then the point-spread function of
this limited-aperture system is the
p Fourier transform of h(x, y), given by
J
(rA),
with
r
=
α2 + β 2 . The resolution of this system is
H(α, β) = 2πA
1
r
roughly the distance from the origin to the first null of the function J1 (rA),
which means that rA = 4, roughly.
For the solar emission problem, this says that we would need a circular
aperture with radius approximately one kilometer to achieve 3 minutes of
arc resolution. But this holds only if the antenna is stationary; a moving
antenna is different! The solar emission problem was solved by using a
rectangular antenna with a large A, but a small B, and exploiting the
rotation of the earth. The resolution is then good in the horizontal, but bad
in the vertical, so that the imaging system discriminates well between two
distinct vertical lines, but cannot resolve sources within the same vertical
line. Because B is small, what we end up with is essentially the integral
of the function f (x, z) along each vertical line. By tilting the antenna, and
waiting for the earth to rotate enough, we can get these integrals along
any set of parallel lines. The problem then is to reconstruct F (k1 , k2 ) from
such line integrals. This is also the main problem in tomography.
14.18
Broadband Signals
We have spent considerable time discussing the case of a distant point
source or an extended object transmitting or reflecting a single-frequency
signal. If the signal consists of many frequencies, the so-called broadband
case, we can still analyze the received signals at the sensors in terms of
time delays, but we cannot easily convert the delays to phase differences,
and thereby make good use of the Fourier transform. One approach is
170
CHAPTER 14. PLANE-WAVE PROPAGATION
to filter each received signal, to remove components at all but a single
frequency, and then to proceed as previously discussed. In this way we can
process one frequency at a time. The object now is described in terms of a
function of both k and ω, with F (k, ω) the complex amplitude associated
with the wave vector k and the frequency ω. In the case of radar, the
function F (k, ω) tells us how the material at P reflects the radio waves at
the various frequencies ω, and thereby gives information about the nature
of the material making up the object near the point P .
There are times, of course, when we do not want to decompose a broadband signal into single-frequency components. A satellite reflecting a TV
signal is a broadband point source. All we are interested in is receiving the
broadband signal clearly, free of any other interfering sources. The direction of the satellite is known and the antenna is turned to face the satellite.
Each location on the parabolic dish reflects the same signal. Because of its
parabolic shape, the signals reflected off the dish and picked up at the focal
point have exactly the same travel time from the satellite, so they combine
coherently, to give us the desired TV signal.
14.18. BROADBAND SIGNALS
Figure 14.1: A uniform line array sensing a planewave field.
171
172
CHAPTER 14. PLANE-WAVE PROPAGATION
Part V
Nonlinear Models
173
Chapter 15
Random Sequences
15.1
Chapter Summary
When we sample a function f (x) we usually make some error, and the
data we get is not precisely f (n∆), but contains additive noise, that is, our
data value is really f (n∆) + noise. Noise is best viewed as random, so it
becomes necessary to treat random sequences f = {fn } in which each fn
is a random variable. The random variables fn and fm may or may not be
statistically independent.
15.2
What is a Random Variable?
The simplest answer to the question What is a random variable? is A
random variable is a mathematical model. Imagine that we repeatedly
drop a baseball from eye-level to the floor. Each time, the baseball behaves
the same. If we were asked to describe this behavior with a mathematical
model, we probably would choose to use a differential equation as our
model. Ignoring everything except the force of gravity, we would write
h00 (t) = −32
as the equation describing the downward acceleration due to gravity. Integrating, we have
h0 (t) = −32t + h0 (0)
as the velocity of the baseball at time t ≥ 0, and integrating once more,
h(t) = −16t2 + h0 (0)t + h(0)
as the equation of position of the baseball at time t ≥ 0, up to the moment
when it hits the floor. Knowing h(0), the distance from eye-level to the
175
176
CHAPTER 15. RANDOM SEQUENCES
floor, and knowing that, since we dropped the ball, h0 (0) = 0, we can
determine how long it will take the baseball to hit the floor, and the speed
with which it will hit. This analysis will apply every time we drop the
baseball. There will, of course, be slight differences from one drop to the
next, depending, perhaps, on how the ball was held, but these will be so
small as to be insignificant.
Now imagine that, instead of a baseball, we drop a feather. A few
repetitions are all that is necessary to convince us that the model used
for the baseball no longer suffices. The factors such as air resistance, air
currents and how the object was held that we safely ignored with regard
to the baseball, now become important. The feather does not always land
in the same place, it doesn’t always take the same amount of time to reach
the floor, and doesn’t always land with the same velocity. It doesn’t even
fall in straight vertical line. How can we possibly model such behavior?
Must we try to describe accurately the air resistance encountered by the
feather? The answer is that we use random variables as our model.
While we cannot say precisely where the feather will land, and, of
course, we must be careful to specify how we are to determine “the place” ,
we can learn, from a number of trials, where it tends to land, and we can
postulate the probability that it will land within any given region of the
floor. In this way, the place where the feather will land becomes a random
variable with associated probability density function. Similarly, we can
postulate the probability that the time for the fall will lie within any interval of elapsed time, making the elapsed time a random variable. Finally,
we can postulate the probability that its velocity vector upon hitting the
ground will lie within any given set of three-dimensional vectors, making
the velocity a random vector. On the basis of these probabilistic models
we can proceed to predict the outcome of the next drop.
It is important to remember that the random variable is the model that
we set up prior to the dropping of the feather, not the outcome of any
particular drop.
15.3
The Coin-Flip Random Sequence
The simplest example of a random sequence is the coin-flip sequence, which
we denote by c = {cn }∞
n=−∞ . We imagine that, at each “time” n, a coin is
flipped, and cn = 1 if the coin shows heads, and cn = −1 if the coin shows
tails. When we speak of this coin-flip sequence, we refer to this random
model, not to any specific sequence of ones and minus ones; the random
coin-flip sequence is not, therefore, a particular sequence, just as a random
variable is not actually a specific number. Any particular sequence of ones
and minus ones can be thought of as having resulted from such an infinite
number of flips of the coin, and is called a realization of the random coin-flip
15.4. CORRELATION
177
sequence.
It will be convenient to allow for the coin to be biased, that is, for
the probabilities of heads and tails to be unequal. We denote by p the
probability that heads occurs and 1 − p the probability of tails; the coin is
called unbiased or fair if p = 1/2. To find the expected value of cn , written
E(cn ), we multiply each possible value of cn by its probability and sum;
that is,
E(cn ) = (+1)p + (−1)(1 − p) = 2p − 1.
If the coin is fair then E(cn ) = 0. The variance of the random variable
cn , measuring its tendency to deviate from its expected value, is var(cn ) =
E([cn − E(cn )]2 ). We have
var(cn ) = [+1 − (2p − 1)]2 p + [−1 − (2p − 1)]2 (1 − p) = 4p − 4p2 .
If the coin is fair then var(cn ) = 1. It is important to note that we do not
change the coin at any time during the generation of a realization of the
random sequence c; in particular, the p does not depend on n. Also, we
assume that the random variables cn are statistically independent.
15.4
Correlation
Let u and v be (possibly complex-valued) random variables with expected
values E(u) and E(v), respectively. The covariance between u and v is
defined to be
cov(u, v) = E (u − E(u))(v − E(v)) ,
and the cross-correlation between u and v is
corr(u, v) = E(uv).
It is easily shown that cov(u, v) = corr(u, v) − E(u)E(v). When u = v
we get cov(u, u) = var(u) and corr(u, u) = E(|u|2 ). If E(u) = E(v) = 0
then cov(u, v) = corr(u, v). In statistics the “correlation coefficient” is the
quantity cov(u, v) divided by the standard deviations of u and v.
When u and v are independent, we have
E(uv) = E(u)E(v),
and
E (u − E(u))(v − E(v)) = E(u − E(u))E((v − E(v))) = 0.
To illustrate, let u = cn and v = cn−m . Then, if the coin is fair,
E(cn ) = E(cn−m ) = 0 and
cov(cn , cn−m ) = corr(cn , cn−m ) = E(cn cn−m ).
178
CHAPTER 15. RANDOM SEQUENCES
Because the cn are independent, E(cn cn−m ) = 0 for m not equal to 0, and
E(|cn |2 ) = var(cn ) = 1. Therefore
cov(cn , cn−m ) = corr(cn , cn−m ) = 0, for m 6= 0,
and
cov(cn , cn ) = corr(cn , cn ) = 1.
In the next section we shall use the random coin-flip sequence to generate a wide class of random sequences, obtained by viewing c = {cn } as
the input into a shift-invariant discrete linear filter.
15.5
Filtering Random Sequences
Suppose, once again, that T is a shift-invariant discrete linear filter with
impulse-response sequence g. Now let us take as input, not a particular sequence, but the random coin-flip sequence c, with p = 0.5. The output will
therefore not be a particular sequence either, but will be another random
sequence, say d. Then, for each n the random variable dn is
dn =
∞
X
∞
X
cm gn−m =
m=−∞
gm cn−m .
(15.1)
m=−∞
We compute the correlation corr(dn , dn−m ) = E(dn dn−m ). Using the convolution formula Equation (15.1), we find that
corr(dn , dn−m ) =
∞
X
∞
X
gk gj corr(cn−k , cn−m−j ).
k=−∞ j=−∞
Since
corr(cn−k , cn−m−j ) = 0, for k 6= m + j,
we have
∞
X
corr(dn , dn−m ) =
gk gk−m .
(15.2)
k=−∞
The expression of the right side of Equation (15.2) is the definition of the
autocorrelation of the non-random sequence g, denoted ρg = {ρg (m)}; that
is,
ρg (m) =
∞
X
k=−∞
gk gk−m .
(15.3)
15.6. AN EXAMPLE
179
It is important to note that the expected value of dn is
E(dn ) =
∞
X
gk E(cn−k ) = 0
k=−∞
and the correlation corr(dn , dn−m ) depends only on m; neither quantity
depends on n and the sequence d is therefore called weak-sense stationary.
Let’s consider an example.
15.6
An Example
Take g0 = g1 = 0.5 and gk = 0 otherwise. Then the system is the two-point
moving-average, with
dn = 0.5cn + 0.5cn−1 .
In the case of the random-coin-flip sequence c each cn is unrelated to all
other cm ; the coin flips are independent. This is no longer the case for the
dn ; one effect of the filter g is to introduce correlation into the output. To
illustrate, since d0 and d1 both depend, to some degree, on the value c0 ,
they are related. Using Equation (15.3) we have
corr(dn , dn ) = ρg (0) = g0 g0 + g1 g1 = 0.25 + 0.25 = 0.5,
corr(dn , dn+1 ) = ρg (−1) = g0 g1 = 0.25,
corr(dn , dn−1 ) = ρg (+1) = g1 g0 = 0.25,
and
corr(dn , dn−m ) = ρg (m) = 0, otherwise.
So we see that dn and dn−m are related, for m = −1, 0, +1, but not otherwise.
15.7
Correlation Functions and Power Spectra
As we have seen, any non-random sequence g = {gn } has its autocorrelation
function defined, for each integer m, by
ρg (m) =
∞
X
gk gk−m .
k=−∞
For a random sequence dn that is wide-sense stationary, its correlation
function is defined to be
ρd (m) = E(dn dn−m ).
180
CHAPTER 15. RANDOM SEQUENCES
The power spectrum of g is defined for ω in [−π, π] by
∞
X
Rg (ω) =
ρg (m)eimω .
m=−∞
It is easy to see that
Rg (ω) = |G(ω)|2 ,
where
G(ω) =
∞
X
gn einω ,
n=−∞
so that Rg (ω) ≥ 0. The power spectrum of the random sequence d = {dn }
is defined as
∞
X
Rd (ω) =
ρd (m)eimω .
m=−∞
Although it is not immediately obvious, we also have Rd (ω) ≥ 0. One way
to see this is to consider
∞
X
dn einω
D(ω) =
n=−∞
and to calculate
E(|D(ω)|2 ) =
∞
X
E(dn dn−m )eimω = Rd (ω).
m=−∞
Given any power spectrum Rd (ω) ≥ 0 we can construct G(ω) by selecting
an arbitrary phase angle θ and letting
p
G(ω) = Rd (ω)eiθ .
We then obtain the non-random sequence g associated with G(ω) using
Z π
1
gn =
G(ω)e−inω dω.
2π −π
It follows that ρg (m) = ρd (m) for each m and Rg (ω) = Rd (ω) for each ω.
What we have discovered is that, when the input to the system is the
random-coin-flip sequence c, the output sequence d has a correlation function ρd (m) that is equal to the autocorrelation of the sequence g. As we just
saw, for any weak-sense stationary random sequence d with expected value
E(dn ) constant and correlation function corr(dn , dn−m ) independent of n,
there is a shift-invariant discrete linear system T with impulse-response
sequence g, such that ρg (m) = ρd (m) for each m. Therefore, any weaksense stationary random sequence d can be viewed as the output of a shiftinvariant discrete linear system, when the input is the random-coin-flip
sequence c = {cn }.
15.8. THE DIRAC DELTA IN FREQUENCY SPACE
15.8
181
The Dirac Delta in Frequency Space
Consider the “function” defined by the infinite sum
∞
∞
1 X inω
1 X −inω
δ(ω) =
e
=
e
.
2π n=−∞
2π n=−∞
(15.4)
This is a Fourier series in which all the Fourier coefficients are one. The
series doesn’t converge in the usual sense, but still has some uses. In
particular, look what happens when we take
F (ω) =
∞
X
f (n)e−inω ,
n=−∞
for π ≤ ω ≤ π, and calculate
Z π
∞
X
F (ω)δ(ω)dω =
1
2π
n=−∞
−π
We have
Z
π
F (ω)e−inω dω.
−π
∞
1 X
F (ω)δ(ω)dω =
f (n) = F (0),
2π n=−∞
−π
Z
π
where the f (n) are the Fourier coefficients of F (ω). This means that δ(ω)
has the sifting property, just like we saw with the Dirac delta δ(x); that is
why we call it δ(ω). When we shift δ(ω) to get δ(ω − α), we find that
Z π
F (ω)δ(ω − α)dω = F (α).
−π
The “function” δ(ω) is the Dirac delta for ω space.
15.9
Random Sinusoidal Sequences
Consider A = |A|eiθ , with amplitude |A| a positive-valued random variable
and phase angle θ a random variable taking values in the interval [−π, π];
then A is a complex-valued random variable. For a fixed frequency ω0 we
define a random sinusoidal sequence s = {sn } by sn = Ae−inω0 . We assume
that θ has the uniform distribution over [−π, π] so that the expected value
of sn is zero. The correlation function for s is
ρs (m) = E(sn sn−m ) = E(|A|2 )e−imω0
and the power spectrum of s is
Rs (ω) = E(|A|2 )
∞
X
m=−∞
e−im(ω0 −ω) ,
182
CHAPTER 15. RANDOM SEQUENCES
so that, by Equation (15.4), we have
Rs (ω) = 2πE(|A|2 )δ(ω − ω0 ).
We generalize this example to the case of multiple independent sinusoids.
Suppose that, for j = 1, ..., J, we have fixed frequencies ωj and independent complex-valued random variables Aj . We let our random sequence be
defined by
sn =
J
X
Aj e−inωj .
j=1
Then the correlation function for s is
J
X
ρs (m) =
E(|Aj |2 )e−imωj
j=1
and the power spectrum for s is
Rs (ω) = 2π
J
X
E(|Aj |2 )δ(ω − ωj ).
j=1
This is the commonly used model of independent sinusoids. The problem
of power spectrum estimation is to determine the values J, the frequencies
ωj and the variances E(|Aj |2 ) from finitely many samples from one or more
realizations of the random sequence s.
15.10
Random Noise Sequences
Let q = {qn } be an arbitrary weak-sense stationary discrete random sequence, with correlation function ρq (m) and power spectrum Rq (ω). We
say that q is white noise if ρq (m) = 0 for m not equal to zero, or, equivalently, if the power spectrum Rq (ω) is constant over the interval [−π, π].
The independent sinusoids in additive white noise model is a random sequence of the form
xn =
J
X
Aj e−inωj + qn .
j=1
The signal power is defined to be ρs (0), which is the sum of the E(|Aj |2 ),
while the noise power is ρq (0). The signal-to-noise ratio (SNR) is the ratio
of signal power to noise power.
15.11. INCREASING THE SNR
15.11
183
Increasing the SNR
It is often the case that the SNR is quite low and it is desirable to process
the data from x to enhance this ratio. The data we have is typically finitely
many values of one realization of x. We say we have fn for n = 1, 2, ..., N ;
we don’t say we have xn because xn is the random variable, not one value
of the random variable. One way to process the data is to estimate ρx (m)
for some small number of integers m around zero, using, for example, the
lag products estimate
ρ̂x (m) =
NX
−m
1
fn fn−m ,
N − m n=1
for m = 0, 1, ..., M < N and ρ̂x (−m) = ρ̂x (m). Because ρq (m) = 0 for m
not equal to zero, we will have ρ̂x (m) approximating ρs (m) for nonzero values of m, thereby reducing the effect of the noise. Therefore, our estimates
of ρs (m) are relatively noise-free for m 6= 0.
15.12
Colored Noise
The additive noise is said to be correlated or non-white if it is not the case
that ρx (m) = 0 for all nonzero m. In this case the noise power spectrum is
not constant, and so may be concentrated in certain regions of the interval
[−π, π].
The next few sections deal with applications of random sequences.
15.13
Spread-Spectrum Communication
In this section we return to the random-coin-flip model, this time allowing
the coin to be biased, that is, p need not be 0.5. Let s = {sn } be a random
sequence, such as sn = Aeinω0 , with E(sn ) = µ and correlation function
ρs (m). Define a second random sequence x by
xn = sn cn .
The random sequence x is generated from the random signal s by randomly
changing its signs. We can show that
E(xn ) = µ(2p − 1)
and, for m not equal to zero,
ρx (m) = ρs (m)(2p − 1)2 ,
184
CHAPTER 15. RANDOM SEQUENCES
with
ρx (0) = ρs (0) + 4p(1 − p)µ2 .
Therefore, if p = 1 or p = 0 we get ρx (m) = ρs (m) for all m, but for
p = 0.5 we get ρx (m) = 0 for m not equal to zero. If the coin is unbiased,
then the random sign changes convert the original signal s into white noise.
Generally, we have
Rx (ω) = (2p − 1)2 Rs (ω) + (1 − (2p − 1)2 )(µ2 + ρs (0)),
which says that the power spectrum of x is a combination of the signal
power spectrum and a white-noise power spectrum, approaching the whitenoise power spectrum as p approaches 0.5. If the original signal power
spectrum is concentrated within a small interval, then the effect of the
random sign changes is to spread that spectrum. Once we know what
the particular realization of the random sequence c is that has been used,
we can recapture the original signal from sn = xn cn . The use of such
a spread spectrum permits the sending of multiple narrow-band signals,
without confusion, as well as protecting against any narrow-band additive
interference.
15.14
Stochastic Difference Equations
The ordinary first-order differential equation y 0 (t) + ay(t) = f (t), with
Rt
initial condition y(0) = 0, has for its solution y(t) = e−at 0 eas f (s)ds.
One way to look at such differential equations is to consider f (t) to be
the input to a system having y(t) as its output. The system determines
which terms will occur on the left side of the differential equation. In many
applications the input f (t) is viewed as random noise and the output is then
a continuous-time random process. Here we want to consider the discrete
analog of such differential equations.
We replace the first derivative with the first difference, yn+1 −yn and we
replace the input with the random-coin-flip sequence c = {cn }, to obtain
the random difference equation
yn+1 − yn + ayn = cn .
(15.5)
With b = 1 − a and 0 < b < 1 we have
yn+1 − byn = cn .
(15.6)
The solution is y = {yn } given by
yn = bn−1
n−1
X
k=−∞
b−k ck .
(15.7)
15.15. RANDOM VECTORS AND CORRELATION MATRICES
185
Comparing this with the solution of the differential equation, we see that
the term bn−1 plays the role of e−at = (e−a )t , so that b = 1 − a is substituting for e−a . The infinite sum replaces the infinite integral, with b−k ck
replacing the integrand eas f (s).
The solution sequence y given by Equation (15.7) is a weak-sense stationary random sequence and its correlation function is
ρy (m) = bm /(1 − b2 ).
Since
bn−1
n−1
X
b−k =
k=−∞
1
1−b
the random sequence (1 − b)yn = ayn is an infinite moving-average random
sequence formed from the random sequence c.
We can derive the solution in Equation (15.7) using z-transforms. We
write
∞
X
yn z −n ,
Y (z) =
n=−∞
and
C(z) =
∞
X
cn z −n .
n=−∞
From Equation (15.6) we have
zY (z) − bY (z) = C(z),
or
Y (z) = C(z)(z − b)−1 .
Expanding in a geometric series, we get
Y (z) = C(z)z −1 1 + bz −1 + b2 z −2 + ... ,
from which the solution given in Equation (15.7) follows immediately.
15.15
Random Vectors and Correlation Matrices
In estimation and detection theory, the task is to distinguish signal vectors
from noise vectors. In order to perform such a task, we need to know how
signal vectors differ from noise vectors. Most frequently, what we have is
statistical information. The signal vectors of interest, which we denote by
s = (s1 , ..., sN )T , typically exhibit some patterns of behavior among their
186
CHAPTER 15. RANDOM SEQUENCES
entries. For example, a constant signal, such as s = (1, 1, ..., 1)T , has all its
entries identical. A sinusoidal signal, such as s = (1, −1, 1, −1, ..., 1, −1)T ,
exhibits a periodicity in its entries. If the signal is a vectorization of a twodimensional image, then the patterns will be more difficult to describe, but
will be there, nevertheless. In contrast, a typical noise vector, denoted
q = (q1 , ..., qN )T , may have entries that are statistically unrelated to each
other, as in white noise. Of course, what is signal and what is noise depends
on the context; unwanted interference in radio may be viewed as noise, even
though it may be a weather report or a song.
To deal with these notions mathematically, we adopt statistical models.
The entries of s and q are taken to be random variables, so that s and
q are random vectors. Often we assume that the mean values, E(s) and
E(q), are both equal to the zero vector. Then patterns that may exist
among the entries of these vectors are described in terms of correlations.
The noise covariance matrix, which we
denote by Q, has for its entries
Qmn = E (qm − E(qm ))(qn − E(qn )) , for m, n = 1, ..., N . The signal
covariance matrix is defined similarly. If E(qn ) = 0 and E(|qn |2 ) = 1
for each n, then Q is the noise correlation matrix. Such matrices Q are
Hermitian and non-negative definite, that is, x† Qx is non-negative, for
every vector x. If Q is a positive multiple of the identity matrix, then the
noise vector q is said to be a white noise random vector.
Chapter 16
Classical and Modern
Methods
16.1
Chapter Summary
It is common to speak of classical, as opposed to modern, signal processing
methods. In this chapter we describe briefly the distinction.
16.2
The Classical Methods
In [66] Candy locates the beginning of the classical period of spectral estimation in Schuster’s use of Fourier techniques in 1898 to analyze sun-spot
data [198]. The role of Fourier techniques grew with the discovery, by
Wiener in the USA and Khintchine in the USSR, of the relation between
the power spectrum and the autocorrelation function. Much of Wiener’s
important work on control and communication remained classified and became known only with the publication of his classic text Time Series in
1949 [225]. The book by Blackman and Tukey, Measurement of Power
Spectra [17], provides perhaps the best description of the classical methods. With the discovery of the FFT by Cooley and Tukey in 1965, all the
pieces were in place for the rapid development of this DFT-based approach
to spectral estimation.
16.3
Modern Signal Processing and Entropy
Until about the middle of the 1970s most signal processing depended almost
exclusively on the DFT, as implemented using the FFT. Algorithms such as
the Gerchberg-Papoulis bandlimited extrapolation method were performed
187
188
CHAPTER 16. CLASSICAL AND MODERN METHODS
as iterative operations on finite vectors, using the FFT at every step. Linear
filters and related windowing methods involving the FFT were also used
to enhance the resolution of the reconstructed objects. The proper design
of these filters was an area of interest to quite a number of researchers,
John Tukey among them. Then, around the end of that decade, interest
in entropy maximization began to grow, as researchers began to wonder
if high-resolution methods developed for seismic oil exploration could be
applied successfully in other areas.
John Burg had developed his maximum entropy method (MEM) while
working in the oil industry in the 1960s. He then went to Stanford as a
mature graduate student and received his doctorate in 1975 for a thesis
based largely on his earlier work on MEM [32]. This thesis and a handful
of earlier presentations at meetings [30, 31] fueled the interest in entropy.
It was not only the effectiveness of Burg’s techniques that attracted the
attention of members of the signal-processing community. The classical
methods seemed to some to be ad hoc, and they sought a more intellectually
satisfying basis for spectral estimation. Classical methods start with the
time series data, say xn , for n = 1, ..., N . In the direct approach, slightly
simplified, the data is windowed; that is, xn is replaced with xn wn for
some choice of constants wn . Then, the vDFT is computed, using the
FFT, and the squared magnitudes of the entries of the vDFT provide the
desired estimate of the power spectrum. In the more indirect approach,
autocorrelation values rx (m) are first estimated, for m = 0, 1, ..., M , where
M is some fraction of the data length N . Then, these estimates of rx (m)
are windowed and the vDFT calculated, again using the FFT.
What some people objected to was the use of these windows. After
all, the measured data was xn , not xn wn , so why corrupt the data at the
first step? The classical methods produced answers that depended to some
extent on which window function one used; there had to be a better way.
Entropy maximization was the answer to their prayers.
In 1981 the first of several international workshops on entropy maximization was held at the University of Wyoming, bring together most of
the people working in this area. The books [205] and [206] contain the
papers presented at those workshops. As one can see from reading those
papers, the general theme is that a new day has dawned.
16.4
Related Methods
It was soon recognized that maximum entropy methods were closely related
to model-based techniques that had been part of statistical time series
for decades. This realization led to a broader use of autoregressive (AR)
and autoregressive, moving average (ARMA) models for spectral estimation
[189], as well as of eigenvector methods, such as Pisarenko’s method [186].
16.4. RELATED METHODS
189
What Candy describes as the modern approach to spectral estimation is
one based on explicit parametric models, in contrast to the classical nonparametric approach. The book edited by Don Childers [76] is a collection
of journal articles that captures the state-of-the-art at the end of the 1970s.
In a sense the transition from the classical ways to the modern methods
solved little; the choice of models is as ad hoc as the choice of windows was
before. On the other hand, we do have a wider collection of techniques
from which to choose and we can examine these techniques to see when
they perform well and when they do not. We do not expect one approach
to work in all cases. High-speed computation permits the use of more
complicated parametric models tailored to the physics of a given situation.
Our estimates will, eventually, be used for some purpose. In medical
imaging a doctor is going to make a diagnosis based in part on what the
image reveals. How good the image needs to be depends on the purpose
for which it is made. Judging the quality of a reconstructed image based
on somewhat subjective criteria, such as how useful it is to a doctor, is a
problem that is not yet solved. Human-observer studies are one way to
obtain this nonmathematical evaluation of reconstruction and estimation
methods. The next step beyond that is to develop computer software that
judges the images or spectra as a human would.
190
CHAPTER 16. CLASSICAL AND MODERN METHODS
Chapter 17
Entropy Maximization
17.1
Chapter Summary
The problem of estimating the nonnegative function R(ω), for |ω| ≤ π,
from the finitely many Fourier-transform values
Z π
r(n) =
R(ω) exp(−inω)dω/2π, n = −N, ..., N
−π
is an under-determined problem, meaning that the data alone is insufficient
to determine a unique answer. In such situations we must select one solution out of the infinitely many that are mathematically possible. The
obvious questions we need to answer are: What criteria do we use in this
selection? How do we find algorithms that meet our chosen criteria? In
this chapter we look at some of the answers people have offered and at one
particular algorithm, Burg’s maximum entropy method (MEM) [30, 31].
17.2
Estimating Non-Negative Functions
The values r(n) are autocorrelation function values associated with a random process having R(ω) for its power spectrum. In many applications,
such as seismic remote sensing, these autocorrelation values are estimates
obtained from relatively few samples of the underlying random process, so
that N is not large. The DFT estimate,
RDF T (ω) =
N
X
r(n) exp(inω),
n=−N
is real-valued and consistent with the data, but is not necessarily nonnegative. For small values of N , the DFT may not be sufficiently resolving
191
192
CHAPTER 17. ENTROPY MAXIMIZATION
to be useful. This suggests that one criterion we can use to perform our
selection process is to require that the method provide better resolution
than the DFT for relatively small values of N , when reconstructing power
spectra that consist mainly of delta functions.
17.3
Philosophical Issues
Generally speaking, we would expect to do a better job of estimating a
function from data pertaining to that function if we also possess additional
prior information about the function to be estimated and are able to employ estimation techniques that make use of that additional information.
There is the danger, however, that we may end up with an answer that
is influenced more by our prior guesses than by the actual measured data.
Striking a balance between including prior knowledge and letting the data
speak for itself is a noble goal; how to achieve that is the question. At this
stage, we begin to suspect that the problem is as much philosophical as it
is mathematical.
We are essentially looking for principles of induction that enable us to
extrapolate from what we have measured to what we have not. Unwilling to
turn the problem over entirely to the philosophers, a number of mathematicians and physicists have sought mathematical solutions to this inference
problem, framed in terms of what the most likely answer is, or which answer
involves the smallest amount of additional prior information [90]. This is
not, of course, a new issue; it has been argued for centuries with regard to
the use of what we now call Bayesian statistics; objective Bayesians allow
the use of prior information, but only if it is the right prior information.
The interested reader should consult the books [205] and [206], containing papers by Ed Jaynes, Roy Frieden, and others originally presented at
workshops on this topic held in the early 1980s.
The maximum entropy method is a general approach to such problems
that includes Burg’s algorithm as a particular case. It is argued that by
maximizing entropy we are, in some sense, being maximally noncommittal
about what we do not know and thereby introducing a minimum of prior
knowledge (some would say prior guesswork) into the solution. In the case
of Burg’s MEM, a somewhat more mathematical argument is available.
Let {xn }∞
n=−∞ be a stationary random process with autocorrelation
sequence r(m) and power spectrum R(ω), |ω| ≤ π. The prediction problem
is the following: suppose we have measured the values of the process prior
to time n and we want to predict the value of the process at time n.
On average, how much error do we expect to make in predicting xn from
knowledge of the infinite past? The answer, according to Szegö’s theorem
17.4. THE AUTOCORRELATION SEQUENCE {R(N )}
193
[135], is
Z
exp[
π
log R(ω)dω];
−π
the integral
Z
π
log R(ω)dω
−π
is the Burg entropy of the random process [189]. Processes that are very
predictable have low entropy, while those that are quite unpredictable, or,
like white noise, completely unpredictable, have high entropy; to make
entropies comparable, we assume a fixed value of r(0). Given the data
r(n), |n| ≤ N , Burg’s method selects that power spectrum consistent with
these autocorrelation values that corresponds to the most unpredictable
random process.
Other similar procedures are also based on selection through optimization. We have seen the minimum norm approach to finding a solution
to an underdetermined system of linear equations, and the minimum expected squared error approach in statistical filtering, and later we shall
see the maximum likelihood method used in detection. We must keep in
mind that, however comforting it may be to know that we are on solid
philosophical ground (if such exists) in choosing our selection criteria, if
the method does not work well, we must use something else. As we shall
see, the MEM, like every other reasonable method, works well sometimes
and not so well other times. There is certainly philosophical precedent for
considering the consequences of our choices, as Blaise Pascal’s famous wager about the existence of God nicely illustrates. As an attentive reader of
the books [205] and [206] will surely note, there is a certain theological tone
to some of the arguments offered in support of entropy maximization. One
group of authors (reference omitted) went so far as to declare that entropy
maximization was what one did if one cared what happened to one’s data.
The objective of Burg’s MEM for estimating a power spectrum is to
seek better resolution by combining nonnegativity and data-consistency in
a single closed-form estimate. The MEM is remarkable in that it is the only
closed-form (that is, noniterative) estimation method that is guaranteed
to produce an estimate that is both nonnegative and consistent with the
autocorrelation samples. Later we shall consider a more general method,
the inverse PDFT (IPDFT), that is both data-consistent and positive in
most cases.
17.4
The Autocorrelation Sequence {r(n)}
We begin our discussion with important properties of the sequence {r(n)}.
Because R(ω) ≥ 0, the values r(n) are often called autocorrelation values.
194
CHAPTER 17. ENTROPY MAXIMIZATION
Since R(ω) ≥ 0, it follows immediately that r(0) ≥ 0. In addition,
r(0) ≥ |r(n)| for all n:
Z π
R(ω) exp(−inω)dω/2π|
|r(n)| = |
−π
π
Z
≤
R(ω)| exp(−inω)|dω/2π = r(0).
−π
In fact, if r(0) = |r(n)| > 0 for some n > 0, then R is a sum of at most
n + 1 delta functions with nonnegative amplitudes. To see this, suppose
that r(n) = |r(n)| exp(iθ) = r(0) exp(iθ). Then,
Z π
R(ω)|1 − exp(i(θ + nω))|2 dω/2π
−π
Z
π
R(ω)(1 − exp(i(θ + nω))(1 − exp(−i(θ + nω))dω/2π
=
−π
Z π
R(ω)[2 − exp(i(θ + nω)) − exp(−i(θ + nω))]dω/2π
=
−π
= 2r(0) − exp(iθ)r(n) − exp(−iθ)r(n) = 2r(0) − r(0) − r(0) = 0.
Therefore, R(ω) > 0 only at the values of ω where |1−exp(i(θ +nω))|2 = 0;
that is, only at ω = n−1 (2πk − θ) for some integer k. Since |ω| ≤ π, there
are only finitely many such k.
This result is important in any discussion of resolution limits. It is
natural to feel that if we have only the Fourier coefficients r(n) for |n| ≤ N
then we have only the low frequency information about the function R(ω).
How is it possible to achieve higher resolution? Notice, however, that
in the case just considered, the infinite sequence of Fourier coefficients is
periodic. Of course, we do not know this a priori, necessarily. The fact
that |r(N )| = r(0) does not, by itself, tell us that R(ω) consists solely of
delta functions and that the sequence of Fourier coefficients is periodic.
But, under the added assumption that R(ω) ≥ 0, it does! When we put
in this prior information about R(ω) we find that the data now tells us
more than it did before. This is a good example of the point made in the
Introduction; to get information out we need to put information in.
In discussing the Burg MEM estimate, we shall need to refer to the
concept of minimum-phase vectors. We consider that briefly now.
17.5
Minimum-Phase Vectors
We say that the finite column vector with complex entries (a0 , a1 , ..., aN )T
is a minimum-phase vector if the complex polynomial
A(z) = a0 + a1 z + ... + aN z N
17.6. BURG’S MEM
195
has the property that A(z) = 0 implies that |z| > 1; that is, all roots of
A(z) are outside the unit circle. Consequently, the function B(z) given by
B(z) = 1/A(z) is analytic in a disk centered at the origin and including
the unit circle. Therefore, we can write
B(z) = b0 + b1 z + b2 z 2 + ...,
and taking z = exp(iω), we get
B(exp(iω)) = b0 + b1 exp(iω) + b2 exp(2iω) + ... .
The point here is that B(exp(iω)) is a one-sided trigonometric series, with
only terms corresponding to exp(inω) for nonnegative n.
17.6
Burg’s MEM
The approach is to estimateRR(ω) by the function S(ω) > 0 that maximizes
π
the so-called Burg entropy, −π log S(ω)dω, subject to the data constraints.
The Euler-Lagrange equation from the calculus of variations allows us
to conclude that S(ω) has the form
S(ω) = 1/H(ω)
for
H(ω) =
N
X
hn einω > 0.
n=−N
From the Fejér-Riesz Theorem 31.1 we know that H(ω) = |A(eiω )|2 for
minimum phase A(z). As we now show, the coefficients an satisfy a system
of linear equations formed using the data r(n).
Given the data r(n), |n| ≤ N , we form the autocorrelation matrix R
with entries Rmn = r(m − n), for −N ≤ m, n ≤ N . Let δ be the column
vector δ = (1, 0, ..., 0)T . Let a = (a0 , a1 , ..., aN )T be the solution of the
system Ra = δ. Then, Burg’s MEM estimate is the function S(ω) =
RM EM (ω) given by
RM EM (ω) = a0 /|A(exp(iω))|2 , |ω| ≤ π.
Once we show that a0 ≥ 0, it will be obvious that RM EM (ω) ≥ 0. We also
must show that RM EM is data-consistent; that is,
Z π
r(n) =
RM EM (ω) exp(−inω)dω/2π =, n = −N, ..., N.
−π
Let us write RM EM (ω) as a Fourier series; that is,
RM EM (ω) =
+∞
X
n=−∞
q(n) exp(inω), |ω| ≤ π.
196
CHAPTER 17. ENTROPY MAXIMIZATION
From the form of RM EM (ω), we have
RM EM (ω)A(exp(iω)) = a0 B(exp(iω)).
(17.1)
Suppose, as we shall see shortly, that A(z) has all its roots outside the
unit circle, so B(exp(iω)) is a one-sided trigonometric series, with only
terms corresponding to exp(inω) for nonnegative n. Then, multiplying on
the left side of Equation (17.1), and equating coefficients corresponding to
n = 0, −1, −2, ..., we find that, provided q(n) = r(n), for |n| ≤ N , we must
have Ra = δ. Notice that these are precisely the same equations we solve
in calculating the coefficients of an AR process. For that reason the MEM
is sometimes called an autoregressive method for spectral estimation.
17.6.1
The Minimum-Phase Property
We now show that if Ra = δ then A(z) has all its roots outside the unit
circle. Let r exp(iθ) be a root of A(z). Then, write
A(z) = (z − r exp(iθ))C(z),
where
C(z) = c0 + c1 z + c2 z 2 + ... + cN −1 z N −1 .
The vector a = (a0 , a1 , ..., aN )T can be written as a = −r exp(iθ)c + d,
where c = (c0 , c1 , ..., cN −1 , 0)T and d = (0, c0 , c1 , ..., cN −1 )T . So, δ = Ra =
−r exp(iθ)Rc + Rd and
0 = d† δ = −r exp(iθ)d† Rc + d† Rd,
so that
r exp(iθ)d† Rc = d† Rd.
From the Cauchy inequality we know that
|d† Rc|2 ≤ (d† Rd)(c† Rc) = (d† Rd)2 ,
(17.2)
where the last equality comes from the special form of the matrix R and
the similarity between c and d.
With
D(ω) = c0 eiω + c1 e2iω ... + cN −1 eiN ω
and
C(ω) = c0 + c1 eiω + ... + cN −1 ei(N −1)ω ,
we can easily show that
d† Rd = c† Rc =
1
2π
Z
π
−π
R(ω)|D(ω)|2 dω
17.6. BURG’S MEM
197
and
d† Rc =
1
2π
Z
π
R(ω)D(ω)C(ω)dω.
−π
If there is equality in the Cauchy Inequality (17.2), then r = 1 and we
would have
Z π
Z π
1
1
exp(iθ)
R(ω)D(ω)C(ω)dω =
R(ω)|D(ω)|2 dω.
2π −π
2π −π
From the Cauchy Inequality for integrals, we can conclude that
exp(iθ)D(ω)C(ω) = |D(ω)|2
for all ω for which R(ω) > 0. But,
exp(iω)C(ω) = D(ω).
Therefore, we cannot have r = 1 unless R(ω) consists of a single delta
function; that is, R(ω) = δ(ω − θ). In all other cases we have
|d† Rc|2 < |r|2 |d† Rc|2 ,
from which we conclude that |r| > 1.
17.6.2
Solving Ra = δ Using Levinson’s Algorithm
Because the matrix R is Toeplitz, that is, constant on diagonals, and positive definite, there is a fast algorithm for solving Ra = δ for a. Instead of
a single R, we let RM be the matrix defined for M = 0, 1, ..., N by
r(0)
r(−1)
...
r(0)
...
 r(1)

 .
=
 .

.
r(M ) r(M − 1) ...

RM
M

r(−M )
r(−M + 1) 





r(0)
so that R = RN . We also let δ be the (M + 1)-dimensional column
vector δ M = (1, 0, ..., 0)T . We want to find the column vector aM =
M
M T
(aM
that satisfies the equation RM aM = δ M . The point
0 , a1 , ..., aM )
of Levinson’s algorithm is to calculate aM +1 quickly from aM .
For fixed M find constants α and β so that


 M −1 
0
a0
−1 
 aM
 aM −1 
 M −1  )

(  1
M
−1 

 . 
 aM −2 


M

δ = RM α  .  + β  . 



 . 
 . 


 M −1 
 . 
aM −1
−1
0
aM
0
198
CHAPTER 17. ENTROPY MAXIMIZATION
 M

1
γ
 0 
 0 


)
( 
 . 
 . 




= α .  + β .  ,




 . 
 . 




0
0
M
γ
1

where
−1
−1
−1
γ M = r(M )aM
+ r(M − 1)aM
+ ... + r(1)aM
0
1
M −1 .
We then have
α + βγ M = 1, αγ M + β = 0
or
β = −αγ M , α − α|γ M |2 = 1,
so
α = 1/(1 − |γ M |2 ), β = −γ M /(1 − |γ M |2 ).
Therefore, the algorithm begins with M = 0, R0 = [r(0)], a00 = r(0)−1 . At
each step calculate the γ M , solve for α and β and form the next aM .
The MEM resolves better than the DFT when the true power spectrum
being reconstructed is a sum of delta functions plus a flat background.
When the background itself is not flat, performance of the MEM degrades
rapidly; the MEM tends to interpret any nonflat background in terms of
additional delta functions. In the next chapter we consider an extension of
the MEM, called the indirect PDFT (IPDFT), that corrects this flaw.
Why Burg’s MEM and the IPDFT are able to resolve closely spaced
sinusoidal components better than the DFT is best answered by studying
the eigenvalues and eigenvectors of the matrix R; we turn to this topic in
a later chapter.
17.7
A Sufficient Condition for Positive-definiteness
If the function
R(ω) =
∞
X
r(n)einω
n=−∞
is nonnegative on the interval [−π, π], then the matrices RM are nonnegativedefinite for every M . Theorems by Herglotz and by Bochner go in the
reverse direction [4]. Katznelson [148] gives the following result.
Theorem 17.1 Let {f (n)}∞
n=−∞ be a sequence of nonnegative real numbers converging to zero, with f (−n) = f (n) for each n. If, for each n > 0,
we have
(f (n − 1) − f (n)) − (f (n) − f (n + 1)) > 0,
17.7. A SUFFICIENT CONDITION FOR POSITIVE-DEFINITENESS199
then there is a nonnegative function R(ω) on the interval [−π, π] with
f (n) = r(n) for each n.
The following figures illustrate the behavior of the MEM. In Figures 17.1,
17.2, and 17.3, the true object has two delta functions at 0.95π and 1.05π.
The data is f (n) for |n| ≤ 10. The DFT cannot resolve the two spikes. The
SNR is high in Figure 17.1, and the MEM easily resolves them. In Figure
17.2 the SNR is much lower and MEM no longer resolves the spikes.
Exercise 17.1 In Figure 17.3 the SNR is much higher than in Figure 17.1.
Explain why the graph looks as it does.
In Figure 17.4 the true object is a box supported between 0.75π and
1.25π. Here N = 10, again. The MEM does a poor job reconstructing the
box. This weakness in MEM will become a problem in the last two figures,
in which the true object consists of the box with the two spikes added. In
Figure 17.5 we have N = 10, while, in Figure 17.6, N = 25.
200
CHAPTER 17. ENTROPY MAXIMIZATION
Figure 17.1: The DFT and MEM, N = 10, high SNR.
17.7. A SUFFICIENT CONDITION FOR POSITIVE-DEFINITENESS201
Figure 17.2: The DFT and MEM, N = 10, low SNR.
202
CHAPTER 17. ENTROPY MAXIMIZATION
Figure 17.3: The DFT and MEM, N = 10, very high SNR. What happened?
17.7. A SUFFICIENT CONDITION FOR POSITIVE-DEFINITENESS203
Figure 17.4: MEM and DFT for a box object; N = 10.
204
CHAPTER 17. ENTROPY MAXIMIZATION
Figure 17.5: The DFT and MEM: two spikes on a large box; N = 10.
17.7. A SUFFICIENT CONDITION FOR POSITIVE-DEFINITENESS205
Figure 17.6: The DFT and MEM: two spikes on a large box; N = 25.
206
CHAPTER 17. ENTROPY MAXIMIZATION
Chapter 18
Eigenvector Methods in
Estimation
18.1
Chapter Summary
Prony’s method showed that information about the signal can sometimes
be obtained from the roots of certain polynomials formed from the data.
Eigenvectors methods are similar, as we shall see.
18.2
Some Eigenvector Methods
Eigenvector methods assume the data are correlation values and involve
polynomials formed from the eigenvectors of the correlation matrix. Schmidt’s
multiple signal classification (MUSIC) algorithm is one such method [196].
A related technique used in direction-of-arrival array processing is the estimation of signal parameters by rotational invariance techniques (ESPRIT)
of Paulraj, Roy, and Kailath [183].
18.3
The Sinusoids-in-Noise Model
We suppose now that the function f (t) being measured is signal plus noise,
with the form
f (t) =
J
X
|Aj |eiθj e−iωj t + n(t) = s(t) + n(t),
j=1
where the phases θj are random variables, independent and uniformly distributed in the interval [0, 2π), and n(t) denotes the random complex stationary noise component. Assume that E(n(t)) = 0 for all t and that
207
208
CHAPTER 18. EIGENVECTOR METHODS IN ESTIMATION
the noise is independent of the signal components. We want to estimate
J, the number of sinusoidal components, their magnitudes |Aj | and their
frequencies ωj .
18.4
Autocorrelation
The autocorrelation function associated with s(t) is
rs (τ ) =
J
X
|Aj |2 e−iωj τ ,
j=1
and the signal power spectrum is the Fourier transform of rs (τ ),
Rs (ω) =
J
X
|Aj |2 δ(ω − ωj ).
j=1
The noise autocorrelation is denoted rn (τ ) and the noise power spectrum
is denoted Rn (ω). For the remainder of this section we shall assume that
the noise is white noise; that is, Rn (ω) is constant and rn (τ ) = 0 for τ 6= 0.
We collect samples of the function f (t) and use them to estimate some
of the values of rs (τ ). From these values of rs (τ ), we estimate Rs (ω),
primarily looking for the locations ωj at which there are delta functions.
We assume that the samples of f (t) have been taken over an interval
of time sufficiently long to take advantage of the independent nature of
the phase angles θj and the noise. This means that when we estimate the
rs (τ ) from products of the form f (t + τ )f (t), the cross terms between one
signal component and another, as well as between a signal component and
the noise, are nearly zero, due to destructive interference coming from the
random phases.
Suppose now that we have the values rf (m) for m = −(M −1), ..., M −1,
where M > J, rf (m) = rs (m) for m 6= 0, and rf (0) = rs (0) + σ 2 , for σ 2
the variance (or power) of the noise. We form the M by M autocorrelation
matrix R with entries Rm,k = rf (m − k).
Exercise 18.1 Show that the matrix R has the following form:
R=
J
X
|Aj |2 ej e†j + σ 2 I,
j=1
where ej is the column vector with entries e−iωj n , for n = 0, 1, ..., M − 1.
18.5. DETERMINING THE FREQUENCIES
209
Let u be an eigenvector of R with kuk = 1 and associated eigenvalue λ.
Then we have
λ = u† Ru =
J
X
|Aj |2 |e†j u|2 + σ 2 ≥ σ 2 .
j=1
Therefore, the smallest eigenvalue of R is σ 2
Because M > J, there must be non-zero M -dimensional vectors v that
are orthogonal to all of the ej ; in fact, we can say that there are M − J
linearly independent such v. For each such vector v we have
Rv =
J
X
|Aj |2 e†j vej + σ 2 v = σ 2 v;
j=1
consequently, v is an eigenvector of R with associated eigenvalue σ 2 .
Let λ1 ≥ λ2 ≥ ... ≥ λM > 0 be the eigenvalues of R and let um be
a norm-one eigenvector associated with λm . It follows from the previous
paragraph that λm = σ 2 , for m = J + 1, ..., M , while λm > σ 2 for m =
1, ..., J. This leads to the MUSIC method for determining the ωj .
18.5
Determining the Frequencies
By calculating the eigenvalues of R and noting how many of them are
greater than the smallest one, we find J. Now we seek the ωj .
For each ω, we let eω have the entries e−iωn , for n = 0, 1, ..., M − 1 and
form the function
M
X
T (ω) =
|e†ω um |2 .
m=J+1
This function T (ω) will have zeros at precisely the values ω = ωj , for
j = 1, ..., J. Once we have determined J and the ωj , we estimate the magnitudes |Aj | using Fourier transform estimation techniques already discussed.
This is basically Schmidt’s MUSIC method.
We have made several assumptions here that may not hold in practice
and we must modify this eigenvector approach somewhat. First, the time
over which we are able to measure the function f (t) may not be long enough
to give good estimates of the rf (τ ). In that case we may work directly with
the samples of f (t). Second, the smallest eigenvalues will not be exactly
equal to σ 2 and some will be larger than others. If the ωj are not well
separated, or if some of the |Aj | are quite small, it may be hard to tell
what the value of J is. Third, we often have measurements of f (t) that
have errors other than those due to background noise; inexpensive sensors
can introduce their own random phases that can complicate the estimation
210
CHAPTER 18. EIGENVECTOR METHODS IN ESTIMATION
process. Finally, the noise may not be white, so that the estimated rf (τ )
will not equal rs (τ ) for τ 6= 0, as before. If we know the noise power
spectrum or have a decent idea what it is, we can perform a pre-whitening
to R, which will then return us to the case considered above, although this
can be a tricky procedure.
18.6
The Case of Non-White Noise
When the noise power spectrum has a component that is not white the
eigenvalues and eigenvectors of R behave somewhat differently from the
white-noise case. The eigenvectors tend to separate into three groups.
Those in the first group correspond to the smallest eigenvalues and are
approximately orthogonal to both the signal components and the nonwhite
noise component. Those in the second group, whose eigenvalues are somewhat larger than those in the previous group, tend to be orthogonal to the
signal components but to have a sizable projection onto the nonwhite-noise
component. Those in the third group, with the largest eigenvalues, have sizable projection onto both the signal and nonwhite noise components. Since
the DFT estimate uses R, as opposed to R−1 , the DFT spectrum is determined largely by the eigenvectors in the third group. The MEM estimator,
which uses R−1 , makes most use of the eigenvectors in the first group, but
in the formation of the denominator. In the presence of a nonwhite-noise
component, the orthogonality of those eigenvectors to both the signals and
the nonwhite noise shows up as peaks throughout the region of interest,
masking or distorting the signal peaks we wish to see.
There is a second problem exacerbated by the nonwhite componentsensitivity of nonlinear and eigenvector methods to phase errors. We have
assumed up to now that the data we have obtained is accurate, but there
isn’t enough of it. In some cases the machinery used to obtain the measured
data may not be of the highest quality; certain applications of SONAR
make use of relatively inexpensive hydrophones that will sink into the ocean
after they have been used briefly. In such cases the complex numbers r(n)
will be distorted. Errors in the measurement of their phases are particularly
damaging. The following figures illustrate these issues.
18.7
Sensitivity
In the following figures the true power spectrum is the box and spikes
object used earlier in our discussion of the MEM and IPDFT. It consists
of two delta functions at ω = 0.95π and 1.05π, along with a box extending
from 0.75π to 1.25π. There is also a small white-noise component that is
flat across [0, 2π], contributing only to the r(0) value. The data, in the
18.7. SENSITIVITY
211
absence of phase errors, is r(n), |n| ≤ N = 25. Three different amounts of
phase perturbation are introduced in the other cases.
Figure 18.1 shows the function T (ω) for the two eigenvectors in the
second group; here, J = 18 and M = 21. The approximate zeros at
0.95π and 1.05π are clearly seen in the error-free case and remain fairly
stable as the phase errors are introduced. Figure 18.2 uses the eigenvectors
in the first group, with J = 0 and M = 18. The approximate nulls at
0.95π and 1.05π are hard to distinguish even in the error-free case and
get progressively worse as phase errors are introduced. Stable nonlinear
methods, such as the IPDFT, rely most on the eigenvectors in the second
group.
Figure 18.1: T (ω) for J = 18, M = 21, varying degrees of phase errors.
212
CHAPTER 18. EIGENVECTOR METHODS IN ESTIMATION
Figure 18.2: T (ω) for J = 0, M = 18, varying degrees of phase errors.
Chapter 19
The IPDFT
19.1
Chapter Summary
Experience with Burg’s MEM shows that it is capable of resolving closely
spaced delta functions better than the DFT, provided that the background
is flat. When the background is not flat, MEM tends to interpret the nonflat background as additional delta functions to be resolved. In this chapter
we consider an extension of MEM based on the PDFT that can resolve in
the presence of non-flat background. This method is called the indirect
PDFT (IPDFT) [56].
19.2
The Need for Prior Information in NonLinear Estimation
As we saw previously, the PDFT is a linear method for incorporating prior
knowledge into the estimation of the Fourier transform. Burg’s MEM is a
nonlinear method for estimating a non-negative Fourier transform.
The IPDFT applies to the reconstruction of one-dimensional power
spectra, but the main idea can be used to generate high-resolution methods
for multi-dimensional spectra as well. The IPDFT method is suggested by
considering the MEM equations Ra = δ as a particular case of the equations that arise in Wiener filter approximation. As in the previous chapter,
we assume that we have the autocorrelation values r(n) for |n| ≤ N , from
which we wish to estimate the power spectrum
R(ω) =
+∞
X
r(n)einω , |ω| ≤ π.
n=−∞
213
214
19.3
CHAPTER 19. THE IPDFT
What Wiener Filtering Suggests
In the appendix on Wiener filter approximation, we show that the best
finite length filter approximation of the Wiener filter is obtained by minimizing the integral in Equation (30.4)
Z
L
X
π
|H(ω) −
−π
fk eikω |2 (Rs (ω) + Ru (ω))dω.
k=−K
The optimal coefficients then must satisfy Equation (30.5):
rs (m) =
L
X
fk (rs (m − k) + ru (m − k)),
(19.1)
k=−K
for −K ≤ m ≤ L.
Consider the case in which the power spectrum we wish to estimate
consists of a signal component that is the sum of delta functions and a noise
component that is white noise. If we construct a finite-length Wiener filter
that filters out the signal component and leaves only the noise, then that
filter should be able to zero out the delta function components. By finding
the locations of those zeros, we can find the supports of the delta functions.
So the approach is to reverse the roles of signal and noise, viewing the
signal as the component called u and the noise as the component called s
in the discussion of the Wiener filter. The autocorrelation function rs (n)
corresponds to the white noise now and so rs (n) = 0 for n 6= 0. The terms
rs (n) + ru (n) are the data values r(n), for |n| ≤ N . Taking K = 0 and
L = N in Equation (19.1), we obtain
N
X
fk r(m − k) = 0,
k=0
for m = 1, 2, ..., N and
N
X
fk r(0 − k) = r(0),
k=0
which is precisely that same system Ra = δ that occurs in MEM.
This approach reveals that the vector a = (a0 , ..., aN )T we find in MEM
can be viewed an a finite-length approximation of the Wiener filter designed
to remove the delta-function component and to leave the remaining flat
white-noise component untouched. The polynomial
A(ω) =
N
X
n=0
an einω
19.4. USING A PRIOR ESTIMATE
215
will then have zeros near the supports of the delta functions. What happens
to MEM when the background is not flat is that the filter tries to eliminate
any component that is not white noise and so places the zeros of A(ω) in
the wrong places.
19.4
Using a Prior Estimate
Suppose we take P (ω) ≥ 0 to be our estimate of the background component
of R(ω); that is, we believe that R(ω) equals a multiple of P (ω) plus a sum
of delta functions. We now ask for the finite length approximation of the
Wiener filter that removes the delta functions and leaves any background
component that looks like P (ω) untouched. We then take rs (n) = p(n),
where
+∞
X
P (ω) =
p(n)einω , |ω| ≤ π.
n=−∞
The desired filter is f = (f0 , ..., fN )T satisfying the equations
p(m) =
N
X
fk r(m − k).
(19.2)
k=0
Once we have found f we form the polynomial
F (ω) =
N
X
fk eikω , |ω| ≤ π.
k=0
The zeros of F (ω) should then be near the supports of the delta function components of the power spectrum R(ω), provided that our original
estimate of the background is not too inaccurate.
In the PDFT it is important to select the prior estimate P (ω) nonzero
wherever the function being reconstructed is nonzero; for the IPDFT the
situation is different. Comparing Equation (19.2) with Equation (28.5), we
see that in the IPDFT the true R(ω) is playing the role previously given to
P (ω), while P (ω) is in the role previously played by the function we wished
to estimate, which, in the IPDFT, is R(ω). It is important, therefore, that
R(ω) not be zero where P (ω) 6= 0; that is, we should choose the P (ω) = 0
wherever R(ω) = 0. Of course, we usually do not know the support of R(ω)
a priori. The point is simply that it is better to make P (ω) = 0 than to
make it nonzero, if we have any doubt as to the value of R(ω).
19.5
Properties of the IPDFT
In our discussion of the MEM, we obtained an estimate for the function
R(ω), not simply a way of locating the delta-function components. As
216
CHAPTER 19. THE IPDFT
we shall show, the IPDFT can also be used to estimate R(ω). Although
the resulting estimate is not guaranteed to be either nonnegative nor data
consistent, it usually is both of these.
For any function G(ω) on [−π, π] with Fourier series
∞
X
G(ω) =
g(n)einω ,
n=−∞
the additive causal part of the function G(ω) is
G+ (ω) =
∞
X
g(n)einω .
n=0
Any function such as G+ that has Fourier coefficients that are zero for
negative indices is called a causal function. The Equation (19.2) then says
that the two causal functions P+ and (F R)+ have Fourier coefficients that
agree for m = 0, 1, ..., N .
Because F (ω) is a finite causal trigonometric polynomial, we can write
(F R)+ (ω) = R+ (ω)F (ω) + J(ω),
where
J(ω) =
N
−1 NX
−m
X
[
r(−k)f (m + k)]eimω .
m=0 k=1
Treating P+ as approximately equal to (F R)+ = R+ F + J, we obtain as
an estimate of R+ the function Q = (P+ − J)/F . In order for this estimate
of R+ to be causal, it is sufficient that the function 1/F be causal. This
means that the trigonometric polynomial F (ω) must be minimum phase;
that is, all its roots lie outside the unit circle. In the chapter on MEM, we
saw that this is always the case for MEM. It is not always the case for the
IPDFT, but it is usually the case in practice; in fact, it was difficult (but
possible) to construct a counterexample. We then construct our IPDFT
estimate of R(ω), which is
RIP DF T (ω) = 2Re(Q(ω)) − r(0).
The IPDFT estimate is real-valued and, when 1/F is causal, guaranteed
to be data consistent. Although this estimate is not guaranteed to be
nonnegative, it usually is.
We showed in the chapter on entropy maximization that the vector a
that solves Ra = δ corresponds to a polynomial A(z) having all its roots on
or outside the unit circle; that is, it is minimum phase. The IPDFT involves
the solution of the system Rf = p, where p = (p(0), ..., p(N ))T is the
vector of initial Fourier coefficients of another power spectrum, P (ω) ≥ 0
19.6. ILLUSTRATIONS
217
on [−π, π]. When P (ω) is constant, we get p = δ. For the IPDFT to be
data-consistent, it is sufficient that the polynomial F (z) = f0 +...+fN z N be
minimum phase. Although this need not be the case, it is usually observed
in practice.
Exercise 19.1 Find conditions on the power spectra R(ω) and P (ω) that
cause F (z) to be minimum phase.
Warning: This is probably not an easy exercise.
19.6
Illustrations
The following figures illustrate the IPDFT. The prior function in each case
is the box object supported on the central fourth of the interval [0, 2π]. The
value r(0) has been increased slightly to regularize the matrix inversion.
Figure 19.1 shows the behavior of the IPDFT when the object is only the
box. Contrast this with the behavior of MEM in this case, as seen in Figure
17.4. Figures 19.2 and 19.3 show the abilty of the IPDFT to resolve the two
spikes at 0.95π and 1.05π against the box background. Again, contrast this
with the MEM reconstructions in Figures 17.5 and 17.6. To show that the
IPDFT is actually indicating the presence of the spikes and not just rolling
across the top of the box, we reconstruct two unequal spikes in Figure 19.4.
Figure 19.5 shows how the IPDFT behaves when we increase the number
of data points; now, N = 25 and the SNR is very low.
218
CHAPTER 19. THE IPDFT
Figure 19.1: The DFT and IPDFT: box only, N = 1.
19.6. ILLUSTRATIONS
219
Figure 19.2: The DFT and IPDFT, box and two spikes, N = 10, high SNR.
220
CHAPTER 19. THE IPDFT
Figure 19.3: The DFT and IPDFT, box and two spikes, N = 10, moderate
SNR.
19.6. ILLUSTRATIONS
221
Figure 19.4: The DFT and IPDFT, box and unequal spikes, N = 10, high
SNR.
222
CHAPTER 19. THE IPDFT
Figure 19.5: The DFT and IPDFT, box and unequal spikes, N = 25, very
low SNR.
Part VI
Wavelets
223
Chapter 20
Analysis and Synthesis
20.1
Chapter Summary
Analysis and synthesis in signal processing refers to the effort to study
complicated functions in terms of simpler ones. The basic building blocks
are orthogonal bases and frames.
20.2
The Basic Idea
An important theme that runs through most of mathematics, from the
geometry of the early Greeks to modern signal processing, is analysis and
synthesis, or, less formally, breaking up and putting back together. The
Greeks estimated the area of a circle by breaking it up into sectors that
approximated triangles. The Riemann approach to integration involves
breaking up the area under a curve into pieces that approximate rectangles
or other simple shapes. Viewed differently, the Riemann approach is first
to approximate the function to be integrated by a step function and then
to integrate the step function.
Along with geometry, Euclid includes a good deal of number theory,
in which we find analysis and synthesis. His theorem that every positive
integer is divisible by a prime is analysis; division does the breaking up
and the simple pieces are the primes. The fundamental theorem of arithmetic, which asserts that every positive integer can be written in an essentially unique way as the product of powers of primes, is synthesis, with the
putting back together done by multiplication.
225
226
20.3
CHAPTER 20. ANALYSIS AND SYNTHESIS
Polynomial Approximation
The individual power functions, xn , are not particularly interesting by
themselves, but when finitely many of them are scaled and added to form a
polynomial, interesting functions can result, as the famous approximation
theorem of Weierstrass confirms [150]:
Theorem 20.1 If f : [a, b] → R is continuous and > 0 is given, we can
find a polynomial P such that |f (x) − P (x)| ≤ for every x in [a, b].
The idea of building complicated functions from powers is carried a
step further with the use of infinite series, such as Taylor series. The sine
function, for example, can be represented for all real x by the infinite power
series
1
1
1
sin x = x − x3 + x5 − x7 + ....
3!
5!
7!
The most interesting thing to note about this is that the sine function has
properties that none of the individual power functions possess; for example, it is bounded and periodic. So we see that an infinite sum of simple
functions can be qualitatively different from the components in the sum. If
we take the sum of only finitely many terms in the Taylor series for the sine
function we get a polynomial, which cannot provide a good approximation
of the sine function for all x; that is, the finite sum does not approximate
the sine function uniformly over the real line. The approximation is better
for x near zero and poorer as we move away from zero. However, for any
selected x and for any > 0, there is a positive integer N , depending on
the x and on the , with the sum of the first n terms of the series within of
sin x for n ≥ N ; that is, the series converges pointwise to sin x for each real
x. In Fourier analysis the trigonometric functions themselves are viewed
as the simple functions, and we try to build more complicated functions as
(possibly infinite) sums of trig functions. In wavelet analysis we have more
freedom to design the simple functions to fit the problem at hand.
20.4
Signal Analysis
When we speak of signal analysis, we often mean that we believe the signal
to be a superposition of simpler signals of a known type and we wish to
know which of these simpler signals are involved and to what extent. For
example, received sonar or radar data may be the superposition of individual components corresponding to spatially localized targets of interest. As
we shall see in our discussion of the ambiguity function and of wavelets,
we want to tailor the family of simpler signals to fit the physical problem
being considered.
20.5. PRACTICAL CONSIDERATIONS IN SIGNAL ANALYSIS
227
Sometimes it is not the individual components that are significant by
themselves, but groupings of these components. For example, if our received signal is believed to consist of a lower frequency signal of interest
plus a noise component employing both low and high frequencies, we can remove some of the noise by performing a low-pass filtering. This amounts to
analyzing the received signal to determine what its low-pass and high-pass
components are. We formulate this operation mathematically using the
Fourier transform, which decomposes the received signal f (t) into complex
exponential function components corresponding to different frequencies.
More generally, we may analyze a signal f (t) by calculating certain inner products hf, gn i , n = 1, ..., N . We may wish to encode the signal using
these N numbers, or to make a decision about the signal, such as recognizing a voice. If the signal is a two-dimensional image, say a fingerprint,
we may want to construct a data-base of these N -dimensional vectors, for
identification. In such a case we are not necessarily claiming that the signal
f (t) is a superposition of the gn (t) in any sense, nor do we necessarily expect to reconstruct f (t) at some later date from the stored inner products.
For example, one might identify a piece of music using only the upward or
downward progression of the first few notes.
There are many cases, on the other hand, in which we do wish to reconstruct the signal f (t) from measurements or stored compressed versions.
In such cases we need to consider this when we design the measuring or
compression procedures. For example, we may have values of the signal or
its Fourier transform at some finite number of points and want to recapture
f (t) itself. Even in those cases mentioned previously in which reconstruction is not desired, such as the fingerprint case, we do wish to be reasonably
sure that similar vectors of inner products correspond to similar signals and
distinct vectors of inner products correspond to distinct signals, within the
obvious limitations imposed by the finiteness of the stored inner products.
The twin processes of analysis and synthesis are dealt with mathematically
using the notions of frames and bases.
20.5
Practical Considerations in Signal Analysis
Perhaps the most basic problem in signal analysis is determining which
sinusoidal components make up a given signal. Let the analog signal f (t)
be given for all real t by
f (t) =
J
X
j=1
Aj eiωj t ,
(20.1)
228
CHAPTER 20. ANALYSIS AND SYNTHESIS
where the Aj are complex amplitudes and the ωj are real numbers. If we
view the variable t as time, then the ωj are frequencies. In theory, we can
determine J, the ωj , and the Aj simply by calculating the Fourier transform
F (ω) of f (t). The function F (ω) will have Dirac delta components at ω =
ωj for each j, and will be zero elsewhere. Obviously, this is not a practical
solution to the problem. The first step in developing a practical approach is
to pass from analog signals, which are functions of the continuous variable
t, to digital signals or sequences, which are functions of the integers.
In theoretical discussions of digital signal processing, analog signals
are converted to discrete signals or sequences by sampling. We begin by
choosing a positive sampling spacing ∆ > 0 and define the nth entry of the
sequence x = {x(n)} by
x(n) = f (n∆),
for all integers n.
Notice that, since
(20.2)
2π
eiωj n∆ = ei(ωj + ∆ )n∆
for all n, we cannot distinguish frequency ωj from ωj + 2π
∆ . We try to select
π π
∆ small enough so that each of the ωj we seek lies in the interval (− ∆
, ∆ ).
If we fail to make ∆ small enough we under-sample, with the result that
some of the ωj will be mistaken for lower frequencies; this is aliasing. Our
goal now is to process the sequence x to determine J, the ωj , and the Aj .
We do this with matched filtering.
Every linear shift-invariant system operates through convolution; associated with the system is a sequence h, such that, when x is the input
sequence, the output sequence is y, with
∞
X
y(n) =
h(k)x(n − k),
(20.3)
k=−∞
for each integer n. In theoretical matched filtering we design a whole family
π π
of such systems or filters, one for each frequency ω in the interval (− ∆
, ∆ ).
We then use our sequence x as input to each of these filters and use the
outputs of each to solve our signal-analysis problem.
π π
, ∆ ) and each positive integer K, we
For each ω in the interval (− ∆
consider the shift-invariant linear filter with h = eK,ω , where
eω (k) =
1
eiωk∆ ,
2K + 1
(20.4)
for |k| ≤ K and eK,ω (k) = 0 otherwise. Using x as input to this system,
we find that the output value y(0) is
y(0) =
J
X
j=1
Aj [
K
X
1
ei(ω−ωj )k∆ ].
2K + 1
k=−K
(20.5)
20.5. PRACTICAL CONSIDERATIONS IN SIGNAL ANALYSIS
229
Recall the following identity for the Dirichlet kernel:
K
X
eikω =
k=−K
sin((K + 12 )ω)
,
sin( ω2 )
(20.6)
for sin( ω2 ) 6= 0. As K → +∞, the inner sum in equation (20.5) goes to zero
for every ω except ω = ωj . Therefore the limit, as K → +∞, of y(0) is
zero, if ω is not equal to any of the ωj , and equals Aj , if ω = ωj . Therefore,
in theory, at least, we can successfully decompose the digital signal into its
constituent parts and distinguish one frequency component from another,
no matter how close together the two frequencies may be.
It is important to note that, to achieve the perfect analysis described
above, we require noise-free values x(n) and we need to take K to infinity;
in practice, of course, neither of these conditions is realistic. We consider
next the practical matter of having only finitely many values of x(n); we
leave the noisy case for another chapter.
20.5.1
The Finite Data Problem
In reality we have only finitely many values of x(n), say for n = −N, ..., N .
In matched filtering we can only take K ≤ N . For the choice of K = N ,
we get
y(0) =
J
X
j=1
Aj [
N
X
1
ei(ω−ωj )k∆ ],
2N + 1
(20.7)
k=−N
for each fixed ω different from the ωj , and y(0) = Aj for ω = ωj . We can
then write
y(0) =
J
X
j=1
Aj [
sin((ω − ωj )(N + 21 )∆)
1
],
2N + 1
sin((ω − ωj )( ∆
2 ))
(20.8)
for ω not equal to ωj . The problem we face for finite data is that the y(0)
is not necessarily zero when ω is not one of the ωj .
In our earlier discussion of signal analysis it was shown that, if we are
willing to make a simplifying assumption, we can continue as in the infinitedata case. The simplifying assumption is that the ωj we seek are J of the
π π
, ∆ ), beginning with
2N + 1 frequencies equally spaced in the interval (− ∆
2π
π
π
α1 = − ∆ + (2N +1)∆ and ending with α2N +1 = ∆ . Therefore,
αm = −
for m = 1, ..., 2N + 1.
π
2πm
+
,
∆ (2N + 1)∆
230
CHAPTER 20. ANALYSIS AND SYNTHESIS
Having made this simplifying assumption, we then design the matched
filters corresponding to the frequencies αn , for n = 1, ..., 2N + 1. Because
N
X
ei(αm −αn )k∆ =
k=−N
=
N
X
m−n
e2πi 2N +1 k
k=−N
m−n
1
sin(2π 2N
+1 (N + 2 ))
m−n
sin(π 2N
+1 )
,
(20.9)
it follows that
N
X
ei(αm −αn )k∆ = 0,
k=−N
for m 6= n and it is equal to 2N + 1 when m = n. We conclude that,
provided the frequencies we seek are among the αm , we can determine J
and the ωj . Once we have these pieces of information, we find the Aj
simply by solving a system of linear equations.
20.6
Frames
Although in practice we deal with finitely many measurements or inner
product values, it is convenient, in theoretical discussions, to imagine that
the signal f (t) has been associated with an infinite sequence of inner prod2
ucts
R ∞ {hf, g2n i , n = 1, 2, ...}. It is also convenient to assume that ||f || =
|f (t)| dt < +∞; that is, we assume that f is in the Hilbert space
−∞
H = L2 . The sequence {gn |n = 1, 2, ...} in any Hilbert space H is called a
frame for H if there are positive constants A ≤ B such that, for all f in H,
A||f ||2 ≤
∞
X
|hf, gn i|2 ≤ B||f ||2 .
(20.10)
n=1
The inequalities in (20.10) define the frame property. A frame is said to be
tight if A = B.
To motivate this definition, suppose that f = g − h. If g and h are
nearly equal, then f is near zero, so that ||f ||2 is near zero. Consequently,
the numbers |hf, gn i|2 are all small, meaning that hg, gn i is nearly equal to
hh, gn i for each n. Conversely, if hg, gn i is nearly equal to hh, gn i for each
n, then the numbers |hf, gn i|2 are all small. Therefore, ||f ||2 is small, from
which we conclude that g is close to h. The analysis operator is the one
that takes us from f to the sequence {hf, gn i}, while the synthesis operator
takes us from the sequence {hf, gn i} to f . This discussion of frames and
related notions is based on the treatment in Christensen’s book [77].
20.7. BASES, RIESZ BASES AND ORTHONORMAL BASES
231
In the case of finite dimensional space, any finite set {gn , n = 1, ..., N }
is a frame for the space H of all f that are linear combinations of the gn .
Exercise 20.1 An interesting example of a frame√in H = R2 is the socalled
Mercedes frame: let g1 = (0, 1), g2 = (− 3/2, −1/2) and g3 =
√
( 3/2, −1/2). Show that for this frame A = B = 3/2, so the Mercedes
frame is tight.
The frame property in (20.10) provides a necessary condition for stable
application of the decomposition and reconstruction operators. But it does
more than that; it actually provides a reconstruction algorithm. The frame
operator S is given by
∞
X
Sf =
hf, gn i gn .
n=1
The frame property implies that the frame operator is invertible. The dual
frame is the sequence {S −1 gn , n = 1, 2, ...}.
Exercise 20.2 Use the definitions of the frame operator S and the dual
frame to obtain the following reconstruction formulas:
f=
∞
X
hf, gn i S −1 gn ;
n=1
and
f=
∞
X
hf, S −1 gn i gn .
n=1
If the frame is tight, then the dual frame is { A1 gn , n = 1, 2, ...}; if the frame
is not tight, inversion of the frame operator is done only approximately.
20.7
Bases, Riesz Bases and Orthonormal Bases
The sequence {gn , n = 1, 2, ...} in H is a basis for H if, for every f in H,
there is a unique sequence {cn , n = 1, 2, ...} with
f=
∞
X
cn gn .
n=1
A basis is called a Riesz basis if it is also a frame for H. It can be shown
that a frame is a Riesz basis if the removal of any one element causes the
232
CHAPTER 20. ANALYSIS AND SYNTHESIS
loss of the frame property; since the second inequality in Inequality (20.10)
is not lost, it follows that it is the first inequality that can now be violated
for some f . A basis is an orthonormal basis for H if ||gn || = 1 for all n and
hgn , gm i = 0 for distinct m and n.
We know that the complex exponentials
1
{en (t) = √ eint , −∞ < n < ∞}
2π
form an orthonormal basis for the
space L2 (−π, π) consisting of all
R π Hilbert
2
f supported on (−π, π) with −π |f (t)| dt < +∞. Every such f can be
written as
+∞
X
1
f (t) = √
an eint ,
2π n=−∞
for
1
an = hf, en i = √
2π
Z
π
f (t)e−int dt.
−π
Consequently, this is true for every f in L2 (−π/2, π/2), although the set of
functions {gn } formed by restricting the {en } to the interval (−π/2, π/2) is
no longer a basis for H = L2 (−π/2, π/2). It is still a tight frame with A =
√1,
but is no longer normalized, since the norm of gn in L2 (−π/2, π/2) is 1/ 2.
An orthonormal basis can be characterized as any sequence with
√ ||gn || = 1
for all n that is a tight frame with A = 1. The sequence { 2g2k , k =
2
−∞,
√ ..., ∞} is an orthonormal basis for L (−π/2, π/2), as is the sequence
{ 2g2k+1 , k = −∞, ..., ∞}. The sequence {hf, gn i , n = −∞, ..., ∞} is redundant; the half corresponding either to the odd n or to the even n suffices
to recover f . Because of this redundancy we can tolerate more inaccuracy
in measuring these values; indeed, this is one of the main attractions of
frames in signal processing.
Chapter 21
Ambiguity Functions
21.1
Chapter Summary
We turn now to signal-processing problems arising in radar. Not only does
radar provide an important illustration of the application of the theory
of Fourier transforms and matched filters, but it also serves to motivate
several of the mathematical concepts we shall encounter in our discussion
of wavelets. The connection between radar signal processing and wavelets
is discussed in some detail in Kaiser’s book [145].
21.2
Radar Problems
In radar a real-valued function ψ(t) representing a time-varying voltage is
converted by an antenna in transmission mode into a propagating electromagnetic wave. When this wave encounters a reflecting target an echo is
produced. The antenna, now in receiving mode, picks up the echo f (t),
which is related to the original signal by
f (t) = Aψ(t − d(t)),
where d(t) is the time required for the original signal to make the round trip
from the antenna to the target and return back at time t. The amplitude A
incorporates the reflectivity of the target as well as attenuation suffered by
the signal. As we shall see shortly, the delay d(t) depends on the distance
from the antenna to the target and, if the target is moving, on its radial
velocity. The main signal-processing problem here is to determine target
range and radial velocity from knowledge of f (t) and ψ(t).
If the target is stationary, at a distance r0 from the antenna, then
d(t) = 2r0 /c, where c is the speed of light. In this case the original signal
233
234
CHAPTER 21. AMBIGUITY FUNCTIONS
and the received echo are related simply by
f (t) = Aψ(t − b),
for b = 2r0 /c. When the target is moving so that its distance to the
antenna, r(t), is time-dependent, the relationship between f and ψ is more
complicated.
Exercise 21.1 Suppose the target is at a distance r0 > 0 from the antenna
at time t = 0, and has radial velocity v, with v > 0 indicating away from
the antenna. Show that the delay function d(t) is now
d(t) = 2
r0 + vt
c+v
and f (t) is related to ψ(t) according to
f (t) = Aψ(
for
t−b
),
a
a=
c+v
c−v
b=
2r0
.
c−v
and
(21.1)
1/2
Show also that if we select A = ( c−v
then energy is preserved; that is,
c+v )
||f || = ||ψ||.
Exercise 21.2 Let Ψ(ω) be the Fourier transform of the signal ψ(t). Show
that the Fourier transform of the echo f (t) in Equation (21.1) is then
F (ω) = Aaeibω Ψ(aω).
(21.2)
The basic problem is to determine a and b, and therefore the range and
radial velocity of the target, from knowledge of f (t) and ψ(t). An obvious
approach is to do a matched filter.
21.3
The Wideband Cross-Ambiguity Function
Note that the received echo f (t) is related to the original signal by the
operations of rescaling and shifting. We therefore match the received echo
21.3. THE WIDEBAND CROSS-AMBIGUITY FUNCTION
235
with all the shifted and rescaled versions of the original signal. For each
a > 0 and real b, let
t−b
ψa,b (t) = ψ(
).
a
The wideband cross-ambiguity function (WCAF) is
Z ∞
1
f (t)ψa,b (t)dt.
(21.3)
(Wψ f )(b, a) = √
a −∞
In the ideal case the values of a and b for which the WCAF takes on its
largest absolute value should be the true values of a and b.
More generally, there will be many individual targets or sources of echos,
each having their own values of a, b, and A. The resulting received echo
function f (t) is a superposition of the individual functions ψa,b (t), which,
for technical reasons, we write as
Z ∞Z ∞
dadb
(21.4)
f (t) =
D(b, a)ψa,b (t) 2 .
a
−∞ 0
We then have the inverse problem of determining D(b, a) from f (t).
Equation (21.4) provides a representation of the echo f (t) as a superposition of rescaled translates of a single function, namely the original signal ψ(t). We shall encounter this representation again in our discussion of
wavelets, where the signal ψ(t) is called the mother wavelet and the WCAF
is called the integral wavelet transform. One reason for discussing radar and
ambiguity functions now is to motivate some of the wavelet theory. Our
discussion here follows closely the treatment in [145], where Kaiser emphasizes the important connections between wavelets and radar ambiguity
functions.
As we shall see in the chapter on wavelets, we can recover the signal
f (t) from the WCAF using the following inversion formula: at points t
where f (t) is continuous we have
Z ∞Z ∞
1
t − b dadb
f (t) =
) 2 ,
(Wψ f )(b, a)ψ(
Cψ −∞ −∞
a
a
with
Z
∞
Cψ =
−∞
|Ψ(ω)|2
dω
|ω|
for Ψ(ω) the Fourier transform of ψ(t). The obvious conjecture is then that
the distribution functon D(b, a) is
D(b, a) =
1
(Wψ f )(b, a).
Cψ
However, this is not generally the case. Indeed, there is no particular
reason why the physically meaningful function D(b, a) must have the form
236
CHAPTER 21. AMBIGUITY FUNCTIONS
(Wψ g)(b, a) for some function g. So the inverse problem of estimating
D(b, a) from f (t) is more complicated. One approach mentioned in [145]
involves transmitting more than one signal ψ(t) and estimating D(b, a)
from the echos corresponding to each of the several different transmitted
signals.
21.4
The Narrowband Cross-Ambiguity Function
The real signal ψ(t) with Fourier transform Ψ(ω) is said to be a narrowband
signal if there are constants α and γ such that the conjugate-symmetric
function Ψ(ω) is concentrated on α ≤ |ω| ≤ γ and γ−α
γ+α is nearly equal to
zero, which means that α is very much greater than β = γ−α
2 . The center
.
frequency is ωc = γ+α
2
Exercise 21.3 Let φ = 2ωc v/c. Show that aωc is approximately equal to
ωc + φ.
It follows then that, for ω > 0, F (ω), the Fourier transform of the echo
f (t), is approximately Aaeibω Ψ(ω + φ). Because the Doppler shift affects
positive and negative frequencies differently, it is convenient to construct a
related signal having only positive frequency components.
Let G(ω) = 2F (ω) for ω > 0 and G(ω) = 0 otherwise. Let g(t) be
the inverse Fourier transform of G(ω). Then, the complex-valued function
g(t) is called the analytic signal associated with f (t). The function f (t) is
the real part of g(t); the imaginary part of g(t) is the Hilbert transform of
f (t). Then, the demodulated analytic signal associated with f (t) is h(t) with
Fourier transform H(ω) = G(ω+ωc ). Similarly, let γ(t) be the demodulated
analytic signal associated with ψ(t).
Exercise 21.4 Show that the demodulated analytic signals h(t) and γ(t)
are related by
h(t) = Beiφt γ(t − b) = Bγφ,b (t),
for B a time-independent constant.
Hint: Use the fact that Ψ(ω) = 0 for 0 ≤ ω < α and φ < α.
To determine the range and radial velocity in the narrowband case
we again use the matched filter, forming the narrowband cross-ambiguity
function (NCAF)
Z ∞
Nh (φ, b) = hh, γφ,b i =
h(t)e−iφt γ(t − b)dt.
(21.5)
−∞
21.5. RANGE ESTIMATION
237
Ideally, the values of φ and b corresponding to the largest absolute value
of Nh (φ, b) will be the true ones, from which the range and radial velocity
can be determined. For each fixed value of b, the NCAF is the Fourier
transform of the function h(t)γ(t − b), evaluated at ω = −φ; so the NCAF
contains complete information about the function h(t). In the chapter on
wavelets we shall consider the NCAF in a different light, with γ playing the
role of a window function and the NCAF the short-time Fourier transform
of h(t), describing the frequency content of h(t) near the time b.
In the more general case in which the narrowband echo function f (t) is
a superposition of narrowband reflections,
Z ∞Z ∞
dadb
D(b, a)ψa,b (t) 2 ,
f (t) =
a
−∞ 0
we have
Z
∞
Z
h(t) =
−∞
∞
DN B (b, φ)eiφt γ(t − b)dφdb,
0
where DN B (b, φ) is the narrowband distribution of reflecting target points,
as a function of b and φ = 2ωc v/c. The inverse problem now is to estimate
this distribution, given h(t).
21.5
Range Estimation
If the transmitted signal is ψ(t) = eiωt and the target is stationary at
range r, then the echo received is f (t) = Aeiω(t−b) , where b = 2r/c. So
our information about r is that we know the value e2iωr/c . Because of
the periodicity of the complex exponential function, this is not enough
information to determine r; we need e2iωr/c for a variety of values of ω. To
obtain these values we can transmit a signal whose frequency changes with
time, such as a chirp of the form
2
ψ(t) = eiωt
with the frequency 2ωt at time t.
238
CHAPTER 21. AMBIGUITY FUNCTIONS
Chapter 22
Time-Frequency Analysis
22.1
Chapter Summary
There are applications in which the frequency composition of the signal of
interest will change over time. A good analogy is a piece of music, where
notes at certain frequencies are heard for a while and then are replaced by
notes at other frequencies. We do not usually care what the overall contribution of, say, middle C is to the song, but do want to know which notes are
to be sounded when and for how long. Analyzing such non-stationary signals requires tools other than the Fourier transform: the short-time Fourier
transform is one such tool; wavelet expansion is another.
22.2
Non-stationary Signals
The inverse Fourier transform formula
Z ∞
1
f (t) =
F (ω)e−iωt dω
2π −∞
provides a representation of the function of time f (t) as a superposition of
sinusoids e−iωt with frequencies ω. The value at ω of the Fourier transform
Z
∞
f (t)eiωt dt
F (ω) =
−∞
is the complex amplitude associated with the sinusoidal component e−iωt .
It quantifies the contribution to f (t) made by that sinusoid, over all of t.
To determine each individual number F (ω) we need f (t) for all t. It is
implicit that the frequency content has not changed over time.
239
240
22.3
CHAPTER 22. TIME-FREQUENCY ANALYSIS
The Short-Time Fourier Transform
To estimate the frequency content of the signal f (t) around the time t = b,
we could proceed as follows. Multiply f (t) by the function that is equal to
1
2 on the interval [b − , b + ] and zero otherwise. Then take the Fourier
transform. The multiplication step is called windowing.
To see how well this works, consider the case in which f (t) = exp(−iω0 t)
for all t. The Fourier transform of the windowed signal is then
exp(i(ω − ω0 )b)
sin((ω − ω0 ))
.
(ω − ω0 )
This function attains its maximum value of one at ω = ω0 . But, the first
zeros of the function are at |ω − ω0 | = π , which says that as gets smaller
the windowed Fourier transform spreads out more and more around ω =
ω0 ; that is, better time localization comes at the price of worse frequency
localization. To achieve a somewhat better result we can change the window
function.
The standard normal (or Gaussian) curve is
1
1
g(t) = √ exp(− t2 ),
2
2π
which has its peak at t = 0 and falls off to zero symmetrically on either
side. For σ > 0, let
1
gσ (t) = g(t/σ).
σ
Then the function gσ (t − b) is centered at t = b and falls off on either side,
more slowly for large σ, faster for smaller σ. Also we have
Z ∞
gσ (t − b)dt = 1
−∞
for each b and σ > 0. Such functions were used by Gabor [115] for windowing signals and are called Gabor windows.
Gabor’s idea was to multiply f (t), the signal of interest, by the window
gσ (t − b) and then to take the Fourier transform, obtaining the short-time
Fourier transform (STFT)
Z ∞
σ
Gb (ω) =
f (t)gσ (t − b)eiωt dt.
−∞
Since gσ (t − b) falls off to zero on either side of t = b, multiplying by
this window essentially restricts the signal to a neighborhood of t = b.
The STFT then measures the frequency content of the signal, near the
time t = b. The STFT therefore performs a time-frequency analysis of the
signal.
22.4. THE WIGNER-VILLE DISTRIBUTION
241
We focus more tightly around the time t = b by choosing a small value
for σ. Because of the uncertainty principle, the Fourier transform of the
window gσ (t − b) grows wider as σ gets smaller; the time-frequency window
remains constant [78]. This causes the STFT to involve greater blurring
in the frequency domain. In short, to get good resolution in frequency, we
need to observe for a longer time; if we focus on a small time interval, we
pay the price of reduced frequency resolution. This is unfortunate because
when we focus on a short interval of time, it is to uncover a part of the signal
that is changing within that short interval, which means it must have high
frequency components within that interval. There is no reason to believe
that the spacing is larger between those high frequencies we wish to resolve
than between lower frequencies associated with longer time intervals. We
would like to have the same resolving capability when focusing on a short
time interval that we have when focusing on a longer one.
22.4
The Wigner-Ville Distribution
In [171] Meyer describes Ville’s approach to determining the instantaneous
power spectrum of the signal, that is, the energy in the signal f (t) that
corresponds to time t and frequency ω. The goal is to find a function
Wf (t, ω) having the properties
Z
Wf (t, ω)dω/2π = |f (t)|2 ,
which is the total energy in the signal at time t, and
Z
Wf (t, ω)dt = |F (ω)|2 ,
which is the total energy in the Fourier transform at frequency ω. Because these two properties do not specify a unique Wf (t, ω), two additional
properties are usually required:
Z
Z Z
Wf (t, ω)Wg (t, ω)dtdω/2π = | f (t)g(t)dt|2
and, for f (t) = gσ (t − b) exp(iαt),
Wf (t, ω) = 2 exp(−σ −2 (t − b)2 ) exp(−σ 2 (ω − α)2 ).
The Wigner-Ville distribution of f (t), given by
Z ∞
τ
τ
W Vf (t, ω) =
f (t + )f (t − ) exp(−iωτ )dτ,
2
2
−∞
242
CHAPTER 22. TIME-FREQUENCY ANALYSIS
has all four of the desired properties. The Wigner-Ville distribution is
always real-valued, but its values need not be nonnegative.
In [95] De Bruijn defines the score of a signal f (t) to be H(x, y; f, f ),
where
Z ∞
f1 (x + t)f2 (x − t)e−4πiyt dt.
H(x, y; f1 , f2 ) = 2
−∞
Exercise 22.1 Relate the narrowband cross-ambiguity function to the De
Bruijn’s score and the Wigner-Ville distribution.
Chapter 23
Wavelets
23.1
Chapter Summary
In this chapter we present a short overview of wavelet signal processing.
23.2
Background
The fantastic increase in computer power over the last few decades has
made possible, even routine, the use of digital procedures for solving problems that were believed earlier to be intractable, such as the modeling of
large-scale systems. At the same time, it has created new applications
unimagined previously, such as medical imaging. In some cases the mathematical formulation of the problem is known and progress has come with
the introduction of efficient computational algorithms, as with the Fast
Fourier Transform. In other cases, the mathematics is developed, or perhaps rediscovered, as needed by the people involved in the applications.
Only later it is realized that the theory already existed, as with the development of computerized tomography without Radon’s earlier work on
reconstruction of functions from their line integrals.
It can happen that applications give a theoretical field of mathematics
a rebirth; such seems to be the case with wavelets [138]. Sometime in the
1980s researchers working on various problems in electrical engineering,
quantum mechanics, image processing, and other areas became aware that
what the others were doing was related to their own work. As connections became established, similarities with the earlier mathematical theory
of approximation in functional analysis were noticed. Meetings began to
take place, and a common language began to emerge around this reborn
area, now called wavelets. One of the most significant meetings took place
in June of 1990, at the University of Massachusetts Lowell. The keynote
243
244
CHAPTER 23. WAVELETS
speaker was Ingrid Daubechies; the lectures she gave that week were subsequently published in the book [94].
There are a number of good books on wavelets, such as [145], [18], and
[222]. A recent issue of the IEEE Signal Processing Magazine has an interesting article on using wavelet analysis of paintings for artist identification
[143].
Fourier analysis and synthesis concerns the decomposition, filtering,
compressing, and reconstruction of signals using complex exponential functions as the building blocks; wavelet theory provides a framework in which
other building blocks, better suited to the problem at hand, can be used.
As always, efficient algorithms provide the bridge between theory and practice.
Since their development in the 1980s wavelets have been used for many
purposes. In the discussion to follow, we focus on the problem of analyzing a
signal whose frequency composition is changing over time. As we saw in our
discussion of the narrowband cross-ambiguity function in radar, the need
for such time-frequency analysis has been known for quite a while. Other
methods, such as Gabor’s short time Fourier transform and the WignerVille distribution, have also been considered for this purpose.
23.3
A Simple Example
Imagine that f (t) is defined for all real t and we have sampled f (t) every
half-second. We focus on the time interval [0, 2). Suppose that f (0) = 1,
f (0.5) = −3, f (1) = 2 and f (1.5) = 4. We approximate f (t) within the
interval [0, 2) by replacing f (t) with the step function that is 1 on [0, 0.5),
−3 on [0.5, 1), 2 on [1, 1.5), and 4 on [1.5, 2); for notational convenience, we
represent this step function by (1, −3, 2, 4). We can decompose (1, −3, 2, 4)
into a sum of step functions
(1, −3, 2, 4) = 1(1, 1, 1, 1) − 2(1, 1, −1, −1) + 2(1, −1, 0, 0) − 1(0, 0, 1, −1).
The first basis element, (1, 1, 1, 1), does not vary over a two-second interval.
The second one, (1, 1, −1, −1), is orthogonal to the first, and does not vary
over a one-second interval. The other two, both orthogonal to the previous
two and to each other, vary over half-second intervals. We can think of these
basis functions as corresponding to different frequency components and
time locations; that is, they are giving us a time-frequency decomposition.
Suppose we let φ0 (t) be the function that is 1 on the interval [0, 1) and
0 elsewhere, and ψ0 (t) the function that is 1 on the interval [0, 0.5) and −1
on the interval [0.5, 1). Then we say that
φ0 (t) = (1, 1, 0, 0),
23.4. THE INTEGRAL WAVELET TRANSFORM
245
and
ψ0 (t) = (1, −1, 0, 0).
Then we write
φ−1 (t) = (1, 1, 1, 1) = φ0 (0.5t),
ψ0 (t − 1) = (0, 0, 1, −1),
and
ψ−1 (t) = (1, 1, −1, −1) = ψ0 (0.5t).
So we have the decomposition of (1, −3, 2, 4) as
(1, −3, 2, 4) = 1φ−1 (t) − 2ψ−1 (t) + 2ψ0 (t) − 1ψ0 (t − 1).
It what follows we shall be interested in extending these ideas, to find other
functions φ0 (t) and ψ0 (t) that lead to bases consisting of functions of the
form
ψj,k (t) = ψ0 (2j t − k).
These will be our wavelet bases.
23.4
The Integral Wavelet Transform
For real numbers b and a 6= 0, the integral wavelet transform (IWT) of the
signal f (t) relative to the basic wavelet (or mother wavelet) ψ(t) is
Z ∞
1
t−b
)dt.
f (t)ψ(
(Wψ f )(b, a) = |a|− 2
a
−∞
This function is also the wideband cross-ambiguity function in radar. The
function ψ(t) is also called a window function and, like Gaussian functions,
it will be relatively localized in time. However, it must also have properties
quite different from those of Gabor’s Gaussian windows; in particular, we
want
Z ∞
ψ(t)dt = 0.
−∞
An example is the Haar wavelet ψHaar (t) that has the value +1 for 0 ≤
t < 21 , −1 for 12 ≤ t < 1 and zero otherwise.
As the scaling parameter a grows larger the wavelet ψ(t) grows wider,
so choosing a small value of the scaling parameter permits us to focus on a
neighborhood of the time t = b. The IWT then registers the contribution
to f (t) made by components with features on the scale determined by
a, in the neightborhood of t = b. Calculations involving the uncertainty
principle reveal that the IWT provides a flexible time-frequency window
that narrows when we observe high frequency components and widens for
lower frequencies [78].
246
CHAPTER 23. WAVELETS
Given the integral wavelet transform (Wψ f )(b, a), it is natural to ask
how we might recover the signal f (t). The following inversion formula
answers that question: at points t where f (t) is continuous we have
Z ∞Z ∞
t − b da
1
(Wψ f )(b, a)ψ(
f (t) =
) db,
Cψ −∞ −∞
a a2
with
Z
∞
Cψ =
−∞
|Ψ(ω)|2
dω
|ω|
for Ψ(ω) the Fourier transform of ψ(t).
23.5
Wavelet Series Expansions
The Fourier series expansion of a function f (t) on a finite interval is a
representation of f (t) as a sum of orthogonal complex exponentials. Localized alterations in f (t) affect every one of the components of this sum.
Wavelets, on the other hand, can be used to represent f (t) so that localized alterations in f (t) affect only a few of the components of the wavelet
expansion. The simplest example of a wavelet expansion is with respect to
the Haar wavelets.
Exercise 23.1 Let w(t) = ψHaar (t). Show that the functions wjk (t) =
w(2j t − k) are mutually orthogonal on the interval [0, 1], where j = 0, 1, ...
and k = 0, 1, ..., 2j − 1.
These functions wjk (t) are the Haar wavelets. Every continuous function f (t) defined on [0, 1] can be written as
j
f (t) = c0 +
∞ 2X
−1
X
cjk wjk (t)
j=0 k=0
for some choice of c0 and cjk . Notice that the support of the function wjk (t),
the interval on which it is nonzero, gets smaller as j increases. Therefore,
the components corresponding to higher values of j in the Haar expansion
of f (t) come from features that are localized in the variable t; such features
are transients that live for only a short time. Such transient components
affect all of the Fourier coefficients but only those Haar wavelet coefficients
corresponding to terms supported in the region of the disturbance. This
ability to isolate localized features is the main reason for the popularity of
wavelet expansions.
23.6. MULTIRESOLUTION ANALYSIS
247
The orthogonal functions used in the Haar wavelet expansion are themselves discontinuous, which presents a bit of a problem when we represent
continuous functions. Wavelets that are themselves continuous, or better
still, differentiable, should do a better job representing smooth functions.
We can obtain other wavelet series expansions by selecting a basic
wavelet ψ(t) and defining ψjk (t) = 2j/2 ψ(2j t − k), for integers j and k.
We then say that the function ψ(t) is an orthogonal wavelet if the family
{ψjk } is an orthonormal basis for the space of square-integrable functions
on the real line, the Hilbert space L2 (R). This implies that for every such
f (t) there are coefficients cjk so that
f (t) =
∞
X
∞
X
cjk ψjk (t),
j=−∞ k=−∞
with convergence in the mean-square sense. The coefficients cjk are found
using the IWT:
k 1
cjk = (Wψ f )( j , j ).
2 2
It is also of interest to consider wavelets ψ for which {ψjk } form a basis,
but not an orthogonal one, or, more generally, form a frame, in which the
series representations of f (t) need not be unique.
As with Fourier series, wavelet series expansion permits the filtering of
certain components, as well as signal compression. In the case of Fourier
series, we might attribute high frequency components to noise and achieve
a smoothing by setting to zero the coefficients associated with these high
frequencies. In the case of wavelet series expansions, we might attribute to
noise localized small-scale disturbances and remove them by setting to zero
the coefficients corresponding to the appropriate j and k. For both Fourier
and wavelet series expansions we can achieve compression by ignoring those
components whose coefficients are below some chosen level.
23.6
Multiresolution Analysis
One way to study wavelet series expansions is through multiresolution analysis (MRA) [166]. Let us begin with an example involving band-limited
functions. This example is called the Shannon MRA.
23.6.1
The Shannon Multiresolution Analysis
Let V0 be the collection of functions f (t) whose Fourier transform F (ω)
is zero for |ω| > π; so V0 is the collection of π-band-limited functions.
Let V1 be the collection of functions f (t) whose Fourier transform F (ω) is
zero for |ω| > 2π; so V1 is the collection of 2π-band-limited functions. In
248
CHAPTER 23. WAVELETS
general, for each integer j, let Vj be the collection of functions f (t) whose
Fourier transform F (ω) is zero for |ω| > 2j π; so Vj is the collection of
2j π-band-limited functions.
Exercise 23.2 Show that if the function f (t) is in Vj then the function
g(t) = f (2t) is in Vj+1 .
We then have a nested sequence of sets of functions {Vj }, with Vj ⊆ Vj+1
for each integer j. The intersection of all the Vj is the set containing only
the zero function. Every function in L2 (R) is arbitrarily close to a function
in at least one of the sets Vj ; more mathematically, we say that the union
of the Vj is dense in L2 (R). In addition, we have f (t) in Vj if and only if
g(t) = f (2t) is in Vj+1 . In general, such a collection of sets of functions
is called a multiresolution analysis for L2 (R). Once we have a MRA for
L2 (R), how do we get a wavelet series expansion?
A function φ(t) is called a scaling function or sometimes the father
wavelet for the MRA if the collection of integer translates {φ(t − k)} forms
a basis for V0 (more precisely, a Riesz basis). Then, for each fixed j, the
functions φjk (t) = φ(2j t − k), for integer k, will form a basis for Vj . In the
case of the Shannon MRA, the scaling function is φ(t) = sinπtπt . But how
do we get a basis for all of L2 (R)?
23.6.2
The Haar Multiresolution Analysis
To see how to proceed, it is helpful to return to the Haar wavelets. Let
φHaar (t) be the function that has the value +1 for 0 ≤ t < 1 and zero
elsewhere. Let V0 be the collection of all functions in L2 (R) that are linear
combinations of integer translates of φ(t); that is, all functions f (t) that
are constant on intervals of the form [k, k + 1), for all integers k. Now V1
is the collection of all functions g(t) of the form g(t) = f (2t), for some f (t)
in V0 . Therefore, V1 consists of all functions in L2 (R) that are constant on
intervals of the form [k/2, (k + 1)/2).
Every function in V0 is also in V1 and every function g(t) in V1 can be
written uniquely as a sum of a function f (t) in V0 and a function h(t) in
V1 that is orthogonal to every function in V0 . For example, the function
g(t) that takes the value +3 for 0 ≤ t < 1/2, −1 for 1/2 ≤ t < 1, and zero
elsewhere can be written as g(t) = f (t) + h(t), where h(t) has the value +2
for 0 ≤ t < 1/2, −2 for 1/2 ≤ t < 1, and zero elsewhere, and f (t) takes the
value +1 for 0 ≤ t < 1 and zero elsewhere. Clearly, h(t), which is twice the
Haar wavelet function, is orthogonal to all functions in V0 .
23.6. MULTIRESOLUTION ANALYSIS
249
Exercise 23.3 Show that the function f (t) can be written uniquely as
f (t) = d(t) + e(t), where d(t) in V−1 and e(t) is in V0 and is orthogonal to every function in V−1 . Relate the function e(t) to the Haar wavelet
function.
23.6.3
Wavelets and Multiresolution Analysis
To get an orthogonal wavelet expansion from a general MRA, we write the
set V1 as the direct sum V1 = V0 ⊕ W0 , so every function g(t) in V1 can be
uniquely written as g(t) = f (t) + h(t), where f (t) is a function in V0 and
h(t) is a function in W0 , with f (t) and h(t) orthogonal. Since the scaling
function or father wavelet φ(t) is in V1 , it can be written as
φ(t) =
∞
X
pk φ(2t − k),
(23.1)
k=−∞
for some sequence {pk } called the two-scale sequence for φ(t). This most
important identity is the scaling relation for the father wavelet. The mother
wavelet is defined using a similar expression
X
(23.2)
ψ(t) =
(−1)k p1−k φ(2t − k).
k
We define
φjk (t) = 2j/2 φ(2j t − k)
(23.3)
ψjk (t) = 2j/2 ψ(2j t − k).
(23.4)
and
The collection {ψjk (t), −∞ < j, k < ∞} then forms an orthogonal wavelet
basis for L2 (R). For the Haar MRA, the two-scale sequence is p0 = p1 = 1
and pk = 0 for the rest.
Exercise 23.4 Show that the two-scale sequence {pk } has the properties
Z
pk = 2 φ(t)φ(2t − k)dt;
∞
X
pk−2m pk = 0,
k=−∞
for m 6= 0 and equals two when m = 0.
250
CHAPTER 23. WAVELETS
23.7
Signal Processing Using Wavelets
Once we have an orthogonal wavelet basis for L2 (R), we can use the basis
to represent and process a signal f (t). Suppose, for example, that f (t) is
band-limited but essentially zero for t not in [0, 1] and we have samples
k
1
f(M
), k = 0, ..., M . We assume that the sampling rate ∆ = M
is faster
than the Nyquist rate so that the Fourier transform of f (t) is zero outside,
say, the interval [0, 2πM ]. Roughly speaking, the Wj component of f (t),
given by
j
2X
−1
gj (t) =
βkj ψjk (t),
k=0
with βkj = hf (t), ψjk (t)i, corresponds to the components of f (t) with frequencies ω between 2j−1 and 2j . For 2j > 2πM we have βkj = 0, so
gj (t) = 0. Let J be the smallest integer greater than log2 (2π) + log2 (M ).
Then, f (t) is in the space VJ and has the expansion
f (t) =
J
2X
−1
αkJ φJk (t),
k=0
for αkJ = hf (t), φJk (t)i. It is common practice, but not universally apk
).
proved, to take M = 2J and to estimate the αkJ by the samples f ( M
J
Once we have the sequence {αk }, we can begin the decomposition of f (t)
into components in Vj and Wj for j < J. As we shall see, the algorithms
for the decomposition and subsequent reconstruction of the signal are quite
similar to the FFT.
23.7.1
Decomposition and Reconstruction
The decomposition and reconstruction algorithms both involve the equation
X j
X
j−1
j−1
ak φjk =
am
φ(j−1),m + bm
ψ(j−1),m ;
(23.5)
k
m
in the decomposition step we know the {ajk } and want the {aj−1
m } and
j−1
j−1
{bj−1
m }, while in the reconstruction step we know the {am } and {bm }
j
and want the {ak }.
Using Equations (23.1) and (23.3), we obtain
φ(j−1),l = 2−1/2
X
k
pk φj,(k+2l) = 2−1/2
X
k
pk−2l φjk ;
(23.6)
23.7. SIGNAL PROCESSING USING WAVELETS
251
using Equations (23.2), (23.3) and (23.4), we get
X
ψ(j−1),l = 2−1/2
(−1)k p1−k+2l φjk .
(23.7)
k
Therefore,
hφjk , φ(j−1),l i = 2−1/2 pk−2l ;
(23.8)
this comes from substituting φ(j−1),l as in Equation (23.6) into the second
term in the inner product. Similarly, we have
hφjk , ψ(j−1),l i = 2−1/2 (−1)k p1−k+2l .
(23.9)
These relationships are then used to derive the decomposition and reconstruction algorithms.
The decomposition step:
To find aj−1
we take the inner product of both sides of Equation (23.5)
l
with the function φ(j−1),l . Using Equation (23.8) and the fact that φ(j−1),l
is orthogonal to all the φ(j−1),m except for m = l and is orthogonal to all
the ψ(j−1),m , we obtain
X j
;
2−1/2
ak pk−2l = aj−1
l
k
similarly, using Equation (23.9), we get
X j
2−1/2
ak (−1)k p1−k+2l = bj−1
.
l
k
The decomposition step is to apply these two equations to get the {aj−1
}
l
j
and {bj−1
}
from
the
{a
}.
l
k
The reconstruction step:
Now we use Equations (23.6) and (23.7) to substitute into the right hand
side of Equation (23.5). Combining terms, we get
X j−1
ajk = 2−1/2
al pk−2l + bj−1
(−1)k p1−k+2l .
l
l
{aj−1
}
l
This takes us from the
and {bj−1
} to the {ajk }.
l
We have assumed that we have already obtained the scaling function
φ(t) with the property that {φ(t − k)} is an orthogonal basis for V0 . But
how do we actually obtain such functions?
252
CHAPTER 23. WAVELETS
23.8
Generating the Scaling Function
The scaling function φ(t) is generated from the two-scale sequence {pk }
using the following iterative procedure. Start with φ0 (t) = φHaar (t), the
Haar scaling function that is one on [0, 1] and zero elsewhere. Now, for
each n = 1, 2, ..., define
∞
X
φn (t) =
pk φn−1 (2t − k).
k=−∞
Provided that the sequence {pk } has certain properties to be discussed
below, this sequence of functions converges and the limit is the desired
scaling function.
The properties of {pk } that are needed can be expressed in terms of
properties of the function
P (z) =
∞
1 X
pk z k .
2
k=−∞
For the Haar MRA, this function is P (z) = 12 (1 + z). We require that
• 1. P (1) = 1,
• 2. |P (eiθ )|2 + |P (ei(θ+π) )|2 = 1, for 0 ≤ θ ≤ π, and
• 3. |P (eiθ )| > 0 for − π2 ≤ θ ≤
23.9
π
2.
Generating the Two-scale Sequence
The final piece of the puzzle is the generation of the sequence {pk } itself, or,
equivalently, finding a function P (z) with the properties listed above. The
following example, also used in [18], illustrates Ingrid Daubechies’ method
[93].
We begin with the identity
cos2
θ
θ
+ sin2 = 1
2
2
and then raise both sides to an odd power n = 2N − 1. Here we use N = 2,
obtaining
θ
θ
θ
1 = cos6 + 3 cos4 sin2
2
2
2
+ cos6
(θ + π)
(θ + π)
(θ + π)
+ 3 cos4
sin2
.
2
2
2
23.9. GENERATING THE TWO-SCALE SEQUENCE
253
We then let
|P (eiθ )|2 = cos6
θ
θ
θ
+ 3 cos4 sin2 ,
2
2
2
so that
|P (eiθ )|2 + |P (ei(θ+π) )|2 = 1
for 0 ≤ θ ≤ π. Now we have to find P (eiθ ).
Writing
θ
θ
θ
|P (eiθ )|2 = cos4 [cos2 + 3 sin2 ],
2
2
2
we have
θ
θ
θ √
P (eiθ ) = cos2 [cos + 3i sin ]eiα(θ) ,
2
2
2
where the real function α(θ) is arbitrary. Selecting α(θ) = 3 θ2 , we get
P (eiθ ) = p0 + p1 eiθ + p2 e2iθ + p3 e3iθ ,
for
√
1+ 3
,
4
√
3+ 3
p1 =
,
4
√
3− 3
p2 =
,
4
√
1− 3
,
p3 =
4
p0 =
and all the other coefficients are zero. The resulting Daubechies’ wavelet is
compactly supported and continuous, but not differentiable [18, 93]. Figure
23.1 shows the scaling function and mother wavelet for N = 2. When larger
values of N are used, the resulting wavelet, often denoted ψN (t), which is
again compactly supported, has approximately N/5 continuous derivatives.
These notions extend to nonorthogonal wavelet bases and to frames.
Algorithms similar to the fast Fourier transform provide the wavelet decomposition and reconstruction of signals. The recent text by Boggess and
Narcowich [18] is a nice introduction to this fast-growing area; the more
advanced book by Chui [78] is also a good source. Wavelets in the context
of Riesz bases and frames are discussed in Christensen’s book [77]. Applications of wavelets to medical imaging are found in [187], as well as in the
other papers in that special issue.
254
23.10
CHAPTER 23. WAVELETS
Wavelets and Filter Banks
In [212] Strang and Nguyen take a somewhat different approach to wavelets,
emphasizing the role of filters and matrices. To illustrate one of their main
points, we consider the two-point moving average filter.
The two-point moving average filter transforms an input sequence x =
{x(n)} to output y = {y(n)}, with y(n) = 21 x(n) + 21 x(n − 1). The filter
h = {h(k)} has h(0) = h(1) = 12 and all the remaining h(n) are zero. This
filter is a finite impulse response (FIR) low-pass filter and is not invertible;
the input sequence with x(n) = (−1)n has output zero. Similarly, the twopoint moving difference filter g = {g(k)}, with g(0) = 21 , g(1) = − 12 , and
the rest zero, is a FIR high-pass filter, also not invertible. However, if we
perform these filters in parallel, as a filter bank, no information is lost and
the input can be completely reconstructed, with a unit delay. In addition,
the outputs of the two filters contain redundancy that can be removed by
decimation, which is taken here to mean downsampling, that is, throwing
away every other term of a sequence.
The authors treat the more general problem of obtaining perfect reconstruction of the input from the output of a filter bank of low- and high-pass
filters followed by downsampling. The properties that must be required of
the filters are those we encountered earlier with regard to the two-scale sequences for the father and mother wavelets. When the filter operations are
construed as matrix multiplications, the decomposition and reconstruction
algorithms become matrix factorizations.
23.11
Using Wavelets
We consider the Daubechies mother wavelet ψN (t), for N = 1, 2, ..., and
n = 2N −1. The two-scale sequence {pk } then has nonzero terms p0 , ..., pn .
For example, when N = 1, we get the Haar wavelet, with p0 = p1 = 1/2,
and all the other pk = 0.
The wavelet signal analysis usually begins by sampling the signal f (t)
closely enough so that we can approximate the aj+1
by the samples f (k/2j+1 ).
k
An important aspect of the Daubechies wavelets is the vanishing of
moments. For k = 0, 1, ..., N − 1 we have
Z
tk ψN (t)dt = 0;
R
for the Haar case we have only that ψ1 (t)dt = 0. We consider now the
significance of vanishing moments for detection.
For an arbitrary signal f (t) the wavelet coefficients bjk are given by
Z
j
bk = f (t)2j/2 ψN (2j t − k)dt.
23.11. USING WAVELETS
255
We focus on N = 2
The function ψ2 (2j t − k) is supported on the interval [k/2j , (k + 3)/2j ]
so we have
Z 3/2j
bjk =
f (t + k/2j )ψ2 (2j t)dt.
0
If f (t) is smooth near t = k/2j , and j is large enough, then
f (t + k/2j ) = f (k/2j ) + f 0 (k/2j )t +
and so
bjk ' 2j/2 [f (k/2j )
+f 0 (k/2j )
Z
Z
3/2j
ψ2 (2j t)dt
0
3/2j
tψ2 (2j t)dt + f 00 (k/2j )
0
Since
1 00
f (k/2j )t2 + · · ·,
2!
Z
3/2j
t2 ψ2 (2j t)dt].
0
Z
Z
ψ2 (t)dt =
tψ2 (t)dt = 0
and
Z
1
t ψ2 (t)dt ' −
8
2
r
3
,
2π
we have
bjk
1
'−
16
r
3 −5j/2 00
2
f (k/2j ).
2π
On the other hand, if f (t) is not smooth near t = k/2j , we expect the bjk
to have a larger magnitude.
Example 1 Suppose that f (t) is piecewise linear. Then f 00 (t) = 0, except
at the places where the lines meet. So we expect the bjk to be zero, except
at the nodes.
Example 2 Let f (t) = t(1−t), for t ∈ [0, 1], and zero elsewhere. We might
begin with the sample values f (k/27 ) and then consider b6k . Again using
N = 2, we find that b6k ' f 00 (k/26 ) = 2, independent of k, except near the
endpoints t = 0 and t = 1. The discontinuity of f 0 (t) at the ends will make
the b6k there larger.
Example 3 Now let g(t) = t2 (1 − t)2 , for t ∈ [0, 1], and zero elsewhere.
The first derivative is continuous at the endpoints t = 0 and t = 1, but the
second derivative is discontinuous there. Using N = 2, we won’t be able to
detect this discontinuity, but using N = 3 we will.
256
CHAPTER 23. WAVELETS
Example 4 Suppose that f (t) = eiωt . Then we have
j
bjk = 2−j/2 eiωk/2 ΨN (ω/2j ),
independent of k, where ΨN denotes the Fourier transform of ψN . If we
plot these values for various j, the maximum is reached when
ω/2j = argmax ΨN ,
from which we can find ω.
23.11. USING WAVELETS
257
Figure 23.1: Daubechies’ scaling function and mother wavelet for N = 2.
258
CHAPTER 23. WAVELETS
Part VII
Estimation and Detection
259
Chapter 24
The BLUE and The
Kalman Filter
24.1
Chapter Summary
In most signal- and image-processing applications the measured data includes (or may include) a signal component we want and unwanted components called noise. Estimation involves determining the precise nature
and strength of the signal component; deciding if that strength is zero or
not is detection.
Noise often appears as an additive term, which we then try to remove. If
we knew precisely the noisy part added to each data value we would simply
subtract it; of course, we never have such information. How then do we
remove something when we don’t know what it is? Statistics provides a
way out.
The basic idea in statistics is to use procedures that perform well on
average, when applied to a class of problems. The procedures are built
using properties of that class, usually involving probabilistic notions, and
are evaluated by examining how they would have performed had they been
applied to every problem in the class. To use such methods to remove
additive noise, we need a description of the class of noises we expect to
encounter, not specific values of the noise component in any one particular
instance. We also need some idea about what signal components look like.
In this chapter we discuss solving this noise removal problem using the best
linear unbiased estimation (BLUE). We begin with the simplest case and
then proceed to discuss increasingly complex scenarios.
An important application of the BLUE is in Kalman filtering. The
connection between the BLUE and Kalman filtering is best understood by
considering the case of the BLUE with a prior estimate of the signal com261
262
CHAPTER 24. THE BLUE AND THE KALMAN FILTER
ponent, and mastering the various matrix manipulations that are involved
in this problem. These calculations then carry over, almost unchanged, to
the Kalman filtering.
Kalman filtering is usually presented in the context of estimating a
sequence of vectors evolving in time. Kalman filtering for image processing
is derived by analogy with the temporal case, with certain parts of the
image considered to be in the “past” of a fixed pixel.
24.2
The Simplest Case
Suppose our data is zj = c + vj , for j = 1, ..., J, where c is an unknown
constant to be estimated and the vj are additive noise. We assume that
E(vj ) = 0, E(vj vk ) = 0 for j 6= k, and E(|vj |2 ) = σj2 . So, the additive
noises are assumed to have mean zero and to be independent (or at least
uncorrelated). In order to estimate c, we adopt the following rules:
• 1. The estimate ĉ is linear in the data z = (z1 , ..., zJ )T ; that is,
ĉ = k† z, for some vector k = (k1 , ..., kJ )T .
• 2. The estimate is unbiased; E(ĉ) = c. This means
PJ
j=1
kj = 1.
• 3. The estimate is best in the sense that it minimizes the expected
error squared; that is, E(|ĉ − c|2 ) is minimized.
Exercise 24.1 Show that the resulting vector k is
J
X
σj−2 ),
ki = σi−2 /(
j=1
and the BLUE estimator of c is then
ĉ =
J
X
i=1
zi σi−2 /(
J
X
σj−2 ).
j=1
Exercise 24.2 Suppose we have data z1 = c + v1 and z2 = c + v2 and
we want to estimate the constant c. Assume that E(v1 ) = E(v2 ) = 0 and
E(v1 v2 ) = ρ, with 0 < |ρ| < 1. Find the BLUE estimate of c.
Exercise 24.3 The concentration of a substance in solution decreases exponentially during an experiment. Noisy measurements of the concentration are made at times t1 and t2 , giving the data
zi = x0 e−ti + vi , i = 1, 2,
where the vi have mean zero, and are uncorrelated. Find the BLUE for the
initial concentration x0 .
24.3. A MORE GENERAL CASE
24.3
263
A More General Case
Suppose now that our data vector is z = Hx + v. Here, x is an unknown
vector whose value is to be estimated, the random vector v is additive
noise whose mean is E(v) = 0 and whose known covariance matrix is
Q = E(vv† ), not necessarily diagonal, and the known matrix H is J by N ,
with J > N . Now we seek an estimate of the vector x. We now use the
following rules:
• 1. The estimate x̂ must have the form x̂ = K † z, where the matrix
K is to be determined.
• 2. The estimate is unbiased; that is, E(x̂) = x.
• 3. The K is determined as the minimizer of the expected squared
error; that is, once again we minimize E(|x̂ − x|2 ).
Exercise 24.4 Show that for the estimator to be unbiased we need K † H =
I, the identity matrix.
Exercise 24.5 Show that
E(|x̂ − x|2 ) = trace K † QK.
Hints: Write the left side as
E(trace ((x̂ − x)(x̂ − x)† )).
Also use the fact that the trace and expected-value operations commute.
The problem then is to minimize trace K † QK subject to the constraint
equation K † H = I. We solve this problem using a technique known as
prewhitening.
Since the noise covariance matrix Q is Hermitian and nonnegative definite, we have Q = U DU † , where the columns of U are the (mutually
orthogonal) eigenvectors of Q and D is a diagonal matrix whose diagonal entries are the (necessarily nonnegative) eigenvalues of Q; therefore,
U † U = I. We call C = U D1/2 U † the Hermitian square root of Q, since
C † = C and C 2 = Q. We assume that Q is invertible, so that C is also.
Given the system of equations
z = Hx + v,
264
CHAPTER 24. THE BLUE AND THE KALMAN FILTER
as before, we obtain a new system
y = Gx + w
by multiplying both sides by C −1 = Q−1/2 ; here, G = C −1 H and w =
C −1 v. The new noise correlation matrix is
E(ww† ) = C −1 QC −1 = I,
so the new noise is white. For this reason the step of multiplying by C −1
is called prewhitening.
With J = CK and M = C −1 H, we have
K † QK = J † J
and
K † H = J † M.
Our problem then is to minimize trace J † J, subject to J † M = I. Recall
that the trace of the matrix A† A is simply the square of the 2-norm of the
vectorization of A.
Our solution method is to transform the original problem into a simpler
problem, where the answer is obvious.
First, for any given matrices L and M such that J and M L have the
same dimensions, the minimum value of
f (J) = trace[(J † − L† M † )(J − M L)]
is zero and occurs when J = M L.
Now let L = L† = (M † M )−1 . The solution is again J = M L, but now
this choice for J has the additional property that J † M = I. So, minimizing
f (J) is equivalent to minimizing f (J) subject to the constraint J † M = I
and both problems have the solution J = M L.
Now using J † M = I, we expand f (J) to get
f (J) = trace[J † J − J † M L − L† M † J + L† M † M L]
= trace[J † J − L − L† + L† M † M L].
The only term here that involves the unknown matrix J is the first one.
Therefore, minimizing f (J) subject to J † M = I is equivalent to minimizing
trace J † J subject to J † M = I, which is our original problem. Therefore,
the optimal choice for J is J = M L. Consequently, the optimal choice for
K is
K = Q−1 HL = Q−1 H(H † Q−1 H)−1 ,
and the BLUE estimate of x is
xBLU E = x̂ = K † z = (H † Q−1 H)−1 H † Q−1 z.
24.4. SOME USEFUL MATRIX IDENTITIES
265
The simplest case can be obtained from this more general formula by taking
N = 1, H = (1, 1, ..., 1)T and x = c.
Note that if the noise is white, that is, Q = σ 2 I, then x̂ = (H † H)−1 H † z,
which is the least-squares solution of the equation z = Hx. The effect of
requiring that the estimate be unbiased is that, in this case, we simply
ignore the presence of the noise and calculate the least squares solution of
the noise-free equation z = Hx.
The BLUE estimator involves nested inversion, making it difficult to
calculate, especially for large matrices. In the exercise that follows, we
discover an approximation of the BLUE that is easier to calculate.
Exercise 24.6 Show that for > 0 we have
(H † Q−1 H + I)−1 H † Q−1 = H † (HH † + Q)−1 .
(24.1)
Hint: Use the identity
H † Q−1 (HH † + Q) = (H † Q−1 H + I)H † .
It follows from Equation (24.1) that
xBLU E = lim H † (HH † + Q)−1 z.
→0
(24.2)
Therefore, we can get an approximation of the BLUE estimate by selecting
> 0 near zero, solving the system of linear equations
(HH † + Q)a = z
for a and taking x = H † a.
24.4
Some Useful Matrix Identities
In the exercise that follows we consider several matrix identities that are
useful in developing the Kalman filter.
Exercise 24.7 Establish the following identities, assuming that all the
products and inverses involved are defined:
CDA−1 B(C −1 − DA−1 B)−1 = (C −1 − DA−1 B)−1 − C;
(24.3)
(A − BCD)−1 = A−1 + A−1 B(C −1 − DA−1 B)−1 DA−1 ;
(24.4)
266
CHAPTER 24. THE BLUE AND THE KALMAN FILTER
A−1 B(C −1 − DA−1 B)−1 = (A − BCD)−1 BC;
(24.5)
(A − BCD)−1 = (I + GD)A−1 ,
(24.6)
for
G = A−1 B(C −1 − DA−1 B)−1 .
Hints: To get Equation (24.3) use
C(C −1 − DA−1 B) = I − CDA−1 B.
For the second identity, multiply both sides of Equation (24.4) on the left
by A−BCD and at the appropriate step use Equation (24.3). For Equation
(24.5) show that
BC(C −1 − DA−1 B) = B − BCDA−1 B = (A − BCD)A−1 B.
For Equation (24.6), substitute what G is and use Equation (24.4).
24.5
The BLUE with a Prior Estimate
In Kalman filtering we have the situation in which we want to estimate
an unknown vector x given measurements z = Hx + v, but also given a
prior estimate y of x. It is the case there that E(y) = E(x), so we write
y = x + w, with w independent of both x and v and E(w) = 0. The
covariance matrix for w we denote by E(ww† ) = R. We now require that
the estimate x̂ be linear in both z and y; that is, the estimate has the form
x̂ = C † z + D† y,
for matrices C and D to be determined.
The approach is to apply the BLUE to the combined system of linear
equations
z = Hx + v and
y = x + w.
In matrix language this combined system becomes u = Jx + n, with uT =
[zT yT ], J T = [H T I T ], and nT = [vT wT ]. The noise covariance matrix
becomes
Q 0
P =
.
0 R
The BLUE estimate is K † u, with K † J = I. Minimizing the variance, we
find that the optimal K † is
K † = (J † P −1 J)−1 J † P −1 .
24.6. ADAPTIVE BLUE
267
The optimal estimate is then
x̂ = (H † Q−1 H + R−1 )−1 (H † Q−1 z + R−1 y).
Therefore,
C † = (H † Q−1 H + R−1 )−1 H † Q−1
and
D† = (H † Q−1 H + R−1 )−1 R−1 .
Using the matrix identities in Equations (24.4) and (24.5) we can rewrite
this estimate in the more useful form
x̂ = y + G(z − Hy),
for
G = RH † (Q + HRH † )−1 .
(24.7)
The covariance matrix of the optimal estimator is K † P K, which can be
written as
K † P K = (R−1 + H † Q−1 H)−1 = (I − GH)R.
In the context of the Kalman filter, R is the covariance of the prior estimate
of the current state, G is the Kalman gain matrix, and K † P K is the posterior covariance of the current state. The algorithm proceeds recursively
from one state to the next in time.
24.6
Adaptive BLUE
We have assumed so far that we know the covariance matrix Q corresponding to the measurement noise. If we do not, then we may attempt
to estimate Q from the measurements themselves; such methods are called
noise-adaptive. To illustrate, let the innovations vector be e = z − Hy.
Then the covariance matrix of e is S = HRH † + Q. Having obtained an
estimate Ŝ of S from the data, we use Ŝ − HRH † in place of Q in Equation
(24.7).
24.7
The Kalman Filter
So far in this chapter we have focused on the filtering problem: given the
data vector z, estimate x, assuming that z consists of noisy measurements
of Hx; that is, z = Hx + v. An important extension of this problem is
that of stochastic prediction. Shortly, we discuss the Kalman-filter method
268
CHAPTER 24. THE BLUE AND THE KALMAN FILTER
for solving this more general problem. One area in which prediction plays
an important role is the tracking of moving targets, such as ballistic missiles, using radar. The range to the target, its angle of elevation, and its
azimuthal angle are all functions of time governed by linear differential
equations. The state vector of the system at time t might then be a vector with nine components, the three functions just mentioned, along with
their first and second derivatives. In theory, if we knew the initial state
perfectly and our differential equations model of the physics was perfect,
that would be enough to determine the future states. In practice neither
of these is true, and we need to assist the differential equation by taking
radar measurements of the state at various times. The problem then is to
estimate the state at time t using both the measurements taken prior to
time t and the estimate based on the physics.
When such tracking is performed digitally, the functions of time are
replaced by discrete sequences. Let the state vector at time k∆t be denoted by xk , for k an integer and ∆t > 0. Then, with the derivatives in
the differential equation approximated by divided differences, the physical
model for the evolution of the system in time becomes
xk = Ak−1 xk−1 + mk−1 .
The matrix Ak−1 , which we assume is known, is obtained from the differential equation, which may have nonconstant coefficients, as well as from the
divided difference approximations to the derivatives. The random vector
sequence mk−1 represents the error in the physical model due to the discretization and necessary simplification inherent in the original differential
equation itself. We assume that the expected value of mk is zero for each
k. The covariance matrix is E(mk m†k ) = Mk .
At time k∆t we have the measurements
zk = Hk xk + vk ,
where Hk is a known matrix describing the nature of the linear measurements of the state vector and the random vector vk is the noise in these
measurements. We assume that the mean value of vk is zero for each k.
The covariance matrix is E(vk vk† ) = Qk . We assume that the initial state
vector x0 is arbitrary.
Given an unbiased estimate x̂k−1 of the state vector xk−1 , our prior
estimate of xk based solely on the physics is
yk = Ak−1 x̂k−1 .
Exercise 24.8 Show that E(yk − xk ) = 0, so the prior estimate of xk is
unbiased. We can then write yk = xk + wk , with E(wk ) = 0.
24.8. KALMAN FILTERING AND THE BLUE
24.8
269
Kalman Filtering and the BLUE
The Kalman filter [147, 117, 79] is a recursive algorithm to estimate the
state vector xk at time k∆t as a linear combination of the vectors zk and
yk . The estimate x̂k will have the form
x̂k = Ck† zk + Dk† yk ,
(24.8)
for matrices Ck and Dk to be determined. As we shall see, this estimate
can also be written as
x̂k = yk + Gk (zk − Hk yk ),
(24.9)
which shows that the estimate involves a prior prediction step, the yk ,
followed by a correction step, in which Hk yk is compared to the measured
data vector zk ; such estimation methods are sometimes called predictorcorrector methods.
In our discussion of the BLUE, we saw how to incorporate a prior
estimate of the vector to be estimated. The trick was to form a larger
matrix equation and then to apply the BLUE to that system. The Kalman
filter does just that.
The correction step in the Kalman filter uses the BLUE to solve the
combined linear system
zk = Hk x k + v k
and
yk = xk + wk .
The covariance matrix of x̂k−1 − xk−1 is denoted by Pk−1 , and we let
Qk = E(wk wk† ). The covariance matrix of yk − xk is
cov(yk − xk ) = Rk = Mk−1 + Ak−1 Pk−1 A†k−1 .
It follows from our earlier discussion of the BLUE that the estimate of xk
is
x̂k = yk + Gk (zk − Hyk ),
with
Gk = Rk Hk† (Qk + Hk Rk Hk† )−1 .
Then, the covariance matrix of x̂k − xk is
Pk = (I − Gk Hk )Rk .
The recursive procedure is to go from Pk−1 and Mk−1 to Rk , then to Gk ,
from which x̂k is formed, and finally to Pk , which, along with the known
matrix Mk , provides the input to the next step. The time-consuming part
of this recursive algorithm is the matrix inversion in the calculation of Gk .
270
CHAPTER 24. THE BLUE AND THE KALMAN FILTER
Simpler versions of the algorithm are based on the assumption that the
matrices Qk are diagonal, or on the convergence of the matrices Gk to a
limiting matrix G [79].
There are many variants of the Kalman filter, corresponding to variations in the physical model, as well as in the statistical assumptions. The
differential equation may be nonlinear, so that the matrices Ak depend on
xk . The system noise sequence {wk } and the measurement noise sequence
{vk } may be correlated. For computational convenience the various functions that describe the state may be treated separately. The model may
include known external inputs to drive the differential system, as in the
tracking of spacecraft capable of firing booster rockets. Finally, the noise
covariance matrices may not be known a priori and adaptive filtering may
be needed. We discuss this last issue briefly in the next section.
24.9
Adaptive Kalman Filtering
As in [79] we consider only the case in which the covariance matrix Qk of the
measurement noise vk is unknown. As we saw in the discussion of adaptive
BLUE, the covariance matrix of the innovations vector ek = zk − Hk yk is
Sk = Hk Rk Hk† + Qk .
Once we have an estimate for Sk , we estimate Qk using
Q̂k = Ŝk − Hk Rk Hk† .
We might assume that Sk is independent of k and estimate Sk = S using
past and present innovations; for example, we could use
k
Ŝ =
1 X
(zj − Hj yj )(zj − Hj yj )† .
k − 1 j=1
Chapter 25
Signal Detection and
Estimation
25.1
Chapter Summary
In this chapter we consider the problem of deciding whether or not a particular signal is present in the measured data; this is the detection problem.
The underlying framework for the detection problem is optimal estimation
and statistical hypothesis testing [117].
25.2
The Model of Signal in Additive Noise
The basic model used in detection is that of a signal in additive noise. The
complex data vector is x = (x1 , x2 , ..., xN )T . We assume that there are two
possibilities:
Case 1: Noise only
xn = zn , n = 1, ..., N,
or
Case 2: Signal in noise
xn = γsn + zn ,
where z = (z1 , z2 , ..., zN )T is a complex vector whose entries zn are values
of random variables that we call noise, about which we have only statistical
information (that is to say, information about the average behavior), s =
(s1 , s2 , ..., sN )T is a complex signal vector that we may known exactly, or at
least for which we have a specific parametric model, and γ is a scalar that
271
272
CHAPTER 25. SIGNAL DETECTION AND ESTIMATION
may be viewed either as deterministic or random (but unknown, in either
case). Unless otherwise stated, we shall assume that γ is deterministic.
The detection problem is to decide which case we are in, based on some
calculation performed on the data x. Since Case 1 can be viewed as a
special case of Case 2 in which the value of γ is zero, the detection problem
is closely related to the problem of estimating γ, which we discussed in the
chapter dealing with the best linear unbiased estimator, the BLUE.
We shall assume throughout that the entries of z correspond to random
variables with means equal to zero. What the variances are and whether or
not these random variables are mutually correlated will be discussed next.
In all cases we shall assume that this information has been determined
previously and is available to us in the form of the covariance matrix Q =
E(zz† ) of the vector z; the symbol E denotes expected value, so the entries
of Q are the quantities Qmn = E(zm z n ). The diagonal entries of Q are
Qnn = σn2 , the variance of zn .
Note that we have adopted the common practice of using the same
symbols, zn , when speaking about the random variables and about the
specific values of these random variables that are present in our data. The
context should make it clear to which we are referring.
PN
In Case 2 we say that the signal power is equal to |γ|2 N1 n=1 |sn |2 =
PN
1
1
1
2 †
2
n=1 σn = N tr(Q), where tr(Q) is the
N |γ| s s and the noise power is N
trace of the matrix Q, that is, the sum of its diagonal terms; therefore, the
noise power is the average of the variances σn2 . The input signal-to-noise
ratio (SNRin ) is the ratio of the signal power to that of the noise, prior to
processing the data; that is,
SNRin =
25.3
1 2 † 1
|γ| s s/ tr(Q) = |γ|2 s† s/tr(Q).
N
N
Optimal Linear Filtering for Detection
In each case to be considered next, our detector will take the form of a
linear estimate of γ; that is, we shall compute the estimate γ̂ given by
γ̂ =
N
X
bn xn = b† x,
n=1
where b = (b1 , b2 , ..., bN )T is a vector to be determined. The objective is
to use what we know about the situation to select the optimal b, which
will depend on s and Q.
For any given vector b, the quantity
γ̂ = b† x = γb† s + b† z
25.3. OPTIMAL LINEAR FILTERING FOR DETECTION
273
is a random variable whose mean value is equal to γb† s and whose variance
is
var(γ̂) = E(|b† z|2 ) = E(b† zz† b) = b† E(zz† )b = b† Qb.
Therefore, the output signal-to-noise ratio (SNRout ) is defined as
SNRout = |γb† s|2 /b† Qb.
The advantage we obtain from processing the data is called the gain associated with b and is defined to be the ratio of the SNRout to SNRin ; that
is,
|b† s|2 tr(Q)
|γb† s|2 /(b† Qb)
gain(b) =
= †
.
2
†
|γ| (s s)/tr(Q)
(b Qb)(s† s)
The best b to use will be the one for which gain(b) is the largest. So,
ignoring the terms in the gain formula that do not involve b, we see that
† 2
s|
, for fixed signal vector s and fixed
the problem becomes maximize |b
b† Qb
noise covariance matrix Q.
The Cauchy inequality plays a major role in optimal filtering and detection:
Cauchy’s inequality: For any vectors a and b we have
|a† b|2 ≤ (a† a)(b† b),
with equality if and only if a is proportional to b; that is, there is a scalar
β such that b = βa.
Exercise 25.1 Use Cauchy’s inequality to show that, for any fixed vector
a, the choice b = βa maximizes the quantity |b† a|2 /b† b, for any constant
β.
Exercise 25.2 Use the definition of the covariance matrix Q to show that
Q is Hermitian and that, for any vector y, y† Qy ≥ 0. Therefore, Q is a
nonnegative definite matrix and, using its eigenvector decomposition, can
be written as Q = CC † , for some invertible square matrix C.
Exercise 25.3 Consider now the problem of maximizing |b† s|2 /b† Qb. Using the two previous exercises, show that the solution is b = βQ−1 s, for
some arbitrary constant β.
274
CHAPTER 25. SIGNAL DETECTION AND ESTIMATION
We can now use the results of these exercises to continue our discussion.
We choose the constant β = 1/(s† Q−1 s) so that the optimal b has b† s = 1;
that is, the optimal filter b is
b = (1/(s† Q−1 s))Q−1 s,
and the optimal estimate of γ is
γ̂ = b† x = (1/(s† Q−1 s))(s† Q−1 x).
The mean of the random variable γ̂ is equal to γb† s = γ, and the variance
is equal to 1/(s† Q−1 s). Therefore, the output signal power is |γ|2 , the
output noise power is 1/(s† Q−1 s), and so the output signal-to-noise ratio
(SNRout ) is
SNRout = |γ|2 (s† Q−1 s).
The gain associated with the optimal vector b is then
maximum gain =
(s† Q−1 s) tr(Q)
.
(s† s)
The calculation of the vector C −1 x is sometimes called prewhitening since
C −1 x = γC −1 s + C −1 z and the new noise vector, C −1 z, has the identity
matrix for its covariance matrix. The new signal vector is C −1 s. The
filtering operation that gives γ̂ = b† x can be written as
γ̂ = (1/(s† Q−1 s))(C −1 s)† C −1 x;
the term (C −1 s)† C −1 x is described by saying that we prewhiten, then do
a matched filter. Now we consider some special cases of noise.
25.4
The Case of White Noise
We say that the noise is white noise if the covariance matrix is Q = σ 2 I,
where I denotes the identity matrix that is one on the main diagonal and
zero elsewhere and σ > 0 is the common standard deviation of the zn . This
means that the zn are mutually uncorrelated (independent, in the Gaussian
case) and share a common variance.
In this case the optimal vector b is b = (s1† s) s and the gain is N . Notice
that γ̂ now involves only a matched filter. We consider now some special
cases of the signal vectors s.
25.4.1
Constant Signal
Suppose that the vector s is constant; that is, s = 1 = (1, 1, ..., 1)T . Then,
we have
N
1 X
γ̂ =
xn .
N n=1
25.5. THE CASE OF CORRELATED NOISE
275
This is the same result we found in our discussion of the BLUE, when we
estimated the mean value and the noise was white.
25.4.2
Sinusoidal Signal, Frequency Known
Suppose that
s = e(ω0 ) = (exp(−iω0 ), exp(−2iω0 ), ..., exp(−N iω0 ))T ,
where ω0 denotes a known frequency in [−π, π). Then, b =
γ̂ =
1
N e(ω0 )
and
N
1 X
xn exp(inω0 );
N n=1
so, we see yet another occurrence of the DFT.
25.4.3
Sinusoidal Signal, Frequency Unknown
If we do not know the value of the signal frequency ω0 , a reasonable thing
to do is to calculate the γ̂ for each (actually, finitely many) of the possible
frequencies within [−π, π) and base the detection decision on the largest
value; that is, we calculate the DFT as a function of the variable ω. If there
is only a single ω0 for which there is a sinusoidal signal present in the data,
the values of γ̂ obtained at frequencies other than ω0 provide estimates of
the noise power σ 2 , against which the value of γ̂ for ω0 can be compared.
25.5
The Case of Correlated Noise
We say that the noise is correlated if the covariance matrix Q is not a
multiple of the identity matrix. This means either that the zn are mutually
correlated (dependent, in the Gaussian case) or that they are uncorrelated,
but have different variances.
In this case, as we saw previously, the optimal vector b is
b=
1
Q−1 s
(s† Q−1 s)
and the gain is
maximum gain =
(s† Q−1 s) tr(Q)
.
(s† s)
How large or small the gain is depends on how the signal vector s relates
to the matrix Q.
For sinusoidal signals, the quantity s† s is the same, for all values of the
parameter ω; this is not always the case, however. In passive detection of
276
CHAPTER 25. SIGNAL DETECTION AND ESTIMATION
sources in acoustic array processing, for example, the signal vectors arise
from models of the acoustic medium involved. For far-field sources in an
(acoustically) isotropic deep ocean, planewave models for s will have the
property that s† s does not change with source location. However, for nearfield or shallow-water environments, this is usually no longer the case.
†
−1
s
achieves its maxIt follows from Exercise 25.3 that the quantity s Q
s† s
imum value when s is an eigenvector of Q associated with its smallest
eigenvalue, λN ; in this case, we are saying that the signal vector does not
look very much like a typical noise vector. The maximum gain is then
λ−1
N tr(Q). Since tr(Q) equals the sum of its eigenvalues, multiplying by
tr(Q) serves to normalize the gain, so that we cannot get larger gain simply
by having all the eigenvalues of Q small.
On the other hand, if s should be an eigenvector of Q associated with
its largest eigenvalue, say λ1 , then the maximum gain is λ−1
1 tr(Q). If
the noise is signal-like, that is, has one dominant eigenvalue, then tr(Q)
is approximately λ1 and the maximum gain is around one, so we have
lost the maximum gain of N we were able to get in the white-noise case.
This makes sense, in that it says that we cannot significantly improve our
ability to discriminate between signal and noise by taking more samples, if
the signal and noise are very similar.
25.5.1
Constant Signal with Unequal-Variance Uncorrelated Noise
Suppose that the vector s is constant; that is, s = 1 = (1, 1, ..., 1)T . Suppose also that the noise covariance matrix is Q = diag{σ1 , ..., σN }.
In this case the optimal vector b has entries
1
−1
σm
,
bm = PN
( n=1 σn−1 )
for m = 1, ..., N , and we have
N
X
1
−1
γ̂ = PN
σm
xm .
( n=1 σn−1 ) m=1
This is the BLUE estimate of γ in this case.
25.5.2
Sinusoidal signal, Frequency Known, in Correlated Noise
Suppose that
s = e(ω0 ) = (exp(−iω0 ), exp(−2iω0 ), ..., exp(−N iω0 ))T ,
25.5. THE CASE OF CORRELATED NOISE
277
where ω0 denotes a known frequency in [−π, π). In this case the optimal
vector b is
1
b=
Q−1 e(ω0 )
†
e(ω0 ) Q−1 e(ω0 )
and the gain is
maximum gain =
1
[e(ω0 )† Q−1 e(ω0 )]tr(Q).
N
How large or small the gain is depends on the quantity q(ω0 ), where
q(ω) = e(ω)† Q−1 e(ω).
The function 1/q(ω) can be viewed as a sort of noise power spectrum,
describing how the noise power appears when decomposed over the various
frequencies in [−π, π). The maximum gain will be large if this noise power
spectrum is relatively small near ω = ω0 ; however, when the noise is similar
to the signal, that is, when the noise power spectrum is relatively large
near ω = ω0 , the maximum gain can be small. In this case the noise power
spectrum plays a role analogous to that played by the eigenvalues of Q
earlier.
To see more clearly why it is that the function 1/q(ω) can be viewed
as a sort of noise power spectrum, consider what we get when we apply
the optimal filter associated with ω to data containing only noise. The
average output should tell us how much power there is in the component of
the noise that resembles e(ω); this is essentially what is meant by a noise
power spectrum. The result is b† z = (1/q(ω))e(ω)† Q−1 z. The expected
value of |b† z|2 is then 1/q(ω).
25.5.3
Sinusoidal Signal, Frequency Unknown, in Correlated Noise
Again, if we do not know the value of the signal frequency ω0 , a reasonable
thing to do is to calculate the γ̂ for each (actually, finitely many) of the
possible frequencies within [−π, π) and base the detection decision on the
largest value. For each ω the corresponding value of γ̂ is
γ̂(ω) = [1/(e(ω)† Q−1 e(ω))]
N
X
an exp(inω),
n=1
where a = (a1 , a2 , ..., aN )T satisfies the linear system Qa = x or a = Q−1 x.
It is interesting to note the similarity between this estimation procedure and
the PDFT discussed earlier; to see the connection, view [1/(e(ω)† Q−1 e(ω))]
in the role of P (ω) and Q its corresponding matrix of Fourier-transform
values. The analogy breaks down when we notice that Q need not be
Toeplitz, as in the PDFT case; however, the similarity is intriguing.
278
25.6
CHAPTER 25. SIGNAL DETECTION AND ESTIMATION
Capon’s Data-Adaptive Method
When the noise covariance matrix Q is not available, perhaps because we
cannot observe the background noise in the absence of any signals that may
also be present, we may use the signal-plus-noise covariance matrix R in
place of Q.
Exercise 25.4 Show that for
R = |γ|2 ss† + Q
maximizing the ratio
|b† s|2 /b† Rb
is equivalent to maximizing the ratio
|b† s|2 /b† Qb.
In [67] Capon offered a high-resolution method for detecting and resolving sinusoidal signals with unknown frequencies in noise. His estimator
has the form
1/e(ω)† R−1 e(ω).
(25.1)
The idea here is to fix an arbitrary ω, and then to find the vector b(ω) that
minimizes b(ω)† Rb(ω), subject to b(ω)† e(ω) = 1. The vector b(ω) turns
out to be
b(ω) =
1
R−1 e(ω).
e(ω)† R−1 e(ω)
(25.2)
Now we allow ω to vary and compute the expected output of the filter b(ω),
operating on the signal plus noise input. This expected output is then
1/e(ω)† R−1 e(ω).
(25.3)
The reason that this estimator resolves closely spaced delta functions better
than linear methods such as the DFT is that, when ω is fixed, we obtain an
optimal filter using R as the noise covariance matrix, which then includes
all sinusoids not at the frequency ω in the noise component. This is actually a good thing, since, when we are looking at a frequency ω that does
not correspond to a frequency actually present in the data, we want the
sinusoidal components present at nearby frequencies to be filtered out.
Part VIII
Appendices
279
Chapter 26
Appendix: Inner Products
26.1
Chapter Summary
Many methods for analyzing measured signals are based on the idea of
matching the data against various potential signals to see which ones match
best. The role of inner products in this matching approach is the topic of
this chapter.
26.2
Cauchy’s Inequality
The matching is done using the complex dot product, e†ω d. In the ideal
case this dot product is large, for those values of ω that correspond to an
actual component of the signal; otherwise it is small. Why this should
be the case is the Cauchy-Schwarz inequality (or sometimes, depending
on the context, just Cauchy’s inequality, just Schwarz’s inequality, or, in
the Russian literature, Bunyakovsky’s inequality). The proof of Cauchy’s
inequality rests on four basic properties of the complex dot product. These
properties can then be used to obtain the more general notion of an inner
product.
26.3
The Complex Vector Dot Product
Let u = (a, b) and v = (c, d) be two vectors in two-dimensional space. Let
u make √
the angle α > 0 with the positive x-axis and v the angle β > 0. Let
||u|| = a2 + b2 denote the length of the vector u. Then a = ||u|| cos α,
b = ||u|| sin α, c = ||v|| cos β and d = ||v|| sin β. So u · v = ac + bd =
281
282
CHAPTER 26. APPENDIX: INNER PRODUCTS
||u||||v||(cos α cos β + sin α sin β = ||u|| ||v|| cos(α − β). Therefore, we have
u · v = ||u|| ||v|| cos θ,
(26.1)
where θ = α − β is the angle between u and v. Cauchy’s inequality is
|u · v| ≤ ||u|| ||v||,
with equality if and only if u and v are parallel.
Cauchy’s inequality extends to vectors of any size with complex entries.
For example, the complex M
√ -dimensional vectors eω and eθ defined earlier
both have length equal to M and
|e†ω eθ | ≤ M,
with equality if and only if ω and θ differ by an integer multiple of π.
From Equation (26.1) we know that the dot product u · v is zero if and
only if the angle between these two vectors is a right angle; we say then
that u and v are mutually orthogonal. Orthogonality was at the core of our
first approach to signal analysis: the vectors ej and ek are orthogonal if
k 6= j. The notion of orthogonality is fundamental in signal processing, and
we shall return to it repeatedly in what follows. The idea of using the dot
product to measure how similar two vectors are is called matched filtering;
it is a popular method in signal detection and estimation of parameters.
Proof of Cauchy’s inequality: To prove Cauchy’s inequality for the
complex vector dot product, we write u · v = |u · v|eiθ . Let t be a real
variable and consider
0 ≤ ||e−iθ u − tv||2 = (e−iθ u − tv) · (e−iθ u − tv)
= ||u||2 − t[(e−iθ u) · v + v · (e−iθ u)] + t2 ||v||2
= ||u||2 − t[(e−iθ u) · v + (e−iθ u) · v] + t2 ||v||2
= ||u||2 − 2Re(te−iθ (u · v)) + t2 ||v||2
= ||u||2 − 2Re(t|u · v|) + t2 ||v||2 = ||u||2 − 2t|u · v| + t2 ||v||2 .
This is a nonnegative quadratic polynomial in the variable t, so it cannot have two distinct real roots. Therefore, the discriminant 4|u · v|2 −
4||v||2 ||u||2 must be non-positive; that is, |u · v|2 ≤ ||u||2 ||v||2 . This is
Cauchy’s inequality.
Exercise 26.1 Use Cauchy’s inequality to show that
||u + v|| ≤ ||u|| + ||v||;
this is called the triangle inequality.
26.4. ORTHOGONALITY
283
A careful examination of the proof just presented shows that we did not
explicitly use the definition of the complex vector dot product, but only
some of its properties. This suggested to mathematicians the possibility of
abstracting these properties and using them to define a more general concept, an inner product, between objects more general than complex vectors,
such as infinite sequences, random variables, and matrices. Such an inner
product can then be used to define the norm of these objects and thereby a
distance between such objects. Once we have an inner product defined, we
also have available the notions of orthogonality and best approximation.
We shall address all of these topics in a later chapter.
26.4
Orthogonality
Consider the problem of writing the two-dimensional real vector (3, −2) as
a linear combination of the vectors (1, 1) and (1, −1); that is, we want to
find constants a and b so that (3, −2) = a(1, 1) + b(1, −1). One way to do
this, of course, is to compare the components: 3 = a + b and −2 = a − b;
we can then solve this simple system for the a and b. In higher dimensions
this way of doing it becomes harder, however. A second way is to make
use of the dot product and orthogonality.
The dot product of two vectors (x, y) and (w, z) in R2 is (x, y) · (w, z) =
xw+yz. If the dot product is zero then the vectors are said to be orthogonal;
the two vectors (1, 1) and (1, −1) are orthogonal. We take the dot product
of both sides of (3, −2) = a(1, 1) + b(1, −1) with (1, 1) to get
1 = (3, −2) · (1, 1) = a(1, 1) · (1, 1) + b(1, −1) · (1, 1) = a(1, 1) · (1, 1) + 0 = 2a,
so we see that a = 21 . Similarly, taking the dot product of both sides with
(1, −1) gives
5 = (3, −2) · (1, −1) = a(1, 1) · (1, −1) + b(1, −1) · (1, −1) = 2b,
so b = 25 . Therefore, (3, −2) = 12 (1, 1) + 52 (1, −1). The beauty of this
approach is that it does not get much harder as we go to higher dimensions.
Since the cosine of the angle θ between vectors u and v is
cos θ = u · v/||u|| ||v||,
where ||u||2 = u · u, the projection of vector v on to the line through the
origin parallel to u is
u·v
u.
Proju (v) =
u·u
Therefore, the vector v can be written as
v = Proju (v) + (v − Proju (v)),
284
CHAPTER 26. APPENDIX: INNER PRODUCTS
where the first term on the right is parallel to u and the second one is
orthogonal to u.
How do we find vectors that are mutually orthogonal? Suppose we
begin with (1, 1). Take a second vector, say (1, 2), that is not parallel to
(1, 1) and write it as we did v earlier, that is, as a sum of two vectors,
one parallel to (1, 1) and the second orthogonal to (1, 1). The projection
of (1, 2) onto the line parallel to (1, 1) passing through the origin is
3
3 3
(1, 1) · (1, 2)
(1, 1) = (1, 1) = ( , )
(1, 1) · (1, 1)
2
2 2
so
3 3
3 3
3 3
1 1
(1, 2) = ( , ) + ((1, 2) − ( , )) = ( , ) + (− , ).
2 2
2 2
2 2
2 2
The vectors (− 21 , 12 ) = − 12 (1, −1) and, therefore, (1, −1) are then orthogonal to (1, 1). This approach is the basis for the Gram-Schmidt method for
constructing a set of mutually orthogonal vectors.
Exercise 26.2 Use the Gram-Schmidt approach to find a third vector in
R3 orthogonal to both (1, 1, 1) and (1, 0, −1).
Orthogonality is a convenient tool that can be exploited whenever we
have an inner product defined.
26.5
Generalizing the Dot Product: Inner
Products
The proof of Cauchy’s inequality rests not on the actual definition of the
complex vector dot product, but rather on four of its most basic properties.
We use these properties to extend the concept of the complex vector dot
product to that of inner product. Later in this chapter we shall give several
examples of inner products, applied to a variety of mathematical objects,
including infinite sequences, functions, random variables, and matrices.
For now, let us denote our mathematical objects by u and v and the inner
product between them as hu, vi . The objects will then be said to be
members of an inner-product space. We are interested in inner products
because they provide a notion of orthogonality, which is fundamental to
best approximation and optimal estimation.
Defining an inner product: The four basic properties that will serve to
define an inner product are:
1: hu, ui ≥ 0, with equality if and only if u = 0;
26.5. GENERALIZING THE DOT PRODUCT: INNER PRODUCTS285
2. hv, ui = hu, vi ;
3. hu, v + wi = hu, vi + hu, wi;
4. hcu, vi = chu, vi for any complex number c.
The inner product is the basic ingredient in Hilbert space theory. Using
the inner product, we define the norm of u to be
p
||u|| = hu, ui
and the distance between u and v to be ||u − v||.
The Cauchy-Schwarz inequality: Because these four properties were
all we needed to prove the Cauchy inequality for the complex vector dot
product, we obtain the same inequality whenever we have an inner product.
This more general inequality is the Cauchy-Schwarz inequality:
|hu, vi| ≤
p
p
hu, ui hv, vi
or
|hu, vi| ≤ ||u|| ||v||,
with equality if and only if there is a scalar c such that v = cu. We say
that the vectors u and v are orthogonal if hu, vi = 0. We turn now to
some examples.
Inner product of infinite sequences: Let u = {un } and v = {vn } be
infinite sequences of complex numbers. The inner product is then
X
hu, vi =
un vn ,
and
||u|| =
qX
|un |2 .
The sums are assumed to be finite; the index of summation n is singly or
doubly infinite, depending on the context. The Cauchy-Schwarz inequality
says that
qX
qX
X
|
un vn | ≤
|un |2
|vn |2 .
Inner product of functions: Now suppose that u = f (x) and v = g(x).
Then,
Z
hu, vi = f (x)g(x)dx
286
CHAPTER 26. APPENDIX: INNER PRODUCTS
and
sZ
||u|| =
|f (x)|2 dx.
The integrals are assumed to be finite; the limits of integration depend on
the support of the functions involved. The Cauchy-Schwarz inequality now
says that
sZ
sZ
Z
|
|f (x)|2 dx
f (x)g(x)dx| ≤
|g(x)|2 dx.
Inner product of random variables: Now suppose that u = X and
v = Y are random variables. Then,
hu, vi = E(XY )
and
||u|| =
p
E(|X|2 ),
which is the standard deviation of X if the mean of X is zero. The expected
values are assumed to be finite. The Cauchy-Schwarz inequality now says
that
p
p
|E(XY )| ≤ E(|X|2 ) E(|Y |2 ).
If E(X) = 0 and E(Y ) = 0, the random variables X and Y are orthogonal
if and only if they are uncorrelated.
Inner product of complex matrices: Now suppose that u = A and
v = B are complex matrices. Then,
hu, vi = trace(B † A)
and
||u|| =
q
trace(A† A),
where the trace of a square matrix is the sum of the entries on the main
diagonal. As we shall see later, this inner product is simply the complex
vector dot product of the vectorized versions of the matrices involved. The
Cauchy-Schwarz inequality now says that
q
q
|trace(B † A)| ≤ trace(A† A) trace(B † B).
26.6. THE ORTHOGONALITY PRINCIPLE
287
Weighted inner product of complex vectors: Let u and v be complex
vectors and let Q be a Hermitian positive-definite matrix; that is, Q† = Q
and u† Qu > 0 for all nonzero vectors u. The inner product is then
hu, vi = v† Qu
and
||u|| =
p
u† Qu.
We know from the eigenvector decomposition of Q that Q = C † C for some
matrix C. Therefore, the inner product is simply the complex vector dot
product of the vectors Cu and Cv. The Cauchy-Schwarz inequality says
that
p
p
|v† Qu| ≤ u† Qu v† Qv.
Weighted inner product of functions: Now suppose that u = f (x)
and v = g(x) and w(x) > 0. Then define
Z
hu, vi = f (x)g(x)w(x)dx
and
sZ
||u|| =
|f (x)|2 w(x)dx.
The integrals are assumed to be finite; the limits of integration depend on
the support of the functions involved.
is simply the
p This inner product
p
inner product of the functions f (x) w(x) and g(x) w(x). The CauchySchwarz inequality now says that
sZ
sZ
Z
|
f (x)g(x)w(x)dx| ≤
|f (x)|2 w(x)dx
|g(x)|2 w(x)dx.
Once we have an inner product defined, we can speak about orthogonality
and best approximation. Important in that regard is the orthogonality
principle.
26.6
The Orthogonality Principle
Imagine that you are standing and looking down at the floor. The point
B on the floor that is closest to N , the tip of your nose, is the unique
point on the floor such that the vector from B to any other point A on the
floor is perpendicular to the vector from N to B; that is, hBN, BAi = 0.
This is a simple illustration of the orthogonality principle. Whenever we
288
CHAPTER 26. APPENDIX: INNER PRODUCTS
have an inner product defined we can speak of orthogonality and apply the
orthogonality principle to find best approximations.
The orthogonality principle: Let u and v1 , ..., vN be members of an
inner-product space. For all choices of scalars a1 , ..., aN , we can compute
the distance from u to the member a1 v1 + ...aN vN . Then, we minimize
this distance over all choices of the scalars; let b1 , ..., bN be this best choice.
The orthogonality principle tells us that the member u − (b1 v1 + ...bN vN )
is orthogonal to the member (a1 v1 + ... + aN vN ) − (b1 v1 + ...bN vN ), that
is,
hu − (b1 v1 + ...bN vN ), (a1 v1 + ... + aN vN ) − (b1 v1 + ...bN vN ) = 0,
for every choice of scalars an . We can then use the orthogonality principle
to find the best choice b1 ., , , .bN .
For each fixed index value j in the set {1, ..., N }, let an = bn if j is not
equal to n and aj = bj + 1. Then we have
0 = hu − (b1 v1 + ...bN vN ), vj i,
or
hu, vj i =
N
X
bn hvn , vj i,
n=1
n
for each j. The v are known, so we can calculate the inner products
hvn , vj i and solve this system of equations for the best bn .
We shall encounter a number of particular cases of the orthogonality
principle in subsequent chapters. The example of the least-squares solution
of a system of linear equations provides a good example of the use of this
principle.
The least-squares solution: Let V a = u be a system of M linear equations in N unknowns. For n = 1, ..., N let vn be the nth column of the
matrix V . For any choice of the vector a with entries an , n = 1, ..., N , the
vector V a is
N
X
Va=
an v n .
n=1
Solving V a = u amounts to representing the vector u as a linear combination of the columns of V .
If there is no solution of V a = u then we can look for the best choice of
coefficients so as to minimize the distance ||u − (a1 v1 + ... + aN vN )||. The
matrix with entries hvn , vj i is V † V , and the vector with entries hu, vj i is
V † u. According to the orthogonality principle, we must solve the system
of equations V † u = V † V a, which leads to the least-squares solution.
26.6. THE ORTHOGONALITY PRINCIPLE
289
Exercise 26.3 Find polynomial functions f (x), g(x) and h(x) that are
orthogonal on the interval [0, 1] and have the property that every polynomial
of degree two or less can be written as a linear combination of these three
functions.
Exercise 26.4 Show that the functions einx , n an integer, are orthogonal
on the interval [−π, π]. Let f (x) have the Fourier expansion
f (x) =
∞
X
an einx , |x| ≤ π.
n=−∞
Use orthogonality to find the coefficients an .
We have seen that orthogonality can be used to determine the coefficients in the Fourier series representation of a function. There are other
useful representations in which orthogonality also plays a role; wavelets is
one example. Let f (x) be defined on the closed interval [0, X]. Suppose
that we change the function f (x) to a new function g(x) by altering the
values for x within a small interval, keeping the remaining values the same:
then all of the Fourier coefficients change. Looked at another way, a localized disturbance in the function f (x) affects all of its Fourier coefficients.
It would be helpful to be able to represent f (x) as a sum of orthogonal
functions in such a way that localized changes in f (x) affect only a small
number of the components in the sum. One way to do this is with wavelets,
as we shall see shortly.
290
CHAPTER 26. APPENDIX: INNER PRODUCTS
Chapter 27
Appendix: Reverberation
and Echo Cancellation
27.1
Chapter Summary
A nice application of Dirac delta function models is the problem of reverberation and echo cancellation, as discussed in [168]. The received signal
is viewed as a filtered version of the original and we want to remove the
effects of the filter, thereby removing the echo. This leads to the problem of
finding the inverse filter. A version of the echo cancellation problem arises
in telecommunications, as discussed in [208] and [207].
27.2
The Echo Model
Suppose that x(t) is the original transmitted signal and the received signal
is
y(t) = x(t) + αx(t − d),
(27.1)
where d > 0 is the delay present in the echo term. We assume that the
echo term is weaker than the original signal, so we make 0 < α < 1. With
the filter function h(t) defined by
h(t) = δ(t) + αδ(t − d) = δ(t) + αδd (t),
(27.2)
where δd (t) = δ(t − d), we can write y(t) as the convolution of x(t) and
h(t); that is,
y(t) = x(t) ∗ h(t).
291
(27.3)
292CHAPTER 27. APPENDIX: REVERBERATION AND ECHO CANCELLATION
A more general model is used to describe reverberation:
h(t) =
K
X
αk δ(t − dk ),
(27.4)
k=0
with α0 = 1, d0 = 0, and dk > 0 and 0 < αk < 1 for k = 1, 2, ..., K.
Our goal is to find a second filter, denoted hi (t), the inverse of h(t) in
Equation (27.2), such that
h(t) ∗ hi (t) = δ(t),
(27.5)
x(t) = y(t) ∗ hi (t).
(27.6)
and therefore
For now, we use trial and error to find hi (t); later we shall use the Fourier
transform.
27.3
Finding the Inverse Filter
As a first guess, let us try
g1 (t) = δ(t) − αδd (t).
(27.7)
Convolving g1 (t) with h(t), we get
h(t) ∗ g1 (t) = δ(t) ∗ δ(t) − α2 δd (t) ∗ δd (t).
(27.8)
We need to find out what δd (t) ∗ δd (t) is.
Exercise 27.1 Use the sifting property of the Dirac delta and the definition of convolution to show that
δd (t) ∗ δd (t) = δ2d (t).
The Fourier transform of δd (t) is the function exp(idω), so that the
Fourier transform of the convolution of δd (t) with itself is the square of
exp(idω), or exp(i(2d)ω). This tells us again that the convolution of δd (t)
with itself is δ2d (t). Therefore,
h(t) ∗ g1 (t) = δ(t) − α2 δ2d (t).
(27.9)
We do not quite have what we want, but since 0 < α < 1, the α2 is much
smaller than α.
Suppose that we continue down this path, and take for our next guess
the filter function g2 (t) given by
g2 (t) = δ(t) − αδd (t) + α2 δ2d (t).
(27.10)
27.4. USING THE FOURIER TRANSFORM
293
We then find that
h(t) ∗ g2 (t) = δ(t) + α3 δ3d (t);
(27.11)
the coefficient is α3 now, which is even smaller, and the delay in the echo
term has moved to 3d. We could continue along this path, but a final
solution is beginning to suggest itself.
Suppose that we define
gN (t) =
N
X
(−1)n αn δnd (t).
(27.12)
n=0
It would then follow that
h(t) ∗ gN (t) = δ(t) − (−1)N +1 αN +1 δ(N +1)d (t).
(27.13)
The coefficient αN +1 goes to zero and the delay goes to infinity, as N → ∞.
This suggests that the inverse filter should be the infinite sum
hi (t) =
∞
X
(−1)n αn δnd (t).
(27.14)
n=0
Then Equation (27.6) becomes
x(t) = y(t) − αy(t − d) + α2 y(t − 2d) − α3 y(t − 3d) + ....
(27.15)
Obviously, to remove the echo completely in this manner we need infinite
memory.
Exercise 27.2 Assume that x(t) = 0 for t < 0. Show that the problem of
removing the echo is simpler now.
27.4
Using the Fourier Transform
The Fourier transform of the filter function h(t) in Equation (27.2) is
H(ω) = 1 + α exp(idω).
(27.16)
h(t) ∗ hi (t) = δ(t),
(27.17)
H(ω)Hi (ω) = 1,
(27.18)
If we are to have
we must have
294CHAPTER 27. APPENDIX: REVERBERATION AND ECHO CANCELLATION
where Hi (ω) is the Fourier transform of the inverse filter function hi (t) that
we seek. It follows that
Hi (ω) = (1 + α exp(idω))−1 .
(27.19)
Recalling the formula for the sum of a geometric progression,
1 − r + r2 − r3 + ... =
1
,
1+r
(27.20)
for |r| < 1, we find that we can write
Hi (ω) = 1 − α exp(idω) + α2 exp(i(2d)ω) − α3 exp(i(3d)ω) + ..., (27.21)
which tells us that hi (t) is precisely as given in Equation (27.14).
27.5
The Teleconferencing Problem
In teleconferencing, each separate room is equipped with microphones for
transmitting to the other rooms and loudspeakers for broadcasting what the
people in the other rooms are saying. For simplicity, consider two rooms,
the transmitting room (TR), in which people are currently speaking, and
the receiving room (RR), where the people are currently listening to the
broadcast from the TR. The RR also has microphones and the problem
arises when the signal broadcast into the RR from the TR reaches the
microphones in the RR and is broadcast back into the TR. If it reaches
the microphones in the TR, it will be re-broadcast to the RR, creating an
echo, or worse.
The signal that reaches a microphone in the RR will depend on the
signals broadcast into the RR from the TR, as well as on the acoustics of
the RR and on the placement of the microphone in the RR; that is, it will
be a filtered version of what is broadcast into the RR. The hope is to be
able to estimate the filter, generate an approximation of what is about to be
re-broadcast, and subtract the estimate prior to re-broadcasting, thereby
reducing to near zero what is re-broadcast back to the TR.
In practice, all signals are viewed as discrete time series, and all filters
are taken to be finite impulse response (FIR) filters. Because the acoustics
of the RR are not known a priori, the filter that the RR imposes must
be estimated. This is done adaptively, by comparing vectors of samples
of the original transmissions with the filtered version that is about to be
re-broadcast, as described in [208].
Chapter 28
Appendix: Using Prior
Knowledge to Estimate
the Fourier Transform
28.1
Chapter Summary
A basic problem in signal processing is the estimation of the function F (ω)
from finitely many values of its inverse Fourier transform f (x). The DFT
is one such estimator. As we shall see in this chapter, there are other
estimators that are able to make better use of prior information about
F (ω) and thereby provide a better estimate.
28.2
Over-sampling
In our discussions above, we assumed that F (ω) = 0 for |ω| > Ω and that
π
∆= Ω
. In Figure 28.1 below, we show the DFT estimate for F (ω) for a
π
case in which Ω = 30
. This would tell us that the proper sampling spacing
is ∆ = 30. However, it is not uncommon to have situations in which x is
time and we can take as many samples of f (x) as we wish, but must take
the samples at points x within some limited time interval, say [0, A]. In the
case considered in the figure, A = 130. If we had used ∆ = 30, we would
have obtained only four data points, which is not sufficient information.
Instead, we used ∆ = 1 and took N = 129 data points; we over-sampled.
There is a price to be paid for over-sampling, however.
The DFT estimation procedure does not “know” about the true value
of Ω; it only “sees” ∆. It “assumes” incorrectly that Ω must be π, since
∆ = 1. Consequently, it “thinks” that we want it to estimate F (ω) on
295
296CHAPTER 28. APPENDIX: USING PRIOR KNOWLEDGE TO ESTIMATE THE FOURIER TRA
the interval [−π, π]. It doesn’t “know” that we know that F (ω) is zero on
most of this interval. Therefore, the DFT spends a lot of its energy trying
to describe the part of the graph of F (ω) where it is zero, and relatively
little of its energy describing what is happening within the interval [−Ω, Ω],
which is all that we are interested in. This is why the bottom graph in the
figure shows the DFT to be poor within [−Ω, Ω]. There is a second graph
in the figure. It looks quite a bit better. How was that graph obtained?
Figure 28.1: The non-iterative band-limited extrapolation method
(MDFT) (top) and the DFT (bottom) for N = 129, ∆ = 1 and Ω = π/30.
We know that F (ω) = 0 outside the interval [−Ω, Ω]. Can we somehow
let the estimation process know that we know this, so that it doesn’t waste
its energy outside this interval? Yes, we can.
The characteristic function of the interval [−Ω, Ω] is
1, if |ω| ≤ Ω ;
χΩ (ω) =
0, if |ω| > Ω .
We take as our estimator of F (ω) a function called the modified DFT,
(MDFT) having the form
M DF T (ω) = χΩ (ω)
N
−1
X
m=0
am eim∆ω .
(28.1)
28.3. USING OTHER PRIOR INFORMATION
297
We determine the coefficients am by making M DF T (ω) consistent with the
data. Inserting M DF T (ω) into the integral in Equation (8.2) and setting
x = n∆, for each n = 0, 1, ..., N − 1, in turn, we find that we must have
Z Ω
N −1
1 X
am
ei(m−n)∆ω dω.
f (n∆) =
2π m=0
−Ω
Performing the integration, we find that we need
f (n∆) =
N
−1
X
m=0
am
sin(Ω(n − m)∆)
,
π(n − m)∆
(28.2)
for n = 0, 1, ..., N − 1. We solve for the am and insert these coefficients into
the formula for the MDFT. The graph of the MDFT is the top graph in
the figure.
The main idea in the MDFT is to use a form of the estimator that already includes whatever important features of F (ω) we may know a priori.
In the case of the MDFT, we knew that F (ω) = 0 outside the interval
[−Ω, Ω], so we introduced a factor of χΩ (ω) in the estimator. Now, whatever coefficients we use, any estimator of the form given in Equation (28.1)
will automatically be zero outside [−Ω, Ω]. We are then free to select the
coefficients so as to make the MDFT consistent with the data. This involves
solving the system of linear equations in (28.2).
28.3
Using Other Prior Information
The approach that led to the MDFT estimate suggests that we can introduce other prior information besides the support of F (ω). For example,
if we have some idea of the overall shape of the function F (ω), we could
choose P (ω) > 0 to indicate this shape and use it instead of χΩ (ω) in our
estimator. This leads to the PDFT estimator, which has the form
P DF T (ω) = P (ω)
N
−1
X
bm eim∆ω .
(28.3)
n=0
Now we find the bm by forcing the right side of Equation (28.3) to be
consistent with the data. Inserting the function P DF T (ω) into the integral
in Equation (8.2), we find that we must have
Z ∞
N −1
1 X
f (n∆) =
bm
P (ω)ei(m−n)∆ω dω.
(28.4)
2π m=0
−∞
Using p(x), the inverse Fourier transform of P (ω), given by
Z ∞
1
p(x) =
P (ω)e−ixω dω,
2π −∞
298CHAPTER 28. APPENDIX: USING PRIOR KNOWLEDGE TO ESTIMATE THE FOURIER TRA
we find that we must have
f (n∆) =
N
−1
X
bm p((n − m)∆),
(28.5)
m=0
for n = 0, 1, ..., N − 1. We solve this system of equations for the bm and
insert them into the PDFT estimator in Equation (28.3).
In Figure 28.2 we have the function F (ω) in the upper left corner. It
consists of one large bump in the center and one smaller bump toward the
right side. The DFT on the upper right side gives only slight indication
that the smaller bump exists. The data here is somewhat over-sampled, so
we can try the MDFT. The prior for the MDFT is P (ω) = χΩ (ω), which
is pictured in the center left frame; it is shown only over [−Ω, Ω], where
it is just one. The MDFT estimate is in the center right frame; it shows
only slight improvement over the DFT. Now, suppose we know that there
is a large bump in the center. Both the DFT and the MDFT tell us clearly
that this is the case, so even if we did not know it at the start, we know it
now. Let’s select as our prior a function P (ω) that includes the big bump
in the center, as shown in the lower left. The PDFT on the lower right now
shows the smaller bump more clearly.
A more dramatic illustration of the use of the PDFT is shown in Figure
28.3. The function F (ω) is a function of two variables simulating a slice of a
head. It has been approximated by a discrete image, called here the “original” . The data was obtained by taking the two-dimensional vector DFT
of the discrete image and replacing most of its values with zeros. When
we formed the inverse vector DFT, we obtained the estimate in the lower
right. This is essentially the DFT estimate, and it tells us nothing about
the inside of the head. From prior information, or even from the DFT
estimate itself, we know that the true F (ω) includes a skull. We therefore
select as our prior the (discretized) function of two variables shown in the
upper left. The PDFT estimate is the image in the lower left. The important point to remember here is that the same data was used to generate
both pictures.
We saw previously how the MDFT can improve the estimate of F (ω),
by incorporating the prior information about its support. Precisely why
the improvement occurs is the subject of the next section.
28.4
Analysis of the MDFT
Let our data be f (xm ), m = 1, ..., M , where the xm are arbitrary values of
the variable x. If F (ω) is zero outside [−Ω, Ω], then minimizing the energy
28.4. ANALYSIS OF THE MDFT
299
over [−Ω, Ω] subject to data consistency produces an estimate of the form
FΩ (ω) = χΩ (ω)
M
X
bm exp(ixm ω),
m=1
with the bm satisfying the equations
f (xn ) =
M
X
m=1
bm
sin(Ω(xm − xn ))
,
π(xm − xn )
for n = 1, ..., M . The matrix SΩ with entries
matrix.
28.4.1
sin(Ω(xm −xn ))
π(xm −xn )
we call a sinc
Eigenvector Analysis of the MDFT
Although it seems reasonable that incorporating the additional information
about the support of F (ω) should improve the estimation, it would be more
convincing if we had a more mathematical argument to make. For that we
turn to an analysis of the eigenvectors of the sinc matrix. Throughout this
subsection we make the simplification that xn = n.
Exercise 28.1 The purpose of this exercise is to show that, for an Hermitian nonnegative-definite M by M matrix Q, a norm-one eigenvector u1
of Q associated with its largest eigenvalue, λ1 , maximizes the quadratic
form a† Qa over all vectors a with norm one. Let Q = U LU † be the
eigenvector decomposition of Q, where the columns of U are mutually orthogonal eigenvectors un with norms equal to one, so that U † U = I, and
L = diag{λ1 , ..., λM } is the diagonal matrix with the eigenvalues of Q as
its entries along the main diagonal. Assume that λ1 ≥ λ2 ≥ ... ≥ λM .
Then maximize
M
X
a† Qa =
λn |a† un |2 ,
n=1
subject to the constraint
a† a = a† U † U a =
M
X
|a† un |2 = 1.
n=1
Hint: Show a† Qa is a convex combination of the eigenvalues of Q.
300CHAPTER 28. APPENDIX: USING PRIOR KNOWLEDGE TO ESTIMATE THE FOURIER TRA
Exercise 28.2 Show that, for the sinc matrix Q = SΩ , the quadratic form
a† Qa in the previous exercise becomes
1
2π
a† SΩ a =
Z
Ω
|
M
X
an einω |2 dω.
−Ω n=1
Show that the norm of the vector a is the integral
1
2π
Z
π
|
M
X
an einω |2 dω.
−π n=1
Exercise 28.3 For M = 30 compute the eigenvalues of the matrix SΩ for
various choices of Ω, such as Ω = πk , for k = 2, 3, ..., 10. For each k arrange
the set of eigenvalues in decreasing order and note the proportion of them
that are not near zero. The set of eigenvalues of a matrix is sometimes
called its eigenspectrum and the nonnegative function χΩ (ω) is a power
spectrum; here is one time in which different notions of a spectrum are
related.
28.4.2
The Eigenfunctions of SΩ
Suppose that the vector u1 = (u11 , ..., u1M )T is an eigenvector of SΩ corresponding to the largest eigenvalue, λ1 . Associate with u1 the eigenfunction
U 1 (ω) =
M
X
u1n einω .
n=1
Then
Z
Ω
λ1 =
−Ω
|U 1 (ω)|2 dω/
Z
π
|U 1 (ω)|2 dω
−π
1
and U (ω) is the function of its form that is most concentrated within the
interval [−Ω, Ω].
Similarly, if uM is an eigenvector of SΩ associated with the smallest
eigenvalue λM , then the corresponding eigenfunction U M (ω) is the function
of its form least concentrated in the interval [−Ω, Ω].
Exercise 28.4 Plot for |ω| ≤ π the functions |U m (ω)| corresponding to
each of the eigenvectors of the sinc matrix SΩ . Pay particular attention to
the places where each of these functions is zero.
28.4. ANALYSIS OF THE MDFT
301
The eigenvectors of SΩ corresponding to different eigenvalues are orthogonal, that is (um )† un = 0 if m is not n. We can write this in terms of
integrals:
Z
π
U n (ω)U m (ω)dω = 0
−π
if m is not n. The mutual orthogonality of these eigenfunctions is related
to the locations of their roots, which were studied in the previous exercise.
Any Hermitian matrix Q is invertible if and only if none of its eigenvalues is zero. With λm and um , m = 1, ..., M , the eigenvalues and eigenvectors of Q, the inverse of Q can then be written as
Q−1 = (1/λ1 )u1 (u1 )† + ... + (1/λM )uM (uM )† .
Exercise 28.5 Show that the MDFT estimator given by Equation (28.1)
FΩ (ω) can be written as
FΩ (ω) = χΩ (ω)
M
X
1
(um )† d U m (ω),
λ
m
m=1
where d = (f (1), f (2), ..., f (M ))T is the data vector.
Exercise 28.6 Show that the DFT estimate of F (ω), restricted to the interval [−Ω, Ω], is
FDF T (ω) = χΩ (ω)
M
X
(um )† d U m (ω).
m=1
From these two exercises we can learn why it is that the estimate FΩ (ω)
resolves better than the DFT. The former makes more use of the eigenfunctions U m (ω) for higher values of m, since these are the ones for which
λm is closer to zero. Since those eigenfunctions are the ones having most of
their roots within the interval [−Ω, Ω], they have the most flexibility within
that region and are better able to describe those features in F (ω) that are
not resolved by the DFT.
302CHAPTER 28. APPENDIX: USING PRIOR KNOWLEDGE TO ESTIMATE THE FOURIER TRA
Figure 28.2: The DFT, the MDFT, and the PDFT.
28.4. ANALYSIS OF THE MDFT
Figure 28.3: The PDFT in image reconstruction.
303
304CHAPTER 28. APPENDIX: USING PRIOR KNOWLEDGE TO ESTIMATE THE FOURIER TRA
Chapter 29
Appendix: The Vector
Wiener Filter
29.1
Chapter Summary
The vector Wiener filter (VWF) provides another method for estimating
the vector x given noisy measurements z, where
z = Hx + v,
with x and v independent random vectors and H a known matrix. We
shall assume throughout this chapter that E(v) = 0 and let Q = E(vv† ).
29.2
The Vector Wiener Filter in Estimation
It is common to formulate the VWF in the context of filtering a signal
vector s from signal plus noise. The data is the vector
z = s + v,
and we want to estimate s. Each entry of our estimate of the vector s
will be a linear combination of the data values; that is, our estimate is
ŝ = B † z for some matrix B to be determined. This B will be called the
vector Wiener filter. To extract the signal from the noise, we must know
something about possible signals and possible noises. We consider several
stages of increasing complexity and correspondence with reality.
305
306
29.3
CHAPTER 29. APPENDIX: THE VECTOR WIENER FILTER
The Simplest Case
Suppose, initially, that all signals must have the form s = au, where a is
an unknown scalar and u is a known vector. Suppose that all noises must
have the form v = bw, where b is an unknown scalar and w is a known
vector. Then, to estimate s, we must find a. So long as J ≥ 2, we should
be able to solve for a and b. We form the two equations
u† z = au† u + bu† w
and
w† z = aw† u + bw† w.
This system of two equations in two unknowns will have a unique solution unless u and w are proportional, in which case we cannot expect to
distinguish signal from noise.
29.4
A More General Case
We move now to a somewhat more complicated model. Suppose that all
signals must have the form
s=
N
X
an un ,
n=1
where the an are unknown scalars and the un are known vectors. Suppose
that all noises must have the form
v=
M
X
bm w m ,
m=1
where the bm are unknown scalars and wm are known vectors. Then, to
estimate s, we must find the an . So long as J ≥ N + M , we should be able
to solve for the unique an and bm . However, we usually do not know a great
deal about the signal and the noise, so we find ourselves in the situation
in which the N and M are large. Let U be the J by N matrix whose nth
column is un and W the J by M matrix whose mth column is wm . Let V
be the J by N + M matrix whose first N columns contain U and whose
last M columns contain W ; so, V = [U W ]. Let c be the N + M by 1
column vector whose first N entries are the an and whose last M entries
are the bm . We want to solve z = V c. But this system of linear equations
has too many unknowns when N + M > J, so we seek the minimum norm
solution. In closed form this solution is
ĉ = V † (V V † )−1 z.
29.5. THE STOCHASTIC CASE
307
The matrix V V † = (U U † + W W † ) involves the signal correlation matrix
U U † and the noise correlation matrix W W † . Consider U U † . The matrix
U U † is J by J and the (i, j) entry of U U † is given by
†
U Uij
=
N
X
uni unj ,
n=1
so the matrix N1 U U † has for its entries the average, over all the n = 1, ..., N ,
of the product of the ith and jth entries of the vectors un . Therefore,
1
†
N U U is statistical information about the signal; it tells us how these
products look, on average, over all members of the family {un }, the ensemble, to use the statistical word.
29.5
The Stochastic Case
To pass to a more formal statistical framework, we let the coefficient vectors a = (a1 , a2 , ..., aN )T and b = (b1 , b2 , ..., bM )T be independent random white-noise vectors, both with mean zero and covariance matrices
E(aa† ) = I and E(bb† ) = I. Then,
U U † = E(ss† ) = Rs
and
W W † = E(vv† ) = Q = Rv .
The estimate of s is the result of applying the vector Wiener filter to the
vector z and is given by
ŝ = U U † (U U † + W W † )−1 z.
Exercise 29.1 Apply the vector Wiener filter to the simplest problem discussed earlier in the chapter on the BLUE; let N = 1 and assume that c is
a random variable with mean zero and variance one. It will help to use the
matrix-inversion identity
(Q + uu† )−1 = Q−1 − (1 + u† Q−1 u)−1 Q−1 uu† Q−1 .
29.6
(29.1)
The VWF and the BLUE
To apply the VWF to the problem considered in the discussion of the
BLUE, let the vector s be Hx. We assume, in addition, that the vector x
is a white-noise vector; that is, E(xx† ) = σ 2 I. Then, Rs = σ 2 HH † .
In the VWF approach we estimate s using
ŝ = B † z,
308
CHAPTER 29. APPENDIX: THE VECTOR WIENER FILTER
where the matrix B is chosen so as to minimize the mean squared error,
E||ŝ − s||2 . This is equivalent to minimizing
trace E((B † z − s)(B † z − s)† ).
Expanding the matrix products and using the previous definitions, we see
that we must minimize
trace (B † (Rs + Rv )B − Rs B − B † Rs + Rs ).
Differentiating with respect to the matrix B using Equations (34.1) and
(34.3), we find
(Rs + Rv )B − Rs = 0,
so that
B = (Rs + Rv )−1 Rs .
Our estimate of the signal component is then
ŝ = Rs (Rs + Rv )−1 z.
With s = Hx, our estimate of s is
ŝ = σ 2 HH † (σ 2 HH † + Q)−1 z,
and the VWF estimate of x is
x̂ = σ 2 H † (σ 2 HH † + Q)−1 z.
How does this estimate relate to the one we got from the BLUE?
The BLUE estimate of x is
x̂ = (H † Q−1 H)−1 H † Q−1 z.
From the matrix identity in Equation (24.5), we know that
(H † Q−1 H + σ −2 I)−1 H † Q−1 = σ 2 H † (σ 2 HH † + Q)−1 .
Therefore, the VWF estimate of x is
x̂ = (H † Q−1 H + σ −2 I)−1 H † Q−1 z.
Note that the BLUE estimate is unbiased and unaffected by changes in
the signal strength or the noise strength. In contrast, the VWF is not
unbiased and does depend on the signal-to-noise ratio; that is, it depends
on the ratio σ 2 /trace (Q). The BLUE estimate is the limiting case of the
VWF estimate, as the signal-to-noise ratio goes to infinity.
The BLUE estimates s = Hx by first finding the BLUE estimate of x
and then multiplying it by H to get the estimate of the signal s.
29.7. WIENER FILTERING OF FUNCTIONS
309
Exercise 29.2 Show that the mean-squared error in the estimation of s is
E(||ŝ − s||2 ) = trace (H(H † Q−1 H)−1 H † ).
The VWF finds the linear estimate of s = Hx that minimizes the meansquared error E(||ŝ − s||2 ). Consequently, the mean squared error in the
VWF is less than that in the BLUE.
Exercise 29.3 Assume that E(xx† ) = σ 2 I. Show that the mean squared
error for the VWF estimate is
E(||ŝ − s||2 ) = trace (H(H † Q−1 H + σ −2 I)−1 H † ).
29.7
Wiener Filtering of Functions
The Wiener filter is often presented in the context of random functions of,
say, time. In this model the signal is s(t) and the noise is q(t), where these
functions of time are viewed as random functions (stochastic processes).
The data is taken to be z(t), a function of t, so that the matrices U U †
and W W † are now infinite matrices; the discrete index j = 1, ..., J is now
replaced by the continuous index variable t. Instead of the finite family
{un , n = 1..., N }, we now have an infinite family of functions u(t) in U. The
entries of U U † are essentially the average values of the products u(t1 )u(t2 )
over all the members of U. It is often assumed that this average of products
is a function not of t1 and t2 separately, but only of their difference t1 − t2 ;
this is called stationarity. So, aver{u(t1 )u(t2 )} = rs (t1 − t2 ) comes from a
function rs (τ ) of a single variable. The Fourier transform of rs (τ ) is Rs (ω),
the signal power spectrum. The matrix U U † is then an infinite Toeplitz
matrix, constant on each diagonal. The Wiener filtering can actually be
achieved by taking Fourier transforms and multiplying and dividing by
power spectra, instead of inverting infinite matrices. It is also common to
discretize the time variable and to consider the Wiener filter operating on
infinite sequences, as we see in the next chapter.
310
CHAPTER 29. APPENDIX: THE VECTOR WIENER FILTER
Chapter 30
Appendix: Wiener Filter
Approximation
30.1
Chapter Summary
As we saw in the chapter on the vector Wiener filter, when the data is
a finite vector composed of signal plus noise the vector Wiener filter can
be used to estimate the signal component, provided we know something
about the possible signals and possible noises. In theoretical discussion
of filtering signal from signal plus noise, it is traditional to assume that
both components are doubly infinite sequences of random variables. In
this case the Wiener filter is a convolution filter that operates on the input
signal plus noise sequence to produce the output estimate of the signal-only
sequence. The derivation of the Wiener filter is in terms of the autocorrelation sequences of the two components, as well as their respective power
spectra.
30.2
The Discrete Stationary Case
Suppose now that the discrete stationary random process to be filtered is
the doubly infinite sequence {zn = sn + qn }∞
n=−∞ , where {sn } is the signal
component with autocorrelation function rs (k) = E(sn+k sn ) and power
spectrum Rs (ω) defined for ω in the interval [−π, π], and {qn } is the noise
component with autocorrelation function rq (k) and power spectrum Rq (ω)
defined for ω in [−π, π]. We assume that for each n the random variables
sn and qn have mean zero and that the signal and noise are independent
of one another. Then the autocorrelation function for the signal-plus-noise
sequence {zn } is
rz (n) = rs (n) + rq (n)
311
312CHAPTER 30. APPENDIX: WIENER FILTER APPROXIMATION
for all n and
Rz (ω) = Rs (ω) + Rq (ω).
is the signal-plus-noise power spectrum.
Let h = {hk }∞
k=−∞ be a linear filter with transfer function
∞
X
H(ω) =
hk eikω ,
k=−∞
for ω in [−π, π]. Given the sequence {zn } as input to this filter, the output
is the sequence
∞
X
yn =
hk zn−k .
(30.1)
k=−∞
The goal of Wiener filtering is to select the filter h so that the output sequence yn approximates the signal sn sequence as well as possible. Specifically, we seek h so as to minimize the expected squared error, E(|yn −sn |2 ),
which, because of stationarity, is independent of n. We have
E(|yn |2 ) =
∞
X
hk (
hj (rs (j − k) + rq (j − k)))
j=−∞
k=−∞
=
∞
X
∞
X
hk (rz ∗ h)k
k=−∞
which, by the Parseval equation, equals
Z
Z
1
1
H(ω)Rz (ω)H(ω)dω =
|H(ω)|2 Rz (ω)dω.
2π
2π
Similarly,
E(sn yn ) =
∞
X
hj rs (j),
j=−∞
which equals
1
2π
Z
Rs (ω)H(ω)dω,
and
E(|sn |2 ) =
1
2π
Z
Rs (ω)dω.
30.3. APPROXIMATING THE WIENER FILTER
313
Therefore,
Z
Z
1
1
2
E(|yn − sn | ) =
|H(ω)| Rz (ω)dω −
Rs (ω)H(ω)dω
2π
2π
Z
Z
1
1
Rs (ω)H(ω)dω +
Rs (ω)dω.
−
2π
2π
2
As we shall see shortly, minimizing E(|yn − sn |2 ) with respect to the function H(ω) leads to the equation
Rz (ω)H(ω) = Rs (ω),
so that the transfer function of the optimal filter is
H(ω) = Rs (ω)/Rz (ω).
The Wiener filter is then the sequence {hk } of the Fourier coefficients of
this function H(ω).
To prove that this choice of H(ω) minimizes E(|yn − sn |2 ), we note that
|H(ω)|2 Rz (ω) − Rs (ω)H(ω) − Rs (ω)H(ω) + Rs (ω)
= Rz |H(ω) − Rs (ω)/Rz (ω)|2 + Rs (ω) − Rs (ω)2 /Rz (ω).
Only the first term involves the function H(ω).
30.3
Approximating the Wiener Filter
Since H(ω) is a nonnegative function of ω, therefore real-valued, its Fourier
coefficients hk will be conjugate symmetric; that is, h−k = hk . This poses
a problem when the random process zn is a discrete time series, with zn
denoting the measurement recorded at time n. From Equation (30.1) we
see that to produce the output yn corresponding to time n we need the
input for every time, past and future. To remedy this we can obtain the
best causal approximation of the Wiener filter h.
A filter g = {gk }∞
k=−∞ is said to be causal if gk = 0 for k < 0; this
means that given the input sequence {zn }, the output
wn =
∞
X
gk zn−k =
k=−∞
∞
X
gk zn−k
k=0
requires only values of zm up to m = n. To obtain the causal filter g
that best approximates the Wiener filter, we find the coefficients gk that
minimize the quantity E(|yn − wn |2 ), or, equivalently,
Z
π
|H(ω) −
−π
+∞
X
k=0
gk eikω |2 Rz (ω)dω.
(30.2)
314CHAPTER 30. APPENDIX: WIENER FILTER APPROXIMATION
The orthogonality principle tells us that the optimal coefficients must satisfy the equations
rs (m) =
+∞
X
gk rz (m − k),
(30.3)
k=0
for all m. These are the Wiener-Hopf equations [181].
Even having a causal filter does not completely solve the problem, since
we would have to record and store the infinite past. Instead, we can decide
to use a filter f = {fk }∞
k=−∞ for which fk = 0 unless −K ≤ k ≤ L for
some positive integers K and L. This means we must store L values and
wait until time n + K to obtain the output for time n. Such a linear filter
is a finite memory, finite delay filter, also called a finite impulse response
(FIR) filter. Given the input sequence {zn } the output of the FIR filter is
vn =
L
X
fk zn−k .
k=−K
To obtain such an FIR filter f that best approximates the Wiener filter,
we find the coefficients fk that minimize the quantity E(|yn − vn |2 ), or,
equivalently,
Z
π
|H(ω) −
−π
L
X
fk eikω |2 Rz (ω)dω.
(30.4)
k=−K
The orthogonality principle tells us that the optimal coefficients must satisfy the equations
rs (m) =
L
X
fk rz (m − k),
(30.5)
k=−K
for −K ≤ m ≤ L.
In [52] it was pointed out that the linear equations that arise in Wienerfilter approximation also occur in image reconstruction from projections,
with the image to be reconstructed playing the role of the power spectrum
to be approximated. The methods of Wiener-filter approximation were
then used to derive linear and nonlinear image-reconstruction procedures.
30.4
Adaptive Wiener Filters
Once again, we consider a stationary random process zn = sn + vn with
autocorrelation function E(zn zn−m ) = rz (m) = rs (m) + rv (m). The finite
30.4. ADAPTIVE WIENER FILTERS
315
causal Wiener filter (FCWF) f = (f0 , f1 , ..., fL )T is convolved with {zn } to
produce an estimate of sn given by
ŝn =
L
X
fk zn−k .
k=0
With yn† = (zn , zn−1 , ..., zn−L ) we can write ŝn = yn† f . The FCWF f
minimizes the expected squared error
J(f ) = E(|sn − ŝn |2 )
and is obtained as the solution of the equations
rs (m) =
L
X
fk rz (m − k),
k=0
for 0 ≤ m ≤ L. Therefore, to use the FCWF we need the values rs (m) and
rz (m − k) for m and k in the set {0, 1, ..., L}. When these autocorrelation
values are not known, we can use adaptive methods to approximate the
FCWF.
30.4.1
An Adaptive Least-Mean-Square Approach
We assume now that we have z0 , z1 , ..., zN and p0 , p1 , ..., pN , where pn is a
prior estimate of sn , but that we do not know the correlation functions rz
and rs .
The gradient of the function J(f ) is
∇J(f ) = Rzz f − rs ,
where Rzz is the square matrix with entries rz (m − n) and rs is the vector
with entries rs (m). An iterative gradient descent method for solving the
system of equations Rzz f = rs is
fτ = fτ −1 − µτ ∇J(fτ −1 ),
for some step-size parameters µτ > 0.
The adaptive least-mean-square (LMS) approach [66] replaces the gradient of J(f ) with an approximation of the gradient of the function G(f ) =
|sn − ŝn |2 , which is −2(sn − ŝn )yn . Since we do not know sn , we replace
that term with the estimate pn . The iterative step of the LMS method is
fτ = fτ −1 + µτ (pτ − yτ† fτ −1 )yτ ,
(30.6)
for L ≤ τ ≤ N . Notice that it is the approximate gradient of the function
|sτ − ŝτ |2 that is used at this step, in order to involve all the data z0 , ..., zN
as we iterate from τ = L to τ = N . We illustrate the use of this method
in adaptive interference cancellation.
316CHAPTER 30. APPENDIX: WIENER FILTER APPROXIMATION
30.4.2
Adaptive Interference Cancellation (AIC)
Adaptive interference cancellation (AIC) [224] is used to suppress a dominant noise component vn in the discrete sequence zn = sn + vn . It is
assumed that we have available a good estimate qn of vn . The main idea
is to switch the roles of signal and noise in the adaptive LMS method and
design a filter to estimate vn . Once we have that estimate, we subtract it
from zn to get our estimate of sn .
In the role of zn we use
qn = vn + n ,
where n denotes a low-level error component. In the role of pn , we take
zn , which is approximately vn , since the signal sn is much lower than the
noise vn . Then, yn† = (qn , qn−1 , ..., qn−L ). The iterative step used to find
the filter f is then
fτ = fτ −1 + µτ (zτ − yτ† fτ −1 )yτ ,
for L ≤ τ ≤ N . When the iterative process has converged to f , we take as
our estimate of sn
L
X
fk qn−k .
ŝn = zn −
k=0
It has been suggested that this procedure be used in computerized tomography to correct artifacts due to patient motion [99].
30.4.3
Recursive Least Squares (RLS)
An alternative to the LMS method is to find the least squares solution of
the system of N − L + 1 linear equations
pn =
L
X
fk zn−k ,
k=0
for L ≤ n ≤ N . The recursive least squares (RLS) method is a recursive
approach to solving this system.
For L ≤ τ ≤ N let Zτ be the matrix whose rows are yn† for n = L, ..., τ ,
T
pτ = (pL , pL+1 , ..., pτ ) and Qτ = Zτ† Zτ . The least squares solution we seek
is
†
f = Q−1
N ZN pN .
Exercise 30.1 Show that Qτ = Qτ −1 + yτ yτ† , for L < τ ≤ N .
30.4. ADAPTIVE WIENER FILTERS
317
Exercise 30.2 Use the matrix-inversion identity in Equation (29.1) to
write Q−1
in terms of Q−1
τ
τ −1 .
Exercise 30.3 Using the previous exercise, show that the desired least
squares solution f is f = fN , where, for L ≤ τ ≤ N we let
fτ = fτ −1 + (
pτ − yτ† fτ −1
1+
yτ† Q−1
τ −1 yτ
)Q−1
τ −1 yτ .
Comparing this iterative step with that given by Equation (30.6), we see
that the former gives an explicit value for µτ and uses Q−1
τ −1 yτ instead of yτ
as the direction vector for the iterative step. The RMS iteration produces
a more accurate estimate of the FCWF than does the LMS method, but
requires more computation.
318CHAPTER 30. APPENDIX: WIENER FILTER APPROXIMATION
Chapter 31
Appendix: Fourier Series
and Analytic Functions
31.1
Chapter Summary
We first encounter infinite series expansions for functions in calculus when
we study Maclaurin and Taylor series. Fourier series are usually first met
in different contexts, such as partial differential equations and boundary
value problems. Laurent expansions come later when we study functions of
a complex variable. There are, nevertheless, important connections among
these different types of infinite series expansions, which provide the subject
for this chapter.
31.2
Laurent Series
Suppose that f (z) is analytic in an annulus containing the unit circle C =
{z | |z| = 1}. Then f (z) has a Laurent series expansion
f (z) =
∞
X
fn z n
n=−∞
valid for z within that annulus. Substituting z = eiθ , we get f (eiθ ), also
written as f (θ), defined for θ in the interval [−π, π] by
∞
X
f (θ) = f (eiθ ) =
fn einθ ;
n=−∞
here the Fourier series for f (θ) is derived from the Laurent series for the
analytic function f (z). If f (z) is actually analytic in (1 + )D, where
319
320CHAPTER 31. APPENDIX: FOURIER SERIES AND ANALYTIC FUNCTIONS
D = {z| |z| < 1} is the open unit disk, then f (z) has a Taylor series
expansion and the Fourier series for f (θ) contains only terms corresponding
to nonnegative n.
31.3
An Example
As an example, consider the rational function
1
1
5
1
−
f (z) =
= − /(z − )(z − 3).
z−3
2
2
z − 12
In an annulus containing the unit circle this function has the Laurent series
expansion
−1
∞
X
X
1
n+1 n
f (z) =
2
z +
( )n+1 z n ;
3
n=−∞
n=0
replacing z with eiθ , we obtain the Fourier series for the function f (θ) =
f (eiθ ) defined for θ in the interval [−π, π].
The function F (z) = 1/f (z) is analytic for all complex z, but because
it has a root inside the unit circle, its reciprocal, f (z), is not analytic in
a disk containing the unit circle. Consequently, the Fourier series for f (θ)
is doubly infinite. We saw in the chapter on complex varables that the
z−a
function G(z) = 1−az
has |G(eiθ )| = 1. With a = 2 and H(z) = F (z)G(z),
we have
1
H(z) = (z − 3)(z − 2),
5
and its reciprocal has the form
1/H(z) =
∞
X
an z n .
n=0
Because
G(eiθ )/H(eiθ ) = 1/F (eiθ ),
it follows that
|1/H(eiθ )| = |1/F (eiθ )| = |f (θ)|
and so
|f (θ)| = |
∞
X
an einθ |.
n=0
Multiplication by G(z) permits us to move a root from inside C to outside
C without altering the magnitude of the function’s values on C.
The relationships between functions defined on C and functions analytic (or harmonic) in D form the core of harmonic analysis [135]. The
factorization F (z) = H(z)/G(z) above is a special case of the inner-outer
factorization for functions in Hardy spaces; the function H(z) is an outer
function, and the functions G(z) and 1/G(z) are inner functions.
31.4. FEJÉR-RIESZ FACTORIZATION
31.4
321
Fejér-Riesz Factorization
Sometimes we start with an analytic function and restrict it to the unit
circle. Other times we start with a function f (eiθ ) defined on the unit
circle, or, equivalently, a function of the form f (θ) for θ in [−π, π], and
view this function as the restriction to the unit circle of a function that is
analytic in a region containing the unit circle. One application of this idea
is the Fejér-Riesz factorization theorem:
Theorem 31.1 Let h(eiθ ) be a finite trigonometric polynomial
N
X
h(eiθ ) =
hn einθ ,
n=−N
such that h(eiθ ) ≥ 0 for all θ in the interval [−π, π]. Then there is
y(z) =
N
X
yn z n
n=0
with h(eiθ ) = |y(eiθ )|2 . The function y(z) is unique if we require, in addition, that all its roots be outside D.
To prove this theorem we consider the function
h(z) =
N
X
hn z n ,
n=−N
which is analytic in an annulus containing the unit circle. The rest of the
proof is contained in the following exercise.
Exercise 31.1 Use the fact that h−n = hn to show that zj is a root of h(z)
if and only if 1/z j is also a root. From the nonnegativity of h(eiθ ), conclude
that if h(z) has a root on the unit circle then it has even multiplicity. Take
y(z) to be proportional to the product of factors z − zj for all the zj outside
D; for roots on C, include them with half their multiplicities.
31.5
Burg Entropy
The Fejér-Riesz theorem is used in the derivation of Burg’s maximum entropy method for spectrum estimation. The problem there is to estimate a
function R(θ) > 0 knowing only the values
Z π
1
rn =
R(θ)e−inθ dθ,
2π −π
322CHAPTER 31. APPENDIX: FOURIER SERIES AND ANALYTIC FUNCTIONS
for |n| ≤ N . The approach is to estimate R(θ)
R π by the function S(θ) > 0
that maximizes the so-called Burg entropy, −π log S(θ)dθ, subject to the
data constraints.
The Euler-Lagrange equation from the calculus of variations allows us
to conclude that S(θ) has the form
S(θ) = 1/
N
X
hn einθ .
n=−N
The function
h(θ) =
N
X
hn einθ
n=−N
is nonnegative, so, by the Fejér-Riesz theorem, it factors as h(θ) = |y(θ)|2 .
We then have S(θ)y(θ) = 1/y(θ). Since all the roots of y(z) lie outside D
and none are on C, the function 1/y(z) is analytic in a region containing C
and D so it has a Taylor series expansion in that region. Restricting this
Taylor series to C, we obtain a one-sided Fourier series having zero terms
for the negative indices.
Exercise 31.2 Show that the coefficients yn in y(z) satisfy a system of
linear equations whose coefficients are the rn .
Hint: Compare the coefficients of the terms on both sides of the equation
S(θ)y(θ) = 1/y(θ) that correspond to negative indices.
Chapter 32
Appendix: Inverse
Problems and the Laplace
Transform
32.1
Chapter Summary
In the farfield propagation examples considered previously, we found the
measured data to be related to the desired object function by a Fourier
transformation. The image reconstruction problem then became one of estimating a function from finitely many noisy values of its Fourier transform.
In this chapter we consider two inverse problems involving the Laplace
transform.
32.2
The Laplace Transform and the Ozone
Layer
The example is taken from Twomey’s book [218].
32.2.1
The Laplace Transform
The Laplace transform of the function f (x) defined for 0 ≤ x < +∞ is the
function
Z
F(s) =
+∞
f (x)e−sx dx.
0
323
(32.1)
324CHAPTER 32. APPENDIX: INVERSE PROBLEMS AND THE LAPLACE TRANSFORM
32.2.2
Scattering of Ultraviolet Radiation
The sun emits ultraviolet (UV) radiation that enters the Earth’s atmosphere at an angle θ0 that depends on the sun’s position, and with intensity
I(0). Let the x-axis be vertical, with x = 0 at the top of the atmosphere
and x increasing as we move down to the Earth’s surface, at x = X. The
intensity at x is given by
I(x) = I(0)e−kx/ cos θ0 .
(32.2)
Within the ozone layer, the amount of UV radiation scattered in the direction θ is given by
S(θ, θ0 )I(0)e−kx/ cos θ0 ∆p,
(32.3)
where S(θ, θ0 ) is a known parameter, and ∆p is the change in the pressure
of the ozone within the infinitesimal layer [x, x+∆x], and so is proportional
to the concentration of ozone within that layer.
32.2.3
Measuring the Scattered Intensity
The radiation scattered at the angle θ then travels to the ground, a distance
of X − x, weakened along the way, and reaches the ground with intensity
S(θ, θ0 )I(0)e−kx/ cos θ0 e−k(X−x)/ cos θ ∆p.
(32.4)
The total scattered intensity at angle θ is then a superposition of the intensities due to scattering at each of the thin layers, and is then
Z X
−kX/ cos θ0
S(θ, θ0 )I(0)e
e−xβ dp,
(32.5)
0
where
β = k[
1
1
].
−
cos θ0
cos θ
(32.6)
This superposition of intensity can then be written as
Z X
−kX/ cos θ0
S(θ, θ0 )I(0)e
e−xβ p0 (x)dx.
(32.7)
0
32.2.4
The Laplace Transform Data
Using integration by parts, we get
Z X
Z
e−xβ p0 (x)dx = p(X)e−βX − p(0) + β
0
0
X
e−βx p(x)dx.
(32.8)
32.3. THE LAPLACE TRANSFORM AND ENERGY SPECTRAL ESTIMATION325
Since p(0) = 0 and p(X) can be measured, our data is then the Laplace
transform value
Z
+∞
e−βx p(x)dx;
(32.9)
0
note that we can replace the upper limit X with +∞ if we extend p(x) as
zero beyond x = X.
The variable β depends on the two angles θ and θ0 . We can alter θ as
we measure and θ0 changes as the sun moves relative to the earth. In this
way we get values of the Laplace transform of p(x) for various values of β.
The problem then is to recover p(x) from these values. Because the Laplace
transform involves a smoothing of the function p(x), recovering p(x) from
its Laplace transform is more ill-conditioned than is the Fourier transform
inversion problem.
32.3
The Laplace Transform and Energy Spectral Estimation
In x-ray transmission tomography, x-ray beams are sent through the object
and the drop in intensity is measured. These measurements are then used
to estimate the distribution of attenuating material within the object. A
typical x-ray beam contains components with different energy levels. Because components at different energy levels will be attenuated differently,
it is important to know the relative contribution of each energy level to the
entering beam. The energy spectrum is the function f (E) that describes
the intensity of the components at each energy level E > 0.
32.3.1
The Attenuation Coefficient Function
Each specific material, say aluminum, for example, is associated with attenuation coefficients, which is a function of energy, which we shall denote
by µ(E). A beam with the single energy E passing through a thickness x of
the material will be weakened by the factor e−µ(E)x . By passing the beam
through various thicknesses x of aluminum and registering the intensity
drops, one obtains values of the absorption function
Z
R(x) =
∞
f (E)e−µ(E)x dE.
(32.10)
0
Using a change of variable, we can write R(x) as a Laplace transform.
326CHAPTER 32. APPENDIX: INVERSE PROBLEMS AND THE LAPLACE TRANSFORM
32.3.2
The Absorption Function as a Laplace Transform
For each material, the attenuation function µ(E) is a strictly decreasing
function of E, so µ(E) has an inverse, which we denote by g; that is,
g(t) = E, for t = µ(E). Equation (32.10) can then be rewritten as
Z ∞
f (g(t))e−tx g 0 (t)dt.
(32.11)
R(x) =
0
We see then that R(x) is the Laplace transform of the function r(t) =
f (g(t))g 0 (t). Our measurements of the intensity drops provide values of
R(x), for various values of x, from which we must estimate the functions
r(t), and, ultimately, f (E).
Chapter 33
Appendix: Matrix Theory
33.1
Chapter Summary
Matrices and their algebraic properties play an ever-increasing role in signal processing. In this chapter we outline the most important of these
properties.
33.2
Matrix Inverses
A square matrix A is said to have inverse A−1 provided that
AA−1 = A−1 A = I,
where I is the identity matrix. The 2 by 2 matrix A =
a
c
b
d
has an
inverse
A
−1
1
d −b
=
ad − bc −c a
whenever the determinant of A, det(A) = ad − bc is not zero. More generally, associated with every complex square matrix is the complex number
called its determinant, which is obtained from the entries of the matrix
using formulas that can be found in any text on linear algebra. The significance of the determinant is that the matrix is invertible if and only
if its determinant is not zero. This is of more theoretical than practical
importance, since no computer can tell when a number is precisely zero.
A matrix A that is not square cannot have an inverse, but does have a
pseudo-inverse, which is found using the singular-value decomposition.
327
328
33.3
CHAPTER 33. APPENDIX: MATRIX THEORY
Basic Linear Algebra
In this section we discuss systems of linear equations, Gaussian elimination,
and the notions of basic and non-basic variables.
33.3.1
Bases and Dimension
The notions of a basis and of linear independence are fundamental in linear
algebra. Let V be a vector space.
Definition 33.1 A collection of vectors {u1 , ..., uN } in V is linearly independent if there is no choice of scalars α1 , ..., αN , not all zero, such that
0 = α1 u1 + ... + αN uN .
(33.1)
Definition 33.2 The span of a collection of vectors {u1 , ..., uN } in V is
the set of all vectors x that can be written as linear combinations of the un ;
that is, for which there are scalars c1 , ..., cN , such that
x = c1 u1 + ... + cN uN .
(33.2)
Definition 33.3 A collection of vectors {w1 , ..., wN } in V is called a spanning set for a subspace S if the set S is their span.
Definition 33.4 A collection of vectors {u1 , ..., uN } in V is called a basis
for a subspace S if the collection is linearly independent and S is their span.
Definition 33.5 A collection of vectors {u1 , ..., uN } in an inner product
space V is called orthonormal if ||un ||2 = 1, for all n, and hum , un i = 0,
for m 6= n.
Suppose that S is a subspace of V, that {w1 , ..., wN } is a spanning set
for S, and {u1 , ..., uM } is a linearly independent subset of S. Beginning
with w1 , we augment the set {u1 , ..., uM } with wj if wj is not in the span of
the um and the wk previously included. At the end of this process, we have
a linearly independent spanning set, and therefore, a basis, for S (Why?).
Similarly, beginning with w1 , we remove wj from the set {w1 , ..., wN } if wj
is a linear combination of the wk , k = 1, ..., j − 1. In this way we obtain
a linearly independent set that spans S, hence another basis for S. The
following lemma will allow us to prove that all bases for a subspace S have
the same number of elements.
Lemma 33.1 Let W = {w1 , ..., wN } be a spanning set for a subspace S
in RI , and V = {v 1 , ..., v M } a linearly independent subset of S. Then
M ≤ N.
33.3. BASIC LINEAR ALGEBRA
329
Proof: Suppose that M > N . Let B0 = {w1 , ..., wN }. To obtain the set
B1 , form the set C1 = {v 1 , w1 , ..., wN } and remove the first member of C1
that is a linear combination of members of C1 that occur to its left in the
listing; since v 1 has no members to its left, it is not removed. Since W is
a spanning set, v 1 is a linear combination of the members of W , so that
some member of W is a linear combination of v 1 and the members of W
that precede it in the list; remove the first member of W for which this is
true.
We note that the set B1 is a spanning set for S and has N members.
Having obtained the spanning set Bk , with N members and whose first k
members are v k , ..., v 1 , we form the set Ck+1 = Bk ∪ {v k+1 }, listing the
members so that the first k + 1 of them are {v k+1 , v k , ..., v 1 }. To get the set
Bk+1 we remove the first member of Ck+1 that is a linear combination of
the members to its left; there must be one, since Bk is a spanning set, and
so v k+1 is a linear combination of the members of Bk . Since the set V is
linearly independent, the member removed is from the set W . Continuing
in this fashion, we obtain a sequence of spanning sets B1 , ..., BN , each with
N members. The set BN is BN = {v 1 , ..., v N } and v N +1 must then be
a linear combination of the members of BN , which contradicts the linear
independence of V .
Corollary 33.1 Every basis for a subspace S has the same number of elements.
Exercise 33.1 Let W = {w1 , ..., wN } be a spanning set for a subspace S
in RI , and V = {v 1 , ..., v M } a linearly independent subset of S. Let A be
the matrix whose columns are the v m , B the matrix whose columns are the
wn . Show that there is an N by M matrix C such that A = BC. Prove
Lemma 33.1 by showing that, if M > N , then there is a non-zero vector x
with Cx = Ax = 0.
Definition 33.6 The dimension of a subspace S is the number of elements
in any basis.
Lemma 33.2 For any matrix A, the maximum number of linearly independent rows equals the maximum number of linearly independent columns.
Proof: Suppose that A is an I by J matrix, and that K ≤ J is the
maximum number of linearly independent columns of A. Select K linearly
independent columns of A and use them as the K columns of an I by K
matrix U . Since every column of A must be a linear combination of these
K selected ones, there is a K by J matrix M such that A = U M . From
AT = M T U T we conclude that every column of AT is a linear combination
of the K columns of the matrix M T . Therefore, there can be at most K
linearly independent columns of AT .
330
CHAPTER 33. APPENDIX: MATRIX THEORY
Definition 33.7 The rank of A is the maximum number of linearly independent rows or of linearly independent columns of A.
33.3.2
Systems of Linear Equations
Consider the system of three linear equations in five unknowns given by
x1
−x1
x1
+2x2
−x2
+2x2
+x3
−3x3
+2x4
+x4
−x4
+x5
−2x5
=0
= 0.
=0
(33.3)
This system can be written in matrix form as Ax = 0, with A the coefficient
matrix


1
2
0
2
1
A =  −1 −1 1
1
0 ,
(33.4)
1
2 −3 −1 −2
and x = (x1 , x2 , x3 , x4 , x5 )T . Applying Gaussian elimination to this system, we obtain a second, simpler, system with the same solutions:
−2x4
+2x4
+x4
x1
x2
x3
+x5
+x5
=0
= 0.
=0
(33.5)
From this simpler system we see that the variables x4 and x5 can be freely
chosen, with the other three variables then determined by this system of
equations. The variables x4 and x5 are then independent, the others dependent. The variables x1 , x2 and x3 are then called basic variables. To
obtain a basis of solutions we can let x4 = 1 and x5 = 0, obtaining the
solution x = (2, −2, −1, 1, 0)T , and then choose x4 = 0 and x5 = 1 to get
the solution x = (−1, 0, −1, 0, 1)T . Every solution to Ax = 0 is then a
linear combination of these two solutions. Notice that which variables are
basic and which are non-basic is somewhat arbitrary, in that we could have
chosen as the non-basic variables any two whose columns are independent.
Having decided that x4 and x5 are the non-basic variables, we can write
the original matrix A as A = [ B N ], where B is the square invertible
matrix


1
2
0
B =  −1 −1 1  ,
(33.6)
1
2 −3
and N is the matrix

2
N = 1
−1

1
0 .
−2
(33.7)
33.3. BASIC LINEAR ALGEBRA
331
With xB = (x1 , x2 , x3 )T and xN = (x4 , x5 )T we can write
Ax = BxB + N xN = 0,
(33.8)
xB = −B −1 N xN .
(33.9)
so that
33.3.3
Real and Complex Systems of Linear Equations
A system Ax = b of linear equations is called a complex system, or a real
system if the entries of A, x and b are complex, or real, respectively. For any
matrix A, we denote by AT and A† the transpose and conjugate transpose
of A, respectively.
Any complex system can be converted to a real system in the following
way. A complex matrix A can
√ be written as A = A1 + iA2 , where A1 and
A2 are real matrices and i = −1. Similarly, x = x1 + ix2 and b = b1 + ib2 ,
where x1 , x2 , b1 and b2 are real vectors. Denote by à the real matrix
A1 −A2
à =
,
(33.10)
A2 A1
by x̃ the real vector
x̃ =
x1
,
x2
(33.11)
b1
.
b2
(33.12)
and by b̃ the real vector
b̃ =
Then x satisfies the system Ax = b if and only if x̃ satisfies the system
Ãx̃ = b̃.
Definition 33.8 A square matrix A is symmetric if AT = A and Hermitian if A† = A.
Definition 33.9 A non-zero vector x is said to be an eigenvector of the
square matrix A if there is a scalar λ such that Ax = λx. Then λ is said
to be an eigenvalue of A.
If x is an eigenvector of A with eigenvalue λ, then the matrix A − λI
has no inverse, so its determinant is zero; here I is the identity matrix with
ones on the main diagonal and zeros elsewhere. Solving for the roots of the
332
CHAPTER 33. APPENDIX: MATRIX THEORY
determinant is one way to calculate the eigenvalues of A. For example, the
eigenvalues of the Hermitian matrix
1
2+i
B=
(33.13)
2−i
1
√
√
= 1 − 5, with corresponding eigenvectors u =
are
√ λ = 1 T+ 5 and λ √
( 5, 2 − i) and v = ( 5, i − 2)T , respectively. Then B̃ has the same
eigenvalues, but both with multiplicity two. Finally, the associated eigenvectors of B̃ are
1
u
,
(33.14)
u2
and
for λ = 1 +
√
−u2
,
u1
(33.15)
5, and
v1
,
v2
(33.16)
−v 2
,
v1
(33.17)
and
for λ = 1 −
33.4
√
5.
Solutions of Under-determined Systems
of Linear Equations
Suppose that Ax = b is a consistent linear system of M equations in
N unknowns, where M < N . Then there are infinitely many solutions.
A standard procedure in such cases is to find that solution x having the
smallest norm
v
uN
uX
||x|| = t
|xn |2 .
n=1
As we shall see shortly, the minimum norm solution of Ax = b is a vector
of the form x = A† z, where A† denotes the conjugate transpose of the
matrix A. Then Ax = b becomes AA† z = b. Typically, (AA† )−1 will
exist, and we get z = (AA† )−1 b, from which it follows that the minimum
33.4. SOLUTIONS OF UNDER-DETERMINED SYSTEMS OF LINEAR EQUATIONS333
norm solution is x = A† (AA† )−1 b. When M and N are not too large,
forming the matrix AA† and solving for z is not prohibitively expensive
and time-consuming. However, in image processing the vector x is often a
vectorization of a two-dimensional (or even three-dimensional) image and
M and N can be on the order of tens of thousands or more. The ART
algorithm gives us a fast method for finding the minimum norm solution
without computing AA† .
We begin by proving that the minimum norm solution of Ax = b has
the form x = A† z for some M -dimensional complex vector z.
Let the null space of the matrix A be all N -dimensional complex vectors
w with Aw = 0. If Ax = b then A(x + w) = b for all w in the null space
of A. If x = A† z and w is in the null space of A, then
||x + w||2 = ||A† z + w||2 = (A† z + w)† (A† z + w)
= (A† z)† (A† z) + (A† z)† w + w† (A† z) + w† w
= ||A† z||2 + (A† z)† w + w† (A† z) + ||w||2
= ||A† z||2 + ||w||2 ,
since
w† (A† z) = (Aw)† z = 0† z = 0
and
(A† z)† w = z† Aw = z† 0 = 0.
Therefore, ||x + w|| = ||A† z + w|| > ||A† z|| = ||x|| unless w = 0. This
completes the proof.
Exercise 33.2 Show that if z = (z1 , ..., zN )T is a column vector with complex entries and H = H † is an N by N Hermitian matrix with complex entries then the quadratic form z† Hz is a real number. Show that
the quadratic form z† Hz can be calculated using only real numbers. Let
z = x + iy, with x and y real vectors and let H = A + iB, where A and
B are real matrices. Then show that AT = A, B T = −B, xT Bx = 0 and
finally,
A −B
x
z† Hz = [ xT yT ]
.
B A
y
Use the fact that z† Hz is real for every vector z to conclude that the eigenvalues of H are real.
334
33.5
CHAPTER 33. APPENDIX: MATRIX THEORY
Eigenvalues and Eigenvectors
Given N by N complex matrix A, we say that a complex number λ is an
eigenvalue of A if there is a nonzero vector u with Au = λu. The column
vector u is then called an eigenvector of A associated with eigenvalue λ;
clearly, if u is an eigenvector of A, then so is cu, for any constant c 6= 0.
If λ is an eigenvalue of A, then the matrix A − λI fails to have an inverse,
since (A − λI)u = 0 but u 6= 0. If we treat λ as a variable and compute
the determinant of A − λI, we obtain a polynomial of degree N in λ. Its
roots λ1 , ..., λN are then the eigenvalues of A. If ||u||2 = u† u = 1 then
u† Au = λu† u = λ.
It can be shown that it is possible to find a set of N mutually orthogonal
eigenvectors of the Hermitian matrix H; call them {u1 , ..., uN }. The matrix
H can then be written as
H=
N
X
λn un (un )† ,
n=1
a linear superposition of the dyad matrices un (un )† . We can also write H =
U LU † , where U is the matrix whose nth column is the column vector un
and L is the diagonal matrix with the eigenvalues down the main diagonal
and zero elsewhere.
The matrix H is invertible if and only if none of the λ are zero and its
inverse is
N
X
n
n †
λ−1
H −1 =
n u (u ) .
n=1
−1
−1
†
We also have H = U L U .
A Hermitian matrix Q is said to be nonnegative-definite (positivedefinite) if all the eigenvalues of Q are nonnegative (positive). The matrix
Q is a nonnegative-definite matrix if and only if there is another matrix
C such that Q = C † C. Since the eigenvalues
of Q are nonnegative, the
√
diagonal matrix L has a square root, L. Using the fact that U † U = I,
we have
√
√
Q = U LU † = U LU † U LU † ;
√
we then take C = U LU † , so C † = C. Then z† Qz = z† C † Cz = ||Cz||2 ,
so that Q is positive-definite if and only if C is invertible.
Exercise 33.3 Let A be an M by N matrix with complex entries. View
A as a linear function with domain C N , the space of all N -dimensional
complex column vectors, and range contained within C M , via the expression
A(x) = Ax. Suppose that M > N . The range of A, denoted R(A), cannot
be all of C M . Show that every vector z in C M can be written uniquely in
33.6. VECTORIZATION OF A MATRIX
335
the form z = Ax + w, where A† w = 0. Show that kzk2 = kAxk2 + kwk2 ,
where kzk2 denotes the square of the norm of z.
Hint: If z = Ax + w then consider A† z. Assume A† A is invertible.
33.6
Vectorization of a Matrix
When the complex M by N matrix A is stored in the computer it is usually
vectorized; that is, the matrix
A11
 A21

 .
A=
 .

.
AM 1

A12
A22
...
...

A1N
A2N 





AM 2
...
AM N
becomes
vec(A) = (A11 , A21 , ..., AM 1 , A12 , A22 , ..., AM 2 , ..., AM N )T .
Exercise 33.4 (a) Show that the complex dot product vec(A)· vec(B) =
vec(B)† vec(A) can be obtained by
vec(A)· vec(B) = trace (AB † ) = tr(AB † ),
where, for a square matrix C, trace (C) means the sum of the entries along
the main diagonal of C. We can therefore use the trace to define an inner
product between matrices: < A, B >= trace (AB † ).
(b) Show that trace (AA† ) ≥ 0 for all A, so that we can use the trace to
define a norm on matrices: ||A||2 = trace (AA† ).
Exercise 33.5 Let B = U LD† be an M by N matrix in diagonalized form;
that is, L is an M by N diagonal matrix with entries λ1 , ..., λK on its main
diagonal, where K = min(M, N ), and U and V are square matrices. Let
the n-th column of U be denoted un and similarly for the columns of V .
Such a diagonal decomposition occurs in the singular value decomposition
(SVD). Show that we can write
B = λ1 u1 (v1 )† + ... + λK uK (vK )† .
336
CHAPTER 33. APPENDIX: MATRIX THEORY
If B is an N by N Hermitian matrix, then we can take U = V and K =
M = N , with the columns of U the eigenvectors of B, normalized to
have Euclidean norm equal to one, and the λn to be the eigenvalues of
B. In this case we may also assume that U is a unitary matrix; that is,
U U † = U † U = I, where I denotes the identity matrix.
33.7
The Singular Value Decomposition (SVD)
We have just seen that an N by N Hermitian matrix H can be written in
terms of its eigenvalues and eigenvectors as H = U LU † or as
H=
N
X
λn un (un )† .
n=1
The singular value decomposition (SVD) is a similar result that applies to
any rectangular matrix. It is an important tool in image compression and
pseudo-inversion.
33.7.1
The SVD
Let C be any N by K complex matrix. In presenting the SVD of C we
shall assume that K ≥ N ; the SVD of C † will come from that of C. Let
A = C † C and B = CC † ; we assume, reasonably, that B, the smaller of
the two matrices, is invertible, so all the eigenvalues λ1 , ..., λN of B are
positive. Then, write the eigenvalue/eigenvector decomposition of B as
B = U LU † .
Exercise 33.6 Show that the nonzero eigenvalues of A and B are the
same.
Let V be the K by K matrix whose first N columns are those of the
matrix C † U L−1/2 and whose remaining K − N columns are any mutually
orthogonal norm-one vectors that are all orthogonal to each of the first
N
√ columns. Let M be the N by K matrix with diagonal entries Mnn =
λn for n = 1,
√ ..., N and whose remaining entries are zero. The nonzero
entries of M , λn , are called the singular values of C. The singular value
decomposition (SVD) of C is C = U M V † . The SVD of C † is C † = V M T U † .
Exercise 33.7 Show that U M V † equals C.
33.7. THE SINGULAR VALUE DECOMPOSITION (SVD)
337
Using the SVD of C we can write
C=
N p
X
λn un (vn )† ,
(33.18)
n=1
where vn denotes the nth column of the matrix V .
33.7.2
Using the SVD in Image Compression
In image processing, matrices such as C are used to represent discrete twodimensional images, with the entries of C corresponding to the grey level
or color at each pixel. It is common to find that most of the N singular
values of C are nearly zero, so that C can be written approximately as a
sum of far fewer than N dyads; this is SVD image compression.
Figures 33.1 and 33.2 illustrate what can be achieved with SVD compression. In both Figure the original is in the upper left. It is a 128 by
128 digitized image, so M = 128. In the images that follow, the number of
terms retained in the sum in Equation (33.18) is, first, 2, then 4, 6, 8, 10, 20
and finally 30. The full sum has 128 terms, remember. In Figure 33.1 the
text is nearly readable using only 10 terms, and certainly could be made
perfectly readable with suitable software, so storing just this compressed
image would be acceptable. In Figure 33.2, an image of a satellite, we get
a fairly good idea of the general shape of the object from the beginning,
with only two terms.
33.7.3
An Application in Space Exploration
The Galileo was deployed from the space shuttle Atlantis on October 18,
1989. After a detour around Venus and back past Earth to pick up gravityassisted speed, Galileo headed for Jupiter. Its mission included a study of
Jupiter’s moon Europa, and the plan was to send back one high-resolution
photo per minute, at a rate of 134 KB per second, via a huge high-gain
antenna. When the time came to open the antenna, it stuck. Without the
pictures, the mission would be a failure.
There was a much smaller low-gain antenna on board, but the best
transmission rate was going to be ten bits per second. All that could be
done from earth was to reprogram an old on-board computer to compress
the pictures prior to transmission. The problem was that pictures could
be taken much faster than they could be transmitted to earth; some way
to store them prior to transmission was key. The original designers of
the software had long since retired, but the engineers figured out a way to
introduce state-of-the art image compression algorithms into the computer.
It happened that there was an ancient reel-to-reel storage device on board
that was there only to serve as a backup for storing atmospheric data.
338
CHAPTER 33. APPENDIX: MATRIX THEORY
Using this device and the compression methods, the engineers saved the
mission [12].
33.7.4
Pseudo-Inversion
If N 6= K then C cannot have an inverse; it does, however, have a pseudoinverse, C ∗ = V M ∗ U † , where M ∗ is the matrix obtained from M by taking
the inverse of each of its nonzero entries and leaving the remaining zeros
the same. The pseudo-inverse of C † is
(C † )∗ = (C ∗ )† = U (M ∗ )T V † = U (M † )∗ V † .
Some important properties of the pseudo-inverse are the following:
1. CC ∗ C = C,
2. C ∗ CC ∗ = C ∗ ,
3. (C ∗ C)† = C ∗ C,
4. (CC ∗ )† = CC ∗ .
The pseudo-inverse of an arbitrary I by J matrix G can be used in much
the same way as the inverse of nonsingular matrices to find approximate or
exact solutions of systems of equations Gx = d. The following examples
illustrate this point.
Exercise 33.8 If I > J the system Gx = d probably has no exact solution.
Show that whenever G† G is invertible the pseudo-inverse of G is G∗ =
(G† G)−1 G† so that the vector x = G∗ d is the least squares approximate
solution.
Exercise 33.9 If I < J the system Gx = d probably has infinitely many
solutions. Show that whenever the matrix GG† is invertible the pseudoinverse of G is G∗ = G† (GG† )−1 , so that the vector x = G∗ d is the exact
solution of Gx = d closest to the origin; that is, it is the minimum norm
solution.
33.8. SINGULAR VALUES OF SPARSE MATRICES
33.8
339
Singular Values of Sparse Matrices
In image reconstruction from projections the M by N matrix A is usually
quite large and often -sparse; that is, most of its elements do not exceed in absolute value, where denotes a small positive quantity. In transmission
tomography each column of A corresponds to a single pixel in the digitized
image, while each row of A corresponds to a line segment through the
object, along which an x-ray beam has traveled. The entries of a given
row of A are nonzero only for those columns whose associated pixel lies on
that line segment; clearly, most of the entries of any given row of A will
then be zero. In emission tomography the I by J nonnegative matrix P
has entries Pij ≥ 0; for each detector i and pixel j, Pij is the probability
that an emission at the jth pixel will be detected at the ith detector.
When a detection is recorded at the ith detector, we want the likely source
of the emission to be one of only a small number of pixels. For single
photon emission tomography (SPECT), a lead collimator is used to permit
detection of only those photons approaching the detector straight on. In
positron emission tomography (PET), coincidence detection serves much
the same purpose. In both cases the probabilities Pij will be zero (or
nearly zero) for most combinations of i and j. Such matrices are called
sparse (or almost sparse). We discuss now a convenient estimate for the
largest singular value of an almost sparse matrix A, which, for notational
convenience only, we take to be real.
In [44] it was shown that if A is normalized so that each row has length
one, then the spectral radius of AT A, which is the square of the largest
singular value of A itself, does not exceed the maximum number of nonzero
elements in any column of A. A similar upper bound on ρ(AT A) can be
obtained for non-normalized, -sparse A.
Let A be an M by N matrix. For each n = 1, ..., N , let sn > 0 be
the number of nonzero entries in the nth column of A, and let s be the
maximum of the sn . Let G be the M by N matrix with entries
N
X
sl A2ml )1/2 .
Gmn = Amn /(
l=1
Lent has shown that the eigenvalues of the matrix GT G do not exceed one
[159]. This result suggested the following proposition, whose proof was
given in [44].
Proposition
33.1 Let A be an M by N matrix. For each
PM m = 1, ..., M let
PN
νm = n=1 A2mn > 0. For each n = 1, ..., N let σn = m=1 emn νm , where
emn = 1 if Amn 6= 0 and emn = 0 otherwise. Let σ denote the maximum
of the σn . Then the eigenvalues of the matrix AT A do not exceed σ. If A
is normalized so that the Euclidean length of each of its rows is one, then
340
CHAPTER 33. APPENDIX: MATRIX THEORY
the eigenvalues of AT A do not exceed s, the maximum number of nonzero
elements in any column of A.
Proof: For simplicity, we consider only the normalized case; the proof for
the more general case is similar.
Let AT Av = cv for some nonzero vector v. We show that c ≤ s. We
have AAT Av = cAv and so wT AAT w = vT AT AAT Av = cvT AT Av =
cwT w, for w = Av. Then, with emn = 1 if Amn 6= 0 and emn = 0
otherwise, we have
(
M
X
2
Amn wm ) = (
m=1
M
X
m=1
≤(
M
X
2
A2mn wm
)(
m=1
(
Amn emn wm )2
M
X
M
X
e2mn ) =
m=1
2
A2mn wm
)sj ≤ (
m=1
M
X
2
A2mn wm
)s.
m=1
Therefore,
wT AAT w =
N X
M
N X
M
X
X
2
(
Amn wm )2 ≤
(
A2mn wm
)s,
n=1 m=1
and
wT AAT w = c
M
X
n=1 m=1
2
wm
=c
m=1
=c
M X
N
X
M
X
2
wm
(
m=1
N
X
A2mn )
n=1
2
wm
A2mn .
m=1 n=1
The result follows immediately.
If we normalize A so that its rows have length one, then the trace of
the matrix AAT is tr(AAT ) = M , which is also the sum of the eigenvalues
of AT A. Consequently, the maximum eigenvalue of AT A does not exceed
M ; this result improves that upper bound considerably, if A is sparse and
so s << M .
In image reconstruction from projection data that includes scattering we
often encounter matrices A most of whose entries are small, if not exactly
zero. A slight modification of the proof provides us with a useful upper
bound for L, the largest eigenvalue of AT A, in such cases. Assume that
the rows of A have length one. For > 0 let s be the largest number of
entries in any column of A whose magnitudes exceed . Then we have
L ≤ s + M N 2 + 2(M N s)1/2 .
The proof of this result is similar to that for Proposition 33.1.
33.8. SINGULAR VALUES OF SPARSE MATRICES
Figure 33.1: Compressing text with the SVD.
341
342
CHAPTER 33. APPENDIX: MATRIX THEORY
Figure 33.2: Compressing an image with the SVD.
Chapter 34
Appendix: Matrix and
Vector Differentiation
34.1
Chapter Summary
The notation associated with matrix and vector algebra is designed to
reduce the number of things we have to think about as we perform our
calculations. This notation can be extended to multi-variable calculus, as
we show in this chapter.
34.2
Functions of Vectors and Matrices
As we saw in the previous chapter, the least squares approximate solution
of Ax = b is a vector x̂ that minimizes the function ||Ax − b||. In our discussion of band-limited extrapolation we showed that, for any nonnegative
definite matrix Q, the vector having norm one that maximizes the quadratic
form x† Qx is an eigenvector of Q associated with the largest eigenvalue.
In the chapter on best linear unbiased optimization we seek a matrix that
minimizes a certain function. All of these examples involve what we can
call matrix-vector differentiation, that is, the differentiation of a function
with respect to a matrix or a vector. The gradient of a function of several
variables is a well-known example and we begin there. Since there is some
possibility of confusion, we adopt the notational convention that boldfaced
symbols, such as x, indicate a column vector, while x denotes a scalar.
343
344CHAPTER 34. APPENDIX: MATRIX AND VECTOR DIFFERENTIATION
34.3
Differentiation with Respect to a Vector
Let x = (x1 , ..., xN )T be an N -dimensional real column vector. Let z =
f (x) be a real-valued function of the entries of x. The derivative of z with
respect to x, also called the gradient of z, is the column vector
∂z
= a = (a1 , ..., aN )T
∂x
with entries
an =
∂z
.
∂xn
Exercise 34.1 Let y be a fixed real column vector and z = f (x) = yT x.
Show that
∂z
= y.
∂x
Exercise 34.2 Let Q be a real symmetric nonnegative definite matrix, and
let z = f (x) = xT Qx. Show that the gradient of this quadratic form is
∂z
= 2Qx.
∂x
Hint: Write Q as a linear combination of dyads involving the eigenvectors.
Exercise 34.3 Let z = ||Ax − b||2 . Show that
∂z
= 2AT Ax − 2AT b.
∂x
Hint: Use z = (Ax − b)T (Ax − b).
We can also consider the second derivative of z = f (x), which is the
Hessian matrix of z
∂2z
H=
= ∇2 f (x)
∂x2
with entries
∂2z
Hmn =
.
∂xm ∂xn
If the entries of the vector z = (z1 , ..., zM )T are real-valued functions of
the vector x, the derivative of z is the matrix whose mth column is the
derivative of the real-valued function zm . This matrix is usually called the
Jacobian matrix of z. If M = N the determinant of the Jacobian matrix is
the Jacobian.
34.4. DIFFERENTIATION WITH RESPECT TO A MATRIX
345
Exercise 34.4 Suppose (u, v) = (u(x, y), v(x, y)) is a change of variables
from the Cartesian (x, y) coordinate system to some other (u, v) coordinate
system. Let x = (x, y)T and z = (u(x), v(x))T .
• (a) Calculate the Jacobian for the rectangular coordinate system obtained by rotating the (x, y) system through an angle of θ.
• (b) Calculate the Jacobian for the transformation from the (x, y)
system to polar coordinates.
34.4
Differentiation with Respect to a Matrix
Now we consider real-valued functions z = f (A) of a real matrix A. As an
example, for square matrices A we have
z = f (A) = trace (A) =
N
X
Ann ,
n=1
the sum of the entries along the main diagonal of A.
The derivative of z = f (A) is the matrix
∂z
=B
∂A
whose entries are
Bmn =
∂z
.
∂Amn
Exercise 34.5 Show that the derivative of trace (A) is B = I, the identity
matrix.
Exercise 34.6 Show that the derivative of z = trace (DAC) with respect
to A is
∂z
= DT C T .
∂A
(34.1)
Consider the function f defined for all J by J positive-definite symmetric matrices by
f (Q) = − log det(Q).
Proposition 34.1 The gradient of f (Q) is g(Q) = −Q−1 .
(34.2)
346CHAPTER 34. APPENDIX: MATRIX AND VECTOR DIFFERENTIATION
Proof: Let ∆Q be symmetric. Let γj , for j = 1, 2, ..., J, be the eigenvalues
of the symmetric matrix Q−1/2 (∆Q)Q−1/2 . These γj are then real and
are also the eigenvalues of the matrix Q−1 (∆Q). We shall consider k∆Qk
small, so we may safely assume that 1 + γj > 0.
Note that
J
X
hQ−1 , ∆Qi =
γj ,
j=1
since the trace of any square matrix is the sum of its eigenvalues. Then we
have
f (Q + ∆Q) − f (Q) = − log det(Q + ∆Q) + log det(Q)
= − log det(I + Q−1 (∆Q)) = −
J
X
log(1 + γj ).
j=1
From the submultiplicativity of the Frobenius norm we have
kQ−1 (∆Q)k/kQ−1 k ≤ k∆Qk ≤ kQ−1 (∆Q)kkQk.
Therefore, taking the limit as k∆Qk goes to zero is equivalent to taking
the limit as kγk goes to zero, where γ is the vector whose entries are the
γj .
To show that g(Q) = −Q−1 note that
f (Q + ∆Q) − f (Q) − h−Q−1 , ∆Qi
k∆Qk
k∆Qk→0
lim sup
| − log det(Q + ∆Q) + log det(Q) + hQ−1 , ∆Qi|
k∆Qk
k∆Qk→0
= lim sup
PJ
≤ lim sup
j=1
kγk/kQ−1 k
kγk→0
≤ kQ−1 k
J
X
j=1
| log(1 + γj ) − γj |
lim
γj →0
γj − log(1 + γj )
= 0.
|γj |
We note in passing that the derivative of det(DAC) with respect to A
is the matrix det(DAC)(A−1 )T .
Although the trace is not independent of the order of the matrices in a
product, it is independent of cyclic permutation of the factors:
trace (ABC) = trace (CAB) = trace (BCA).
34.4. DIFFERENTIATION WITH RESPECT TO A MATRIX
347
Therefore, the trace is independent of the order for the product of two
matrices:
trace (AB) = trace (BA).
From this fact we conclude that
xT x = trace (xT x) = trace (xxT ).
If x is a random vector with correlation matrix
R = E(xxT ),
then
E(xT x) = E(trace (xxT )) = trace (E(xxT )) = trace (R).
We shall use this trick in the chapter on detection.
Exercise 34.7 Let z = trace (AT CA). Show that the derivative of z with
respect to the matrix A is
∂z
= CA + C T A.
∂A
(34.3)
Therefore, if C = Q is symmetric, then the derivative is 2QA.
We have restricted the discussion here to real matrices and vectors. It
often happens that we want to optimize a real quantity with respect to a
complex vector. We can rewrite such quantities in terms of the real and
imaginary parts of the complex values involved, to reduce everything to
the real case just considered. For example, let Q be a hermitian matrix;
then the quadratic form k† Qk is real, for any complex vector k. As we saw
in Exercise 33.2, we can write the quadratic form entirely in terms of real
matrices and vectors.
If w = u + iv is a complex number with real part u and imaginary part
v, the function z = f (w) = |w|2 is real-valued. The derivative of z = f (w)
with respect to the complex variable w does not exist. When we write
z = u2 + v 2 , we consider z as a function of the real vector x = (u, v)T . The
derivative of z with respect to x is the vector (2u, 2v)T .
Similarly, when we consider the real quadratic form k† Qk, we view each
of the complex entries of the N by 1 vector k as two real numbers forming a
two-dimensional real vector. We then differentiate the quadratic form with
respect to the 2N by 1 real vector formed from these real and imaginary
parts. If we turn the resulting 2N by 1 real vector back into an N by
1 complex vector, we get 2Qk as the derivative; so, it appears as if the
formula for differentiating in the real case carries over to the complex case.
348CHAPTER 34. APPENDIX: MATRIX AND VECTOR DIFFERENTIATION
34.5
Eigenvectors and Optimization
We can use these results concerning differentiation with respect to a vector
to show that eigenvectors solve certain optimization problems.
Consider the problem of maximizing the quadratic form x† Qx, subject
to x† x = 1; here the matrix Q is Hermitian, positive-definite, so that all
of its eigenvalues are positive. We use the Lagrange-multiplier approach,
with the Lagrangian
L(x, λ) = x† Qx − λx† x,
where the scalar variable λ is the Lagrange multiplier. We differentiate
L(x, λ) with respect to x and set the result equal to zero, obtaining
2Qx − 2λx = 0,
or
Qx = λx.
Therefore, x is an eigenvector of Q and λ is its eigenvalue. Since
x† Qx = λx† x = λ,
we conclude that λ = λ1 , the largest eigenvalue of Q, and x = u1 , a
norm-one eigenvector associated with λ1 .
Now consider the problem of maximizing x† Qx, subject to x† x = 1,
and x† u1 = 0. The Lagrangian is now
L(x, λ, α) = x† Qx − λx† x − αx† u1 .
Differentiating with respect to the vector x and setting the result equal to
zero, we find that
2Qx − 2λx − αu1 = 0,
or
Qx = λx + βu1 ,
for β = α/2. But, we know that
(u1 )† Qx = λ(u1 )† x + β(u1 )† u1 = β,
and
(u1 )† Qx = (Qu1 )† x = λ1 (u1 )† x = 0,
so β = 0 and we have
Qx = λx.
Since
x† Qx = λ,
34.5. EIGENVECTORS AND OPTIMIZATION
349
we conclude that x is a norm-one eigenvector of Q associated with the
second-largest eigenvalue, λ = λ2 .
Continuing in this fashion, we can show that the norm-one eigenvector
of Q associated with the nth largest eigenvalue λn maximizes the quadratic
form x† Qx, subject to the constraints x† x = 1 and x† um = 0, for m =
1, 2, ..., n − 1.
350CHAPTER 34. APPENDIX: MATRIX AND VECTOR DIFFERENTIATION
Chapter 35
Appendix: Compressed
Sensing
35.1
Chapter Summary
One area that has attracted much attention lately is compressed sensing or
compressed sampling (CS) [101]. For applications such as medical imaging,
CS may provide a means of reducing radiation dosage to the patient without
sacrificing image quality. An important aspect of CS is finding sparse
solutions of under-determined systems of linear equations, which can often
be accomplished by one-norm minimization. The best reference to date is
probably [27].
35.2
Compressed Sensing
The objective in CS is exploit sparseness to reconstruct a vector f in RJ
from relatively few linear functional measurements [101].
Let U = {u1 , u2 , ..., uJ } and V = {v 1 , v 2 , ..., v J } be two orthonormal
bases for RJ , with all members of RJ represented as column vectors. For
i = 1, 2, ..., J, let
µi = max {|hui , v j i|}
1≤j≤J
and
µ(U, V ) = max µi .
1≤i≤J
We know from Cauchy’s Inequality that
|hui , v j i| ≤ 1,
351
352
CHAPTER 35. APPENDIX: COMPRESSED SENSING
and from Parseval’s Equation
J
X
|hui , v j i|2 = ||ui ||2 = 1.
j=1
Therefore, we have
1
√ ≤ µ(U, V ) ≤ 1.
J
The quantity µ(U, V ) is the coherence measure of the two bases; the closer
µ(U, V ) is to the lower bound of √1J , the more incoherent the two bases
are.
Let f be a fixed member of RJ ; we expand f in the V basis as
f = x1 v 1 + x2 v 2 + ... + xJ v J .
We say that the coefficient vector x = (x1 , ..., xJ ) is S-sparse if S is the
number of non-zero xj .
If S is small, most of the xj are zero, but since we do not know which
ones these are, we would have to compute all the linear functional values
xj = hf, v j i
to recover f exactly. In fact, the smaller S is, the harder it would be to
learn anything from randomly selected xj , since most would be zero. The
idea in CS is to obtain measurements of f with members of a different
orthonormal basis, which we call the U basis. If the members of U are very
much like the members of V , then nothing is gained. But, if the members of
U are quite unlike the members of V , then each inner product measurement
yi = hf, ui i = f T ui
should tell us something about f . If the two bases are sufficiently incoherent, then relatively few yi values should tell us quite a bit about f .
Specifically, we have the following result due to Candès and Romberg [64]:
suppose the coefficient vector x for representing f in the V basis is S-sparse.
Select uniformly randomly M ≤ J members of the U basis and compute
the measurements yi = hf, ui i . Then, if M is sufficiently large, it is highly
probable that z = x also solves the problem of minimizing the one-norm
||z||1 = |z1 | + |z2 | + ... + |zJ |,
subject to the conditions
yi = hg, ui i = g T ui ,
for those M randomly selected ui , where
g = z1 v 1 + z2 v 2 + ... + zJ v J .
The smaller µ(U, V ) is, the smaller the M is permitted to be without
reducing the probability of perfect reconstruction.
35.3. SPARSE SOLUTIONS
35.3
353
Sparse Solutions
Suppose that A is a real M by N matrix, with M < N , and that the linear
system Ax = b has infinitely many solutions. For any vector x, we define
the support of x to be the subset S of {1, 2, ..., N } consisting of those n
for which the entries xn 6= 0. For any under-determined system Ax = b,
there will, of course, be at least one solution of minimum support, that is,
for which |S|, the size of the support set S, is minimum. However, finding
such a maximally sparse solution requires combinatorial optimization, and
is known to be computationally difficult. It is important, therefore, to have
a computationally tractable method for finding maximally sparse solutions.
35.3.1
Maximally Sparse Solutions
Consider the problem P0 : among all solutions x of the consistent system
b = Ax, find one, call it x̂, that is maximally sparse, that is, has the
minimum number of non-zero entries. Obviously, there will be at least
one such solution having minimal support, but finding one, however, is a
combinatorial optimization problem and is generally NP-hard.
35.3.2
Minimum One-Norm Solutions
Instead, we can seek a minimum one-norm solution, that is, we can solve
the problem P1 : minimize
||x||1 =
N
X
|xn |,
n=1
subject to Ax = b. Denote the solution by x∗ . Problem P1 can be formulated as a linear programming problem, so is more easily solved. The
big questions are: when does P1 have a unique solution x∗ , and when is
x∗ = x̂? The problem P1 will have a unique solution if and only if A is
such that the one-norm satisfies
||x∗ ||1 < ||x∗ + v||1 ,
for all non-zero v in the null space of A.
35.3.3
Minimum One-Norm as an LP Problem
The entries of x need not be non-negative, so the problem is not yet a linear
programming problem. Let
B = [A
−A ] ,
354
CHAPTER 35. APPENDIX: COMPRESSED SENSING
and consider the linear programming problem of minimizing the function
cT z =
2J
X
zj ,
j=1
subject to the constraints z ≥ 0, and Bz = b. Let z ∗ be the solution. We
write
∗
u
∗
z =
.
v∗
Then, as we shall see, x∗ = u∗ − v ∗ minimizes the one-norm, subject to
Ax = b.
First, we show that u∗j vj∗ = 0, for each j. If, say, there is a j such that
0 < vj∗ < u∗j , then we can create a new vector z by replacing the old u∗j
with u∗j − vj∗ and the old vj∗ with zero, while maintaining Bz = b. But then,
since u∗j −vj∗ < u∗j +vj∗ , it follows that cT z < cT z ∗ , which is a contradiction.
Consequently, we have kx∗ k1 = cT z ∗ .
Now we select any x with Ax = b. Write uj = xj , if xj ≥ 0, and uj = 0,
otherwise. Let vj = uj − xj , so that x = u − v. Then let
u
z=
.
v
Then b = Ax = Bz, and cT z = kxk1 . Consequently,
kx∗ k1 = cT z ∗ ≤ cT z = kxk1 ,
and x∗ must be a minimum one-norm solution.
35.3.4
Why the One-Norm?
When a system of linear equations Ax = b is under-determined, we can
find the minimum-two-norm solution that minimizes the square of the twonorm,
N
X
||x||22 =
x2n ,
n=1
subject to Ax = b. One drawback to this approach is that the two-norm
penalizes relatively large values of xn much more than the smaller ones,
so tends to provide non-sparse solutions. Alternatively, we may seek the
solution for which the one-norm,
||x||1 =
N
X
n=1
|xn |,
35.3. SPARSE SOLUTIONS
355
is minimized. The one-norm still penalizes relatively large entries xn more
than the smaller ones, but much less than the two-norm does. As a result,
it often happens that the minimum one-norm solution actually solves P0
as well.
35.3.5
Comparison with the PDFT
The PDFT approach to solving the under-determined system Ax = b is to
select weights wn > 0 and then to find the solution x̃ that minimizes the
weighted two-norm given by
N
X
|xn |2 wn .
n=1
Our intention is to select weights wn so that wn−1 is reasonably close to
|x∗n |; consider, therefore, what happens when wn−1 = |x∗n |. We claim that x̃
is also a minimum-one-norm solution.
To see why this is true, note that, for any x, we have
N
X
N
X
|x | p ∗
pn
|xn |
|x∗n |
n=1
n=1
v
v
uN
uN
u X |xn |2 u X
t
t
|x∗ |.
≤
|x∗n | n=1 n
n=1
|xn | =
Therefore,
v
v
uN
uN
u X |x̃n |2 u X
t
t
|x̃n | ≤
|x∗n |
∗|
|x
n
n=1
n=1
n=1
v
v
uN
uN
N
u X |x∗ |2 u X
X
n t
∗| =
≤t
|x
|x∗n |.
n
∗|
|x
n
n=1
n=1
n=1
N
X
Therefore, x̃ also minimizes the one-norm.
35.3.6
Iterative Reweighting
Let x be the truth. Generally, we want each weight wn to be a good
prior estimate of the reciprocal of |xn |. Because we do not yet know x,
we may take a sequential-optimization approach, beginning with weights
wn0 > 0, finding the PDFT solution using these weights, then using this
PDFT solution to get a (we hope!) better choice for the weights, and so
on. This sequential approach was successfully implemented in the early
1980’s by Michael Fiddy and his students [111].
356
CHAPTER 35. APPENDIX: COMPRESSED SENSING
In [65], the same approach is taken, but with respect to the one-norm.
Since the one-norm still penalizes larger values disproportionately, balance
can be achieved by minimizing a weighted-one-norm, with weights close to
the reciprocals of the |xn |. Again, not yet knowing x, they employ a sequential approach, using the previous minimum-weighted-one-norm solution to
obtain the new set of weights for the next minimization. At each step of
the sequential procedure, the previous reconstruction is used to estimate
the true support of the desired solution.
It is interesting to note that an on-going debate among users of the
PDFT concerns the nature of the prior weighting. Does wn approximate
|xn |−1 or |xn |−2 ? This is close to the issue treated in [65], the use of a
weight in the minimum-one-norm approach.
It should be noted again that finding a sparse solution is not usually
the goal in the use of the PDFT, but the use of the weights has much the
same effect as using the one-norm to find sparse solutions: to the extent
that the weights approximate the entries of x̂, their use reduces the penalty
associated with the larger entries of an estimated solution.
35.4
Why Sparseness?
One obvious reason for wanting sparse solutions of Ax = b is that we have
prior knowledge that the desired solution is sparse. Such a problem arises
in signal analysis from Fourier-transform data. In other cases, such as in
the reconstruction of locally constant signals, it is not the signal itself, but
its discrete derivative, that is sparse.
35.4.1
Signal Analysis
Suppose that our signal f (t) is known to consist of a small number of
complex exponentials, so that f (t) has the form
f (t) =
J
X
aj eiωj t ,
j=1
for some small number of frequencies ωj in the interval [0, 2π). For n =
0, 1, ..., N − 1, let fn = f (n), and let f be the N -vector with entries fn ;
we assume that J is much smaller than N . The discrete (vector) Fourier
transform of f is the vector fˆ having the entries
N −1
1 X
fn e2πikn/N ,
fˆk = √
N n=0
for k = 0, 1, ..., N −1; we write fˆ = Ef , where E is the N by N matrix with
entries Ekn = √1N e2πikn/N . If N is large enough, we may safely assume
35.4. WHY SPARSENESS?
357
that each of the ωj is equal to one of the frequencies 2πik and that the
vector fˆ is J-sparse. The question now is: How many values of f (n) do we
need to calculate in order to be sure that we can recapture f (t) exactly?
We have the following theorem [63]:
Theorem 35.1 Let N be prime. Let S be any subset of {0, 1, ..., N − 1}
with |S| ≥ 2J. Then the vector fˆ can be uniquely determined from the
measurements fn for n in S.
We know that
f = E † fˆ,
where E † is the conjugate transpose of the matrix E. The point here is
that, for any matrix R obtained from the identity matrix I by deleting
N − |S| rows, we can recover the vector fˆ from the measurements Rf .
If N is not prime, then the assertion of the theorem may not hold, since
we can have n = 0 mod N , without n = 0. However, the assertion remains
valid for most sets of J frequencies and most subsets S of indices; therefore,
with high probability, we can recover the vector fˆ from Rf .
Note that the matrix E is unitary, that is, E † E = I, and, equivalently,
the columns of E form an orthonormal basis for C N . The data vector is
b = Rf = RE † fˆ.
In this example, the vector f is not sparse, but can be represented sparsely
in a particular orthonormal basis, namely as f = E † fˆ, using a sparse vector
fˆ of coefficients. The representing basis then consists of the columns of the
matrix E † . The measurements pertaining to the vector f are the values
fn , for n in S. Since fn can be viewed as the inner product of f with δ n ,
the nth column of the identity matrix I, that is,
fn = hδ n , f i,
the columns of I provide the so-called sampling basis. With A = RE † and
x = fˆ, we then have
Ax = b,
with the vector x sparse. It is important for what follows to note that the
matrix A is random, in the sense that we choose which rows of I to use to
form R.
35.4.2
Locally Constant Signals
Suppose now that the function f (t) is locally constant, consisting of some
number of horizontal lines. We discretize the function f (t) to get the
358
CHAPTER 35. APPENDIX: COMPRESSED SENSING
vector f = (f (0), f (1), ..., f (N ))T . The discrete derivative vector is g =
(g1 , g2 , ..., gN )T , with
gn = f (n) − f (n − 1).
Since f (t) is locally constant, the vector g is sparse. The data we will have
will not typically be values f (n). The goal will be to recover f from M
linear functional values pertaining to f , where M is much smaller than N .
We shall assume, from now on, that we have measured, or can estimate,
the value f (0).
Our M by 1 data vector d consists of measurements pertaining to the
vector f :
N
X
dm =
Hmn fn ,
n=0
for m = 1, ..., M , where the Hmn are known. We can then write
N
N X
N
X
X
dm = f (0)
Hmn +
Hmj gk .
n=0
k=1
j=k
Since f (0) is known, we can write
N
N
X
X
Hmn =
Amk gk ,
bm = dm − f (0)
n=0
where
Amk =
N
X
k=1
Hmj .
j=k
The problem is then to find a sparse solution of Ax = g. As in the previous
example, we often have the freedom to select the linear functions, that is,
the values Hmn , so the matrix A can be viewed as random.
35.4.3
Tomographic Imaging
The reconstruction of tomographic images is an important aspect of medical diagnosis, and one that combines aspects of both of the previous examples. The data one obtains from the scanning process can often be
interpreted as values of the Fourier transform of the desired image; this is
precisely the case in magnetic-resonance imaging, and approximately true
for x-ray transmission tomography, positron-emission tomography (PET)
and single-photon emission tomography (SPECT). The images one encounters in medical diagnosis are often approximately locally constant, so the
associated array of discrete partial derivatives will be sparse. If this sparse
derivative array can be recovered from relatively few Fourier-transform values, then the scanning time can be reduced.
We turn now to the more general problem of compressed sampling.
35.5. COMPRESSED SAMPLING
35.5
359
Compressed Sampling
Our goal is to recover the vector f = (f1 , ..., fN )T from M linear functional
values of f , where M is much less than N . In general, this is not possible
without prior information about the vector f . In compressed sampling,
the prior information concerns the sparseness of either f itself, or another
vector linearly related to f .
Let U and V be unitary N by N matrices, so that the column vectors
of both U and V form orthonormal bases for C N . We shall refer to the
bases associated with U and V as the sampling basis and the representing
basis, respectively. The first objective is to find a unitary matrix V so that
f = V x, where x is sparse. Then we want to find a second unitary matrix
U such that, when an M by N matrix R is obtained from U by deleting
rows, the sparse vector x can be determined from the data b = RV x = Ax.
Theorems in compressed sensing describe properties of the matrices U and
V such that, when R is obtained from U by a random selection of the rows
of U , the vector x will be uniquely determined, with high probability, as
the unique solution that minimizes the one-norm.
360
CHAPTER 35. APPENDIX: COMPRESSED SENSING
Chapter 36
Appendix: Transmission
Tomography I
36.1
Chapter Summary
Our topic is now transmission tomography. This chapter will provide a
detailed description of how the data is gathered, the mathematical model
of the scanning process, and the problem to be solved. In the next chapter
we shall study the various mathematical techniques needed to solve this
problem and the manner in which these techniques are applied, including
filtering methods for inverting the two-dimensional Fourier transform.
36.2
X-ray Transmission Tomography
Although transmission tomography is not limited to scanning living beings,
we shall concentrate here on the use of x-ray tomography in medical diagnosis and the issues that concern us in that application. The mathematical
formulation will, of course, apply more generally.
In x-ray tomography, x-rays are transmitted through the body along
many lines. In some, but not all, cases, the lines will all lie in the same
plane. The strength of the x-rays upon entering the body is assumed
known, and the strength upon leaving the body is measured. This data can
then be used to estimate the amount of attenuation the x-ray encountered
along that line, which is taken to be the integral, along that line, of the
attenuation function. On the basis of these line integrals, we estimate the
attenuation function. This estimate is presented to the physician as one or
more two-dimensional images.
361
362 CHAPTER 36. APPENDIX: TRANSMISSION TOMOGRAPHY I
36.3
The Exponential-Decay Model
As an x-ray beam passes through the body, it encounters various types of
matter, such as soft tissue, bone, ligaments, air, each weakening the beam
to a greater or lesser extent. If the intensity of the beam upon entry is Iin
and Iout is its lower intensity after passing through the body, then
R
−
f
Iout = Iin e L ,
where f = f (x, y) ≥ 0 is the attenuation function describing the twodimensional
distribution of matter within the slice of the body being scanned
R
and L f is the integral of the function f over the line L along which the
x-ray beam has passed. To see why this is the case, imagine the line L
parameterized by the variable s and consider the intensity function I(s)
as a function of s. For small ∆s > 0, the drop in intensity from the start
to the end of the interval [s, s + ∆s] is approximately proportional to the
intensity I(s), to the attenuation f (s) and to ∆s, the length of the interval;
that is,
I(s) − I(s + ∆s) ≈ f (s)I(s)∆s.
Dividing by ∆s and letting ∆s approach zero, we get
I 0 (s) = −f (s)I(s).
Exercise 36.1 Show that the solution to this differential equation is
Z u=s
I(s) = I(0) exp(−
f (u)du).
u=0
Hint: Use an integrating factor.
R
R
From knowledge of Iin and Iout , we can determine L f . If we know L f
for every line in the x, y-plane we can reconstruct the attenuation function
f . In the real world we know line integrals only approximately and only
for finitely many lines. The goal in x-ray transmission tomography is to
estimate the attenuation function f (x, y) in the slice, from finitely many
noisy measurements of the line integrals. We usually have prior information about the values that f (x, y) can take on. We also expect to find
sharp boundaries separating regions where the function f (x, y) varies only
slightly. Therefore, we need algorithms capable of providing such images.
36.4
Difficulties to be Overcome
There are several problems associated with this model. X-ray beams are
not exactly straight lines; the beams tend to spread out. The x-rays are not
monochromatic, and their various frequency components are attenuated at
36.5. RECONSTRUCTION FROM LINE INTEGRALS
363
different rates, resulting in beam hardening, that is, changes in the spectrum
of the beam as it passes through the object. The beams consist of photons
obeying statistical laws, so our algorithms probably should be based on
these laws. How we choose the line segments is determined by the nature
of the problem; in certain cases we are somewhat limited in our choice
of these segments. Patients move; they breathe, their hearts beat, and,
occasionally, they shift position during the scan. Compensating for these
motions is an important, and difficult, aspect of the image reconstruction
process. Finally, to be practical in a clinical setting, the processing that
leads to the reconstructed image must be completed in a short time, usually
around fifteen minutes. This time constraint is what motivates viewing
the three-dimensional attenuation function in terms of its two-dimensional
slices.
As we shall see, the Fourier transform and the associated theory of convolution filters play important roles in the reconstruction of transmission
tomographic images.
The data we actually obtain at the detectors are counts of detected
photons. These counts are not the line integrals; they are random quantities whose means, or expected values, are related to the line integrals.
The Fourier inversion methods for solving the problem ignore its statistical
aspects; in contrast, other methods, such as likelihood maximization, are
based on a statistical model that involves Poisson-distributed emissions.
36.5
Reconstruction from Line Integrals
We turn now to the underlying problem of reconstructing attenuation functions from line-integral data.
36.5.1
The Radon Transform
Our goal is to reconstruct the function f (x, y) ≥ 0 from line-integral data.
Let θ be a fixed angle in the interval [0, π). Form the t, s-axis system with
the positive t-axis making the angle θ with the positive x-axis, as shown
in Figure 36.1. Each point (x, y) in the original coordinate system has
coordinates (t, s) in the second system, where the t and s are given by
t = x cos θ + y sin θ,
and
s = −x sin θ + y cos θ.
If we have the new coordinates (t, s) of a point, the old coordinates are
(x, y) given by
x = t cos θ − s sin θ,
364 CHAPTER 36. APPENDIX: TRANSMISSION TOMOGRAPHY I
and
y = t sin θ + s cos θ.
We can then write the function f as a function of the variables t and s.
For each fixed value of t, we compute the integral
Z
Z
f (x, y)ds = f (t cos θ − s sin θ, t sin θ + s cos θ)ds
L
along the single line L corresponding to the fixed values of θ and t. We
repeat this process for every value of t and then change the angle θ and
repeat again. In this way we obtain the integrals of f over every line L in
the plane. We denote by rf (θ, t) the integral
Z
Z
rf (θ, t) =
f (x, y)ds = f (t cos θ − s sin θ, t sin θ + s cos θ)ds. (36.1)
L
The function rf (θ, t) is called the Radon transform of f .
36.5.2
The Central Slice Theorem
For fixed θ the function rf (θ, t) is a function of the single real variable t;
let Rf (θ, ω) be its Fourier transform. Then
Z
Rf (θ, ω) = rf (θ, t)eiωt dt
Z Z
=
Z Z
=
f (t cos θ − s sin θ, t sin θ + s cos θ)eiωt dsdt
f (x, y)eiω(x cos θ+y sin θ) dxdy = F (ω cos θ, ω sin θ),
where F (ω cos θ, ω sin θ) is the two-dimensional Fourier transform of the
function f (x, y), evaluated at the point (ω cos θ, ω sin θ); this relationship
is called the Central Slice Theorem. For fixed θ, as we change the value
of ω, we obtain the values of the function F along the points of the line
making the angle θ with the horizontal axis. As θ varies in [0, π), we get all
the values of the function F . Once we have F , we can obtain f using the
formula for the two-dimensional inverse Fourier transform. We conclude
that we are able to determine f from its line integrals. As we shall see,
inverting the Fourier transform can be implemented by combinations of
frequency-domain filtering and back-projection.
36.5. RECONSTRUCTION FROM LINE INTEGRALS
365
Figure 36.1: The Radon transform of f at (t, θ) is the line integral of f
along line L.
366 CHAPTER 36. APPENDIX: TRANSMISSION TOMOGRAPHY I
Chapter 37
Appendix: Transmission
Tomography II
37.1
Chapter Summary
According to the Central Slice Theorem, if we have all the line integrals
through the attenuation function f (x, y) then we have the two-dimensional
Fourier transform of f (x, y). To get f (x, y) we need to invert the twodimensional Fourier transform; that is the topic of this chapter.
37.2
Inverting the Fourier Transform
The Fourier-transform inversion formula for two-dimensional functions tells
us that the function f (x, y) can be obtained as
f (x, y) =
1
4π 2
Z Z
F (u, v)e−i(xu+yv) dudv.
(37.1)
We now derive alternative inversion formulas.
37.2.1
Back-Projection
For 0 ≤ θ < π and all real t, let h(θ, t) be any function of the variables θ
and t; for example, it could be the Radon transform. As with the Radon
transform, we imagine that each pair (θ, t) corresponds to one line through
the x, y-plane. For each fixed point (x, y) we assign to this point the average, over all θ, of the quantities h(θ, t) for every pair (θ, t) such that the
point (x, y) lies on the associated line. The summing process is integration
367
368 CHAPTER 37. APPENDIX: TRANSMISSION TOMOGRAPHY II
and the back-projection function at (x, y) is
Z π
BPh (x, y) =
h(θ, x cos θ + y sin θ)dθ.
(37.2)
0
The operation of back-projection will play an important role in what follows
in this chapter.
37.2.2
Ramp Filter, then Back-project
Expressing the double integral in Equation (37.1) in polar coordinates
(ω, θ), with ω ≥ 0, u = ω cos θ, and v = ω sin θ, we get
Z 2π Z ∞
1
F (u, v)e−i(xu+yv) ωdωdθ,
f (x, y) =
4π 2 0
0
or
Z πZ ∞
1
f (x, y) =
F (u, v)e−i(xu+yv) |ω|dωdθ.
4π 2 0 −∞
Now write
F (u, v) = F (ω cos θ, ω sin θ) = Rf (θ, ω),
where Rf (θ, ω) is the FT with respect to t of rf (θ, t), so that
Z ∞
Z ∞
F (u, v)e−i(xu+yv) |ω|dω =
Rf (θ, ω)|ω|e−iωt dω.
−∞
−∞
The function gf (θ, t) defined for t = x cos θ + y sin θ by
Z ∞
1
Rf (θ, ω)|ω|e−iωt dω
gf (θ, x cos θ + y sin θ) =
2π −∞
(37.3)
is the result of a linear filtering of rf (θ, t) using a ramp filter with transfer
function H(ω) = |ω|. Then,
Z π
1
1
BPgf (x, y) =
gf (θ, x cos θ + y sin θ)dθ
(37.4)
f (x, y) =
2π
2π 0
gives f (x, y) as the result of a back-projection operator; for every fixed value
of (θ, t) add gf (θ, t) to the current value at the point (x, y) for all (x, y)
lying on the straight line determined by θ and t by t = x cos θ + y sin θ.
The final value at a fixed point (x, y) is then the average of all the values
gf (θ, t) for those (θ, t) for which (x, y) is on the line t = x cos θ + y sin θ.
It is therefore said that f (x, y) can be obtained by filtered back-projection
(FBP) of the line-integral data.
Knowing that f (x, y) is related to the complete set of line integrals by
filtered back-projection suggests that, when only finitely many line integrals
are available, a similar ramp filtering and back-projection can be used to
estimate f (x, y); in the clinic this is the most widely used method for the
reconstruction of tomographic images.
37.2. INVERTING THE FOURIER TRANSFORM
37.2.3
369
Back-project, then Ramp Filter
There is a second way to recover f (x, y) using back-projection and filtering,
this time in the reverse order; that is, we back-project the Radon transform
and then ramp filter the resulting function of two variables. We begin with
the back-projection operation, as applied to the function h(θ, t) = rf (θ, t).
We have
Z π
rf (θ, x cos θ + y sin θ)dθ.
(37.5)
BPrf (x, y) =
0
Replacing rf (θ, t) with
rf (θ, t) =
1
2π
Z
∞
Rf (θ, ω)e−iωt dω,
−∞
and inserting
Rf (θ, ω) = F (ω cos θ, ω sin θ),
and
t = x cos θ + y sin θ,
we get
Z
π
BPrf (x, y) =
0
1 Z ∞
F (ω cos θ, ω sin θ)e−i(x cos θ+y sin θ) dω dθ.
2π −∞
With u = ω cos θ and v = ω sin θ, this becomes
Z ∞
Z π
F (u, v) −i(xu+yv)
1
√
e
|ω|dω dθ,
BPrf (x, y) =
2π −∞ u2 + v 2
0
1 Z ∞
=
G(u, v)e−i(xu+yv) |ω|dω dθ
2π −∞
0
Z ∞Z ∞
1
=
G(u, v)e−i(xu+yv) dudv.
2π −∞ −∞
Z
π
This tells us that the back-projection of rf (θ, t) is the function g(x, y) whose
two-dimensional Fourier transform is
G(u, v) =
p
1
F (u, v)/ u2 + v 2 .
2π
Therefore, we can obtain f (x, y) from rf (θ, t) by first back-projecting rf (θ, t)
to
√ get g(x, y) and then filtering g(x, y) by forming G(u, v), multiplying by
u2 + v 2 , and taking the inverse Fourier transform.
370 CHAPTER 37. APPENDIX: TRANSMISSION TOMOGRAPHY II
37.2.4
Radon’s Inversion Formula
To get Radon’s inversion formula, we need two basic properties of the
Fourier transform. First, if f (x) has Fourier transform F (γ) then the
derivative f 0 (x) has Fourier transform −iγF (γ). Second, if F (γ) = sgn(γ),
γ
for γ 6= 0, and equal to zero for γ = 0, then its
the function that is |γ|
1
inverse Fourier transform is f (x) = iπx
.
Writing equation (37.3) as
Z ∞
1
ωRf (θ, ω)sgn(ω)e−iωt dω,
gf (θ, t) =
2π −∞
we see that gf is the inverse Fourier transform of the product of the two
functions ωRf (θ, ω) and sgn(ω). Consequently, gf is the convolution of
1
∂
their individual inverse Fourier transforms, i ∂t
rf (θ, t) and iπt
; that is,
1
gf (θ, t) =
π
Z
∞
−∞
1
∂
rf (θ, s)
ds,
∂t
t−s
∂
which is the Hilbert transform of the function ∂t
rf (θ, t), with respect to
the variable t. Radon’s inversion formula is then
Z π
1
∂
f (x, y) =
HT ( rf (θ, t))dθ.
2π 0
∂t
37.3
From Theory to Practice
What we have just described is the theory. What happens in practice?
37.3.1
The Practical Problems
Of course, in reality we never have the Radon transform rf (θ, t) for all
values of its variables. Only finitely many angles θ are used, and, for each
θ, we will have (approximate) values of line integrals for only finitely many
t. Therefore, taking the Fourier transform of rf (θ, t), as a function of
the single variable t, is not something we can actually do. At best, we can
approximate Rf (θ, ω) for finitely many θ. From the Central Slice Theorem,
we can then say that we have approximate values of F (ω cos θ, ω sin θ), for
finitely many θ. This means that we have (approximate) Fourier transform
values for f (x, y) along finitely many lines through the origin, like the
spokes of a wheel. The farther from the origin we get, the fewer values we
have, so the coverage in Fourier space is quite uneven. The low-spatialfrequencies are much better estimated than higher ones, meaning that we
have a low-pass version of the desired f (x, y). The filtered back-projection
approaches we have just discussed both involve ramp filtering, in which the
37.4. SOME PRACTICAL CONCERNS
371
higher frequencies are increased, relative to the lower ones. This too can
only be implemented approximately, since the data is noisy and careless
ramp filtering will cause the reconstructed image to be unacceptably noisy.
37.3.2
A Practical Solution: Filtered Back-Projection
We assume, to begin with, that we have finitely many line integrals, that
is, we have values rf (θ, t) for finitely many θ and finitely many t. For
each fixed θ we estimate the Fourier transform, Rf (θ, ω). This step can
be performed in various ways, and we can freely choose the values of ω
at which we perform the estimation. The FFT will almost certainly be
involved in calculating the estimates of Rf (θ, ω).
For each fixed θ we multiply our estimated values of Rf (θ, ω) by |ω| and
then use the FFT again to inverse Fourier transform, to achieve a ramp
filtering of rf (θ, t) as a function of t. Note, however, that when |ω| is large,
we may multiply by a smaller quantity, to avoid enhancing noise. We do
this for each angle θ, to get a function of (θ, t), which we then back-project
to get our final image. This is ramp-filtering, followed by back-projection,
as applied to the finite data we have.
It is also possible to mimic the second approach to inversion, that is, to
back-project onto the pixels each rf (θ, t) that we have, and then to perform
a ramp filtering of this two-dimensional array of numbers to obtain the
final image. In this case, the two-dimensional ramp filtering involves many
applications of the FFT.
There is a third approach. Invoking the Central Slice Theorem, we can
say that we have finitely many approximate values of F (u, v), the Fourier
transform of the attenuation function f (x, y), along finitely many lines
through the origin. The first step is to use these values to estimate the
values of F (u, v) at the points of a rectangular grid. This step involves
interpolation [215, 219]. Once we have (approximate) values of F (u, v) on
a rectangular grid, we perform a two-dimensional FFT to obtain our final
estimate of the (discretized) f (x, y).
37.4
Some Practical Concerns
As computer power increases and scanners become more sophisticated,
there is pressure to include more dimensionality in the scans. This means
going beyond slice-by-slice tomography to fully three-dimensional images,
or even including time as the fourth dimension, to image dynamically. This
increase in dimensionality comes at a cost, however [202]. Besides the increase in radiation to the patient, there are other drawbacks, such as longer
acquisition time, storing large amounts of data, processing and analyzing
372 CHAPTER 37. APPENDIX: TRANSMISSION TOMOGRAPHY II
this data, displaying the results, reading and understanding the higherdimensional images, and so on.
37.5
Summary
We have seen how the problem of reconstructing a function from line integrals arises in transmission tomography. The Central Slice Theorem connects the line integrals and the Radon transform to the Fourier transform
of the desired attenuation function. Various approaches to implementing
the Fourier Inversion Formula lead to filtered back-projection algorithms
for the reconstruction. In x-ray tomography, as well as in PET, viewing the
data as line integrals ignores the statistical aspects of the problem, and in
SPECT, it ignores, as well, the important physical effects of attenuation.
To incorporate more of the physics of the problem, iterative algorithms
based on statistical models have been developed. We consider some of
these algorithms in the books [46] and [48].
Bibliography
[1] Agmon, S. (1954) “The relaxation method for linear inequalities.”Canadian Journal of Mathematics 6, pp. 382–392.
[2] Anderson, T. (1972) “Efficient estimation of regression coefficients
in time series.”Proc. of Sixth Berkeley Symposium on Mathematical
Statistics and Probability, Volume 1: The Theory of Statistics University of California Press, Berkeley, CA, pp. 471–482.
[3] Anderson, A. and Kak, A. (1984) “Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm.”Ultrasonic Imaging 6, pp. 81–94.
[4] Ash, R. and Gardner, M. (1975) Topics in Stochastic Processes Boston:
Academic Press.
[5] Axelsson, O. (1994) Iterative Solution Methods. Cambridge, UK:
Cambridge University Press.
[6] Baggeroer, A., Kuperman, W., and Schmidt, H. (1988) “Matched field
processing: source localization in correlated noise as optimum parameter estimation.”Journal of the Acoustical Society of America 83, pp.
571–587.
[7] Baillon, J. and Haddad, G. (1977) “Quelques proprietes des operateurs
angle-bornes et n-cycliquement monotones.”Israel J. of Mathematics
26, pp. 137–150.
[8] Barrett, H., White, T., and Parra, L. (1997) “List-mode likelihood.”J.
Opt. Soc. Am. A 14, pp. 2914–2923.
[9] Bauschke, H. (2001) “Projection algorithms: results and open problems.”in Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Butnariu, D., Censor, Y., and Reich, S.,
editors, Amsterdam: Elsevier Science. pp. 11–22.
373
374
BIBLIOGRAPHY
[10] Bauschke, H. and Borwein, J. (1996) “On projection algorithms for
solving convex feasibility problems.”SIAM Review 38 (3), pp. 367–
426.
[11] Bauschke, H., Borwein, J., and Lewis, A. (1997) “The method of cyclic
projections for closed convex sets in Hilbert space.”Contemporary
Mathematics: Recent Developments in Optimization Theory and Nonlinear Analysis 204, American Mathematical Society, pp. 1–38.
[12] Benson, M. (2003) “What Galileo Saw.” in The New Yorker; reprinted
in [81].
[13] Bertero, M. (1992) “Sampling theory, resolution limits and inversion
methods.”in [15], pp. 71–94.
[14] Bertero, M., and Boccacci, P. (1998) Introduction to Inverse Problems
in Imaging Bristol, UK: Institute of Physics Publishing.
[15] Bertero, M. and Pike, E.R., editors (1992) Inverse Problems in Scattering and Imaging Malvern Physics Series, Adam Hilger, IOP Publishing, London.
[16] Bertsekas, D.P. (1997) “A new class of incremental gradient methods
for least squares problems.”SIAM J. Optim. 7, pp. 913–926.
[17] Blackman, R. and Tukey, J. (1959) The Measurement of Power Spectra. New York: Dover Publications.
[18] Boggess, A. and Narcowich, F. (2001) A First Course in Wavelets,
with Fourier Analysis. Englewood Cliffs, NJ: Prentice-Hall.
[19] Born, M. and Wolf, E. (1999) Principles of Optics: 7th edition. Cambridge, UK: Cambridge University Press.
[20] Bochner, S. and Chandrasekharan, K. (1949) Fourier Transforms, Annals of Mathematical Studies, No. 19. Princeton, NJ: Princeton University Press.
[21] Bolles, E.B. (1997) Galileo’s Commandment: 2,500 Years of Great
Science Writing. New York: W.H. Freeman.
[22] Borwein, J. and Lewis, A. (2000) Convex Analysis and Nonlinear Optimization. Canadian Mathematical Society Books in Mathematics,
New York: Springer-Verlag.
[23] Bracewell, R.C. (1979) “Image reconstruction in radio astronomy.” in
[128], pp. 81–104.
BIBLIOGRAPHY
375
[24] Bregman, L.M. (1967) “The relaxation method of finding the common
point of convex sets and its application to the solution of problems in
convex programming.”USSR Computational Mathematics and Mathematical Physics 7: pp. 200–217.
[25] Brodzik, A. and Mooney, J. (1999) “Convex projections algorithm
for restoration of limited-angle chromotomographic images.”Journal
of the Optical Society of America A 16 (2), pp. 246–257.
[26] Browne, J. and A. DePierro, A. (1996) “A row-action alternative to
the EM algorithm for maximizing likelihoods in emission tomography.”IEEE Trans. Med. Imag. 15, pp. 687–699.
[27] Bruckstein, A., Donoho, D., and Elad, M. (2009) “From sparse solutions of systems of equations to sparse modeling of signals and images.”
SIAM Review, 51(1), pp. 34–81.
[28] Bruyant, P., Sau, J., and Mallet, J.J. (1999) “Noise removal using
factor analysis of dynamic structures: application to cardiac gated
studies.”Journal of Nuclear Medicine 40 (10), pp. 1676–1682.
[29] Bucker, H. (1976) “Use of calculated sound fields and matched field
detection to locate sound sources in shallow water.”Journal of the
Acoustical Society of America 59, pp. 368–373.
[30] Burg, J. (1967) “Maximum entropy spectral analysis.”paper presented
at the 37th Annual SEG meeting, Oklahoma City, OK.
[31] Burg, J. (1972) “The relationship between maximum entropy spectra
and maximum likelihood spectra.”Geophysics 37, pp. 375–376.
[32] Burg, J. (1975) Maximum Entropy Spectral Analysis, Ph.D. dissertation, Stanford University.
[33] Byrne, C. (1992) “Effects of modal phase errors on eigenvector and
nonlinear methods for source localization in matched field processing.”Journal of the Acoustical Society of America 92(4), pp. 2159–
2164.
[34] Byrne, C. (1993) “Iterative image reconstruction algorithms based on
cross-entropy minimization.”IEEE Transactions on Image Processing
IP-2, pp. 96–103.
[35] Byrne, C. (1995) “Erratum and addendum to ‘Iterative image reconstruction algorithms based on cross-entropy minimization’.”IEEE
Transactions on Image Processing IP-4, pp. 225–226.
376
BIBLIOGRAPHY
[36] Byrne, C. (1996) “Iterative reconstruction algorithms based on crossentropy minimization.”in Image Models (and their Speech Model
Cousins), S.E. Levinson and L. Shepp, editors, IMA Volumes in
Mathematics and its Applications, Volume 80, pp. 1–11. New York:
Springer-Verlag.
[37] Byrne, C. (1996) “Block-iterative methods for image reconstruction
from projections.”IEEE Transactions on Image Processing IP-5, pp.
792–794.
[38] Byrne, C. (1997) “Convergent block-iterative algorithms for image
reconstruction from inconsistent data.”IEEE Transactions on Image
Processing IP-6, pp. 1296–1304.
[39] Byrne, C. (1998) “Accelerating the EMML algorithm and related iterative algorithms by rescaled block-iterative (RBI) methods.”IEEE
Transactions on Image Processing IP-7, pp. 100–109.
[40] Byrne, C. (1999) “Iterative projection onto convex sets using multiple
Bregman distances.”Inverse Problems 15, pp. 1295–1313.
[41] Byrne, C. (2000) “Block-iterative interior point optimization methods
for image reconstruction from limited data.”Inverse Problems 16, pp.
1405–1419.
[42] Byrne, C. (2001) “Bregman-Legendre multidistance projection algorithms for convex feasibility and optimization.”in Inherently Parallel
Algorithms in Feasibility and Optimization and their Applications,
Butnariu, D., Censor, Y., and Reich, S., editors, pp. 87–100. Amsterdam: Elsevier Publ.,
[43] Byrne, C. (2001) “Likelihood maximization for list-mode emission
tomographic image reconstruction.”IEEE Transactions on Medical
Imaging 20(10), pp. 1084–1092.
[44] Byrne, C. (2002) “Iterative oblique projection onto convex sets and
the split feasibility problem.”Inverse Problems 18, pp. 441–453.
[45] Byrne, C. (2004) “A unified treatment of some iterative algorithms in
signal processing and image reconstruction.”Inverse Problems 20, pp.
103–120.
[46] Byrne, C. (2008) Applied Iterative Methods, Wellesley, MA: AK Peters,
Publ.
[47] Byrne, C. (2009) A First Course in Optimization, unpublished text
available at my web site.
BIBLIOGRAPHY
377
[48] Byrne, C. (2009) Applied and Computational Linear Algebra: A First
Course, unpublished text available at my web site.
[49] Byrne, C., Brent, R., Feuillade, C., and DelBalzo, D (1990) “A stable
data-adaptive method for matched-field array processing in acoustic
waveguides.”Journal of the Acoustical Society of America 87(6), pp.
2493–2502.
[50] Byrne, C. and Censor, Y. (2001) “Proximity function minimization
using multiple Bregman projections, with applications to split feasibility and Kullback-Leibler distance minimization.”Annals of Operations
Research 105, pp. 77–98.
[51] Byrne, C. and Fiddy, M. (1987) “Estimation of continuous object
distributions from Fourier magnitude measurements.”JOSA A 4, pp.
412–417.
[52] Byrne, C. and Fiddy, M. (1988) “Images as power spectra; reconstruction as Wiener filter approximation.”Inverse Problems 4, pp. 399–409.
[53] Byrne, C. and Fitzgerald, R. (1979) “A unifying model for spectrum estimation.”in Proceedings of the RADC Workshop on Spectrum
Estimation- October 1979, Griffiss AFB, Rome, NY.
[54] Byrne, C. and Fitzgerald, R. (1982) “Reconstruction from partial information, with applications to tomography.”SIAM J. Applied Math.
42(4), pp. 933–940.
[55] Byrne, C., Fitzgerald, R., Fiddy, M., Hall, T. and Darling, A. (1983)
“Image restoration and resolution enhancement.”J. Opt. Soc. Amer.
73, pp. 1481–1487.
[56] Byrne, C. and Fitzgerald, R. (1984) “Spectral estimators that extend
the maximum entropy and maximum likelihood methods.”SIAM J.
Applied Math. 44(2), pp. 425–442.
[57] Byrne, C., Frichter, G., and Feuillade, C. (1990) “Sector-focused stability methods for robust source localization in matched-field processing.”Journal of the Acoustical Society of America 88(6), pp. 2843–
2851.
[58] Byrne, C., Haughton, D., and Jiang, T. (1993) “High-resolution inversion of the discrete Poisson and binomial transformations.”Inverse
Problems 9, pp. 39–56.
[59] Byrne, C., Levine, B.M., and Dainty, J.C. (1984) “Stable estimation
of the probability density function of intensity from photon frequency
counts.”JOSA Communications 1(11), pp. 1132–1135.
378
BIBLIOGRAPHY
[60] Byrne, C., and Steele, A. (1985) “Stable nonlinear methods for sensor array processing.”IEEE Transactions on Oceanic Engineering OE10(3), pp. 255–259.
[61] Byrne, C., and Wells, D. (1983) “Limit of continuous and discrete
finite-band Gerchberg iterative spectrum extrapolation.”Optics Letters 8 (10), pp. 526–527.
[62] Byrne, C., and Wells, D. (1985) “Optimality of certain iterative and
non-iterative data extrapolation procedures.”Journal of Mathematical
Analysis and Applications 111 (1), pp. 26–34.
[63] Candès, E., Romberg, J., and Tao, T. (2006) “Robust uncertainty principles: Exact signal reconstruction from highly incomplete
frequency information.”IEEE Transactions on Information Theory,
52(2), pp. 489–509.
[64] Candès, E., and Romberg, J. (2007) “Sparsity and incoherence in compressive sampling.”Inverse Problems, 23(3), pp. 969–985.
[65] Candès, E., Wakin, M., and Boyd, S. (2007) “Enhancing
sparsity by reweighted l1 minimization.” preprint available at
http://www.acm.caltech.edu/ emmanuel/publications.html .
[66] Candy, J. (1988) Signal Processing: The Modern Approach New York:
McGraw-Hill Publ.
[67] Capon, J. (1969) “High-resolution frequency-wavenumber spectrum
analysis.”Proc. of the IEEE 57, pp. 1408–1418.
[68] Cederquist, J., Fienup, J., Wackerman, C., Robinson, S., and
Kryskowski, D. (1989) “Wave-front phase estimation from Fourier intensity measurements.”Journal of the Optical Society of America A
6(7), pp. 1020–1026.
[69] Censor, Y. (1981) “Row-action methods for huge and sparse systems
and their applications.”SIAM Review, 23: 444–464.
[70] Censor, Y. and Elfving, T. (1994) “A multiprojection algorithm using
Bregman projections in a product space.”Numerical Algorithms 8, pp.
221–239.
[71] Censor, Y., Eggermont, P.P.B., and Gordon, D. (1983) “Strong
underrelaxation in Kaczmarz’s method for inconsistent systems.”Numerische Mathematik 41, pp. 83–92.
BIBLIOGRAPHY
379
[72] Censor, Y., Iusem, A.N. and Zenios, S.A. (1998) “An interior point
method with Bregman functions for the variational inequality problem
with paramonotone operators.”Mathematical Programming, 81, pp.
373–400.
[73] Censor, Y. and Segman, J. (1987) “On block-iterative maximization.”J. of Information and Optimization Sciences 8, pp. 275–291.
[74] Censor, Y. and Zenios, S.A. (1997) Parallel Optimization: Theory,
Algorithms and Applications. New York: Oxford University Press.
[75] Chang, J.-H., Anderson, J.M.M., and Votaw, J.R. (2004) “Regularized image reconstruction algorithms for positron emission tomography.”IEEE Transactions on Medical Imaging 23(9), pp. 1165–1175.
[76] Childers, D., editor (1978) Modern Spectral Analysis. New York:IEEE
Press.
[77] Christensen, O. (2003) An Introduction to Frames and Riesz Bases.
Boston: Birkhäuser.
[78] Chui, C. (1992) An Introduction to Wavelets. Boston: Academic Press.
[79] Chui, C. and Chen, G. (1991) Kalman Filtering, second edition. Berlin:
Springer-Verlag.
[80] Cimmino, G. (1938) “Calcolo approssimato per soluzioni dei sistemi
di equazioni lineari.”La Ricerca Scientifica XVI, Series II, Anno IX 1,
pp. 326–333.
[81] Cohen, J. (2010) (editor) The Best of The Best American Science
Writing, Harper-Collins Publ.
[82] Combettes, P. (1993) “The foundations of set theoretic estimation.”Proceedings of the IEEE 81 (2), pp. 182–208.
[83] Combettes, P. (1996) “The convex feasibility problem in image recovery.”Advances in Imaging and Electron Physics 95, pp. 155–270.
[84] Combettes, P. (2000) “Fejér monotonicity in convex optimization.”in
Encyclopedia of Optimization, C.A. Floudas and P. M. Pardalos, editors, Boston: Kluwer Publ.
[85] Combettes, P., and Trussell, J. (1990) “Method of successive projections for finding a common point of sets in a metric space.”Journal of
Optimization Theory and Applications 67 (3), pp. 487–507.
[86] Cooley, J. and Tukey, J. (1965) “An algorithm for the machine calculation of complex Fourier series.”Math. Comp., 19, pp. 297–301.
380
BIBLIOGRAPHY
[87] Cox, H. (1973) “Resolving power and sensitivity to mismatch of optimum array processors.”Journal of the Acoustical Society of America
54, pp. 771–785.
[88] Csiszár, I. and Tusnády, G. (1984) “Information geometry and alternating minimization procedures.”Statistics and Decisions Supp. 1,
pp. 205–237.
[89] Csiszár, I. (1989) “A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling.”The Annals of Statistics 17 (3),
pp. 1409–1413.
[90] Csiszár, I. (1991) “Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems.”The Annals
of Statistics 19 (4), pp. 2032–2066.
[91] Dainty, J. C. and Fiddy, M. (1984) “The essential role of prior knowleldge in phase retrieval.”Optica Acta 31, pp. 325–330.
[92] Darroch, J. and Ratcliff, D. (1972) “Generalized iterative scaling for
log-linear models.”Annals of Mathematical Statistics 43, pp. 1470–
1480.
[93] Daubechies, I. (1988) “Orthogonal bases of compactly supported
wavelets.”Commun. Pure Appl. Math. 41, pp. 909–996.
[94] Daubechies, I. (1992) Ten Lectures on Wavelets. Philadelphia: Society
for Industrial and Applied Mathematics.
[95] De Bruijn, N. (1967) “Uncertainty principles in Fourier analysis.”in
Inequalties, O. Shisha, editor, pp. 57–71, Boston: Academic Press.
[96] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) “Maximum likelihood from incomplete data via the EM algorithm.”Journal of the
Royal Statistical Society, Series B 37, pp. 1–38.
[97] De Pierro, A. (1995) “A modified expectation maximization algorithm
for penalized likelihood estimation in emission tomography.”IEEE
Transactions on Medical Imaging 14, pp. 132–137.
[98] De Pierro, A. and Iusem, A. (1990) “On the asymptotic behaviour of
some alternate smoothing series expansion iterative methods.”Linear
Algebra and its Applications 130, pp. 3–24.
[99] Dhanantwari, A., Stergiopoulos, S., and Iakovidis, I. (2001) “Correcting organ motion artifacts in x-ray CT medical imaging systems by
adaptive processing. I. Theory.”Med. Phys. 28(8), pp. 1562–1576.
BIBLIOGRAPHY
381
[100] Dolidze, Z.O. (1982) “Solution of variational inequalities associated
with a class of monotone maps.”Ekonomika i Matem. Metody 18 (5),
pp. 925–927 (in Russian).
[101] Donoho, D. (2006) “Compressed sampling” IEEE Transactions on
Information Theory, 52 (4). (download preprints at http://www.stat.
stanford.edu/ donoho/Reports).
[102] Duda, R., Hart, P., and Stork, D. (2001) Pattern Classification, Wiley.
[103] Dugundji, J. (1970) Topology Boston: Allyn and Bacon, Inc.
[104] Eddington, A. (1927) “The story of Algol.”Stars and Atoms;
reprinted in [21].
[105] Eggermont, P.P.B., Herman, G.T., and Lent, A. (1981) “Iterative
algorithms for large partitioned linear systems, with applications to
image reconstruction.”Linear Algebra and its Applications 40, pp. 37–
67.
[106] Everitt, B. and Hand, D. (1981) Finite Mixture Distributions London:
Chapman and Hall.
[107] Feuillade, C., DelBalzo, D., and Rowe, M. (1989) “Environmental
mismatch in shallow-water matched-field processing: geoacoustic parameter variability.”Journal of the Acoustical Society of America 85,
pp. 2354–2364.
[108] Feynman, R. (1985) QED: The Strange Theory of Light and Matter.
Princeton, NJ: Princeton University Press.
[109] Feynman, R., Leighton, R., and Sands, M. (1963) The Feynman Lectures on Physics, Vol. 1. Boston: Addison-Wesley.
[110] Fiddy, M. (1983) “The phase retrieval problem.”in Inverse Optics,
SPIE Proceedings 413 (A.J. Devaney, editor), pp. 176–181.
[111] Fiddy, M. (2008) private communication.
[112] Fienup, J. (1979) “Space object imaging through the turbulent atmosphere.”Optical Engineering 18, pp. 529–534.
[113] Fienup, J. (1987) “Reconstruction of a complex-valued object
from the modulus of its Fourier transform using a support constraint.”Journal of the Optical Society of America A 4(1), pp. 118–
123.
382
BIBLIOGRAPHY
[114] Frieden, B. R. (1982) Probability, Statistical Optics and Data Testing. Berlin: Springer-Verlag.
[115] Gabor, D. (1946) “Theory of communication.”Journal of the IEE
(London) 93, pp. 429–457.
[116] Gasquet, C. and Witomski, F. (1998) Fourier Analysis and Applications. Berlin: Springer-Verlag.
[117] Gelb, A., editor, (1974) Applied Optimal Estimation, written by the
technical staff of The Analytic Sciences Corporation, MIT Press, Cambridge, MA.
[118] Geman, S., and Geman, D. (1984) “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.”IEEE Transactions
on Pattern Analysis and Machine Intelligence PAMI-6, pp. 721–741.
[119] Gerchberg, R. W. (1974) “Super-restoration through error energy
reduction.”Optica Acta 21, pp. 709–720.
[120] Golshtein, E., and Tretyakov, N. (1996) Modified Lagrangians and
Monotone Maps in Optimization. New York: John Wiley and Sons,
Inc.
[121] Gordon, R., Bender, R., and Herman, G.T. (1970) “Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy
and x-ray photography.”J. Theoret. Biol. 29, pp. 471–481.
[122] Green, P. (1990) “Bayesian reconstructions from emission tomography data using a modified EM algorithm.”IEEE Transactions on Medical Imaging 9, pp. 84–93.
[123] Groetsch, C. (1999) Inverse Problems: Activities for Undergraduates.
The Mathematical Association of America.
[124] Gubin, L.G., Polyak, B.T. and Raik, E.V. (1967) “The method of
projections for finding the common point of convex sets.”USSR Computational Mathematics and Mathematical Physics 7, pp. 1–24.
[125] Haacke, E., Brown, R., Thompson, M., and Venkatesan, R. (1999)
Magnetic Resonance Imaging. New York: Wiley-Liss.
[126] Haykin, S. (1985) Array Signal Processing. Englewood Cliffs, NJ:
Prentice-Hall.
[127] Hebert, T. and Leahy, R. (1989) “A generalized EM algorithm for 3-D
Bayesian reconstruction from Poisson data using Gibbs priors.”IEEE
Transactions on Medical Imaging 8, pp. 194–202.
BIBLIOGRAPHY
383
[128] Herman, G.T. (ed.) (1979) “Image Reconstruction from Projections” , Topics in Applied Physics, Vol. 32, Springer-Verlag, Berlin.
[129] Herman, G.T. (1999) private communication.
[130] Herman, G. T. and Meyer, L. (1993) “Algebraic reconstruction techniques can be made computationally efficient.”IEEE Transactions on
Medical Imaging 12, pp. 600–609.
[131] Higbee, S. (2004) private communication.
[132] Hildreth, C. (1957) “A quadratic programming procedure.”Naval Research Logistics Quarterly 4, pp. 79–85. Erratum, p. 361.
[133] Hinich, M. (1973) “Maximum likelihood signal processing for a vertical array.”Journal of the Acoustical Society of America 54, pp. 499–
503.
[134] Hinich, M. (1979) “Maximum likelihood estimation of the position of
a radiating source in a waveguide.”Journal of the Acoustical Society
of America 66, pp. 480–483.
[135] Hoffman, K. (1962) Banach Spaces of Analytic Functions Englewood
Cliffs, NJ: Prentice-Hall.
[136] Hogg, R. and Craig, A. (1978) Introduction to Mathematical Statistics MacMillan, New York.
[137] Holte, S., Schmidlin, P., Linden, A., Rosenqvist, G. and Eriksson,
L. (1990) “Iterative image reconstruction for positron emission tomography: a study of convergence and quantitation problems.”IEEE
Transactions on Nuclear Science 37, pp. 629–635.
[138] Hubbard, B. (1998) The World According to Wavelets. Natick, MA:
A K Peters, Inc.
[139] Hudson, H.M. and Larkin, R.S. (1994) “Accelerated image reconstruction using ordered subsets of projection data.”IEEE Transactions
on Medical Imaging 13, pp. 601–609.
[140] Huesman, R., Klein, G., Moses, W., Qi, J., Ruetter, B., and Virador, P. (2000) “List-mode maximum likelihood reconstruction applied to positron emission mammography (PEM) with irregular sampling.”IEEE Transactions on Medical Imaging 19 (5), pp. 532–537.
[141] Hutton, B., Kyme, A., Lau, Y., Skerrett, D., and Fulton, R. (2002)
“A hybrid 3-D reconstruction/registration algorithm for correction of
head motion in emission tomography.”IEEE Transactions on Nuclear
Science 49 (1), pp. 188–194.
384
BIBLIOGRAPHY
[142] Johnson, R. (1960) Advanced Euclidean Geometry. New York: Dover
Publ.
[143] Johnson, C., Hendriks, E., Berezhnoy, I., Brevdo, E., Hughes, S.,
Daubechies, I., Li, J., Postma, E., and Wang, J. (2008) “Image Processing for Artist Identification.” IEEE Signal Processing Magazine,
25(4), pp. 37–48.
[144] Kaczmarz, S. (1937) “Angenäherte Auflösung von Systemen linearer
Gleichungen.”Bulletin de l’Academie Polonaise des Sciences et Lettres
A35, pp. 355–357.
[145] Kaiser, G. (1994) A Friendly Guide to Wavelets. Boston: Birkhäuser.
[146] Kak, A., and Slaney, M. (2001) “Principles of Computerized Tomographic Imaging” , SIAM, Philadelphia, PA.
[147] Kalman, R. (1960) “A new approach to linear filtering and prediction
problems.”Trans. ASME, J. Basic Eng. 82, pp. 35–45.
[148] Katznelson, Y. (1983) An Introduction to Harmonic Analysis. New
York: John Wiley and Sons, Inc.
[149] Kheifets, A. (2004) private communication.
[150] Körner, T. (1988) Fourier Analysis. Cambridge, UK: Cambridge University Press.
[151] Körner, T. (1996) The Pleasures of Counting. Cambridge, UK: Cambridge University Press.
[152] Kullback, S. and Leibler, R. (1951) “On information and sufficiency.”Annals of Mathematical Statistics 22, pp. 79–86.
[153] Landweber, L. (1951) “An iterative formula for Fredholm integral
equations of the first kind.”Amer. J. of Math. 73, pp. 615–624.
[154] Lane, R. (1987) “Recovery of complex images from Fourier magnitude.”Optics Communications 63(1), pp. 6–10.
[155] Lange, K. and Carson, R. (1984) “EM reconstruction algorithms for
emission and transmission tomography.”Journal of Computer Assisted
Tomography 8, pp. 306–316.
[156] Lange, K., Bahn, M. and Little, R. (1987) “A theoretical study of
some maximum likelihood algorithms for emission and transmission
tomography.”IEEE Trans. Med. Imag. MI-6(2), pp. 106–114.
BIBLIOGRAPHY
385
[157] Leahy, R., Hebert, T., and Lee, R. (1989) “Applications of Markov
random field models in medical imaging.”in Proceedings of the Conference on Information Processing in Medical Imaging Lawrence-Berkeley
Laboratory, Berkeley, CA.
[158] Leahy, R. and Byrne, C. (2000) “Guest editorial: Recent development
in iterative image reconstruction for PET and SPECT.”IEEE Trans.
Med. Imag. 19, pp. 257–260.
[159] Lent, A. (1998) private communication.
[160] Levitan, E. and Herman, G. (1987) “A maximum a posteriori probability expectation maximization algorithm for image reconstruction in
emission tomography.”IEEE Transactions on Medical Imaging 6, pp.
185–192.
[161] Liao, C.-W., Fiddy, M., and Byrne, C. (1997) “Imaging from the zero
locations of far-field intensity data.”Journal of the Optical Society of
America -A 14 (12), pp. 3155–3161.
[162] Lindberg, D. (1992) The Beginnings of Western Science, University
of Chicago Press.
[163] Luenberger, D. (1969) Optimization by Vector Space Methods. New
York: John Wiley and Sons, Inc.
[164] Lustig, M., Donoho, D., and Pauly, J. (2008) Magnetic Resonance in
Medicine, to appear.
[165] Magness, T., and McQuire, J. (1962) “Comparison of least squares
and minimum variance estimates of regression parameters.”Annals of
Mathematical Statistics 33, pp. 462–470.
[166] Mallat, S.G. (1989) “A theory of multiresolution signal decomposition: The wavelet representation.”IEEE Transactions on Pattern
Analysis and Machine Intelligence PAMI-11, pp. 674–693.
[167] Mann, W. (1953) “Mean value methods in iteration.”Proc. Amer.
Math. Soc. 4, pp. 506–510.
[168] McClellan, J., Schafer, R., and Yoder, M. (2003) Signal Processing
First. Upper Saddle River, New Jersey: Prentice Hall, Inc.
[169] McLachlan, G.J. and Krishnan, T. (1997) The EM Algorithm and
Extensions. New York: John Wiley and Sons, Inc.
386
BIBLIOGRAPHY
[170] Meidunas, E. (2001) Re-scaled Block Iterative Expectation Maximization Maximum Likelihood (RBI-EMML) Abundance Estimation and Sub-pixel Material Identification in Hyperspectral Imagery,
MS thesis, Department of Electrical Engineering, University of Massachusetts Lowell.
[171] Meyer, Y. (1993) Wavelets: Algorithms and Applications. Philadelphia, PA: SIAM Publ.
[172] Mooney, J., Vickers, V., An, M., and Brodzik, A. (1997) “Highthroughput hyperspectral infrared camera.”Journal of the Optical Society of America, A 14 (11), pp. 2951–2961.
[173] Motzkin, T. and Schoenberg, I. (1954) “The relaxation method for
linear inequalities.”Canadian Journal of Mathematics 6, pp. 393–404.
[174] Narayanan, M., Byrne, C. and King, M. (2001) “An interior point
iterative maximum-likelihood reconstruction algorithm incorporating
upper and lower bounds with application to SPECT transmission
imaging.”IEEE Transactions on Medical Imaging TMI-20 (4), pp.
342–353.
[175] Nash, S. and Sofer, A. (1996) Linear and Nonlinear Programming.
New York: McGraw-Hill.
[176] Natterer, F. (1986) Mathematics of Computed Tomography. New
York: John Wiley and Sons, Inc.
[177] Natterer, F., and Wübbeling, F. (2001) Mathematical Methods in
Image Reconstruction. Philadelphia, PA: SIAM Publ.
[178] Nelson, R. (2001) “Derivation of the missing cone.”unpublished
notes.
[179] Oppenheim, A. and Schafer, R. (1975) Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
[180] Papoulis, A. (1975) “A new algorithm in spectral analysis and bandlimited extrapolation.”IEEE Transactions on Circuits and Systems 22,
pp. 735–742.
[181] Papoulis, A. (1977) Signal Analysis. New York: McGraw-Hill.
[182] Parra, L. and Barrett, H. (1998) “List-mode likelihood: EM algorithm and image quality estimation demonstrated on 2-D PET.”IEEE
Transactions on Medical Imaging 17, pp. 228–235.
BIBLIOGRAPHY
387
[183] Paulraj, A., Roy, R., and Kailath, T. (1986) “A subspace rotation
approach to signal parameter estimation.”Proceedings of the IEEE 74,
pp. 1044–1045.
[184] Peressini, A., Sullivan, F., and Uhl, J. (1988) The Mathematics of
Nonlinear Programming. Berlin: Springer-Verlag.
[185] Pelagotti, A., Del Mastio, A., De Rosa, A., Piva, A. (2008) “Multispectral imaging of paintings.” IEEE Signal Processing Magazine,
25(4), pp. 27–36.
[186] Pisarenko, V. (1973) “The retrieval of harmonics from a covariance
function.”Geoph. J. R. Astrom. Soc., 30.
[187] Pižurica, A., Philips, W., Lemahieu, I., and Acheroy, M. (2003)
“A versatile wavelet domain noise filtration technique for medical
imaging.”IEEE Transactions on Medical Imaging: Special Issue on
Wavelets in Medical Imaging 22, pp. 323–331.
[188] Poggio, T. and Smale, S. (2003) “The mathematics of learning: dealing with data.”Notices of the American Mathematical Society 50 (5),
pp. 537–544.
[189] Priestley, M. B. (1981) Spectral Analysis and Time Series. Boston:
Academic Press.
[190] Prony, G.R.B. (1795) “Essai expérimental et analytique sur les lois
de la dilatabilité de fluides élastiques et sur celles de la force expansion
de la vapeur de l’alcool, à différentes températures.”Journal de l’Ecole
Polytechnique (Paris) 1(2), pp. 24–76.
[191] Qian, H. (1990) “Inverse Poisson transformation and shot noise filtering.”Rev. Sci. Instrum. 61, pp. 2088–2091.
[192] Ribés, A., Pillay, R., Schmitt, F., and Lahanier,C. (2008) “Studying
that smile.” IEEE Signal Processing Magazine, 25(4), pp. 14–26.
[193] Rockafellar, R. (1970) Convex Analysis. Princeton, NJ: Princeton
University Press.
[194] Rockmore, A., and Macovski, A. (1976) “A maximum likelihood
approach to emission image reconstruction from projections.” IEEE
Transactions on Nuclear Science, NS-23, pp. 1428–1432.
[195] Schmidlin, P. (1972) “Iterative separation of sections in tomographic
scintigrams.”Nucl. Med. 15(1).
[196] Schmidt, R. (1981) A Signal Subspace Approach to Multiple Emitter
Location and Spectral Estimation. PhD thesis, Stanford University.
388
BIBLIOGRAPHY
[197] Schultz, L., Blanpied, G., Borozdin, K., et al. (2007) “Statistical
reconstruction for cosmic ray muon tomography.” IEEE Transactions
on Image Processing, 16(8), pp. 1985–1993.
[198] Schuster, A. (1898) “On the investigation of hidden periodicities with
application to a supposed 26 day period of meteorological phenomena.”Terrestrial Magnetism 3, pp. 13–41.
[199] Shang, E. (1985) “Source depth estimation in waveguides.”Journal
of the Acoustical Society of America 77, pp. 1413–1418.
[200] Shang, E. (1985) “Passive harmonic source ranging in waveguides by
using mode filter.”Journal of the Acoustical Society of America 78,
pp. 172–175.
[201] Shang, E., Wang, H., and Huang, Z. (1988) “Waveguide characterization and source localization in shallow water waveguides using Prony’s
method.”Journal of the Acoustical Society of America 83, pp. 103–106.
[202] Shaw, C. (2010) “Dimensions in medical imaging: the more the better?” Proceedings of the IEEE, 98(1), pp. 2–5.
[203] Shepp, L., and Vardi, Y. (1982) “Maximum likelihood reconstruction
for emission tomography.” IEEE Transactions on Medical Imaging,
MI-1, pp. 113–122.
[204] Shieh, M., Byrne, C., Testorf, M., and Fiddy, M. (2006) “Iterative
image reconstruction using prior knowledge.” Journal of the Optical
Society of America, A, 23(6), pp. 1292–1300.
[205] Smith, C. Ray and Grandy, W.T., editors (1985) Maximum-Entropy
and Bayesian Methods in Inverse Problems. Dordrecht: Reidel Publ.
[206] Smith, C. Ray and Erickson, G., editors (1987) Maximum-Entropy
and Bayesian Spectral Analysis and Estimation Problems. Dordrecht:
Reidel Publ.
[207] Sondhi, M. (2006) “The History of Echo Cancellation.” IEEE Signal
Processing Magazine, September 2006, pp. 95–102.
[208] Sondhi, M., Morgan, D., and Hall, J. (1995) “Stereophonic acoustic
echo cancellation- an overview of the fundamental problem.” IEEE
Signal Processing Letters, 2(8), pp. 148–151.
[209] Stark, H., and Woods, J. (2002) Probability and Random Processes, with Applications to Signal Processing. Upper Saddle River,
NJ: Prentice-Hall.
BIBLIOGRAPHY
389
[210] Stark, H. and Yang, Y. (1998) Vector Space Projections: A Numerical
Approach to Signal and Image Processing, Neural Nets and Optics.
New York: John Wiley and Sons, Inc.
[211] Strang, G. (1980) Linear Algebra and its Applications. New York:
Academic Press.
[212] Strang, G. and Nguyen, T. (1997) Wavelets and Filter Banks. Wellesley, MA: Wellesley-Cambridge Press.
[213] Tanabe, K. (1971) “Projection method for solving a singular system
of linear equations and its applications.”Numer. Math. 17, pp. 203–
214.
[214] Therrien, C. (1992) Discrete Random Signals and Statistical Signal
Processing. Englewood Cliffs, NJ: Prentice-Hall.
[215] Thévenaz, P., Blu, T., and Unser, M. (2000) “Interpolation revisited.” IEEE Transactions on Medical Imaging, 19, pp.739–758.
[216] Tindle, C., Guthrie, K., Bold, G., Johns, M., Jones, D., Dixon, K.,
and Birdsall, T. (1978) “Measurements of the frequency dependence
of normal modes.”Journal of the Acoustical Society of America 64,
pp. 1178–1185.
[217] Tolstoy, A. (1993) Matched Field Processing for Underwater Acoustics. Signapore: World Scientific.
[218] Twomey, S. (1996) Introduction to the Mathematics of Inversion in
Remote Sensing and Indirect Measurement. New York: Dover Publ.
[219] Unser, M. (1999) “Splines: A perfect fit for signal and image processing.” IEEE Signal Processing Magazine, 16, pp. 22–38.
[220] Van Trees, H. (1968) Detection, Estimation and Modulation Theory.
New York: John Wiley and Sons, Inc.
[221] Vardi, Y., Shepp, L.A. and Kaufman, L. (1985) “A statistical model
for positron emission tomography.”Journal of the American Statistical
Association 80, pp. 8–20.
[222] Walnut, D. (2002) An Introduction to Wavelets. Boston: Birkhäuser.
[223] Wernick, M. and Aarsvold, J., editors (2004) Emission Tomography:
The Fundamentals of PET and SPECT. San Diego: Elsevier Academic
Press.
[224] Widrow, B. and Stearns, S. (1985) Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
390
BIBLIOGRAPHY
[225] Wiener, N. (1949) Time Series. Cambridge, MA: MIT Press.
[226] Wright, W., Pridham, R., and Kay, S. (1981) “Digital signal processing for sonar.”Proc. IEEE 69, pp. 1451–1506.
[227] Yang, T.C. (1987) “A method of range and depth estimation by
modal decomposition.”Journal of the Acoustical Society of America
82, pp. 1736–1745.
[228] Yin, W., and Zhang, Y. (2008) “Extracting salient features from
less data via l1 -minimization.”SIAG/OPT Views-and-News, 19(1),
pp. 11–19.
[229] Youla, D. (1978) “Generalized image restoration by the method of
alternating projections.”IEEE Transactions on Circuits and Systems
CAS-25 (9), pp. 694–702.
[230] Youla, D.C. (1987) “Mathematical theory of image restoration by the
method of convex projections.”in Image Recovery: Theory and Applications, pp. 29–78, Stark, H., editor (1987) Orlando FL: Academic
Press.
[231] Young, R. (1980) An Introduction to Nonharmonic Fourier Analysis.
Boston: Academic Press.
[232] Zeidler, E. (1990) Nonlinear Functional Analysis and its Applications:
II/B- Nonlinear Monotone Operators. Berlin: Springer-Verlag.
Index
AT , 331
A† , 331, 332
χΩ (ω), 117
δ(x), 119
-sparse matrix, 339
causal function, 118
causal system, 139
Central Slice Theorem, 364
characteristic function, 296
characteristic function of a set, 117
chirp signal, 237
adaptive filter, 267
coherent summation, 67
adaptive interference cancellation, 316 complex conjugate, 59
aliasing, 35, 228
complex dot product, 281, 335
analytic signal, 236
complex exponential function, 63
aperture, 33
complex numbers, 59
approximate delta function, 119
compressed sampling, 351
array aperture, 87, 162, 164
compressed sensing, 351
ART, 333
conjugate transpose, 332
autocorrelation, 137, 178, 187, 191, convolution, 117, 122, 130, 141, 153
208, 311
convolution of sequences, 134
autoregressive process, 188
Cooley, 151
correlated noise, 71, 275
back-projection, 368
correlation, 186, 275
backscatter, 7
correlation matrix, 186
band-limited extrapolation, 22
covariance matrix, 186, 272
band-limiting, 132
basic variable, 330
basic wavelet, 245
basis, 227, 328
beam-hardening, 363
best linear unbiased estimator, 261
BLUE, 261, 262, 272
Bochner, 198
bounded sequence, 138
broadband signals, 29
Burg, 191
Cauchy’s inequality, 281
Cauchy-Schwarz inequality, 281, 285
causal filter, 313
data consistency, 193, 299
demodulation, 236
detection, 271
DFT, 22, 68, 104, 143, 153, 187, 198,
213
DFT matrix, 144
diffraction grating, 80
dimension of a subspace, 329
Dirac delta, 119
direct problems, 8
directionality, 77
Dirichlet kernel, 70
discrete convolution, 134
391
392
INDEX
discrete Fourier transform, 22, 68, Helmholtz equation, 84, 159
104
Herglotz, 198
discrete-time Fourier transform, 144 Hermitian, 287, 334
dot product, 281, 283
Hermitian matrix, 331
DTFT, 144
hertz, 102
dyad, 344
Hessian matrix, 344
Hilbert transform, 118, 370
eigenvalue, 209, 331, 334, 339
Horner’s method, 151
eigenvector, 188, 209, 287, 299, 331,
334
imaginary part, 59
emission tomography, 339
impulse-response function, 129
ESPRIT, 207
incoherent bases, 352
Euler, 65
indirect measurement, 13
even part, 118
inner function, 320
expected squared error, 263, 312
inner product, 281, 283, 284
inner-outer factorization, 320
far-field assumption, 23
inner-product space, 284
fast Fourier transform, 45, 69, 109, integral wavelet transform, 245
143, 151
interference, 208
father wavelet, 248
Inverse Fourier transform, 97
FFT, 45, 109, 143, 149, 151, 187
inverse Fourier transform, 115
filtered back-projection, 368
inverse problems, 8
finite impulse response filter, 254, 314 IPDFT, 213
FIR filter, 314
Fourier coefficients, 144
Jacobian, 344
Fourier cosine transform, 25
Kalman filter, 269
Fourier integral, 97
Fourier Inversion Formula, 101, 115 Katznelson, 198
Fourier series, 101
Laplace transform, 121
Fourier sine transform, 25
least mean square algorithm, 315
Fourier transform, 32, 97, 115, 158
least squares solution, 265, 338
Fourier-transform pair, 115
least-squares, 288
frame, 227
Levinson’s algorithm, 197
frame operator, 231
line array, 86, 161
Fraunhofer lines, 7
frequency-domain extrapolation, 123 linear filter, 188
frequency-response function, 122, 130 linear independence, 328
logarithm of a complex number, 65
Gabor windows, 240
main lobe, 31
gain, 273
Markov chain, 17
Gram-Schmidt, 284
matched filter, 228, 282
grating lobes, 31
matched filtering, 282
Haar wavelet, 245, 246
matching, 281
Heaviside function, 117
matrix differentiation, 343
INDEX
393
matrix inverse, 334
matrix-inversion identity, 307
maximum entropy, 188, 191
maximum entropy method, 188
MDFT, 27, 107, 296
MEM, 188, 191, 213
minimum norm solution, 333, 338
minimum phase, 216
minimum-phase, 194
modified DFT, 27, 107, 296
modulation transfer function, 122
moving average, 188
multiresolution analysis, 247
MUSIC, 207
planar sensor array, 86, 161
planewave, 84, 85, 160
point-spread function, 122
Poisson summation, 103
positive-definite, 287, 334
positive-definite sequence, 198
power spectrum, 137, 180, 187, 191,
277, 312
pre-whitening, 210
prediction error, 192
predictor-corrector methods, 269
prewhitening, 263, 274
Prony, 73
pseudo-inverse, 338
narrowband cross-ambiguity function,
236
narrowband signal, 162, 236
noise power, 272
noise power spectrum, 277
non-iterative band-limited extrapolation, 301
non-periodic convolution, 141, 142
nonnegative-definite, 334
norm, 283, 285
Nyquist spacing, 33, 167
quadratic form, 299, 333, 347
odd part, 118
optical transfer function, 122
optimal filter, 272
orthogonal, 246, 282, 283, 285, 334
orthogonal wavelet, 247
orthogonality principle, 104, 287
orthonormal, 328
outer function, 320
over-sampling, 295
sampling, 167
sampling frequency, 46
sampling rate, 102
SAR, 33
scaling function, 248
scaling relation, 249
Schwartz class, 124
Schwartz function, 124
separation of variables, 83, 159
sgn, 117
Shannon MRA, 247
Shannon’s Sampling Theorem, 102,
164, 168
shift-invariant system, 128
short-time Fourier transform, 240
sign function, 117
signal analysis, 226
Parseval’s equation, 103
Parseval-Plancherel Equation, 120
PDFT, 108, 213, 277, 297
periodic convolution, 141
PET, 339
phase problem, 89
phase steering, 82
radar, 233
radial function, 114
Radon transform, 364
rank of a matrix, 330
real part, 59
reciprocity principle, 158
recursive least squares, 316
remote sensing, 13, 83, 159
resolution, 71
394
signal power, 272
signal-to-noise ratio, 182, 272
SILO, 128
sinc, 299
sinc function, 158
singular value, 336, 339
singular value decomposition, 336
sinusoid, 66
sinusoidal functions, 145
SNR, 272
span, 328
spanning set, 328
sparse matrix, 339
SPECT, 339
spectral analysis, 7
spectral radius, 339
spectrum, 145
stable, 138
state vector, 268
stationarity, 309
SVD, 336
symmetric matrix, 331
synthetic-aperture radar, 33, 164
system transfer function, 122
Szegö’s theorem, 192
three-point moving average, 136
tight frame, 230
time-frequency analysis, 240
time-frequency window, 241
time-harmonic solutions, 84
trace, 263, 335, 345
transfer function, 130
transition probability, 17
transmission tomography, 339
triangle inequality, 282
trigonometric polynomial, 43
Tukey, 151
unbiased, 262
uncorrelated, 286
uniform line array, 167, 168
vDFT, 108, 143
INDEX
vector DFT, 108, 143
vector differentiation, 343
vector discrete Fourier transform, 143
vector Wiener filter, 305, 307
visible region, 34
Viterbi algorithm, 17
wave equation, 83, 159
wavelength, 24
wavelet, 246, 289
wavevector, 84, 160
weak-sense stationary, 179
Weierstrass approximation theorem,
226
white noise, 182, 186, 274
wideband cross-ambiguity function,
235
Wiener filter, 213, 309, 312
Wiener-Hopf equations, 314
Wigner-Ville distribution, 241
window, 240
z-transform, 138, 185
zero-padding, 111, 147, 148, 154
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement