tomba ivan tesi

tomba ivan tesi
Alma Mater Studiorum
Università di Bologna
Dottorato di Ricerca in
MATEMATICA
Ciclo XXV
Settore Concorsuale di afferenza: 01/A5
Settore Scientifico disciplinare: MAT/08
Iterative regularization methods
for ill-posed problems
Tesi di Dottorato presentata da: Ivan Tomba
Coordinatore Dottorato:
Chiar.mo Prof.
Alberto Parmeggiani
Relatore:
Chiar.ma Prof.ssa
Elena Loli Piccolomini
Esame Finale anno 2013
Contents
Introduction
vii
1 Regularization of ill-posed problems in Hilbert spaces
1
1.1
Fundamental notations . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Differentiation as an inverse problem . . . . . . . . . . . . . .
2
1.3
Abel integral equations . . . . . . . . . . . . . . . . . . . . . .
6
1.4
Radon inversion (X-ray tomography) . . . . . . . . . . . . . .
7
1.5
Integral equations of the first kind . . . . . . . . . . . . . . . .
9
1.6
Hadamard’s definition of ill-posed problems . . . . . . . . . . 11
1.7
Fundamental tools in the Hilbert space setting . . . . . . . . . 12
1.7.1
Basic definitions and notations
. . . . . . . . . . . . . 12
1.7.2
The Moore-Penrose generalized inverse . . . . . . . . . 13
1.8
Compact operators: SVD and the Picard criterion . . . . . . . 17
1.9
Regularization and Bakushinskii’s Theorem
. . . . . . . . . . 20
1.10 Construction and convergence of regularization methods . . . 22
1.11 Order optimality . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.12 Regularization by projection . . . . . . . . . . . . . . . . . . . 28
1.12.1 The Seidman example (revisited) . . . . . . . . . . . . 29
1.13 Linear regularization: basic results . . . . . . . . . . . . . . . 32
1.14 The Discrepancy Principle . . . . . . . . . . . . . . . . . . . . 36
1.15 The finite dimensional case: discrete ill-posed problems . . . . 38
1.16 Tikhonov regularization . . . . . . . . . . . . . . . . . . . . . 41
1.17 The Landweber iteration . . . . . . . . . . . . . . . . . . . . . 44
i
ii
CONTENTS
2 Conjugate gradient type methods
51
2.1
Finite dimensional introduction . . . . . . . . . . . . . . . . . 52
2.2
General definition in Hilbert spaces . . . . . . . . . . . . . . . 59
2.3
The algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.3.1
The minimal residual method (MR) and the conjugate
gradient method (CG) . . . . . . . . . . . . . . . . . . 66
2.4
2.3.2
CGNE and CGME . . . . . . . . . . . . . . . . . . . . 67
2.3.3
Cheap Implementations . . . . . . . . . . . . . . . . . 69
Regularization theory for the conjugate gradient type methods 72
2.4.1
Regularizing properties of MR and CGNE . . . . . . . 74
2.4.2
Regularizing properties of CG and CGME . . . . . . . 78
2.5
Filter factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.6
CGNE, CGME and the Discrepancy Principle . . . . . . . . . 81
2.6.1
Test 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.6.2
Test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.6.3
Test 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.7
CGNE vs. CGME . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.8
Conjugate gradient type methods with parameter n=2 . . . . 92
2.8.1
Numerical results . . . . . . . . . . . . . . . . . . . . . 93
3 New stopping rules for CGNE
97
3.1
Residual norms and regularizing properties of CGNE . . . . . 98
3.2
SR1: Approximated Residual L-Curve Criterion . . . . . . . . 102
3.3
SR1: numerical experiments . . . . . . . . . . . . . . . . . . . 107
3.4
3.3.1
Test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.3.2
Test 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
SR2: Projected Data Norm Criterion . . . . . . . . . . . . . . 110
3.4.1
Computation of the index p of the SR2 . . . . . . . . . 113
3.5
SR2: numerical experiments . . . . . . . . . . . . . . . . . . . 114
3.6
Image deblurring . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.7
SR3: Projected Noise Norm Criterion . . . . . . . . . . . . . . 121
3.8
Image deblurring: numerical experiments . . . . . . . . . . . . 124
CONTENTS
iii
3.8.1
Test 1 (gimp test problem) . . . . . . . . . . . . . . . . 125
3.8.2
Test 2 (pirate test problem) . . . . . . . . . . . . . . . 127
3.8.3
Test 3 (satellite test problem) . . . . . . . . . . . . . . 128
3.8.4
The new stopping rules in the Projected Restarted CGNE129
4 Tomography
4.1
4.2
133
The classical Radon Transform . . . . . . . . . . . . . . . . . 133
4.1.1
The inversion formula . . . . . . . . . . . . . . . . . . 136
4.1.2
Filtered backprojection . . . . . . . . . . . . . . . . . . 139
The Radon Transform over straight lines . . . . . . . . . . . . 141
4.2.1
The Cone Beam Transform
. . . . . . . . . . . . . . . 143
4.2.2
Katsevich’s inversion formula . . . . . . . . . . . . . . 145
4.3
Spectral properties of the integral operator . . . . . . . . . . . 149
4.4
Parallel, fan beam and helical scanning . . . . . . . . . . . . . 151
4.5
4.6
4.4.1
2D scanning geometry . . . . . . . . . . . . . . . . . . 152
4.4.2
3D scanning geometry . . . . . . . . . . . . . . . . . . 154
Relations between Fourier and singular functions
. . . . . . . 154
4.5.1
The case of the compact operator . . . . . . . . . . . . 155
4.5.2
Discrete ill-posed problems . . . . . . . . . . . . . . . . 156
Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 158
4.6.1
Fanbeamtomo . . . . . . . . . . . . . . . . . . . . . . . 159
4.6.2
Seismictomo . . . . . . . . . . . . . . . . . . . . . . . . 160
4.6.3
Paralleltomo . . . . . . . . . . . . . . . . . . . . . . . . 161
5 Regularization in Banach spaces
163
5.1
A parameter identification problem for an elliptic PDE . . . . 164
5.2
Basic tools in the Banach space setting . . . . . . . . . . . . . 167
5.3
5.2.1
Basic mathematical tools . . . . . . . . . . . . . . . . . 167
5.2.2
Geometry of Banach space norms . . . . . . . . . . . . 168
5.2.3
The Bregman distance . . . . . . . . . . . . . . . . . . 174
Regularization in Banach spaces . . . . . . . . . . . . . . . . . 176
5.3.1
Minimum norm solutions . . . . . . . . . . . . . . . . . 176
iv
CONTENTS
5.4
5.3.2
Regularization methods . . . . . . . . . . . . . . . . . 178
5.3.3
Source conditions and variational inequalities . . . . . 180
Iterative regularization methods . . . . . . . . . . . . . . . . . 181
5.4.1
The Landweber iteration: linear case . . . . . . . . . . 182
5.4.2
The Landweber iteration: nonlinear case . . . . . . . . 185
5.4.3
The Iteratively Regularized Landweber method . . . . 186
5.4.4
The Iteratively Regularized Gauss-Newton method . . 189
6 A new Iteratively Regularized Newton-Landweber iteration193
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.2
Error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.3
Parameter selection for the method . . . . . . . . . . . . . . . 202
6.4
Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.5
Convergence rates with an a-priori stopping rule . . . . . . . . 211
6.6
Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 213
6.7
A new proposal for the choice of the parameters . . . . . . . . 219
6.7.1
Convergence rates in case ν > 0 . . . . . . . . . . . . . 224
6.7.2
Convergence as n → ∞ for exact data δ = 0 . . . . . . 224
6.7.3
Convergence with noisy data as δ → 0 . . . . . . . . . 225
6.7.4
Newton-Iteratively Regularized Landweber algorithm . 227
Conclusions
229
A Spectral theory in Hilbert spaces
231
B Approximation of a finite set of data with cubic B-splines 235
B.1 B-splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
B.2 Data approximation
C The algorithms
. . . . . . . . . . . . . . . . . . . . . . . 236
239
C.1 Test problems from P. C. Hansen’s Regularization Tools . . . 239
C.2 Conjugate gradient type methods algorithms . . . . . . . . . . 241
C.3 The routine data− approx . . . . . . . . . . . . . . . . . . . . . 242
CONTENTS
v
C.4 The routine mod− min− max . . . . . . . . . . . . . . . . . . . . 243
C.5 Data and files for image deblurring . . . . . . . . . . . . . . . 244
C.6 Data and files for the tomographic problems . . . . . . . . . . 245
D CGNE and rounding errors
247
Bibliography
249
Acknowledgements
i
Introduction
Inverse and ill-posed problems are nowadays a very important field of research in applied mathematics and numerical analysis. The main reason for
this large interest is the wide number of applications, ranging from medical imaging, via material testing, seismology, inverse scattering and financial
mathematics, to weather forecasting, just to cite some of the most famous.
Typically, in these problems some fundamental information is not available
and the solution does not depend continuously on the data. As a consequence
of this lack of stability, even very small errors in the data can cause very large
errors in the results. Thus, the problems have to be regularized by inserting
some additional information in the data to obtain reasonable approximations
of the sought for solution. On the other hand, it is important to keep the
computational cost of the corresponding algorithms as low as possible, since
in practical applications the total amount of data to be processed is usually
very high.
The main topic of this Ph.D thesis is the regularization of ill-posed problems
by means of iterative regularization techniques. The principal advantage of
the iterative methods is that the regularized solutions are obtained by arresting the methods at an early stage, which often allows to spare time in
the computations. On the other side, the main difficulty in their use is the
choice of the stopping index of the iteration: an early stopping produces an
over-regularized solution, whereas a late stopping computes a noisy solution.
In particular, we shall focus on the conjugate gradient type methods for
regularizing linear ill-posed problems from the classical Hilbert space setvii
viii
Introduction
ting point of view, and on a new inner-outer Newton-Iteratively Regularized
Landweber method for solving nonlinear ill-posed problems in the Banach
space framework.
Regarding the conjugate gradient type methods, we propose three new automatic1 stopping rules for the Conjugate Gradient method applied to the
Normal Equation in the discrete setting, based on the regularizing properties
of the method in the continuous setting. These stopping rules are tested in
a series of numerical simulations, including some problems of tomographic
images reconstruction.
Regarding the Newton-Iteratively Regularized Landweber method, we define
both the iteration and the stopping rules showing convergence and convergence rates results.
In detail, the thesis is constituted by six chapters.
• In Chapter 1 we recall the basic notions of the regularization theory in
the Hilbert space framework. Revisiting the regularization theory, we
mainly follow the famous book of Engl, Hanke and Neubauer [17]. Some
examples are added, some others corrected and some proofs completed.
• Chapter 2 is dedicated to the definition of the conjugate gradient type
methods and the analysis of their regularizing properties. A comparison
of these methods is made by means of numerical simulations.
• In Chapter 3 we motivate, define and analyze the new stopping rules
for the Conjugate Gradient applied to the Normal Equation. The stopping rules are tested in many different examples, including some image
deblurring test problems.
• In Chapter 4 we consider some applications in tomographic problems.
Some theoretical properties of the Radon Transform are studied and
then used in the numerical tests to implement the stopping rules defined
in Chapter 3.
1
i.e. that can be defined precisely by a software
Introduction
ix
• Chapter 5 is a survey of the regularization theory in the Banach space
framework. The main advantages of working in a Banach space setting
instead of a Hilbert space setting are explained in the introduction.
Then, the fundamental tools and results of this framework are summarized following [82]. The regularizing properties of some important
iterative regularization methods in the Banach space framework, such
as the Landweber and Iteratively Regularized Landweber methods and
the Iteratively Regularized Gauss-Newton method, are described in the
last section.
• The main results about the new inner-outer Newton-Iteratively Regularized Landweber iteration are presented in the conclusive part of the
thesis, in Chapter 6.
x
Introduction
Chapter 1
Regularization of ill-posed
problems in Hilbert spaces
The fundamental background of this thesis is the regularization theory for
(linear) ill-posed problems in the Hilbert space setting. In this introductory
chapter we are going to revisit and summarize the basic concepts of this
theory, which is nowadays well-established. To this end, we will mainly
follow the famous book of Engl, Hanke and Neubauer [17].
Starting from some very famous examples of inverse problems (differentiation and integral equations of the first kind) we will review the notions of
regularization method, stopping rule and order optimality. Then we will
consider a class of finite dimensional problems arising from the discretization
of ill-posed problems, the so called discrete ill-posed problems. Finally, in
the last two sections of the chapter we will recall the basic properties of the
Tikhonov and the Landweber methods.
Apart from [17], general references for this chapter are [20], [21], [22], [61],
[62], [90] and [91] and, concerning the part about finite dimensional problems,
[36] and the references therein.
1
2
1.1
1. Regularization of ill-posed problems in Hilbert spaces
Fundamental notations
In this section, we fix some important notations that will we used throughout
the thesis.
First of all, we shall denote by Z, R, C, the sets of the integer, real and
complex numbers respectively. In C, the imaginary unit will be denoted by
the symbol ı. The set of strictly positive integers will be denoted by N or
Z+ , the set of positive real numbers by R+ .
If not stated explicitly, we shall denote by h·, ·i and by k · k the standard
euclidean scalar product and euclidean norm on the cartesian product RD ,
D ∈ N respectively. Moreover, SD−1 := {x ∈ RD | kxk = 1} will be the unit
sphere in RD .
For i and j ∈ N, we will denote by Mi,j (R) (respectively, Mi,j (C)) the space
of all matrices with i rows and j columns with entries in R (respectively,
C) and by GLj (R) (respectively, GLj (C)) the space of all square invertible
matrices on R (C).
For an appropriate subset Ω ⊆ RD , D ∈ N, C(Ω) will be the set of continuous functions on Ω. Analogously, C k (Ω) will denote the set of differentiable
functions with k continuous derivatives on Ω, k = 1, 2, ..., ∞, and C0k (Ω)
will be the corresponding sets of functions with compact support. For p
∈ [1, ∞], we will write Lp (Ω) for the Lebesgue spaces with index p on Ω,
and W p,k (Ω) for the corresponding Sobolev spaces, with the special cases
Hk (Ω) = W 2,k (Ω). The space of rapidly decreasing functions on RD will be
denoted by S(RD ).
1.2
Differentiation as an inverse problem
In this section we present a fundamental example for the study of ill-posed
problems: the computation of the derivative of a given differentiable function.
Let f be any function in C 1 ([0, 1]). For every δ ∈ (0, 1) and every n ∈ N define
fnδ (t) := f (t) + δ sin
nt
, t ∈ [0, 1].
δ
(1.1)
1.2 Differentiation as an inverse problem
3
Then
d
nt
d δ
fn (t) = f (t) + n cos , t ∈ [0, 1],
dt
dt
δ
hence, in the uniform norm,
kf − fnδ kC([0,1]) = δ
and
d
d
f − fnδ kC([0,1]) = n.
dt
dt
δ
Thus, if we consider f and fn the exact and perturbed data, respectively, of
k
the problem compute the derivative
df
dt
of the data f , for an arbitrary small
perturbation of the data δ we can obtain an arbitrary large error n in the
result. Equivalently, the operator
d
: (C 1 ([0, 1]), k kC([0,1]) ) −→ (C([0, 1]), k kC([0,1]) )
dt
is not continuous. Of course, it is possible to enforce continuity by measuring
the data in the C 1 -norm, but this would be like cheating, since to calculate
the error in the data one should calculate the derivative, namely the result.
It is important to notice that
df
dt
K[x](s) :=
Z
solves the integral equation
s
0
x(t)dt = f (s) − f (0),
(1.2)
i.e. the result can be obtained by inverting the operator K. More precisely,
we have:
Proposition 1.2.1. The linear operator K : C([0, 1]) −→ C([0, 1]) defined
by (1.2) is continuous, injective and surjective onto the linear subspace of
C([0, 1]) denoted by W := {x ∈ C 1 ([0, 1]) | x(0) = 0}. The inverse of K
d
: W −→ C([0, 1])
dt
is unbounded.
If K is restricted to
Sγ := {x ∈ C 1 ([0, 1]) | kxkC([0,1]) + k
then (K|Sγ )−1 is bounded.
dx
kC([0,1]) ≤ γ, γ > 0},
dt
4
1. Regularization of ill-posed problems in Hilbert spaces
Proof. The first part is obvious. For the second part, it is enough to observe
that kxkC([0,1]) + k dx
k
≤ γ ensures that Sγ is bounded and equicontinudt C([0,1])
ous in C([0, 1]), thus, according to the Ascoli-Arzelà Theorem, Sγ is relatively
compact. Hence (K|Sγ )−1 is bounded because it is the inverse of a bijective
and continuous operator defined on a relatively compact set.
The last statement says that we can restore stability by assuming a-priori
bounds for f ′ and f ′′ .
Suppose now we want to calculate the derivative of f via central difference
quotients with step size σ and let f δ be its noisy version with
kf δ − f kC([0,1]) ≤ δ.
(1.3)
If f ∈ C 2 [0, 1], a Taylor expansion gives
f (t + σ) − f (t − σ)
= f ′ (t) + O(σ),
2σ
but if f ∈ C 3 [0, 1] the second derivative can be eliminated, thus
f (t + σ) − f (t − σ)
= f ′ (t) + O(σ 2).
2σ
Remembering that we are dealing with perturbed data
f (t + σ) − f (t − σ) δ
f δ (t + σ) − f δ (t − σ)
∼
+ ,
2σ
2σ
σ
the total error behaves like
δ
,
σ
where ν = 1, 2 if f ∈ C 2 [0, 1] or f ∈ C 3 [0, 1] respectively.
O(σ ν ) +
(1.4)
A remarkable consequence of this is that for fixed δ, when σ is too small,
the total error is large, because of the term
δ
,
σ
the propagated data error.
Moreover, there exists an optimal discretization parameter σ ♯ , which cannot
be computed explicitly, since it depends on unavailable information about
the exact data, i.e. the smoothness.
However, if σ ∼ δ µ one can search the power µ that minimizes the total error
with respect to δ, obtaining µ =
1
2
if f ∈ C 2 [0, 1] and µ =
1
3
if f ∈ C 3 [0, 1],
1.2 Differentiation as an inverse problem
5
The behavior of the Total Error in ill−posed problems
1.4
Approximation Error
Propagated Data Error
Total Error
1.2
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Figure 1.1: The typical behavior of the total error in ill-posed problems
√
2
with a resulting total error of the order of O( δ) and O(δ 3 ) respectively.
2
Thus, in the best case, the total error O(δ 3 ) tends to 0 slower than the
data error δ and it can be shown that this result cannot be improved unless
f is a quadratic polynomial: this means that there is an intrinsic loss of
information.
Summing up, in this simple example we have seen some important features
concerning ill-posed problems:
• amplification of high-frequency errors;
• restoration of stability by a-priori assumptions;
• two error terms of different nature, adding up to a total error as in
Figure 1.1;
• appearance of an optimal discretization parameter, depending on apriori information;
6
1. Regularization of ill-posed problems in Hilbert spaces
• loss of information even under optimal circumstances.
1.3
Abel integral equations
When dealing with inverse problems, one has often to solve an integral equation. In this section we present an example which can be described mathematically by means of the Abel Integral Equation. The name is in honor of
the famous Norwegian mathematician N. H. Abel, who studied this problem
for the first time.1
Let a mass element move on the plane R2x1 ,x2 along a curve Γ from a point
P1 on level h > 0 to a point P0 on level 0. The only force acting on the mass
element is the gravitational force mg.
The direct problem is to determine the time τ in which the element moves
from P1 to P0 when the curve Γ is given. In the inverse problem, one measures the time τ = τ (h) for several values of h and tries to determine the
curve Γ. Let the curve be parametrized by x1 = ψ(x2 ). Then Γ = Γ(x2 ) and
ψ(x2 )
Γ(x2 ) =
x2
!
,
dΓ(x2 ) =
p
1 + ψ ′ (x2 )2 .
According to the conservation of energy,
m 2
v + mgx2 = mgh,
2
thus the velocity verifies
v(x2 ) =
The total time τ from P1 to P0 is
τ = τ (h) =
Z
P0
P1
1
p
dΓ(x2 )
=
v(x2 )
This example is taken from [61].
Z
2g(h − x2 ).
0
h
s
1 + ψ ′ (x2 )2
dx2 ,
2g(h − x2 )
h > 0.
1.4 Radon inversion (X-ray tomography)
7
Figure 1.2: A classical example of computerized tomography.
Set φ(x2 ) :=
p
√
1 + ψ ′ (x2 )2 and let f (h) := τ (h) 2g be known (measured).
Then the problem is to determine the unknown function φ from Abel’s integral equation
Z
0
1.4
h
φ(x2 )
√
dx2 = f (h),
h − x2
h > 0.
(1.5)
Radon inversion (X-ray tomography)
We consider another very important example widely studied in medical applications, arising in Computerized Tomography, which can lead to an Abel
integral equation.
Let Ω ⊆ R2 be a compact domain with a spatially varying density f : Ω → R
(in medical applications, Ω represents the section of a human body, see Figure 1.2). Let L be any line in the plane and suppose we direct a thin beam of
8
1. Regularization of ill-posed problems in Hilbert spaces
X-rays into the body along L and measure how much intensity is attenuated
by going through the body. Let L be parametrized by its normal versor θ ∈
S1 and its distance s > 0 from the origin. If we assume that the decay −∆I
of an X-ray beam along a distance ∆t is proportional to the intensity I, the
density f and to ∆t, we obtain
∆I(sθ + tθ ⊥ ) = −I(sθ + tθ ⊥ )f (sθ + tθ ⊥ )∆t,
where θ ⊥ is a unit vector orthogonal to θ. In the limit for t → 0, we have
d
I(sθ + tθ ⊥ ) = −I(sθ + tθ ⊥ )f (sθ + tθ ⊥ ).
dt
Thus, if IL (s, θ) and I0 (s, θ) denote the intensity of the X-ray beam measured
at the detector and the emitter, respectively, where the detector and the
emitter are connected by the line parametrized by s and θ and are located
outside of Ω, an integration along the line yields
R[f ](s, θ) :=
Z
f (sθ + tθ ⊥ )dt = − log
IL (s, θ)
, θ ∈ S1 , s > 0,
I0 (s, θ)
(1.6)
where the integration can be made over R, since obviously f = 0 outside of
Ω.
The inverse problem of determining the density distribution f from the Xray measurements is then equivalent to solve the integral equation of the first
kind (1.6). The operator R is called the Radon Transform in honor of the
Austrian mathematician J. Radon, who studied the problem of reconstructing
a function of two variables from its line integrals already in 1917 (cf. [75]).
The problem simplifies in the following special case (which is of interest, e.g.
in material testing), where Ω is a circle of radius R, f is radially symmetric,
i.e. f (r, θ) = ψ(r), 0 < r ≤ R, kθk = 1 for a suitable function ψ, and we
choose only horizontal lines. If
g(s) := − log
IL (s, θ 0 )
I0 (s, θ 0 )
(1.7)
1.5 Integral equations of the first kind
9
denotes the measurements in this situation, with θ 0 = (0, ±1), for 0 < s ≤ R
we have
R[f ](s, θ 0 ) =
=
Z
Z
=2
f (sθ 0 +
R
√
R2 −s2
√
− R2 −s2
Z R
s
tθ ⊥
0 )dt
=
√
Z
√
ψ( s2 + t2 )dt
R
ψ( s2 + t2 )dt = 2
rψ(r)
√
dr.
r 2 − s2
Z
√
R2 −s2
√
ψ( s2 + t2 )dt
(1.8)
0
Thus we obtain another Abel integral equation of the first kind:
Z R
g(s)
rψ(r)
√
dr =
, 0 < s ≤ R.
2
r 2 − s2
s
(1.9)
In the case g(R) = 0, the Radon Transform can be explicitly inverted and
Z
1 R g ′ (s)
√
ds.
(1.10)
ψ(r) = −
π r
s2 − r 2
We observe from the last equation that the inversion formula involves the
derivative of the data g, which can be considered as an indicator of the illposedness of the problem. However, here the data is integrated and thus
smoothed again, but the kernel of this integral operator has a singularity for
s = r, so we expect the regularization effect of integration to be only partial.
This heuristic statement can be made more precise, as we shall see later in
Chapter 4.
1.5
Integral equations of the first kind
In Section 1.3, starting from a physical problem, we have constructed a very
simple mathematical model based on the integral equation (1.5), where we
have to recover the unknown function φ from the data f . Similarly, in Section 1.4 we have seen that an analogous equation is obtained to recover the
function ψ from the measured data g.
As a matter of fact, very often ill-posed problems lead to integral equations.
In particular, Abel integral equations such as (1.5) and (1.10) fall into the
10
1. Regularization of ill-posed problems in Hilbert spaces
class of the Fredholm equations of the first kind whose definition is recalled
below.
Definition 1.5.1. Let s1 < s2 be real numbers and let κ, f and φ be real valued functions defined respectively on [s1 , s2 ]2 , [s1 , s2 ] and [s1 , s2 ]. A Fredholm
equation of the first kind is an equation of the form
Z
s2
s1
κ(s, t)φ(t)dt = f (s) s ∈ [s1 , s2 ].
(1.11)
Fredholm integral equations of the first kind must be treated accurately
(see [22] as a general reference): if κ is continuous and φ is integrable, then
it is easy to see that f is also continuous, thus if the data is not continuous
while the kernel is, then (1.11) has no integrable solution. This means that
the question of existence is not trivial and requires investigation. Concerning
the uniqueness of the solutions, take for example κ(s, t) = s sin t, f (s) = s
and [s1 , s2 ] = [0, π]: then φ(t) = 1/2 is a solution of (1.11), but so is each of
the functions φn (t) = 1/2 + sin nt, n ∈ N.
Moreover, we also observe that if κ is square integrable, as a consequence of
the Riemann-Lebesgue lemma, there holds
Z
π
0
κ(s, t) sin(nt)dt → 0 as n → +∞.
(1.12)
Thus if φ is a solution of (1.11) and C is arbitrarily large
Z
π
0
κ(s, t)(φ(t) + C sin(nt))dt → f (s) as n → +∞
(1.13)
and for large values of n the slightly perturbed data
f˜(s) := f (s) + C
Z
π
κ(s, t) sin(nt)dt
(1.14)
0
corresponds to a solution φ̃(t) = φ(t) + C sin(nt) which is arbitrarily distant
from φ. In other words, as in the example considered in Section 1.2, the
solution doesn’t depend continuously on the data.
1.6 Hadamard’s definition of ill-posed problems
1.6
11
Hadamard’s definition of ill-posed problems
Integral equations of the first kind are the most famous example of ill-posed
problems. The definition of ill-posedness goes back to the beginning of the
20-th century and was stated by J. Hadamard as follows:
Definition 1.6.1. Let F be a mapping from a topological space X to another
topological space Y and consider the abstract equation
F (x) = y.
The equation is said to be well-posed if
(i) for each y ∈ Y, a solution x ∈ X of F (x) = y exists;
(ii) the solution x is unique in X ;
(iii) the dependence of x upon y is continuous.
The equation is said to be ill-posed if it is not well-posed.
Of course, the definition of well-posedness above is equivalent to the request that F is surjective and injective and that the inverse mapping F −1 is
continuous.
For example, due to the considerations made in the previous sections, integral equations of the first kind are examples ill-posed equations. If X = Y is
a Hilbert space and F = A is a linear, self-adjoint operator with its spectrum
contained in [0, +∞[, the equation of the second kind
y = Ax + tx
is well-posed for every t > 0, since the operator A + t is invertible and its
inverse (A + t)−1 is also continuous.
12
1.7
1. Regularization of ill-posed problems in Hilbert spaces
Fundamental tools in the Hilbert space
setting
So far, we have seen several examples of ill-posed problems. It is obvious
from Hadamard’s definition of ill-posedness that an exhaustive mathematical
treatment of such problems should be based on a different notion of solution
of the abstract equation F (x) = y to achieve existence and uniqueness. For
linear problems in the Hilbert space setting, this is done with the MoorePenrose Generalized Inverse.
At first, we fix some standard definitions and notations. General references
for this section are [17] and [21].
1.7.1
Basic definitions and notations
Let A be a linear bounded (continuous) operator between two Hilbert Spaces
X and Y. To simplify the notations, the scalar products in X and Y and
their induced norms will be denoted by the same symbols h·, ·i and k · k
respectively. For x̄ ∈ X and δ > 0,
Bδ (x̄) := {x ∈ X | kx − x̄k < δ}
(1.15)
is the open ball centered in x̄ with radius δ and Bδ (x̄) or Bδ (x̄) is its closure
with respect to the topology of X .
We denote by R(A) the range of A:
R(A) := {y ∈ Y | ∃ x ∈ X : y = Ax}
(1.16)
and by ker(A) the null-space of A:
ker(A) := {x ∈ X | Ax = 0}.
(1.17)
We recall that R(A) and ker(A) are subspaces of Y and X respectively and
that ker(A) is closed.
We also recall the following basic definitions.
1.7 Fundamental tools in the Hilbert space setting
13
Definition 1.7.1 (Orthogonal space). Let M ⊆ X . The orthogonal space
of M is the closed vector space M⊥ defined by:
M⊥ := {x ∈ X | hx, zi = 0, ∀z ∈ M}.
(1.18)
Definition 1.7.2 (Adjoint operator). The bounded operator A∗ : Y → X ,
defined as
hA∗ y, xi = hy, Axi,
∀x ∈ X , y ∈ Y,
(1.19)
is called the adjoint operator of A.
If A : X → X and A = A∗ , then A is called self-adjoint.
Definition 1.7.3 (Orthogonal projector). Let W be a subspace of X .
For every x ∈ X , there exist a unique element in W, called the projection of
x onto W, that minimizes the distance kx − wk, w ∈ W.
The map P : X → X , that associates to an element x ∈ X its projection
onto W, is called the orthogonal projector onto W.
This is the unique linear and self-adjoint operator satisfying the relation P =
P 2 that maps X onto W.
Definition 1.7.4. Let {xn }n∈N be a sequence in X and let x ∈ X . The
sequence xn is said to converge weakly to x if, for every z ∈ X , hxn , zi
converges to hx, zi. In this case, we shall write
xn ⇀ x.
1.7.2
The Moore-Penrose generalized inverse
We are interested in solving the equation
Ax = y,
(1.20)
for x ∈ X , but we suppose we are only given an approximation of the exact
data y ∈ Y, which are assumed to exist and to be fixed, but unknown.
14
1. Regularization of ill-posed problems in Hilbert spaces
Definition 1.7.5.
(i) A least-squares solution of the equation Ax = y is
an element x ∈ X such that
kAx − yk = inf kAz − yk.
z∈X
(1.21)
(ii) An element x ∈ X is a best-approximate solution of Ax = y if it is a
least-squares solution of Ax = y and
kxk = inf{kzk | z is a least-squares solution of kAx − yk}
(1.22)
holds.
(iii) The Moore-Penrose (generalized) inverse A† of A is the unique linear
extension of Ã−1 to
D(A† ) := R(A) + R(A)⊥
(1.23)
ker(A† ) = R(A)⊥ ,
(1.24)
à := A|ker(A)⊥ : ker(A)⊥ → R(A).
(1.25)
with
where
A† is well defined: in fact, it is trivial to see that ker(Ã) = {0} and
T
R(Ã) = R(A), so R(Ã)−1 exists. Moreover, since R(A) R(A)⊥ = {0},
every y ∈ D(A† ) can be written in a unique way as y = y1 + y2 , with y1 ∈
R(A) and y2 ∈ R(A)⊥ , so using (1.24) and the requirement that A† is linear,
one can easily verify that A† y = Ã−1 y1 .
The Moore Penrose generalized inverse can be characterized as follows.
Proposition 1.7.1. Let now and below P and Q be the orthogonal projectors
onto ker(A) and R(A), respectively. Then R(A† ) = ker(A)⊥ and the four
Moore-Penrose equations hold:
AA† A = A,
(1.26)
A† AA† = A† ,
(1.27)
1.7 Fundamental tools in the Hilbert space setting
15
A† A = I − P,
(1.28)
AA† = Q|D(A† ) .
(1.29)
Here and below, the symbol I denotes the identity map.
If a linear operator Ǎ : D(A† ) → X verifies (1.28) and (1.29), then Ǎ = A† .
Proof. For the first part, see [17]. We show the second part.
Since ǍAǍ = ǍQ|D(A† ) = Ǎ and AǍA = A(I − P ) = A − AP = A, all
Moore-Penrose equations hold for Ǎ. Then, keeping in mind that I − P is
the orthogonal projector onto ker(A)⊥ , for every y ∈ D(A† ) we have: Ǎy =
ǍAǍy = (I − P )Ǎy = Ã−1 A(I − P )Ǎy = Ã−1 AǍy = Ã−1 Qy = A† y.
An application of the Closed Graph Theorem leads to the following important fact.
Proposition 1.7.2. The Moore-Penrose generalized inverse A† has a closed
graph gr(A† ). Furthermore, A† is continuous if and only if R(A) is closed.
We use this to give another characterization of the Moore-Penrose pseudoinverse:
Proposition 1.7.3. There can be only one linear bounded operator
Ǎ : Y → X that verifies (1.26) and (1.27) and such that ǍA and AǍ are selfadjoint. If such an Ǎ exists, then A† is also bounded and Ǎ = A† . Moreover,
in this case, the Moore-Penrose generalized inverse of the adjoint of A, (A∗ )† ,
is bounded too and
(A∗ )† = (A† )∗ .
(1.30)
Proof. Suppose Ǎ, B̌ : Y → X are linear bounded operators that verify (1.26)
and (1.27) and such that ǍA, B̌A, AǍ and AB̌ are self-adjoint.
Then
ǍA = A∗ Ǎ∗ = (A∗ B̌ ∗ A∗ )Ǎ∗ = (A∗ B̌ ∗ )(A∗ Ǎ∗ ) = (B̌A)(ǍA) = B̌(AǍA) = B̌A
16
1. Regularization of ill-posed problems in Hilbert spaces
and in a similar way AB̌ = AǍ. Thus we obtain
Ǎ = Ǎ(AǍ) = Ǎ(AB̌) = (ǍA)B̌ = (B̌A)B̌ = B̌.
Suppose now that such an operator Ǎ exists. For every z ∈ Y and every y ∈
R(A), y = limn→+∞ Axn , xn ∈ X , we have
hy, zi = lim hAxn , zi = lim hxn , A∗ zi =
n→+∞
n→+∞
= lim hxn , A∗ Ǎ∗ A∗ zi = hAǍy, zi,
n→+∞
so y = AǍy and y lies in R(A). This means that R(A) is closed, thus
according to Proposition 1.7.2 A† is bounded and for the first part of the
proof A† = Ǎ.
Finally, to prove the last statement it is enough to verify that for the linear
bounded operator (A† )∗ conditions (1.26) and (1.27) hold with A replaced
by A∗ , together with the the correspondent self-adjointity conditions, which
consists just of straightforward calculations.
The definitions of least-squares solution and best-approximate solution
make sense too: if y ∈ D(A† ), the set S of the least-squares solutions of
Ax = y is non-empty and its best-approximate solution turns out to be
unique and strictly linked to the operator A† . More precisely, we have:
Proposition 1.7.4. Let y ∈ D(A† ). Then:
(i) Ax = y has a unique best-approximate solution, which is given by
x† := A† y.
(1.31)
(ii) The set S of the least-squares solutions of Ax = y is equal to x† +
ker(A) and every x ∈ X lies in S if and only if the normal equation
A∗ Ax = A∗ y
holds.
(1.32)
1.8 Compact operators: SVD and the Picard criterion
17
(iii) D(A† ) is the natural domain of definition for A† , in the sense that
y∈
/ D(A† ) ⇒ S = ∅.
(1.33)
Proof. See [17] for (i) and (ii). Here we show (iii). Suppose x ∈ X is a
least-squares solution of Ax = y. Then Ax is the closest element in R(A) to
y, so Ax − y ∈ R(A)⊥ . This implies that Q(Ax − y) = 0, but QAx = Ax,
thus we deduce that Qy ∈ R(A) and y = Qy + (I − Q)y ∈ D(A† ).
Thus the introduction of the concept of best-approximate solution, although it enforces uniqueness, does not always lead to a solvable problem
and is no remedy for the lack of continuous dependance from the data in
general.
1.8
Compact operators: SVD and the Picard
criterion
Among linear and bounded operators, compact operators are of special interest, since many integral operators are compact.
We recall that a compact operator is a linear operator that maps any bounded
subset of X into a relatively compact subset of Y.
For example, suppose that Ω ⊆ RD , D ≥ 1, is a nonempty compact and
Jordan measurable set that coincides with the closure of its interior. It is
well known that if the kernel κ is either in L2 (Ω × Ω) or weakly singular, i.e.
κ is continuous on {(s, t) ∈ Ω × Ω | s 6= t} and for all s 6= t ∈ Ω
|κ(s, t)| ≤
C
|s − t|D−ǫ
(1.34)
with C > 0 and ǫ > 0, then the operator K : L2 (Ω) → L2 (Ω) defined by
K[x](s) :=
Z
Ω
κ(s, t)x(t)dt
(1.35)
18
1. Regularization of ill-posed problems in Hilbert spaces
is compact (see e.g. [20]).2
If a compact linear operator K is also self-adjoint, the notion of eigensystem
is well-defined (a proof of the existence of an eigensystem and a more exhaustive treatment of the argument can be found e.g. in [62]): an eigensystem
{(λj ; vj )}j∈N of the operator K consists of all nonzero eigenvalues λj ∈ R of
K and a corresponding complete orthonormal set of eigenvectors vj . Then
K can be diagonalized by means of the formula
Kx =
∞
X
j=1
λj hx, vj ivj ,
∀x ∈ X
(1.36)
and the nonzero eigenvalues of K converge to 0.
If K : X → Y is not self-adjoint, then observing that the operators K ∗ K : X
→ X and KK ∗ : Y → Y are positive semi-definite and self-adjoint compact
operators with the same set of nonzero eigenvalues written in nondecreasing
order with multiplicity
λ21 ≥ λ22 ≥ λ23 ...,
λj > 0 ∀j ∈ N,
we define a singular system {λj ; vj , uj }j∈N . The vectors vj form a complete
orthonormal system of eigenvectors of K ∗ K and uj , defined as
uj :=
Kvj
,
kKvj k
(1.37)
form a complete orthonormal system of eigenvectors of KK ∗ . Thus {vj }j∈N
span R(K ∗ ) = R(K ∗ K), {uj }j∈N span R(K) = R(KK ∗ ) and the following
formulas hold:
Kx =
Kvj = λj uj ,
(1.38)
K ∗ uj = λj vj ,
(1.39)
∞
X
j=1
2
λj hx, vj iuj ,
∀x ∈ X ,
(1.40)
Here and below, we shall denote with the symbol K a linear and bounded operator
which is also compact.
1.8 Compact operators: SVD and the Picard criterion
∗
K y=
∞
X
j=1
λj hy, uj ivj ,
∀y ∈ Y.
19
(1.41)
All the series above converge in the Hilbert space norms of X and Y.
Equations (1.40) and (1.41) are the natural infinite-dimensional extension of
the well known singular value decomposition (SVD) of a matrix.
For compact operators, the condition for the existence of the best-approximate
solution K † y of the equation Kx = y can be written down in terms of the
singular value expansion of K. It is called the Picard Criterion and can be
given by means of the following theorem (see [17] for the proof).
Theorem 1.8.1. Let {λj ; vj , uj }j∈N be a singular system for the compact
linear operator K an let y ∈ Y. Then
y ∈ D(K † ) ⇐⇒
∞
X
|hy, uj i|2
λ2j
j=1
< +∞
(1.42)
and whenever y ∈ D(K † ),
†
K y=
∞
X
hy, uj i
j=1
λj
vj .
(1.43)
Thus the Picard Criterion states that the best-approximate solution x†
of Kx = y exists only if the SVD coefficients |hy, uj i| decay fast-enough with
respect to the singular values λj .
In the finite dimensional case, of course the sum in (1.42) is always finite, the
best-approximate solution always exists and the Picard Criterion is always
satisfied.
From (1.43) we can see that in the case of perturbed data, the error components with respect to the the basis {uj } corresponding to the small values of
λj are amplified by the large factors λ−1
j . For example, if dim(R(K)) = +∞
and the perturbed data is defined by yjδ := y + δuj , then ky − yjδ k = δ, but
K † y − K † yjδ = λ−1
j hδuj , uj ivj
and hence
kK † y − K † yjδ k = λ−1
j δ → +∞ as j → +∞.
20
1. Regularization of ill-posed problems in Hilbert spaces
In the finite dimensional case there are only finitely many eigenvalues, so
these amplification factors stay bounded. However, they might be still very
large: this is the case of the discrete ill-posed problems, for which also a
Discrete Picard Condition can be defined, as we shall see later on.
1.9
Regularization and Bakushinskii’s Theorem
In the previous sections, we started discussing the problem of solving the
equation Ax = y. In practice, the exact data y is not known precisely, but
only approximations y δ with
ky δ − yk ≤ δ
(1.44)
is available. In literature, y δ ∈ Y is called the noisy data and δ > 0 the noise
level.
Our purpose is to approximate the best-approximate solution x† = A† y of
(1.20) from the knowledge of δ, y δ and A.
According to Proposition 1.7.2, in general the operator A† is not continuous,
so in the ill-posed case A† y δ is in general a very bad approximation of x†
even if it exists. Roughly speaking, regularizing Ax = y means essentially to
construct of a family of continuous operators {Rσ }, depending on a certain
regularization parameter σ, that approximate A† (in some sense) and such
that xδσ := Rσ y δ satisfies the conditions above. We state this more precisely
in the following fundamental definition.
Definition 1.9.1. Let σ0 ∈ (0, +∞]. For every σ ∈ (0, σ0 ), let Rσ : Y → X
be a continuous (not necessarily linear) operator.
The family {Rσ } is called a regularization operator for A† if, for every y ∈
D(A† ), there exists a function
α : (0, +∞) × Y → (0, σ0 ),
1.9 Regularization and Bakushinskii’s Theorem
21
called parameter choice rule for y, that allows to associate to each couple (δ, y δ ) a specific operator Rα(δ,yδ ) and a regularized solution xδα(δ,yδ ) :=
Rα(δ,yδ ) y δ , and such that
lim sup α(δ, y δ ) = 0.
δ→0
(1.45)
y δ ∈Bδ (y)
If the parameter choice rule (below, p.c.r.) α satisfies in addition
lim sup kRα(δ,yδ ) y δ − A† yk = 0.
δ→0
(1.46)
y δ ∈Bδ (y)
then it is said to be convergent.
For a specific y ∈ D(A† ), a pair (Rσ , α) is called a (convergent) regularization
method for solving Ax = y if {Rσ } is a regularization for A† and α is a
(convergent) parameter choice rule for y.
Remark 1.9.1.
• In the definition above all possible noisy data with
noise level ≤ δ are considered, so the convergence is intended in a
worst-case sense.
However, in a specific problem, a sequence of approximate solutions
n
can converge fast to x† also when (1.46) fails!
xδα(δ
δ
n ,y n )
• A p.c.r. α = α(δ, y δ ) depends explicitly on the noise level and on the
perturbed data y δ .
According to the definition above it should also depend on the exact data
y, which is unknown, so this dependance can only be on some a priori
knowledge about y like smoothness properties.
We distinguish between two types of parameter choice rules:
Definition 1.9.2. Let α be a parameter choice rule according to Definition
1.9.1. If α does not depend on y δ , but only on δ, then we call α an apriori parameter choice rule and write α = α(δ). Otherwise, we call α an
a-posteriori parameter choice rule. If α does not depend on the noise level δ,
then it is said to be an heuristic parameter choice rule.
22
1. Regularization of ill-posed problems in Hilbert spaces
If α does not depend on y δ it can be defined before the actual calculations
once and for all: this justifies the terminology a-priori and a-posteriori in the
definition above.
For the choice of the parameter, one can also construct a p.c.r. that depends
explicitly only on the perturbed data y δ and not on the noise level. However,
a famous result due to Bakushinskii shows that such a p.c.r. cannot be
convergent:
Theorem 1.9.1 (Bakushinskii). Suppose {Rσ } is a regularization operator
for A† such that for every y ∈ D(A† ) there exists a convergent p.c.r. α which
depends only on y δ . Then A† is bounded.
Proof. If α = α(y δ ), then it follows from (1.46) that for every y ∈ D(A† ) we
have
lim sup kRα(yδ ) y δ − A† yk = 0,
δ→0
(1.47)
y δ ∈Bδ (y)
so that Rα(y) y = A† y. Thus, if {yn }n∈N is a sequence in D(A† ) converging to
y, then A† yn = Rα(yn ) yn converges to A† y. This means that A† is sequentially
continuous, hence it is bounded.
Thus, in the ill-posed case, no heuristic parameter choice rule can yield a
convergent regularization method. However, this doesn’t mean that such a
p.c.r. cannot give good approximations of x† for a fixed positive δ!
1.10
Construction and convergence of regularization methods
In general terms, regularizing an ill-posed problem leads to three questions:
1. How to construct a regularization operator?
2. How to choose a parameter choice rule to give rise to convergent regularization methods?
1.10 Construction and convergence of regularization methods
23
3. How can these steps be performed in some optimal way?
This and the following sections will deal with the answers of these problems,
at least in the linear case. The following result provides a characterization
of the regularization operators. Once again, we refer to [17] for more details
and for the proofs.
Proposition 1.10.1. Let {Rσ }σ>0 be a family of continuous operators con-
verging pointwise on D(A† ) to A† as σ → 0.
Then {Rσ } is a regularization for A† and for every y ∈ D(A† ) there exists
an a-priori p.c.r. α such that (Rσ , α) is a convergent regularization method
for solving Ax = y.
Conversely, if (Rσ , α) is a convergent regularization method for solving Ax =
y with y ∈ D(A† ) and α is continuous with respect to δ, then Rσ y converges
to A† y as σ → 0.
Thus the correct approach to understand the meaning of regularization
is pointwise convergence.
Furthermore, if {Rσ } is linear and uniformly
bounded, in the ill-posed case we can’t expect convergence in the operator norm, since then A† would have to be bounded.
We consider an example of a regularization operator which fits the definitions
above. Although very similar examples can be found in literature, cf. e.g.
[61], this is slightly different.
Example 1.10.1. Consider the operator K : X := L2 [0, 1] → Y := L2 [0, 1]
defined by
K[x](s) :=
Z
s
x(t)dt.
0
Then K is linear, bounded and compact and it is easily seen that
R(K) = {y ∈ H1 [0, 1] | y ∈ C([0, 1]), y(0) = 0}
(1.48)
and that the distributional derivative from R(K) to X is the inverse of K.
Since R(K) ⊇ C0∞ [0, 1], R(K) is dense in Y and R(K)⊥ = {0}.
24
1. Regularization of ill-posed problems in Hilbert spaces
For y ∈ C([0, 1]) and for σ > 0, define
(
1
(y(t + σ) − y(t)),
σ
(Rσ y)(t) :=
1
(y(t) − y(t − σ)),
σ
if
if
0 ≤ t ≤ 21 ,
1
2
< t ≤ 1.
Then {Rσ } is a family of linear and bounded operators with
√
6
kRσ ykL2[0,1] ≤
kykL2[0,1]
σ
(1.49)
(1.50)
defined on a dense linear subspace of L2 [0, 1], thus it can be extended to the
whole Y and (1.50) is still true.
Since the measure of [0, 1] is finite, for y ∈ R(K) the distributional derivative
of y lies in L1 [0, 1], so y is a function of bounded variation. Thus, according
to Lebesgue’s Theorem, the ordinary derivative y ′ exists almost everywhere
in [0, 1] and it is equal to the distributional derivative of y as an L2 -function.
Consequently, we can apply the Dominate Convergence Theorem to show that
kRσ y − K † ykL2[0,1] → 0, as σ → 0
so that, according to Proposition 1.10.1, Rσ is a regularization for the distributional derivative K † .
1.11
Order optimality
We concentrate now on how to construct optimal regularization methods. To
this aim, we shall make use of some analytical tools from the spectral theory
of linear and bounded operators. For the reader who is not accustomed with
the spectral theory, we refer to Appendix A or to the second chapter of [17],
where the basic ideas and results we will need are gathered in a few pages;
for a more comprehensive treatment, classical references are e.g. [2] and [44].
In principle, a (convergent) regularization method (R̄σ , ᾱ) for solving Ax = y
should be optimal if the quantity
ε1 = ε1 (δ, R̄σ , ᾱ) :=
sup kR̄ᾱ(δ,yδ ) y δ − A† yk
y δ ∈Bδ (y)
(1.51)
1.11 Order optimality
25
converges to 0 as quickly as
ε2 = ε2 (δ) := inf
(Rσ ,α)
sup kRα(δ,yδ ) y δ − A† yk,
(1.52)
y δ ∈Bδ (y)
i.e. if there are no regularization methods for which the approximate solutions converge to 0 (in the usual worst-case sense) quicker than the approximate solutions of (R̄σ , ᾱ).
Once again, it is not advisable to look for some uniformity in y, as we can
see from the following result.
Proposition 1.11.1. Let {Rσ } be a regularization for A† with Rσ (0) = 0,
let α = α(δ, y δ ) be a p.c.r. and suppose that R(A) is non closed. Then there
can be no function f : (0, +∞) → (0, +∞) with limδ→0 f (δ) = 0 such that
ε1 (δ, Rσ , α) ≤ f (δ)
(1.53)
holds for every y ∈ D(A† ) with kyk ≤ 1 and all δ > 0.
Thus convergence can be arbitrarily slow: in order to study convergence
rates of the approximate solutions to x† it is necessary to restrict on subsets
of D(A† ) (or of X ), i.e. to formulate some a-priori assumption on the exact
data (or equivalently, on the exact solution). This can be done by introducing
the so called source sets, which are defined as follows.
Definition 1.11.1. Let µ, ρ > 0. An element x ∈ X is said to have a source
representation if it belongs to the source set
Xµ,ρ := {x ∈ X | x = (A∗ A)µ w, kwk ≤ ρ}.
(1.54)
The union with respect to ρ of all source sets is denoted by
Xµ :=
[
ρ>0
Xµ,ρ = {x ∈ X | x = (A∗ A)µ w} = R((A∗ A)µ ).
Here and below, we use spectral theory to define
Z
∗
µ
(A A) := λµ dEλ ,
(1.55)
(1.56)
26
1. Regularization of ill-posed problems in Hilbert spaces
where {Eλ } is the spectral family associated to the self-adjoint A∗ A (cf. Ap-
pendix A) and since A∗ A ≥ 0 the integration can be restricted to the compact
set [0, kA∗ Ak] = [0, kAk2 ].
Since A is usually a smoothing operator, the requirement for an element
to be in Xµ,ρ can be considered as a smoothness condition.
The notion of optimality is based on the following result about the source
sets, which is stated for compact operators, but can be extended to the non
compact case (see [17] and the references therein).
Proposition 1.11.2. Let A be compact with R(A) being non closed and let
{Rσ } be a regularization operator for A† . Define also
∆(δ, Xµ,ρ , Rσ , α) := sup{kRα(δ,yδ ) y δ − xk | x ∈ Xµ,ρ , y δ ∈ Bδ (Ax)}
(1.57)
for any fixed µ, ρ and δ in (0, +∞) (and α a p.c.r. relative to Ax). Then
there exists a sequence {δk } converging to 0 such that
2µ
2µ+1
∆(δk , Xµ,ρ , Rσ , α) ≥ δk
1
ρ 2µ+1 .
(1.58)
This justifies the following definition.
Definition 1.11.2. Let R(A) be non closed, let {Rσ } be a regularization for
A† and let µ, ρ > 0.
Let α be a p.c.r. which is convergent for every y ∈ AXµ,ρ .
We call (Rσ , α) optimal in Xµ,ρ if
2µ
1
∆(δ, Xµ,ρ , Rσ , α) = δ 2µ+1 ρ 2µ+1
(1.59)
holds for every δ > 0.
We call (Rσ , α) of optimal order in Xµ,ρ if there exists a constant C ≥ 1 such
that
2µ
1
∆(δ, Xµ,ρ , Rσ , α) ≤ Cδ 2µ+1 ρ 2µ+1
holds for every δ > 0.
(1.60)
1.11 Order optimality
27
The term optimality refers to the fact that if R(A) is non closed, then
2µ
1
a regularization algorithm cannot converge to 0 faster than δ 2µ+1 ρ 2µ+1 as
δ → 0, under the a-priori assumption x ∈ Xµ,ρ , or (if we are concerned with
2µ
the rate only), not faster than O(δ 2µ+1 ) under the a-priori assumption x ∈
Xµ . In other words, we prove the following fact:
Proposition 1.11.3. With the assumption of Definition 1.11.2, (Rσ , α) is
of optimal order in Xµ,ρ if and only if there exists a constant C ≥ 1 such that
for every y ∈ AXµ,ρ
2µ
1
sup kRα(δ,yδ ) y δ − x† k ≤ Cδ 2µ+1 ρ 2µ+1
(1.61)
y δ ∈Bδ (y)
holds for every δ > 0. For the optimal case an analogous result is true.
Proof. First, we show that y ∈ AXµ,ρ if and only if y ∈ R(A) and x† ∈ Xµ,ρ .
The sufficiency is obvious because of (1.29). For the necessity, we observe
that if x = (A∗ A)µ w, with kwk ≤ ρ, then x lies in (ker A)⊥ , since for every
z ∈ ker A we have:
∗
µ
Z
kAk2
hx, zi = h(A A) w, zi = lim
= lim
ǫ→0
ǫ→0
Z
kAk2
ǫ
λµ dhw, Eλzi
(1.62)
µ
λ dhw, zi = 0.
ǫ
We obtain that x† = A† y = A† Ax = (I − P )x = x is the only element in
T
Xµ,ρ A−1 ({y}), thus the entire result follows immediately and the proof is
complete.
The following result due to R. Plato assures that, under very weak assumptions, the order-optimality in a source set implies convergence in R(A).
More precisely:
Theorem 1.11.1. Let {Rσ } a regularization for A† . For s > 0, let αs be the
family of parameter choice rules defined by
αs (δ, y δ ) := α(sδ, y δ ),
(1.63)
28
1. Regularization of ill-posed problems in Hilbert spaces
where α is a parameter choice rule for Ax = y, y ∈ R(A).
Suppose that for every τ > τ0 , τ0 ≥ 1, (Rσ , ατ ) is a regularization method of
optimal order in Xµ,ρ , for some µ, ρ > 0.
Then, for every τ > τ0 , (Rσ , ατ ) is convergent for every y ∈ R(A) and it is
of optimal order in every Xν,ρ , with 0 < ν ≤ µ.
It is worth mentioning that the source sets Xµ,ρ are not the only possible
choice one can make. They are well-suited for operators A whose spectrum
decays to 0 as a power of λ, but they don’t work very well in the case of
exponentially ill-posed problems in which the spectrum of A decays to 0 exponentially fast. In this case, different source conditions such as logarithmic
source conditions should be used, for which analogous results and definitions
can be stated. In this work logarithmic and other different source conditions
shall not be considered. A deeper treatment of this argument can be found,
e.g., in [47].
1.12
Regularization by projection
In practice, regularization methods must be implemented in finite-dimensional
spaces, thus it is important to see what happens when the data and the
solutions are approximated in finite-dimensional spaces. It turns out that
this important passage from infinite to finite dimensions can be seen as a
regularization method itself. One approach to deal with this problem is
regularization by projection, where the approximation is the projection onto
finite-dimensional spaces alone: many important regularization methods are
included in this category, such as discretization, collocation, Galerkin or Ritz
approximation.
The finite-dimensional approximations can be made both in the spaces X
and Y: here we consider only the first one.
We approximate x† as follows: given a sequence of subspaces of X
X1 ⊆ X2 ⊆ X3 ...
(1.64)
1.12 Regularization by projection
29
such that
[
Xn = X ,
(1.65)
xn := A†n y,
(1.66)
n∈N
the n-th approximation of x† is
where An = APn and Pn is the orthogonal projector onto Xn . Note that since
An has finite-dimensional range, R(An ) is closed and thus A†n is bounded (i.e.
xn is a stable approximation of x† ). Moreover, it is an easy exercise to show
that xn is the minimum norm solution of Ax = y in Xn : for this reason, this
method is called least-squares projection.
Note that the iterates xn may not converge to x† in X , as the following
example due to Seidman shows. We reconsider it entirely in order to correct
a small inaccuracy which can be found in the presentation of this example
in [17].
1.12.1
The Seidman example (revisited)
Suppose X is infinite-dimensional, let {en } be an orthonormal basis for X
and let Xn := span{e1 , ...en }, for every n ∈ N. Then of course {Xn } satisfies
(1.64) and (1.65). Define an operator A : X → Y as follows:
!
∞
∞
X
X
A(x) = A
xj ej :=
(xj aj + bj x1 ) ej ,
j=1
with
|bj | :=
|aj | :=
(
(1.67)
j=1
(
0
if j = 1,
j −1
if j > 1,
j −1
j
− 25
if j is odd,
if j is even.
(1.68)
(1.69)
Then:
• A is well defined, since |aj xj + bj x1 |2 ≤ 2 (|xj |2 + |x1 |j −2) for every j
and linear.
30
1. Regularization of ill-posed problems in Hilbert spaces
• A is injective: Ax = 0 implies
(a1 x1 , a2 x2 + b2 x1 , a3 x3 + b3 x1 , ...) = 0,
thus xj = 0 for every j, i.e. x = 0.
• A is compact: in fact, suppose {xk }k∈N is a bounded sequence in X .
Then also the first components of xk , denoted by xk,1 , form a bounded
sequence in C, which has a convergent subsequence {xkl ,1 } and, in corP∞
P
respondence, ∞
j=1 bj xk,1 ej converj=1 bj xkl ,1 ej is a subsequence of
P∞
ging in X to j=1 bj liml→∞ xkl ,1 ej . Consequently, the application x 7→
P∞
P∞
j=1 aj xj ej
j=1 bj x1 ej is compact. Moreover, the application x 7→
is also compact, because it is the limit of the sequence of operators
P
defined by An x := nj=1 aj xj ej . Thus, being the sum of two compact
operators, A is compact (see, e.g., [62] for the properties of the compact
operators used here).
Let y := Ax† with
†
x :=
∞
X
j −1 ej
(1.70)
j=1
and let xn :=
P∞
j=1
xn,j ej be the best-approximate solution of Ax = y in Xn .
Then it is readily seen that xn := (xn,1 , xn,2 , ..., xn,n ) is the vector minimizing
n
X
aj (xj − j −1 ) + bj (x1 − 1)
j=1
2
+
∞
X
j=n+1
j −2 (1 + aj − x1 )2
among all x = (x1 , ..., xn ) ∈ Cn .
(1.71)
Imposing that the gradient of (1.71) with respect to x is equal to 0 for x =
xn , we obtain
2a21 (xn,1 − 1) − 2
2
n
X
j=1
∞
X
j=n+1
j −2 (1 + aj − xn,1 ) = 0,
aj (xn,j − j −1 ) + bj (xn,1 − 1) aj δj,k ,
k = 2, ..., n.
1.12 Regularization by projection
31
Here, δj,k is the Kronecker delta, which is equal to 1 if k = j and equal to 0
otherwise.
Consequently, for the first variable xn,1 we have
xn,1 =
1+
∞
X
(1 + aj )j −2
j=n+1
∞
X
=1+
aj j −2
j=n+1
!
!
1+
∞
X
j −2
j=n+1
∞
X
1+
j −2
j=n+1
!−1
(1.72)
!−1
and for every k = 2, ..., n there holds
xn,k = (ak )−1 k −1 ak − bk (xn,1 − 1) = k −1 − (ak k)−1 (xn,1 − 1).
(1.73)
We use this to calculate
n
X
kxn − Pn x k = k
(xn,j − j −1 )ej k2
† 2
j=1
2
= (xn,1 − 1) +
= (xn,1 − 1)
=
n
X
2
j=2
1+
j −1 − (aj j)−1 (xn,1 − 1) − j −1
n
X
(aj j)
j=2
(aj j)−2
j=1
n
X
!
∞
X
−2
!
aj j −2
j=n+1
!2
1+
∞
X
j −2
j=n+1
Of these three factors, the third one is clearly convergent to 1.
The first one behaves like n4 , since, applying Cesaro’s rule to
Pn 3
j=1 j
sn :=
,
n4
we obtain
Similarly,
1
(n + 1)3
=
.
lim sn = lim
n→∞
n→∞ (n + 1)4 − n4
4
∞
X
j=n+1
aj j
−2
∼
∞
X
j=n+1
j −3 ∼ n−2 ,
2
!−2
(1.74)
.
32
1. Regularization of ill-posed problems in Hilbert spaces
because
P∞
j=n+1 j
n−2
−3
∼
−(n + 1)−3
n2 (n + 1)−1
1
=
→ .
−2
−2
2
2
(n + 1) − n
(n + 1) − n
2
These calculations show that
||xn − Pn x† || → λ > 0,
(1.75)
so xn doesn’t converge to x† , which was what we wanted to prove.
The following result gives sufficient (and necessary) conditions for convergence.
Theorem 1.12.1. For y ∈ D(A† ), let xn be defined as above. Then
(i) xn ⇀ x† ⇐⇒ {kxn k} is bounded;
(ii) xn → x† ⇐⇒ lim supkxn k ≤ kx† k;
n→+∞
(iii) if
lim sup k(A†n )∗ xn k = lim sup k(A∗n )† xn k < ∞,
n→+∞
(1.76)
n→+∞
then
xn → x† .
For the proof of this theorem and for further results about the leastsquares projection method see [17].
1.13
Linear regularization: basic results
In this section we consider a class of regularization methods based on the
spectral theory for linear self-adjoint operators.
The basic idea is the following one: let {Eλ } be the spectral family associated
R
to A∗ A. If A∗ A is continuously invertible, then (A∗ A)−1 = λ1 dEλ . Since
the best-approximate solution x† = A† y can be characterized by the normal
equation (1.32), then
†
x =
Z
1
dEλ A∗ y.
λ
(1.77)
1.13 Linear regularization: basic results
33
In the ill-posed case the integral in (1.77) does not exist, since the integrand
1
λ
has a pole in 0. The idea is to replace
1
λ
by a parameter-dependent family
of functions gσ (λ) which are at least piecewise continuous on [0, kAk2 ] and,
for convenience, continuous from the right in the points of discontinuity and
to replace (1.77) by
xσ :=
Z
gσ (λ)dEλ A∗ y.
(1.78)
By construction, the operator on the right-hand side of (1.78) acting on y is
continuous, so the approximate solutions
Z
δ
xσ := gσ (λ)dEλ A∗ y δ ,
(1.79)
can be computed in a stable way.
Of course, in order to obtain convergence as σ → 0, it is necessary to require
that limσ→0 gσ (λ) =
1
λ
for every λ ∈ (0, kAk2 ].
First, we study the question under which condition the family {Rσ } with
Z
Rσ := gσ (λ)dEλ A∗
(1.80)
is a regularization operator for A† .
Using the normal equation we have
†
†
∗
∗
∗
∗
†
x − xσ = x − gσ (A A)A y = (I − gσ (A A)A A)x =
Z
(1 − λgσ (λ))dEλ x† .
(1.81)
Hence if we set, for all (σ, λ) for which gσ (λ) is defined,
rσ (λ) := 1 − λgσ (λ),
(1.82)
x† − xσ = rσ (A∗ A)x† .
(1.83)
so that rσ (0) = 1, then
In these notations, we have the following results.
Theorem 1.13.1. Let, for all σ > 0, gσ : [0, kAk2 ] → R fulfill the following
assumptions:
34
1. Regularization of ill-posed problems in Hilbert spaces
• gσ is piecewise continuous;
• there exists a constant C > 0 such that
|λgσ (λ)| ≤ C
(1.84)
1
λ
(1.85)
for all λ ∈ (0, kAk2 ];
•
lim gσ (λ) =
σ→0
for all λ ∈ (0, kAk2 ].
Then, for all y ∈ D(A† ),
lim gσ (A∗ A)A∗ y = x†
(1.86)
lim kgσ (A∗ A)A∗ yk = +∞.
(1.87)
σ→0
and if y ∈
/ D(A† ), then
σ→0
Remark 1.13.1. (i) It is interesting to note that for every y ∈ D(A† ) the
R
R
integral gσ (λ)dEλ A∗ y is converging in X , even if λ1 dEλ A∗ y does not
exist and gσ (λ) is converging pointwise to λ1 .
(ii) According to Proposition 1.10.1, in the assumptions of Theorem 1.13.1,
{Rσ } is a regularization operator for A† .
Another important result concerns the so called propagation data error
kxσ − xδσ k:
Theorem 1.13.2. Let gσ and C be as in Theorem 1.13.1, xσ and xδσ be
defined by (1.78) and (1.79) respectively. For σ > 0, let
Gσ := sup{|gσ (λ)| | λ ∈ [0, kAk2 ]}.
(1.88)
kAxσ − Axδσ k ≤ Cδ
(1.89)
Then,
and
kxσ − xδσ k ≤ δ
hold.
p
CGσ
(1.90)
1.13 Linear regularization: basic results
35
Thus the total error kx† − xδσ k can be estimated by
p
kx† − xδσ k ≤ kx† − xσ k + δ CGσ .
Since gσ (λ) →
1
λ
(1.91)
as σ → 0, and it can be proved that the estimate (1.90)
is sharp in the usual worst-case sense, the propagated data error generally
explodes for fixed δ > 0 as σ → 0 (cf. [22]).
We now concentrate on the first term in (1.91), the approximation error.
While the propagation data error can be studied by estimating gσ (λ), for the
approximation error one has to look at rσ (λ):
Theorem 1.13.3. Let gσ fulfill the assumptions of Theorem 1.13.1. Let
µ, ρ, σ0 be fixed positive numbers. Suppose there exists a function ωµ : (0, σ0 ) →
R such that for every σ ∈ (0, σ0 ) and every λ ∈ [0, kAk2] the estimate
λµ |rσ (λ)| ≤ ωµ (σ)
(1.92)
kxσ − x† k ≤ ρ ωµ (σ)
(1.93)
kAxσ − Ax† k ≤ ρ ωµ+ 1 (σ)
(1.94)
is true. Then, for x† ∈ Xµ,ρ ,
and
2
hold.
A straightforward calculation leads immediately to an important consequence:
Corollary 1.13.1. Let the assumptions of Theorem 1.13.3 hold with
ωµ (σ) = cσ µ
for some constant c > 0 and assume that
1
Gσ = O
, as σ → 0.
σ
Then, with the parameter choice rule
2
2µ+1
δ
α∼
,
ρ
the regularization method (Rσ , α) is of optimal order in Xµ,ρ .
(1.95)
(1.96)
(1.97)
36
1. Regularization of ill-posed problems in Hilbert spaces
1.14
The Discrepancy Principle
So far, we have considered only a-priori choices for the parameter α = α(δ).
Such a-priori choices should be based on some a-priori knowledge of the true
solution, namely its smoothness, but unfortunately in practice this information is often not available. This motivates the necessity of looking for
a-posteriori parameter choice rules. In this section we will discuss the most
famous a-posteriori choice, the discrepancy principle (introduced for the first
time by Morozov, cf. [67]) and some other important improved choices depending both on the noise level and on the noisy data.
Definition 1.14.1. Let gσ be as in Theorem 1.13.1 and such that, for every
λ > 0, σ 7→ gσ (λ) is continuous from the left, and let rσ be defined by (1.82).
Fix a positive number τ such that
τ > sup{|rσ (λ)| | σ > 0, λ ∈ [0, kAk2 ]}.
(1.98)
For y ∈ R(A), the regularization parameter defined via the Discrepancy Principle is
α(δ, y δ ) := sup{σ > 0 | kAxδσ − y δ k ≤ τ δ}.
Remark 1.14.1.
(1.99)
• The idea of the Discrepancy Principle is to choose
the biggest parameter for which the corresponding residual has the same
order of the noise level, in order to reduce the propagated data error as
much as possible.
• It is fundamental that y ∈ R(A). Otherwise, kAxδσ −y δ k can be bounded
from below in the following way:
ky − Qyk − 2δ ≤ ky − y δ k + kQ(y δ − y)k + ky δ − Qy δ k − 2δ
≤ δ + δ + kAxδσ − y δ k − 2δ = kAxδσ − y δ k.
Thus, if δ is small enough, the set
{σ > 0 | kAxδσ − y δ k ≤ τ δ}
is empty.
(1.100)
1.14 The Discrepancy Principle
37
• The assumed continuity from the left for gσ assures that the functional
σ 7→ kAxδσ − y δ k is also continuous from the left. Therefore, if α(δ, y δ )
satisfies the Discrepancy Principle (1.99), we have
kAxδα(δ,yδ ) − y δ k ≤ τ δ.
(1.101)
• If kAxδσ − y δ k ≤ τ δ holds for every σ > 0, then α(δ, y δ ) = +∞ and
xδα(δ,yδ ) has to be understood in the sense of a limit as α → +∞.
The main convergence properties of the Discrepancy Principle are described in the following important theorem (see [17] for the long proof). A
full understanding of the statement of the theorem requires the notions of
saturation and qualification of a regularization method.
The term saturation is used to describe the behavior of some regularization
operators for which
2µ
kxδσ − x† k = O(δ 2µ+1 )
(1.102)
does not hold for every µ, but only up to a finite value µ0 , called the qualification of the method.
More precisely, the qualification µ0 is defined as the largest value such that
λµ |rσ (λ)| = O(σ µ )
(1.103)
holds for every µ ∈ (0, µ0].
Theorem 1.14.1. In addition to the assumptions made for gσ in Definition
1.14.1, suppose that there exists a constant c̃ such that Gσ satisfies
Gσ ≤
c̃
,
σ
(1.104)
for every σ > 0. Assume also that the regularization method (Rσ , α) corresponding to the Discrepancy Principle has qualification µ0 >
1
2
and that, with
ωµ defined as in Theorem 1.13.3,
ωµ (α) ∼ αµ ,
for 0 < µ ≤ µ0 .
(1.105)
Then (Rσ , α) is convergent for every y ∈ R(A) and of optimal order in Xµ,ρ ,
for µ ∈ (0, µ0 − 12 ] and for ρ > 0.
38
1. Regularization of ill-posed problems in Hilbert spaces
Thus, in general, a regularization method (Rσ , α) with α defined via the
Discrepancy Principle need not be of optimal order in Xµ,ρ for µ > µ0 − 21 ,
as the following result for the Tikhonov method in the compact case implies:
Theorem 1.14.2 (Groetsch). Let K = A be compact, define Rσ := (K ∗ K+
σ)−1 K ∗ and choose the Discrepancy Principle (1.99) as a parameter choice
rule for Rσ . If
1
kxδα(δ,yδ ) − x† k = o(δ 2 )
(1.106)
holds for every y ∈ R(K) and y δ ∈ Y fulfilling ky δ − yk ≤ δ, then R(K) is
finite-dimensional.
Consequently, since
• µ0 = 1 for (Rσ , α) as in Theorem 1.14.2 (cf. the results in Section 1.16)
and
1
• in the ill-posed case kxδα(δ,yδ ) − x† k does not converge faster than O(δ 2 ),
(Rσ , α) cannot be of optimal order in Xµ,ρ for µ > µ0 − 21 .
This result is the motivation and the starting point for the introduction
of other (improved) a-posteriori parameter choice rules defined to overcome
the problem of saturation. However, we are interested mainly in iterative
methods, where these rules are not needed, so we address the interested
reader to [17] for more details about such rules. There, also a coverage of
some of the most important heuristic parameter choices rules can be found.
1.15
The finite dimensional case: discrete illposed problems
In practice, ill-posed problems like integral equations of the first kind have
to be approximated by a finite dimensional problem whose solution can be
found by a software.
In the finite dimensional setting, the linear operator A is simply a matrix A
1.15 The finite dimensional case: discrete ill-posed problems
39
∈ Mm,N (R), the Moore Penrose Generalized Inverse A† is defined for every
data b ∈ Y = Rm and being a linear map from Rm to ker(A)⊥ ⊆ X = RN is
continuous. Thus, according to Hadamard’s definition, the equation Ax = b
cannot be ill-posed. However, from a practical point of view, a theoretically
well-posed problem can be very similar to an ill-posed one. To explain this,
recall that a linear operator A is bounded if and only if there exists a constant C > 0 such that kAxk ≤ Ckxk for every x ∈ X : if the constant C is
too big, then this estimate is virtually useless and little perturbations in the
data can generate very huge errors in the results. This concern should be
even more serious if one takes into account also rounding errors due to finite
arithmetics calculations. Such finite dimensional problems occur very often
in practice and they are characterized by very ill-conditioned matrices.
In his book [36], P.C. Hansen distinguishes between two classes of problems
where the matrix of the system Ax = b is highly ill-conditioned: rank deficient and discrete ill-posed problems.
In a rank deficient problem, the matrix A has a cluster of small eigenvalues and a well determined gap between its large and small singular values.
Discrete ill-posed problems arise from the discretization of ill-posed problems
such as integral equations of the first kind and their singular values typically
decay gradually to zero. Although of course we shall be more interested
in discrete ill-posed problems, we should keep in mind that regularization
methods can also be applied with success on rank deficient problems and
therefore should be considered also from this point of view.
As we have seen in Example 1.10.1, the process of discretization of an illposed problem is indeed a regularization itself, since it can be considered as
a projection method. However, as a matter of fact, usually the regularizing
process of discretization is not enough to obtain a good approximation of the
exact solution and it is necessary to apply other regularization methods.
Here, we will give a very brief survey about the discretization of integral
equations of the first kind. More details can be found for example in [61]
(Chapter 3), in [3] and [14].
40
1. Regularization of ill-posed problems in Hilbert spaces
There are essentially two classes of methods for discretizing integral equations such as (1.11): quadrature methods and Galerkin methods.
In a quadrature method, one chooses m samples f (si ), i = 1, ..., m of the function f (s) and uses a quadrature rule with abscissas t1 , t2 , ..., tN and weights
ω1 , ω2 , ...ωN to calculate the integrals
Z
s2
s1
κ(si , t)φ(t)dt ∼
N
X
ωj κ(si , tj )φ(tj ),
i = 1, ..., m.
j=1
The result is a linear system of the type Ax = b, where the components of
the vector b are the samples of f , the elements of the matrix A ∈ Mm,N (R)
are defined by ai,j = ωj κ(si , tj ) and the unknowns xj forming the vector x
correspond to the values of φ at the abscissas tj .
In a Galerkin method, it is necessary to fix two finite dimensional subspaces XN ⊆ X and Ym ⊆ Y, dim(XN )= N, dim(Ym )= m and define two
corresponding sets of orthonormal basis functions φj , j = 1, ..., N and ψi ,
i = 1, ..., m. Then the matrix and the right-hand side elements of the system
Ax = b are given by
Z Z
Z
ai,j =
κ(s, t)ψi (s)φj (t)dsdt and bi =
[s1 ,s2 ]2
s2
f (s)ψi (s)ds (1.107)
s1
and the unknown function φ depends on the solutions of the system via the
P
formula φ(t) = N
j=1 xj φj (t).
If κ is symmetric, X = Y, m = N, XN = YN and φi = ψi for every i,
then the matrix A is also symmetric and the Galerkin method is called the
Rayleigh-Ritz method.
A special case of the Galerkin method is the least-squares collocation or
moment discretization: it is defined for integral operators K with continuous
kernel and the delta functions δ(s − si ) as the basis functions ψi . In [17] it is
shown that least-squares collocation is a particular projection method of the
type described in Section 1.12 in which the projection is made on the space
Y and therefore a regularization method itself.
For discrete ill-posed problems, we have already noticed that the Picard
1.16 Tikhonov regularization
41
Criterion is always satisfied. However, it is possible to state a Discrete Picard
Criterion as follows (cf. [32] and [36]).
Definition 1.15.1 (Discrete Picard condition). Fix a singular value decomposition of the matrix A = UΛV∗ where U and V are constituted by the
singular vectors of A thought as column vectors. The unperturbed right-hand
side b in a discrete ill-posed problem satisfies the discrete Picard condition
if the SVD coefficients |huj , bi| on the average decay to zero faster than the
singular values λj .
Unfortunately, the SVD coefficients may have a non-monotonic behavior,
thus it is difficult to give a precise definition.
For many discrete ill-posed problems arising from integral equations of the
first kind the Discrete Picard Criterion is satisfied for exact data. In general,
it is not satisfied when the data is perturbed by the noise.
We shall return to this argument later on and we will see how the plot of the
SVD coefficients may help to understand the regularizing properties of some
regularization methods.
1.16
Tikhonov regularization
The most famous regularization method was introduced by A.N. Tikhonov
in 1963 (cf. [90], [91]).
In the linear case, it fits the general framework of Section 1.13 and fulfills
the assumptions of Theorem 1.13.1 with
gσ (λ) :=
1
.
λ+σ
(1.108)
The regularization parameter σ stabilizes the computation of the approximate solutions
xδσ = (A∗ A + σ)−1 A∗ y δ ,
(1.109)
which can therefore be defined by the following regularized version of the
normal equation:
A∗ Axδσ + σxδσ = A∗ y δ .
(1.110)
42
1. Regularization of ill-posed problems in Hilbert spaces
Tikhonov regularization can be studied from a variational point of view,
which is the key to extend it to nonlinear problems:
Theorem 1.16.1. Let xσ be as in (1.109). Then xδσ is the unique minimizer
of the Tikhonov functional
x 7→ kAx − y δ k2 + σkxk2 .
(1.111)
As an illustrative example, we calculate the functions defined in the previous chapter in the case of Tikhonov regularization.
• Remembering that gσ (λ) =
Gσ =
1
σ
1
, we obtain immediately that
λ+σ
and rσ (λ) = 1 − gσ (λ) =
σ
.
σ+λ
(1.112)
• The computation of ωµ (σ) requires an estimate for the function
hµ (σ) := λµ
σ
.
λ+σ
(1.113)
Calculating the derivative of hµ brings to
h′µ (λ) = rσ (λ)λµ−1 (µ −
thus if µ < 1 hµ has a maximum for λ =
λ
),
λ+σ
σµ
1−µ
(1.114)
and we obtain
hµ (σ) ≤ µµ (1 − µ)1−µ σ µ ,
(1.115)
whereas h′µ (λ) > 0 for µ ≥ 1, so hµ assumes its maximum for λ = kAk2 .
Putting this together, we conclude that for ωµ we can take
ωµ (σ) =
with c=kAk2µ−2 .
(
σµ, µ ≤ 1
cσ, µ > 1
(1.116)
1.16 Tikhonov regularization
43
The results of Section 1.13 can thus be applied to Tikhonov regularization:
in particular, according to Corollary 1.13.1, as long as µ ≤ 1 Tikhonov
regularization with the parameter choice rule (1.97) is of optimal order in Xµ,ρ
23
and the best possible convergence rate, obtained when µ = 1 and α ∼ ρδ ,
is given by
2
kxδα − x† k = O(δ 3 )
(1.117)
for x† ∈ X1,ρ .
Due to the particular form of the function ωµ found in (1.116), the Tikhonov
method saturates, since (1.103) holds only for µ ≤ 1. Thus Tikhonov regula-
rization has finite qualification µ0 = 1. A result similar to Theorem 1.14.2
can be proved (see [17] or [22]).
Theorem 1.16.2. Let K be compact with infinite-dimensional range, xδσ be
defined by (1.109) with K instead of A. Let α = α(δ, y δ ) be any parameter
choice rule. Then
2
sup{kxδα − x† k : kQ(y − y δ )k ≤ δ} = o(δ 3 )
(1.118)
implies x† = 0.
The Tikhonov regularization method was also studied on convex subsets
of the Hilbert space X . This can be of particular interest in certain appli-
cations such as image deblurring where we can take X = L2 ([0, 1]2 ) and the
solution lies in the convex set C := {x ∈ X | x ≥ 0}. A quick treatment of
the argument can be found in [17] and for details we suggest [71].
The Tikhonov method is now well understood, but has some drawbacks:
1. It is quite expensive from a computational point of view, since it requires an inversion of the operator A∗ A + σ.
2. For every choice of the regularization parameter the operator to be
inverted in the formula (1.109) changes, thus if α is chosen in the wrong
way the computations should be restarted.
44
1. Regularization of ill-posed problems in Hilbert spaces
3. As a matter of fact, Tikhonov regularization calculates a smooth solution.
4. It has finite qualification.
The third drawback implies that Tikhonov regularization may not work very
well if one looks for irregular solutions. For this reason, in certain problems
such as image processing nowadays many researchers prefer to rely on other
methods based on the minimization of a different functional. More precisely,
in the objective function (1.111), kxk2 is replaced by a term that takes into
account the nature of the sought solution x† . For example, one can choose a
version of the total variation of x, which often provides very good practical
results in imaging (cf. e.g. [96]).
The fourth problem can be overcome by using a variant of the algorithm
known as iterative Tikhonov regularization (cf. e.g. [17]).
The points 1 and 2 are the main reasons why we prefer iterative methods
to Tikhonov regularization. Nevertheless, the Tikhonov method is still very
popular. In fact, it can be combined with different methods, works well in
certain applications (e.g. when the sought solution is smooth) and it remains
one of the most powerful weapon against ill posed problems in the nonlinear
case.
1.17
The Landweber iteration
A different way of regularizing an ill-posed problem is the approach of the
iterative methods: consider the direct operator equation (1.20), i.e. the
problem of calculating y from x and A. If the computation of y is easy and
reasonably cheap, the iterative methods form a sequence of iterates {xk }
based on the direct solution of (1.20) that xk converges to x† .
It turns out that for many iterative methods the iteration index k plays the
role of the regularization parameter σ and that these methods can be studied
from the point of view of the regularization theory developed in the previous
1.17 The Landweber iteration
45
sections.
When dealing with perturbed data y δ , in order to use regularization techniques, one has to write the iterates xδk in terms of an operator Rk (and of
course of y δ ). Now, Rk may depend or not on y δ itself. If it doesn’t, then the
resulting method fits completely the general theory of linear regularization:
this is the case of the Landweber iteration, the subject of the present section.
Otherwise, some of the basic assumptions of the regularization theory may
fail to be true (e.g. the operator mapping y δ to xδk may be not continuous):
conjugate gradient type methods, which will we discussed in detail in the
next chapter, fit into this category.
In spite of the difficulties arising from this problem, in practice the conjugate
gradient type methods are known as a very powerful weapon against ill-posed
problems, whereas the Landweber iteration has the drawback of being a very
slow method and is mainly used in nonlinear problems. For this reason, in
this section we give only an outline of the main properties of the Landweber
iteration and skip most of the proofs.
The idea of the Landweber method is to approximate A† y with a sequence
of iterates {xk }k∈N transforming the normal equation (1.32) into equivalent
fixed point equations like
x = x + A∗ (y − Ax) = (I − A∗ A)x.
(1.119)
In practice, one is given the perturbed data y δ instead of y and defines the
Landweber iterates as follows:
Definition 1.17.1 (Landweber Iteration). Fix an initial guess xδ0 ∈ X
and for k = 1, 2, 3, ... compute the Landweber approximations recursively via
the formula
xδk = xδk−1 + A∗ (y δ − Axδk−1 ).
(1.120)
We observe that in the definition of the Landweber iterations we can
suppose without loss of generality kAk ≤ 1. If this were not the case, then
we would introduce a relaxation parameter ω with 0 < ω ≤ kAk−2 in front
46
1. Regularization of ill-posed problems in Hilbert spaces
of A∗ , i.e. we would iterate
xδk = xδk−1 + ωA∗(y δ − Axδk−1 ), k ∈ N.
(1.121)
1
In other words, we would multiply the equation Ax = y δ by ω 2 and iterate
with (1.120).
Moreover, if {zkδ } is the sequence of the Landweber iterates with initial guess
z0δ = 0 and data y δ − Axδ0 , then xδk = xδ0 + zkδ , so that supposing xδ0 = 0 is no
restriction too.
Since we have assumed kAk2 = 1 < 2, then I − A∗ A is nonexpansive and one
may apply the method of successive approximations. It is important to note
that in the ill-posed case I − A∗ A is no contraction, because the spectrum of
A∗ A clusters at the origin. For example, if A is compact, then there exists a
sequence {λn } of eigenvalues of A∗ A such that |λn | → 0 as n → +∞ and for
a corresponding sequence of eigenvectors {vn } one has
k(I − A∗ A)vn k
k(1 − λn )vn k
=
= |1 − λn | −→ 1 as n → +∞,
kvn k
kvn k
i.e. k(I − A∗ A)k ≥ 1.
Despite this, already in 1956 in his work [63], Landweber was able to prove
the following strong convergence result in the case of compact operators (our
proof, taken from [17] makes use of the regularization theory and is valid in
the general case of linear and bounded operators).
Theorem 1.17.1. If y ∈ D(A† ), then the Landweber approximations xk
corresponding to the exact data y converge to A† y as k → +∞. If y ∈
/
D(A† ), then kxk k → +∞ as k → +∞.
Proof. By induction, the iterates xk may be expressed in the form
xk =
k−1
X
j=0
(I − A∗ A)j A∗ y.
(1.122)
Suppose now y ∈ D(A† ). Then since A∗ y = A∗ Ax† , we have
k−1
k−1
X
X
∗
j ∗
†
†
∗
(I−A∗ A)j x† = (I−A∗ A)k x† .
x −xk = x − (I−A A) A Ax = x −A A
†
†
j=0
j=0
1.17 The Landweber iteration
47
Thus we have found the functions g and r of Section 1.13:
gk (λ) =
k−1
X
j=0
(1 − λ)j ,
(1.123)
rk (λ) = (1 − λ)k .
Since kAk ≤ 1 we consider λ ∈ (0, 1]: in this interval λgk (λ) = 1 − rk (λ)
is uniformly bounded and gk (λ) converges to
1
λ
as k → +∞ because rk (λ)
converges to 0. We can therefore apply Theorem 1.13.1 to prove the assertion,
with k −1 playing the role of σ.
The theorem above states that the approximation error of the Landweber
iterates converges to zero. Next we examine the behavior of the propagated
data error: on the one hand, according to the same theorem, if y δ ∈
/ D(A† )
the iterates xk must diverge; on the other hand, the operator Rk defined by
Rk y δ := xδk
is equal to
(1.124)
k−1
P
j=0
(I − A∗ A)j A∗ . Therefore, for fixed k, xδk depends continuously
on the data so that the propagation error cannot be arbitrarily large.
This leads to the following result.
Proposition 1.17.1. Let y, y δ be a pair of right-hand side data with ky −
y δ k ≤ δ and let {xk } and {xδk } be the corresponding two iteration sequences.
Then we have
kxk − xδk k ≤
√
kδ,
k ≥ 0.
(1.125)
Remark 1.17.1. According to the previous results, the total error has two
components, an approximation error converging (slowly) to 0 and the propa√
gated data error of the order at most kδ. For small values of k the data
error is negligible and the total error seems to converge to the exact solution
√
A† y, but when kδ reaches the order of magnitude of the approximation
error, the data error is no longer hidden in xδk and the total error begins to
increase and eventually diverges.
48
1. Regularization of ill-posed problems in Hilbert spaces
The phenomenon described in Remark 1.17.1 is called semi-convergence
and is very typical of iterative methods for solving inverse problems. Thus the
regularizing properties of iterative methods for ill-posed problems ultimately
depend on a reliable stopping rule for detecting the transient from convergence to divergence: the iteration index k plays the part of the regularization
parameter σ and the stopping rule is the counterpart of the parameter choice
rule for continuous regularization methods.
As an a-posteriori stopping rule, a discrete version of the Discrepancy Principle can be considered:
Definition 1.17.2. Fix τ > 1. For k = 0, 1, 2, ... let xδk be the k-th iterate
of an iterative method for solving Ax = y with perturbed data y δ such that
ky − y δ k ≤ δ, δ > 0. The stopping index kD = kD (δ, y δ ) corresponding to the
Discrepancy Principle is the biggest k such that
ky δ − Axδk k ≤ τ δ.
(1.126)
Of course, one should prove that the stopping index kD is well defined,
i.e. there is a finite index such that the residual ky δ − Axδk k is smaller than
the tolerance τ δ.
In the case of the Landweber iteration, we observe that the residual can be
written in the form
y δ − Axδk = y − Axδk−1 − AA∗ (y δ − Axδk−1 ) = (I − AA∗ )(y δ − Axδk−1 ),
hence the non-expansivity of I −AA∗ implies that the residual norm is mono-
tonically decreasing. However, this is not enough to show that the discre-
pancy principle is well defined. For this, a more precise estimate of the
residual norm is needed. This estimate can be found (cf. [17], Proposition
6.4) and leads to the following result.
Proposition 1.17.2. The Discrepancy Principle defines a finite stopping
index kD (δ, y δ ) with kD (δ, y δ ) = O(δ −2).
The regularization theory can be used to prove the order optimality of
the Landweber iteration with the Discrepancy Principle as a stopping rule:
1.17 The Landweber iteration
49
Theorem 1.17.2. For fixed τ > 1, the Landweber iteration with the discrepancy principle is convergent for every y ∈ R(A) and of optimal order in
Xµ , for every µ > 0.
2
Moreover, if A† y ∈ Xµ , then kD (δ, y δ ) = O(δ − 2µ+1 ) and the Landweber
method has qualification µ0 = +∞.
As mentioned earlier, the main problem of the Landweber iteration is
that in practice it requires too many iterations until the stopping criterion
is met. Another stopping rule has been proposed in [15], but it requires a
2
2µ+1
1
if
2µ+1
similar number of iterations. In fact, it can be shown that the exponent
cannot be improved in general. However, it is possible to reduce it to
more sophisticated methods are used. Such accelerated Landweber methods
are the so called semi-iterative methods (with the ν-methods as significant
examples): they will not be treated here, since we will focus our attention
in greater detail on the conjugate gradient type methods, which are considered quicker (or at least not slower) and more flexible than the accelerated
Landweber methods. For a complete coverage of the semi-iterative methods,
see [17].
50
1. Regularization of ill-posed problems in Hilbert spaces
Chapter 2
Conjugate gradient type
methods
This chapter is entirely dedicated to the conjugate gradient type methods.
These methods are mostly known for being fast and robust solvers of large
systems of linear equations: for example, the classical Conjugate Gradient
method (CG), introduced for the first time by Hestenes and Stiefel in 1952
(see [45]), finds the exact solution of a linear system with a positive definite
N ×N matrix in at most N iterative steps, cf. Theorem 2.1.1 below. For this
reason, the importance of these methods goes far beyond the regularization
of ill-posed problems, although here they will be studied mainly from this
particular point of view.
One can approach the conjugate gradient type methods in many different
ways: it is possible to see them as optimization methods or as projection
methods. Alternatively, one can study them from the point of view of the
orthogonal polynomials. In each case, the Krylov spaces are fundamental in
the definition of the conjugate gradient type methods, so that they are often
regarded as Krylov methods.
Definition 2.0.3. Let V be a vector space and let A be a linear map from
V to itself. For a given vector x0 ∈ V and for k ∈ N, the k-th Krylov space
51
52
2. Conjugate gradient type methods
(based on x0 ) is the linear subspace of V defined by
Kk−1 (A; x0 ) := span{x0 , Ax0 , A2 x0 , ..., Ak−1 x0 }.
(2.1)
A Krylov method selects the k-th iterative approximate solution xk of x†
as an element of a certain Krylov space depending on A and x0 satisfying
certain conditions.
In particular, a conjugate gradient type method chooses the minimizer of a
particular function in the shifted space x0 + Kk−1 (A; y − Ax0 ) with respect
to a particular measure.
We will introduce the subject in a finite dimensional setting with an optimization approach, but in order to understand the regularizing properties
of the algorithms in the general framework of Chapter 1 the main analysis
will be developed in Hilbert spaces using orthogonal polynomials. The main
reference for this chapter is the book of M. Hanke [27]. For the finite dimensional introduction, we will follow [59].
2.1
Finite dimensional introduction
For N ∈ N we denote by
h·, ·i : RN × RN −→ R
(2.2)
the standard scalar product on RN inducing the euclidean norm k · k.
For a matrix A ∈ Mm,N (R), m ∈ N, kAk denotes the norm of A as a linear
operator from RN to Rm .
For notational convenience, here and below a vector x ∈ RN will be thought
as a column vector x ∈ MN,1 (R), thus x∗ will be the row vector transposed
of x.
We consider the linear system
Ax = b,
with A ∈ GLN (R) symmetric and positive definite, b ∈ RN , N >> 1.
(2.3)
2.1 Finite dimensional introduction
53
Definition 2.1.1. The conjugate gradient method for solving (2.3) generates
a sequence {xk }k∈N in RN such that for each k the k-th iterate xk minimizes
1
φ(x) := x∗ Ax − x∗ b
2
on x0 + Kk−1 (A; r0 ), with r0 := b − Ax0 .
(2.4)
Of course, when the minimization is made on the whole space, then the
minimizer is the exact solution x† .
Due to the assumptions made on the matrix A, there are an orthogonal
matrix U ∈ ON (R) and a diagonal matrix Λ = diag{λ1 , ..., λN }, with λi > 0
for every i = 1, ..., N, such that
A = UΛU∗
and (2.5) can be used to define on RN the so called A-norm
√
kxkA := x∗ Ax.
(2.5)
(2.6)
It turns out that the minimization property of xk can be read in terms of
this norm:
Proposition 2.1.1. If Ω ⊆ RN and xk minimizes the function φ on Ω, then
it minimizes also kx† − xkA = krkA−1 on Ω, with r = b − Ax.
Proof. Since Ax† = b and A is symmetric, we have
kx† − xk2A = (x† − x)∗ A(x† − x) = x∗ Ax − x∗ Ax† − (x† )∗ Ax + (x† )∗ Ax†
= x∗ Ax − 2x∗ b + (x† )∗ Ax† = 2φ(x) + (x† )∗ Ax† .
(2.7)
Thus the minimization of φ is equivalent to the minimization of kx† − xk2A
(and consequently of kx† − xkA ).
Moreover, using again the symmetry of A,
kx − x† k2A = (A(x − x† ))∗ A−1 (A(x − x† )) = (Ax − b)∗ A−1 (Ax − b)
= kAx − bk2A−1
and the proof is complete.
(2.8)
54
2. Conjugate gradient type methods
Remark 2.1.1. Proposition 2.1.1 has the following consequences:
1. The k-th iterate of CG minimizes the the approximation error
εk := xk − x†
in the A-norm in the shifted Krylov space x0 + Kk−1 (A; r0 ).
Since a generic element x̆ of x0 + Kk−1 (A; r0) can be written in the
form
x̆ = x0 +
k−1
X
j
γj A r0 = x0 +
j=0
k−1
X
j=0
γj Aj+1(x† − x0 )
for some coefficients γ0 , ..., γk−1, if we define the polynomials
qk−1 (λ) :=
k−1
X
γ j λj ,
(2.9)
j=0
pk (λ) := 1 − λqk−1 (λ),
we obtain that
x† − x̆ = x† − x0 − qk−1 (A)r0 = x† − x0 − qk−1(A)A(x† − x0 )
= pk (A)(x† − x0 ).
(2.10)
Hence the minimization property of xk can also be written in the form
kx† − xk kA = min0 kp(A)(x† − x0 )kA ,
(2.11)
p∈Πk
where Π0k is the the set of all polynomials p of degree equal to k such
that p(0) = 1.
2. For every p ∈ Πk := {polynomials of degree k} one has
p(A) = Up(Λ)U∗ .
1
1
Moreover, the square root of A is well defined by A 2 := UΛ 2 U∗ , with
1
1
1
Λ 2 := diag{λ12 , ..., λN2 } and immediately there follows
1
kxk2A = kA 2 xk2 , x ∈ RN .
(2.12)
2.1 Finite dimensional introduction
55
Consequently, since the norm of a symmetric, positive definite matrix
is equal to its largest eigenvalue, we easily get:
1
kp(A)xkA = kp(A)A 2 xk ≤ kp(A)kkxkA , ∀x ∈ RN , ∀p ∈ Πk , (2.13)
kx† − xk kA ≤ k(x† − x0 )kA min0
max |p(λ)|,
p∈Πk λ∈spec(A)
(2.14)
where spec(A) denotes the spectrum of the matrix A.
The last inequality can be reinterpreted in terms of the relative error:
Corollary 2.1.1. Let A be symmetric and positive definite and let {xk }k∈N
be the sequence of iterates of the CG method. If k ≥ 0 is fixed and p is any
polynomial in Π0k , then the relative error is bounded as follows:
kx† − xk kA
≤ max |p(λ)|.
kx† − x0 kA λ∈spec(A)
(2.15)
This leads to the most important result about the Conjugate Gradient
method in RN .
Theorem 2.1.1. If A ∈ GLN (R) is a symmetric and positive definite matrix
and b is any vector in RN , then CG will find the solution x† of (2.3) in at
most N iterative steps.
Proof. It is enough to define the polynomial
N
Y
λj − λ
p̄(λ) =
,
λj
j=1
observe that p̄ belongs to Π0N and use Corollary 2.1.1: since p̄ vanishes on
the spectrum of A, kx† − xN kA must be equal to 0.
This result is of course very pleasant, but not so good as it seems: first,
if N is very large, N iterations can be too many. Then, we should remember
that we usually have to deal with perturbed data and if A is ill-conditioned
finding the exact solution of the perturbed system can lead to very bad
results. The first problem will be considered immediately, whereas for the
56
2. Conjugate gradient type methods
second one see the next sections.
A-priori information about the data b and the spectrum of A can be very
useful to improve the result stated in Theorem 2.1.1: we consider two different
situations in which the same improved result can be shown.
Proposition 2.1.2. Let uj ∈ RN , j = 1, ..., N, be the columns of a matrix
U for which (2.5) holds. Suppose that b is a linear combination of k of these
N eigenvectors of A:
b=
k
X
γl ∈ R,
γl uil ,
l=1
1 ≤ i1 < ... < ik ≤ N.
(2.16)
Then, if we set x0 := 0, CG will converge in at most k iteration steps.
Proof. For every l = 1, ..., k, let λil be the eigenvalue corresponding to the
eigenvector uil . Then obviously
k
X
γl
x =
ui
λ il l
l=1
†
and we proceed as in the proof of Theorem 2.1.1 defining
k
Y
λ il − λ
p̄(λ) =
.
λ il
l=1
Now p̄ belongs to Π0k and vanishes on λil for every l, so
†
p̄(A)x =
k
X
p̄(λil )
l=1
γl
ui = 0
λ il l
and we use the minimization property
kx† − xk kA ≤ kp̄(A)x† kA = 0
to conclude.
In a similar way it is possible to prove the following statement.
2.1 Finite dimensional introduction
57
Proposition 2.1.3. Suppose that the spectrum of A consists of exactly k
distinct eigenvalues. Then CG will find the solution of (2.3) in at most k
iterations.
One can also study the behavior of the relative error measured in the
euclidean norm in terms of the condition number of the matrix A:
1. Let λ1 ≥ λ2 ≥ ... ≥ λN be the eigenvalues of A.
Proposition 2.1.4.
Then for every x ∈ RN we have
1
1
kxkA λN2 ≤ kAxk ≤ kxkA λ12 .
(2.17)
2. If κ2 (A) := kAkkA−1k is the condition number of A, then
kr0 k kxk − x† kA
kb − Axk k p
≤ κ2 (A)
.
kbk
kbk kx0 − x† kA
(2.18)
Proof. Let uj ∈ RN (j = 1, ..., N) be the columns of a matrix U as in the
proof of Proposition 2.1.2. Then
Ax =
N
X
λj (u∗j x)uj ,
j=1
so
1
λN kxk2A = λN kA 2 xk2A = λN
≤ kAxk2 ≤ λ1
N
X
j=1
N
X
λj (u∗j x)2
j=1
(2.19)
1
2
λj (u∗j x)2 = λ1 kA xk2A = λ1 kxk2A ,
which proves the first part.
For the second statement, recalling that kA−1 k = λ−1
N and using the previous
inequalities, we obtain
kb − Axk k
kA(x† − xk )k
=
≤
kr0 k
kA(x† − x0 )k
r
λ1 kx† − xk kA p
kxk − x† kA
= κ2 (A)
.
λN kx† − x0 kA
kx0 − x† kA
58
2. Conjugate gradient type methods
At last, we mention a result of J.W. Daniel (cf. [8]) that provides a bound
for the relative error, which is, in some sense, as sharp as possible:
p
kxk − x† kA
≤2
kx0 − x† kA
κ2 (A) − 1
p
κ2 (A) + 1
!k
.
(2.20)
We conclude the section with a couple of examples that show clearly the
efficiency of this method.
Example 2.1.1. Suppose we know that the spectrum of the matrix A is
contained in the interval I1 :=]9, 11[. Then, if we put x0 := 0 and
p̄k (λ) :=
(10 − λ)k
,
10k
since p̄k lies in Π0k the minimization property (2.11) gives
kxk − x† kA ≤ kx† k max |p̄k (λ)| = p̄k (9) = 10−k .
9≤λ≤11
(2.21)
Thus after k iteration steps, the relative error in the A-norm will be reduced
of a factor 10−3 when 10−k ≤ 10−3 , i.e. when k ≥ 3.
Observing that κ2 (A) ≤
11
,
9
the estimate (2.18) can be used to deduce that
kAxk − bk
≤
kbk
√
11 −k
10 ,
3
so the norm of the residual will be reduced of 10−3 when 10−k ≤
√
√λ−1
λ+1
(2.22)
√3 10−3 ,
11
i.e.
is strictly increasing in
when k ≥ 4. Moreover, since the function λ 7→
√
√
κ2 (A)−1
is bounded by √11−3
]0, +∞[, √
and Daniel’s inequality provides an
11+1
κ2 (A)+1
improved version of (2.21).
Even Daniel’s estimate can be very pessimistic if we have more precise
information about the spectrum of A. For instance, if all the eigenvalues
cluster in a small number of intervals, the condition number of A can be
very huge, but CG can perform very well, as the following second example
shows.
2.2 General definition in Hilbert spaces
59
Example 2.1.2. Suppose that the spectrum of A is contained in the intervals
I1 := (1, 1.50) and I2 := (399, 400) and put x0 := 0.
The best we can say about the condition number of A is that κ2 (A) ≤ 400,
which inserted in Daniel’s formula gives
k
kxk − x† k
19
≤2
≈ 2(0.91)k ,
†
kx k
21
(2.23)
predicting a slow convergence.
However, if we take
p̄3k (λ) :=
(1.25 − λ)k (400 − λ)2k
(1.25)k (400)2k
we easily see that
max |p̄3k (λ)| ≤
λ∈spec(A)
0.25
1.25
k
= (0.2)k ,
(2.24)
providing a much sharper estimate. More precisely, in order to reduce the
relative error in the A-norm of the factor 10−3 Daniel predicts 83 iteration
10 (2000)
steps, since 2(0.91)k < 10−3 when k > − log
≈ 82.5. Instead, according
log (0.91)
10
to the estimate based on p̄3k the relative error will be reduced of the factor 10−3
after k = 3i iterations when (0.2)i < 10−3 , i.e. when i > − log
3
10 (0.2)
hence it predicts only 15 iterations!
≈ 4.3,
In conclusion, in the finite dimensional case we have seen that the Conjugate Gradient method combines certain minimization properties in a very
efficient way and that a-priori information can be used to predict the strength
of its performance. Moreover, the polynomials qk and pk can be used to understand its behavior and can prove to be very useful in particular cases.
2.2
General definition in Hilbert spaces
In this section we define the conjugate gradient type methods in the usual
Hilbert space framework. As a general reference and for the skipped proofs,
we refer to [27].
60
2. Conjugate gradient type methods
If not said otherwise, here the operator A acting between the Hilbert spaces
X and Y will be self-adjoint and positive semi-definite with its spectrum
contained in [0, 1].1
For n ∈ N0 := N ∪ {0} , fix an initial guess x0 ∈ X of the solution A† y of
Ax = y and consider the bilinear form defined on the space of all polynomials
Π∞ by
[φ, ψ]n : = hφ(A)(y − Ax0 ), An ψ(A)(y − Ax0 )i
Z ∞
=
φ(λ)ψ(λ)λn dkEλ (y − Ax0 )k2 ,
(2.25)
0
where {Eλ } denotes the spectral family associated to A.
Then from the theory of orthogonal polynomials (see, e.g., [88] Chapter II)
we know that there is a well defined sequence of orthogonal polynomials
[n]
[n]
{pk } such that pk ∈ Πk and
[n]
[n]
k 6= j.
[pk , pj ]n = 0,
(2.26)
Moreover, if we force these polynomials to belong to Π0k , the sequence is univocally determined and satisfies a well known three-term recurrence formula,
given by
[n]
[n]
[n]
p0 = 1, p1 = 1 − α0 λ,
[n]
[n]
[n]
[n]
[n]
[n]
pk+1 = −αk λpk + pk − αk
[n]
βk
[n]
αk−1
[n]
[n]
pk−1 − pk , k ≥ 1,
(2.27)
[n]
where the numbers αk 6= 0 and βk , k ≥ 0 can be computed explicitly (see
below).
The k-th iterate of a conjugate gradient type method is given by
[n]
[n]
xk := x0 + qk−1(A)(y − Ax0 ),
(2.28)
[n]
where the iteration polynomials {qk−1} are related to the residual polyno[n]
mials {pk } via
[n]
[n]
qk−1(λ) =
1
1 − pk
λ
∈ Πk−1 .
(2.29)
Of course, if this is not the case, the equation Ax = y can always be rescaled to
guarantee kAk ≤ 1.
2.3 The algorithms
61
[n]
The expression residual polynomial for pk is justified by the fact that
y−
[n]
Axk
[n]
= y − A x0 + qk−1 (A)(y − Ax0 )
[n]
= y − Ax0 − Aqk−1 (A)(y − Ax0 )
[n]
= I − Aqk−1 (A) (y − Ax0 )
(2.30)
[n]
= pk (A)(y − Ax0 ).
Moreover, if y ∈ R(A) and x ∈ X is such that Ax = y, then
[n]
[n]
[n]
x − xk = x − x0 − qk−1 (A)A(x − x0 ) = pk (A)(x − x0 ).
(2.31)
In the following sections, in order to simplify the notations, we will omit
the superscript n and the dependance of pk and qk from y unless strictly
necessary.
2.3
The algorithms
In this section we describe how the algorithms of the conjugate gradient type
methods can be derived from the general framework of the previous section.
We refer basically to [27], adding a few details.
Let n ∈ Z, n ≥ 0.
Proposition 2.3.1. Due to the recurrence formula (2.27), the iteration polynomials satisfy
q−1 = 0, q0 = α0 ,
βk
qk = qk−1 + αk pk +
(qk−1 − qk−2 ) , k ≥ 1.
αk−1
(2.32)
Proof. By the definition of the iteration polynomials, we have
λq−1 (λ) = 1 − 1 = 0,
q0 (λ) =
1 − p1 (λ)
= α0
λ
(2.33)
62
2. Conjugate gradient type methods
and for k ≥ 1 the recurrence formula for the pk gives
−1
1 + αk λpk (λ) − pk (λ) + αk αk−1
βk (pk−1 (λ) − pk (λ))
1 − pk+1 (λ)
=
λ
λ
−1
αk λ − λ2 αk qk−1 (λ) + λqk−1 (λ) + αk αk−1
βk (λqk−1 (λ) − λqk−2 (λ))
=
λ
αk
= αk pk (λ) + qk−1 (λ) +
βk (qk−1 (λ) − qk−2(λ)) .
αk−1
(2.34)
qk (λ) =
Proposition 2.3.2. The iterates xk of the conjugate gradient type methods
can be computed with the following recursion:
∆x0 = y − Ax0 ,
x1 = x0 + α0 ∆x0 ,
∆xk = y − Axk + βk ∆xk−1 ,
xk+1 = xk + αk ∆xk ,
k ≥ 1.
(2.35)
Proof. Since q0 = α0 , the relation between x1 and x0 is obvious.
We proceed by induction on k. From the definitions of xk and xk+1 there
follows
xk+1 = xk + (qk − qk−1 )(A)(∆x0 )
(2.36)
and now using Proposition 2.3.1 and the induction we have:
αk
(qk − qk−1 )(A)(∆x0 ) = αk pk (A)(∆x0 ) +
βk (qk−1 − qk−2 )(A)(∆x0 )
αk−1
(qk−1 − qk−2 )(A)(∆x0 )
= αk (y − Axk ) + αk βk
αk−1
= αk (y − Axk + βk ∆xk−1 ).
(2.37)
Proposition 2.3.3. Define
s0 := 1,
sk := pk + βk sk−1 , k ≥ 1.
(2.38)
Then for every k ≥ 0 the following relations hold:
∆xk = sk (A)(y − Ax0 ),
(2.39)
pk+1 = pk − αk λsk .
(2.40)
2.3 The algorithms
63
Proof. For k = 0, the first relation is obviously satisfied. For k ≥ 1, using
induction again we obtain:
∆xk = y − Axk + βk ∆xk−1 = pk (A)(∆x0 ) + βk sk−1 (A)(∆x0 ) = sk (A)(∆x0 ),
which proves (2.39).
To see (2.40), it is enough to consider the relations
xk+1 − xk
= ∆xk = sk (A)(∆x0 ),
αk
xk+1 − xk = (qk (A) − qk−1(A)) (∆x0 ),
λ (qk (λ) − qk−1 (λ)) = pk (λ) − pk+1(λ)
and link them together.
[n]
Proposition 2.3.4. The sequence {sk } = {sk }k∈N is orthogonal with respect
to the inner product [·, ·]n+1. More precisely, if ℓ denotes the number of the
nonzero points of increase of the function α(λ) = kEλ (∆x0 )k2 , then
[n]
[n+1]
pk
=
2
[n]
1 pk − pk+1
[n]
[n]
, with πk,n := (pk )′ (0) − (pk+1)′ (0) > 0
πk,n
λ
(2.41)
for every 0 ≤ k < ℓ.
Proof. A well known fact from the theory of orthogonal polynomials is that
[n]
[n]
pk has k simple zeros λj,k , j = 1, ..., k, with
[n]
[n]
[n]
0 < λ1,k < λ2,k < ... < λk,k ≤ kAk ≤ 1.
As a consequence, we obtain
[n]
pk (λ)
=
k
Y
j=1
1−
λ
[n]
λj,k
!
,
[n]
(pk )′ (0)
=−
k
X
1
[n]
j=1
λj,k
.
(2.42)
[n]
Thus (pk )′ (0) ≤ −k. Moreover, the zeros of two consecutive orthogonal
polynomials interlace, i.e.
[n]
[n]
[n]
[n]
[n]
[n]
0 < λ1,k+1 < λ1,k < λ2,k+1 < λ2,k < ... < λk,k < λk+1,k+1,
2
of course, ℓ can be finite or infinity: in the ill-posed case, since the spectrum of A∗ A
clusters at 0, it is infinity.
64
2. Conjugate gradient type methods
so πk,n > 0 holds true.
Now observe that by the definition of πk,n the right-hand side of (2.41) lies
in Π0k . Denote this polynomial by p. For any other polynomial q ∈ Πk−1 we
have
[p, q]n+1 =
[n+1]
and since pk
1
[n]
πk,n
[n]
[pk − pk+1 , q]n =
1
πk,n
[n]
[n]
([pk , q]n − [pk+1 , q]n ) = 0
is the only polynomial in Π0k satisfying this equation for
[n+1]
every q ∈ Π0k−1 , p = pk
. The orthogonality of the sequence {sk } follows
immediately from Proposition 2.3.3.
Proposition 2.3.5. If the function α(λ) defined in Proposition 2.3.4 has
ℓ = ∞ points of increase, the coefficients αk and βk appearing in the formulas
(2.35) of Proposition 2.3.2 can be computed as follows:
αk =
βk =
[pk , pk ]n
,
[sk , sk ]n+1
k ≥ 0,
[pk , pk ]n
[pk , pk ]n
=
,
αk−1 [sk−1 , sk−1 ]n+1
[pk−1 , pk−1 ]n
1
(2.43)
k ≥ 1.
(2.44)
Otherwise, the formulas above remain valid, but the iteration must be stopped
in the course of the (ℓ + 1)-th step since [sℓ , sℓ ]n+1 = 0 and αk is undefined.
In this case, we distinguish between the following possibilities:
[n]
• if y belongs to R(A), for every n ∈ N0 xℓ = A† y;
• if y has a non-trivial component along R(A)⊥ and n ≥ 1, then (I −
E0 )xℓ = A† y;
• if y has a non-trivial component along R(A)⊥ and n = 0, then the
conclusion (I − E0 )xℓ = A† y does not hold any more.
Proof. Note that in the ill-posed case (2.43) and (2.44) are well defined, since
all inner products are nonzero. By the orthogonality of {pk } and Proposition
2.3.3, for every k ≥ 0 we have
0 = [pk+1 , sk ]n = [pk , sk ]n − αk [λsk , sk ]n = [pk , pk ] − αk [sk , sk ]n+1 ,
2.3 The algorithms
65
which gives (2.43).
For every k ≥ 1, the orthogonality of {sk } with respect to [·, ·]n+1 yields
0 = [sk , sk−1 ]n+1 = [pk , λsk−1 ]n + βk [sk−1 , sk−1 ]n+1
1
=
[pk , pk−1 − pk ]n + βk [sk−1 , sk−1 ]n+1
αk−1
1
=−
[pk , pk ]n + βk [sk−1 , sk−1 ]n+1 ,
αk−1
(2.45)
which leads to (2.44).
Now suppose that ℓ < ∞. Then the bilinear form (2.25) turns out to be
[φ, ψ]n =
Z
+∞
n
λ φ(λ)ψ(λ)dα(λ) =
0
ℓ
X
λnj φ(λj )ψ(λj )
(2.46)
j=1
if y ∈ R(A) or if n ≥ 1, whereas if neither of these two conditions is satisfied
λ0 := 0 is the (ℓ + 1)-th point of increase of α(λ) and
[φ, ψ]n =
ℓ
X
λnj φ(λj )ψ(λj ).
j=0
[n]
If y ∈ R(A), since there exists a unique polynomial pk ∈ Π0k perpendicular
[n]
to Πk−1 such that pk (λj ) = 0 for j = 1, ..., k and consequently satisfying
[n]
[pk , pk ]n = 0, then kxℓ − x† k2 = kpℓ (A)(x0 − x† )k2 = 0. If y does not belong
to R(A) and n ≥ 1, since (2.46) is still valid, due to the same considerations
[n]
we obtain (I − E0 )xℓ = x† . Finally, in the case n = 0 it is impossible to find
[n]
pk as before, thus the same conclusions cannot be deduced.
From the orthogonal polynomial point of view, the minimization property
of the conjugate gradient type methods turns out to be an easy consequence
of the previous results.
Proposition 2.3.6. Suppose n ≥ 1, let xk be the k-th iterate of the corresponding conjugate gradient type method and let x be any other element in
the Krylov shifted subspace x0 + Kk−1 (A; y − Ax0 ). Then
kA
n−1
2
(y − Axk )k ≤ kA
n−1
2
(y − Ax)k
(2.47)
66
2. Conjugate gradient type methods
and the equality holds if and only if x = xk .
1
If n = 0 and y ∈ R(A 2 ), then kA
n−1
2
(y − Axk )k is well defined and the same
result obtained in the case n ≥ 1 remains valid.
Proof. Consider the case n = 1. In terms of the residual polynomials pk ,
(2.47) reads as follows:
[pk , pk ]n−1 ≤ [p, p],
for every p ∈ Π0k .
Since for every p ∈ Π0k there exists s ∈ Πk−1 such that p − pk = λs, by
orthogonality we have
[p, p]n−1 − [pk , pk ]n−1 = [p − pk , p + pk ]n−1 = [s, λs + 2pk ]n = [s, s]n+1 ≥ 0,
and the equality holds if and only if s = 0, i.e. if and only if p = pk .
1
If n = 0 and y ∈ R(A 2 ), then [pk , pk ]−1 is well defined by
Z ∞
[pk , pk ]−1 :=
p2k (λ)λ−1 dkEλ (y − Ax0 )k2
0+
and the proof is the same as above.
Note that in the case n = 0 this is the same result obtained in the discrete
case in Proposition 2.1.1.
The computation of the coefficients αk and βk allows a very easy and cheap
computation of the iterates of the conjugate gradient type-methods.
We focus our attention on the cases n = 1 and n = 0, corresponding respectively to the minimal residual method and the classical conjugate gradient
method.
2.3.1
The minimal residual method (MR) and the conjugate gradient method (CG)
• In the case n = 1, from Proposition 2.3.6 we see that the corresponding
method minimizes, in the shifted Krylov space x0 + Kk−1 (A; y − Ax0 ),
the residual norm. For this reason, this method is called minimal residual method (MR). Propositions 2.3.1-2.3.5 lead to Algorithm 1.
2.3 The algorithms
67
• In the case n = 0, using again Propositions 2.3.1-2.3.5 we find (cf.
Algorithm 2) the classical Conjugate Gradient method originally proposed by Hestenes and Stiefel in [45] in 1952. If y ∈ R(A), then according to Proposition 2.3.6, the k-th iterate xk of CG minimizes the
error x† − xk in x0 + Kk−1(A; y − Ax0 ) with respect to the energy-norm
hx† − xk , A(x† − xk )i.
Looking at the algorithms, it is important to note that for every iterative
step MR and CG must compute only once a product of the type Av with v
∈ X.
Algorithm 1 MR
r0 = y − Ax0 ;
d = r0 ;
Ad = Ar0 ;
k = 0;
while (not stop) do
α = hrk , Ark i/kAdk2;
xk+1 = xk + αd;
rk+1 = rk − αAd;
β = hrk+1 , Ark+1i/hrk , Ark i;
d = rk+1 + βd;
Ad = Ark+1 + βAd;
k = k + 1;
end while
2.3.2
CGNE and CGME
Suppose that the operator A fails to be self-adjoint and semi-definite, i.e. it
is of the type we discussed in Chapter 1. Then it is still possible to use the
conjugate gradient type methods, seeking for the (best-approximate) solution
of the equation
AA∗ υ = y
(2.48)
68
2. Conjugate gradient type methods
Algorithm 2 CG
r0 = y − Ax0 ;
d = r0 ;
k = 0;
while (not stop) do
α = krk k2 /hd, Adi;
xk+1 = xk + αd;
rk+1 = rk − αAd;
β = krk+1 k2 /krk k2 ;
d = rk+1 + βd;
k = k + 1;
end while
and putting x = A∗ υ.
In this more general case, we shall denote as usual with {Eλ } the spectral
family of A∗ A and with {Fλ } the spectral family of AA∗ . All the definitions
of the self-adjoint case carry over here, keeping in mind that they will always
refer to AA∗ instead of A and the corresponding iterates are
υk = υ0 + qk−1 (AA∗ )(y − Ax0 ).
(2.49)
The definition of the first iterate υ0 is not important, since we are not interested in calculating υk , but we are looking for xk . Thus we multiply both
sides of the equation (2.49) by A∗ and get
xk = x0 + A∗ qk−1(AA∗ )(y − Ax0 ) = x0 + qk−1 (A∗ A)A∗ (y − Ax0 ).
(2.50)
As in the self-adjoint case, the residual y − Axk is expressed in terms of the
residual polynomials pk corresponding to the operator AA∗ via the formula
y − Axk = pk (AA∗ )(y − Ax0 )
(2.51)
and if y = Ax for some x ∈ X , then
x − xk = pk (AA∗ )(x − x0 ).
(2.52)
2.3 The algorithms
69
As in the self-adjoint case, we consider the possibilities n = 1 and n = 0.
• If n = 1, according to Proposition 2.3.6, the iterates xk minimize the
residual norm in the Krylov shifted space x0 + Kk−1(A∗ A; A∗ (y −Ax0 )),
cf. Algorithm 3.
A very important fact concerning this case is that this is equal to the
direct application of CG to the normal equation
A∗ Ax = A∗ y,
as one can easily verify by using Proposition 2.3.6 or by comparing the
algorithms. This method is by far the most famous in literature and is
usually called CGNE, i.e. CG applied to the Normal Equation.
• It is also possible to apply CG to the equation (2.48), obtaining Algorithm 4: this corresponds to the choice n = 0 and by Proposition 2.3.6
if y ∈ R(A) the iterates xk minimize the error norm kx† − xk k in the
corresponding Krylov space.3
We conclude this section with a remark: forming and solving the equation
(2.48) can only lead to the minimal norm solution of Ax = y, because the
iterates xk = A∗ υk lie in R(A∗ ) ⊆ ker(A)⊥ , which is closed. Thus, if one is
looking for solutions different from x† , then should not rely on these methods.
2.3.3
Cheap Implementations
In [27] M. Hanke suggests an implementation of both gradient type methods
with n = 1 and n = 0 in one scheme, which requires approximately the same
computational effort of implementing only one of them. For this purpose,
further results (gathered in Proposition 2.3.7 below) from the theory of orthogonal polynomials are needed.
3
The reader should keep in mind the difference between CG and CGME: the former
minimizes xk − x† in the energy norm, whereas the latter minimizes exactly the norm of
the error kx† − xk k.
70
2. Conjugate gradient type methods
Algorithm 3 CGNE
r0 = y − Ax0 ;
d = A∗ r0 ;
k = 0;
while (not stop) do
α = kA∗ rk k2 /kAdk2 ;
xk+1 = xk + αd;
rk+1 = rk − αAd;
β = kA∗ rk+1 k2 /kA∗ rk k2 ;
d = A∗ rk+1 + βd;
k = k + 1;
end while
Algorithm 4 CGME
r0 = y − Ax0 ;
d = A∗ r0 ;
k = 0;
while (not stop) do
α = krk k2 /kdk2;
xk+1 = xk + αd;
rk+1 = rk − αAd;
β = krk+1 k2 /krk k2 ;
d = A∗ rk+1 + βd;
k = k + 1;
end while
2.3 The algorithms
71
For simplicity, in the remainder we will restrict to the case in which A is a
semi-definite, self-adjoint operator and the initial guess is the origin: x0 = 0.
For the proof of the following facts and for a more exhaustive coverage of
the argument, see [27]. The second statement has already been proved in
Proposition 2.3.4.
Proposition 2.3.7. Fix k ∈ N0 , k < ℓ. Then:
[n]
1. For n ∈ N, the corresponding residual polynomial pk can be written in
the form
[n]
pk
=
[n] [n]
[pk , pk ]n−1
k
X
[n−1]
[pj
[n−1] −1 [n−1]
]n−1 pj
, pj
(2.53)
j=0
and
kA
n−1
2
[n]
[n]
[n]
(y − Axk )k2 = [pk , pk ]n−1 =
k
X
[n−1]
[pj
[n−1] −1
]n−1
, pj
j=0
!−1
.
(2.54)
The same is true for n = 0 if and only if E0 y = 0, i.e. if and only if
the data y has no component along R(A)⊥ .
2. For n ∈ N0 there holds:
[n]
[n+1]
pk
[n]
1 pk − pk+1
=
.
πk,n
λ
[n]
(2.55)
[n]
3. For n ∈ N, πk,n = (pk )′ (0) − (pk+1 )′ (0) is also equal to
[n]
πk,n =
[n]
[n]
[n]
[pk , pk ]n−1 − [pk+1 , pk+1 ]n−1
[n+1]
[pk
[n+1]
, pk
]n
[n+1]
=
[pk
[n+1]
[pk
[n+1]
, pk
[n+1]
, pk
]n
]n+1
.
(2.56)
Starting from Algorithm 1 and using Proposition 2.3.7, it is not difficult to
construct an algorithm which implements both MR and CG without further
computational effort. The same can be done starting from CGNE. The
results are summarized in Algorithm 5 (6), where xk and zk are the iterates
corresponding respectively to CG (CGME) and MR (CGNE).
Once again, we address the reader to [27] for further details.
72
2. Conjugate gradient type methods
Algorithm 5 MR+CG
x0 = z0 ;
r0 = y − Az0 ;
d = r0 ;
p1 = Ar0 ;
p2 = Ad;
k = 0;
while (not stop) do
α = hrk , p1 i/kp2k2 ;
zk+1 = zk + αd;
π = krk k2 /hrk , p1 i
xk+1 = xk + πrk ;
rk+1 = rk − αp2 ;
t = Ark+1 ;
β = hrk+1, ti/hrk , p1 i;
d = rk+1 + βd;
p1 = t;
p2 = t + βp2 ;
k = k + 1;
end while
2.4
Regularization theory for the conjugate
gradient type methods
This section is entirely devoted to the study of the conjugate gradient methods for ill-posed problems. Although such methods are not regularization
methods in the strict sense of Definition 1.9.1, as we will see they preserve the
most important regularization properties and for this reason they are usually
included in the class of regularization methods. Since the results we are going
to state can nowadays be considered classic and are treated in great detail
both in [27] and in [17], most of the proofs will be omitted. The non-omitted
2.4 Regularization theory for the conjugate gradient type methods
73
Algorithm 6 CGNE+CGME
x0 = z0 ;
r0 = y − Az0 ;
d = A∗ r0 ;
p1 = d;
p2 = Ad;
k = 0;
while (not stop) do
α = kp1 k2 /kp2 k2 ;
zk+1 = zk + αd;
π = krk k2 /kp1 k2
xk+1 = xk + πp1 ;
rk+1 = rk − αp2 ;
t = A∗ rk+1 ;
β = ktk2 /kp1 k2 ;
d = t + βd;
p1 = t;
p2 = Ad;
k = k + 1;
end while
proofs and calculations will serve us to define new stopping rules later on.
We begin with an apparently very unpleasant result concerning the stability
properties of the conjugate gradient type methods.
Theorem 2.4.1. Let the self-adjoint semi-definite operator A be compact and
non-degenerate. Then for any conjugate gradient type method with parameter
[n]
n ∈ N0 and for every k ∈ N, the operator Rk = Rk that maps the data y
[n]
onto the k-th iterate xk = xk is discontinuous in X .
Moreover, even in the non compact case, Rk is discontinuous at y if and only
if E0 y belongs to an invariant subspace of A of dimension at most k − 1.
Every stopping rule for a conjugate gradient type method must take into
74
2. Conjugate gradient type methods
account this phenomenon. In particular, no a-priory stopping rule k(δ) can
render a conjugate gradient type method convergent (cf. [27] and [17]). At
first, this seems to be discouraging, but the lack of discontinuity of Rk is
not really a big problem, since it is still possible to find reliable a-posteriori
stopping rules which preserve the main properties of convergence and order
optimality.
Before we proceed with the analysis, we have to underline that the methods
with parameter n ≥ 1 are much easier to treat than those with n = 0. For
this reason, we shall consider the two cases separately.
2.4.1
Regularizing properties of MR and CGNE
As usual, we begin considering the unperturbed case first.
Proposition 2.4.1. Let y ∈ R(A) and let n1 and n2 be integers with n1 < n2
[n ]
[n ]
and [1, 1]n1 < +∞. Then [pk 2 , pk 2 ]n1 is strictly decreasing as k goes from 0
to ℓ.
This has two important consequences:
[n]
Corollary 2.4.1. If y ∈ R(A) and xk = xk are the iterates of a conjugate
gradient type method corresponding to a parameter n ≥ 1 and right-hand side
y, then
• The residual norm ky − Axk k is strictly decreasing for 0 ≤ k ≤ ℓ.
• The iteration error kx† − xk k is strictly decreasing for 0 ≤ k ≤ ℓ.
To obtain the most important convergence results, the following estimates
play a central role. We have to distinguish between the self-adjoint case and
the more general setting of Section 2.3.2. The proof of the part with the
operator AA∗ , which will turn out to be of great importance later, can be
found entirely in [17], Theorem 7.9.
Lemma 2.4.1. Let λ1,k < ... < λk,k be the the zeros of pk . Then:
2.4 Regularization theory for the conjugate gradient type methods
75
• In the self-adjoint case, for y ∈ X ,
ky − Axk k ≤ kEλ1,k ϕk (A)yk,
(2.57)
λ
with the function ϕ(λ) := pk (λ) λ1,k1,k−λ satisfying
λ2 ϕ2k (λ) ≤ 4|p′k (0)|−1,
0 ≤ ϕk (λ) ≤ 1,
0 ≤ λ ≤ λ1,k .
(2.58)
• In the general case with AA∗ , for y ∈ Y,
ky − Axk k ≤ kFλ1,k ϕk (AA∗ )yk,
with the function ϕ(λ) := pk (λ)
0 ≤ ϕk (λ) ≤ 1,
λ1,k
λ1,k −λ
21
(2.59)
satisfying
λϕ2k (λ) ≤ |p′k (0)|−1,
0 ≤ λ ≤ λ1,k .
(2.60)
This leads to the following convergence theorem.
Theorem 2.4.2.
• Suppose that A is self-adjoint and semi-definite. If
y ∈ R(A), then the iterates {xk } of a conjugate gradient type method
with parameter n ≥ 1 converge to A† y as k → +∞. If y ∈
/ R(A) and
ℓ = ∞, then kxk k → +∞ as k → +∞. If y ∈
/ R(A) and ℓ < ∞ then
the iteration terminates after ℓ steps, Axℓ = E0 y and xℓ = A† y if and
only if ℓ = 0.
• Let A satisfy the assumptions of Section 2.3.2 and let {xk } be the itera-
tes of a conjugate gradient type method with parameter n ≥ 1 applied
with AA∗ . If y ∈ D(A† ), then xk converges to A† y as k → +∞, but if
y∈
/ D(A† ), then kxk k → +∞ as k → +∞.
Theorem 2.4.2 implies that the iteration must be terminated appropriately when dealing with perturbed data y δ ∈
/ D(A† ), due to numerical instabilities.
Another consequence of Lemma 2.4.1 is the following one:
76
2. Conjugate gradient type methods
Lemma 2.4.2. Let xδk be the iterates of a conjugate gradient type method
with parameter n ≥ 1 corresponding to the perturbed right-hand side y δ and
the self-adjoint semi-definite operator A. If the exact right-hand side belongs
to R(A) and ℓ = ∞, then
lim sup ky δ − Axδk k ≤ ky − y δ k.
(2.61)
k→+∞
Moreover, if the exact data satisfy the source condition
A† y ∈ X µ2 ,ρ ,
µ > 0, ρ > 0,
(2.62)
then there exists a constant C > 0 such that
ky δ − Axδk k ≤ ky − y δ k + C|p′k (0)|−µ−1 ρ,
1 ≤ k ≤ ℓ.
(2.63)
The same estimate is obtained for the gradient type methods working with
AA∗ instead of A, but the exponent −µ − 1 must be replaced by − µ+1
.
2
Assuming the source condition (2.62), it is also possible to give an estimate for the error:
Lemma 2.4.3. Let xδk be the iterates of a conjugate gradient type method
with parameter n ≥ 1 corresponding to y δ and the self-adjoint semi-definite
operator A. If (2.62) holds, then for 0 ≤ k ≤ ℓ,
µ 1
kA† y − xδk k ≤ C kFλ1,k (y − y δ )k|p′k (0)| + ρ µ+1 Mkµ+1 ,
(2.64)
where C is a positive constant depending only on µ, and
Mk := max{ky δ − Axδk k, ky δ − yk}.
(2.65)
In the cases with AA∗ instead of A, the same is true, but in (2.64) |p′k (0)|
1
must be replaced by |p′k (0)| 2 .
We underline that in Hanke’s statement of Lemma 2.4.3 (cf. Lemma
3.8 in [27]) the term kFλ1,k (y − y δ )k in the inequality (2.64) is replaced by
ky − y δ k. This sharper estimate follows directly from the proof of the Lemma
3.8 in [27].
Combining Lemma 2.4.2 and Lemma 2.4.3 we obtain:
2.4 Regularization theory for the conjugate gradient type methods
77
Theorem 2.4.3. If y satisfies the source condition (2.62) and ky δ − yk ≤ δ,
then the iteration error of a conjugate gradient type method with parameter
n ≥ 1 associated to a self-adjoint semi-definite operator A is bounded by
kA† y − xδk k ≤ C |p′k (0)|−µρ + |p′k (0)|δ ,
1 ≤ k ≤ ℓ.
(2.66)
In the cases with AA∗ instead of A, the same estimate holds, but |p′k (0)| must
1
be replaced by |p′k (0)| 2 .
Theorem 2.4.3 can be seen as the theoretical justification of the well
known phenomenon of the semi-convergence, which is experimented in practical examples: from (2.66), we observe that for small values of k the righthand side is dominated by |p′k (0)|−µρ, but as k increases towards +∞, this
term converges to 0, while |p′k (0)|δ diverges. Thus, as usual, there is a pre-
cise value of k that minimizes the error kA† y − xδk k and it is necessary to
define appropriate stopping rules to obtain satisfying results. In the case of
the conjugate gradient type methods with parameter n ≥ 1, the Discrepancy
Principle proves to be an efficient one.
Definition 2.4.1 (Discrepancy Principle for MR and CGNE). Assume
ky δ − yk ≤ δ. Fix a number τ > 1 and terminate the iteration when, for the
first time, ky δ − Axδk k ≤ τ δ. Denote the corresponding stopping index with
kD = kD (δ, y δ ).
A few remarks are necessary:
(i) The Discrepancy Principle is well defined. In fact, due to Lemma 2.4.2,
for every δ and every y δ such that ky δ − yk ≤ δ there is always a finite
stopping index such that the corresponding residual norm is smaller
than τ δ.
(ii) Since the residual must be computed anyway in the course of the iteration, the Discrepancy Principle requires very little additional computational effort.
78
2. Conjugate gradient type methods
The following result is fundamental for the regularization theory of conjugate
gradient type methods. For MR and CGNE it was proved for the first time by
Nemirovsky in [70], our statement is taken as usual from [27], where a detailed
proof using the orthogonal polynomial and spectral theory framework is also
given.
Theorem 2.4.4. Any conjugate gradient type method with parameter n ≥ 1
with the Discrepancy Principle as a stopping rule is of optimal order, in
the sense that it satisfies the conditions of Definition 1.11.2, except for the
continuity of the operators Rk .
It is not difficult to see from the proof of Plato’s Theorem 1.11.1 that
the discontinuity of Rk does not influence the result. Thus we obtain also a
convergence result for y ∈ R(A):
Corollary 2.4.2. Let y ∈ R(A) and ky δ − yk ≤ δ. If the stopping index for
a conjugate gradient type method with parameter n ≥ 1 is chosen according
to the Discrepancy Principle and denoted by kD = kD (δ, y δ ), then
lim sup kxδkD − A† yk = 0.
δ→0
2.4.2
(2.67)
y δ ∈Bδ (y)
Regularizing properties of CG and CGME
The case of conjugate gradient type methods with parameter n = 0 is much
harder to study. The first difficulties arise from the fact that the residual
norm is not necessarily decreasing during the iteration, as the following example shows:
Example 2.4.1. Let A ∈ M2 (R) be defined by
!
τ 0
A=
,
(2.68)
0 1
!
2
τ > 0, and let x0 = 0 and y =
. Then according to Algorithm 2 we
1
!
2τ
have: r0 = y, Ar0 =
, α = 4τ5+1 and x1 = 4τ5+1 y. Therefore, if τ is
1
2.4 Regularization theory for the conjugate gradient type methods
sufficiently small, we have
2−
ky − Ax1 k = 1−
10τ
4τ +1
5
4τ +1
79
!
√
> 5 = kyk = ky − Ax0 k.
Moreover, in the ill-posed case it is necessary to restrict to the case where
the data y belongs to R(A) (and not to D(A† )):
Theorem 2.4.5. If y ∈
/ R(A) and {xk } are the iterates of CG (CGME)
then either the iteration breaks down in the course of the (ℓ + 1)-th step or
ℓ = +∞ and kxk k → +∞ as k → +∞.
However, the main problem is that there are examples showing that the
Discrepancy Principle does not regularize these methods (see [27], Section
4.2). More precisely, CG and CGME with the Discrepancy Principle as a
stopping rule may give rise to a sequence of iterates diverging in norm as δ
goes to 0. Thus, other stopping criteria have to be formulated: one of the
most important is the following.
Definition 2.4.2. Fix τ > 1 and assume ky − y δ k ≤ δ. Terminate the CG
(CGME) iteration as soon as ky δ − Axδk k = 0, or when for the first time
k
X
j=0
ky δ − Axδj k−2 ≥ (τ δ)−2 .
(2.69)
According to Proposition 2.3.7, the index corresponding to this stopping
[1]
[1]
1
rule is the smallest integer k such that [pk , pk ]02 ≤ τ δ, i.e. it is exactly the
same stopping index defined for MR (CGNE) by the Discrepancy Principle,
thus we denote it again by kD . The importance of this stopping criterion lies
in the following result.
Theorem 2.4.6. Let y satisfy (2.62) and let ky − y δ k ≤ δ. If CG or CGME
is applied to y δ and terminated after kD steps according to Definition 2.4.2,
then there exists some uniform constant C > 0 such that
1
µ
kA† y − xδkD k ≤ Cρ µ+1 δ µ+1 .
(2.70)
80
2. Conjugate gradient type methods
Thus, due to Plato’s Theorem, except for the continuity of the operator
Rk , also CG and CGME are regularization methods of optimal order when
they are arrested according to Definition 2.4.2.
We continue with the definition of another very important tool for regularizing ill-posed problems that will turn out to be very useful: the filter factors.
2.5
Filter factors
We have seen in Chapter 1 that the regularized solution of the equation (1.20)
can be computed via a formula of the type
Z
xreg (σ) = gσ (λ)dEλ A∗ y δ .
(2.71)
If the linear operator A = K is compact, then using the singular value
expansion of the compact operator the equation above reduces to
xreg (σ) =
∞
X
j=1
gσ (λ2j )λj hy δ , uj ivj
(2.72)
and the sum converges if gσ satisfies the basic assumptions of Chapter 1. If we
consider the operator U : Y → Y that maps the elements ej , j = 1, ... + ∞,
of an orthonormal Hilbert base of Y into uj , we see that for y ∈ Y
∗
U y=
+∞
X
j=1
huj , yiej .
Then, if V : X → X is defined in a similar way and Λ(ej ) := λj ej , (2.72) can
be written in the compact form
xreg (σ) = V Θσ Λ† U ∗ y δ ,
with Θσ (ej ) := gσ (λ2j )λ2j ej .
The coefficients
Φσ (λ2j ) := gσ (λ2j )λ2j ,
j = 1, ..., +∞,
(2.73)
2.6 CGNE, CGME and the Discrepancy Principle
81
are known in literature (cf. e.g. [36]) as the filter factors of the regularization
operator, since they attenuate the errors corresponding to the small singular
values λj .
Filter factors are very important when dealing with ill-posed and discrete illposed problems, because they give an insight into the way a method regularizes the data. Moreover, they can be defined not only for linear regularization methods such as Tikhonov Regularization or Landweber type methods,
but also when the solution does not depend linearly on the data, as it happens in the case of Conjugate Gradient type methods, where equation (2.71)
does not hold any more. For example, from formula (2.50) we can see that
for n = 0, 1
[n]
[n]
[n]
[n]
xk = qk−1 (A∗ A)A∗ y δ = V qk−1 (Λ2 )V ∗ V ΛU ∗ y δ = V qk−1 (Λ2 )Λ2 Λ† U ∗ y δ ,
(2.74)
so the filter factors of CGME and CGNE are respectively
[0]
[0]
(2.75)
[1]
[1]
(2.76)
Φk (λ2j ) = qk−1(λ2j )λ2j
and
Φk (λ2j ) = qk−1 (λ2j )λ2j .
Later on, we shall see how this tool can be used to understand the regularizing
properties of the conjugate gradient type methods.
2.6
CGNE, CGME and the Discrepancy Principle
So far, we have given a general overview of the main properties of the conjugate gradient type methods and a stopping rule for every method has been
defined.
In the remainder of this chapter, we shall study the behavior of the conjugate gradient type method in discrete ill-posed problems. We will proceed
as follows.
82
2. Conjugate gradient type methods
• Analyze the performances of CGNE and CGME arrested at the step
kD = kD (δ, y δ ), i.e. respectively with the Discrepancy Principle (cf.
Definition 2.4.1) and with the a-posteriori stopping rule proposed by
Hanke (cf. Definition 2.4.2). This will be the subject of the current
section.
• Give an insight of the regularizing properties of CGME and CGNE by
means of the filter factors (cf. Section 2.7).
• Analyze the performances obtained by the method with parameter n =
2 (cf. Section 2.8).
Discrete ill-posed problems are constructed very easily using P.C. Hansen’s
Regularization Tools [35], cf. the Appendix.
As an illustrative example, we consider the test problem heat(N) in our preliminary test, which will be called Test 0 below.
2.6.1
Test 0
The Matlab command
[A, b, x] = heat(N)
generates the matrix A ∈ GLN (R) (A is not symmetric in this case!), the exact solution x† and the right-hand side vector b of the artificially constructed
ill-posed linear system Ax = b. More precisely, it provides the discretization
of a Volterra integral equation of the first kind related to an inverse heat
equation, obtained by simple collocation and midpoint rule with N points
(cf. [35] and the references therein). The inverse heat equation is a well
known ill-posed problem, see e.g. [17], [61] and [62].
After the construction of the exact underlying problem, we perturb the exact
data with additive white noise, by generating a multivariate gaussian vector
E ∼ N (0, IN ), by defining a number ̺ ∈ ]0, 1[ representing the percentage
of noise on the data and by setting
bδ := b + e,
with e =
̺kbk
E.
kEk
(2.77)
2.6 CGNE, CGME and the Discrepancy Principle
Relative error history
3
83
Comparison of the optimal solutions
10
1.2
Exact solution
#
z
#
x
(CGNE,k )
1
(CGME,k )
2
10
0.8
10
(j)
0.6
x
Relative errors
CGNE
CGME
1
0.4
0
10
0.2
−1
10
0
−2
10
0
20
40
60
Iteration number
80
(a) Relative errors history
100
−0.2
0
200
400
600
800
1000
j
(b) Optimal Solutions
Figure 2.1: Test 0: relative errors (on the left) and optimal solutions (on the
right)
Here and below, 0 is the constant column vector whose components are equal
to 0 and IN is the identity matrix of dimension N × N.
Of course, from the equation above there follows immediately that δ = ̺kbk
and e ∼ N (0, δIN ). In this case, since kbk = 1.4775 and ̺ is chosen equal
to 1%, δ = 1.4775 × 10−2 .
Next, we solve the linear system with the noisy data bδ performing kM AX =
40 iteration steps of algorithm 6, by means of the routine cgne− cgme defined
in the Appendix. The parameter τ > 1 of the Discrepancy Principle is fixed
equal to 1.001. Looking at Figure 2.1 we can compare the relative errors of
CGME (red stars) and CGNE (blue circles) in the first 30 iteration steps.
Denoting with xδk the CGME iterates and with zδk the CGNE iterates we
observe:
1. The well known phenomenon of semi-convergence is present in both algorithms, but appears with stronger evidence in CGME than in CGNE.
2. If kx♯ (δ) and kz♯ (δ) are defined as the iteration indices at which, respectively, CGME and CGNE attain their best approximation of x† , the
numerical results show that kx♯ (δ) = 8 and kz♯ (δ) = 24. The correspon-
84
2. Conjugate gradient type methods
Test 0: Comparison of the solutions at k=kD
Comparison of CGNE solutions
2.5
1.2
Exact solution
(CGNE,kD)
(CGME,kD)
2
Exact solution
(CGNE,k#z )
1
(CGNE,kD)
1.5
0.8
1
x(j)
x(j)
0.6
0.5
0.4
0
0.2
−0.5
0
−1
−1.5
0
200
400
600
800
−0.2
0
1000
200
400
j
600
800
1000
j
(a) Solutions at k = kD
(b) Discrepancy and optimal solutions for
CGNE
Figure 2.2: Test 0: comparison between the solutions of CGNE and CGME
at k = kD (on the left) and between the discrepancy and the optimal solution
of CGNE (on the right).
ding relative errors are approximately equal to
ε♯x = 0.2097,
ε♯z = 0.0570,
so CGNE achieves a better approximation, although to obtain its best
result it has to perform 16 more iteration steps than CGME.
3. Calculating the iteration index defined by Morozov’s Discrepancy Principle we get kD := kD (δ, bδ ) = 15: the iterates corresponding to this
index are the solutions of the regularization methods in Definition 2.4.2
and Definition 2.4.1 (respectively the a-posteriori rule proposed by
Hanke and Morozov’s Discrepancy Principle) and the corresponding
relative errors are approximately equal to
εxD = 2.2347,
εzD = 0.0794.
Therefore, even if the stopping rule proposed by Hanke makes CGME a
regularization method of optimal order, in this case it finds a very unsatisfying solution (cf. its oscillations in the left picture of Figure 2.2).
2.6 CGNE, CGME and the Discrepancy Principle
85
Moreover, from the right of Figure 2.2 we can see that CGNE arrested
with the Discrepancy Principle gives a slightly oversmoothed solution
compared to the optimal one, which provides a better reconstruction
of the maximum and of the first components of x† at the price of some
small oscillations in the last components.
Although we chose a very particular case, many of the facts we have described
above hold in other examples as well, as we can see from the next more
significant test.
2.6.2
Test 1
Test 1
1D Test Problems
N
noise
kx♯
kz♯
kD
ε♯x
ε♯z
ε♯xD
ε♯zD
Baart
1000
0.1%
3
7
4
0.1659
0.0893
1.8517
0.1148
Deriv2
1000
0.1%
9
27
21
0.2132
0.1401
1.8986
0.1460
Foxgood
1000
0.1%
2
5
3
0.0310
0.0068
0.4716
0.0070
Gravity
1000
0.1%
6
13
11
0.0324
0.0083
1.0639
0.0104
Heat
1000
0.1%
18
37
33
0.0678
0.0174
0.8004
0.0198
I-laplace
1000
0.1%
11
38
19
0.2192
0.1856
1.7867
0.1950
Phillips
1000
0.1%
4
12
9
0.0243
0.0080
0.1385
0.0089
Shaw
1000
0.1%
6
14
8
0.0853
0.0356
0.2386
0.0474
N
noise
kx♯
kz♯
kD
ε♯x
ε♯z
ε♯xD
ε♯zD
Blur
2500
2.0%
9
12
7
0.1089
0.1016
0.1161
0.1180
Tomo
2500
2.0%
13
22
11
0.2399
0.2117
0.2436
0.2450
2D Test Problems
Table 2.1: Numerical results for Test 1.
We consider 10 different medium size test problems from [35]. The same
algorithm of Test 0 is used apart from the choices of the test problem and of
the parameters ̺, N and kM AX = 100. In all examples the white gaussian
86
2. Conjugate gradient type methods
noise is generated using the Matlab function rand and the seed is chosen equal
to 0. The results are gathered in Table 2.1.
Looking at the data, one can easily notice that the relations
kx♯ < kz♯ ,
ε♯x > ε♯z ,
kD < kz♯
hold true in all the examples considered. Thus it is natural to ask if they are
always verified or counterexamples can be found showing opposite results.
Another remark is that very often kD > kx♯ and in this case the corresponding error is very huge.
2.6.3
Test 2
The following experiment allows to answer the questions asked in Test 1 and
substantially confirms the general remarks we have made so far.
For each of the seven problems of Table 2.2 we choose 10 different values
for each of the parameters N ∈ {100, 200, ..., 1000}, ̺ ∈ {0.1%, 0.2%, ..., 1%}
and the Matlab seed ∈ {1, ..., 10} for the random components of the noise on
the exact data. In each case we compare the values of kx♯ and kz♯ with kD and
the values of ε♯x with ε♯z . The left side of the table shows how many times, for
each test problem, ε♯z < ε♯x and vice versa. The right-hand side shows how
many times, for each problem and for each method, the stopping index kD
is smaller, equal or larger than the optimal one. In this case the value of τ
has been chosen equal to 1 + 10−15 . From the results, summarized in Table
2.2, we deduce the following facts.
• It is possible, but very unlikely, that ε♯x < ε♯z (this event has occurred
only 22 times out of 7000 in Test 2 and only in the very particular test
problem foxgood, which is severely ill-posed).
• The trend that emerged in Test 0 and Test 1 concerning the relation
between kD and the optimal stopping indices kx♯ and kz♯ is confirmed in
Test 2. The stopping index kD provides usually (but not always!) a
2.6 CGNE, CGME and the Discrepancy Principle
87
Test 2
Best err. perf.
Baart
Deriv2
Foxgood
Gravity
Heat
Phillips
Shaw
Total
CGNE
CGME
1000
0
1000
978
1000
1000
1000
1000
6978
0
22
0
0
0
0
22
Stopping
kD < k♯
kD = k♯
kD > k♯
CGNE
564
372
64
CGME
0
545
455
CGNE
882
112
6
CGME
0
0
1000
CGNE
483
426
91
CGME
0
532
468
CGNE
861
118
21
CGME
0
1
999
CGNE
991
9
0
CGME
0
1
999
CGNE
751
207
42
CGME
0
48
952
CGNE
806
185
9
CGME
0
4
996
CGNE
5338
1429
233
CGME
0
1031
5869
Table 2.2: Numerical results for Test 2.
slightly oversmoothed solution for CGNE and often a noise dominated
solution for CGME.
In the problems with a symmetric and positive definite matrix A, it is also
possible to compare the results of CGNE and CGME with those obtained
by MR and CG. This was done for phillips, shaw, deriv2 and gravity and the
outcome was that CGNE attained the best performance 3939 times out of
4000, with 61 successes of MR in the remaining cases.
In conclusion, the numerical tests described above lead us to ask the following
questions:
1. The relations kx♯ < kz♯ and ε♯x > ε♯z hold very often in the cases consi-
88
2. Conjugate gradient type methods
dered above. Is there a theoretical justification of this fact?
2. The conjugate gradient methods with parameter n = 1 seem to provide
better results than those with parameter n = 0. What can we say about
other conjugate gradient methods with parameter n > 1?
3. To improve the performance of CGME one can choose a larger τ . This
is not in contrast with the regularization theory above. On the other
hand, arresting CGNE later means stopping the iteration when the
residual norm has become smaller than δ, while the Discrepancy Principle states that τ must be chosen larger than 1. How can this be
justified and implemented in practice by means of a reasonable stopping rule?
We will answer the questions above in detail.
2.7
CGNE vs. CGME
In the finite dimensional setting described in Section 2.6, both iterates of
CGME and CGNE will eventually converge to the vector x̃ := A† bδ as described in Section 2.1, which can be very distant from the exact solution
x† = A† b we are looking for, since A† is ill-conditioned. The problem is to
understand how the iterates converge to x̃ and how they reach an approximation of x† in their first steps.
First of all, we recall that xδk minimizes the norm of the error kx − x̃k in
Kk−1 (A∗ A; A∗bδ ), whereas zδk minimizes the residual norm kAx − bδ k in the
same Krylov space. Thus the iterates of CGME, converging to x̃ as fast as
possible, will be the better approximations of the exact underlying solution
x† in the very first steps, when the noise on the data still plays a secondary
role. However, being the greediest approximations of the noisy solution x̃,
they will also be influenced by the noise at an earlier stage than the iterates
of CGNE. This explains the relation kx♯ < kz♯ , which is often verified in the
numerical experiments.
2.7 CGNE vs. CGME
89
Relative error history
0.4
CGNE
CGME
0.35
Relative errors
0.3
0.25
0.2
0.15
0.1
0.05
0
2
4
6
Iteration number
8
10
12
Figure 2.3: Relative error history for CGNE and CGME with a perturbation
of the type ē = Aw̄, for the problem phillips(1000). CGME achieves the
better approximation (ε♯x = 0.0980, ε♯x = 0.1101).
Moreover, expanding the quantities minimized by the methods in terms of
the noise e, we get
kx − x̃k = kx − x† − A† ek
for CGME and
kAx − bδ k = kAx − b − ek
for CGNE. In the case of CGME, the error is amplified by the multiplication
with the matrix A† . As a consequence, in general CGME will obtain a
poorer reconstruction of the exact solution x† , because its iterates will be
more sensible to the amplification of the noise along the components relative
to the small singular values of A.
This justifies the relation ε♯x > ε♯z verified in almost all the numerical
experiments above. We observe that these considerations are based on the
remark that the components of the random vector e are approximately of
the same size. Indeed, things can change significantly if a different kind of
perturbation is chosen (e.g. the SVD components of the noise e decay like
O(λj )). To show this, consider the test problem phillips(1000), take ̺ = 5%,
define ē = Aw̄, where w̄ is the exact solution of the problem heat(1000) and
put bδ = b + ē: from the plot of the relative errors in Figure 2.3 it is clear
90
2. Conjugate gradient type methods
Residual polynomials
1.2
[1]
pk
1
p[0]
k
[1]
i,k
[0]
0.8
λ
λi,k
p(λ)
0.6
0.4
0.2
0
−0.2
0
0.1
0.2
0.3
0.4
0.5
λ
0.6
0.7
0.8
0.9
1
Figure 2.4: Residual polynomials for CGNE (blue line) and CGME (red line)
that in this case CGME obtains the best performance and it is not difficult
to construct similar examples leading to analogous results.
This example suggests that it is almost impossible to claim that a method
works better than the other one without assuming important restrictions on
A, δ and bδ and on the perturbation e. Nevertheless, a general remark can
be done from the analysis of the filter factors of the methods. In [36] P. C.
Hansen describes the regularizing properties of CGNE by means of the filter
factors, showing that in the first steps it tends to reconstruct the components
of the solution related to the low frequency part of the spectrum. The analysis
[1]
is based on the convergence of the Ritz values λi,k (the zeros of the residual
[1]
polynomial pk ) to the singular values of the operator A. From the plot of
the residual polynomials (cf. Figure 2.4) and from the interlacing properties
of their roots we can deduce that the iterates of CGME and CGNE should
behave in a similar way. The main difference is the position of the roots,
which allows us to compare the filter factors of the methods.
Theorem 2.7.1. Let A be a linear compact operator between the Hilbert
spaces X and Y and let y δ be the given perturbed data of the underlying exact
equation Ax = y. Let {λj ; uj , vj }j∈N be a singular system for A. Denote with
xδk and zkδ the iterates of CGME and CGNE corresponding to y δ respectively,
[0]
[1]
[0]
with pk and pk the corresponding residual polynomials and with Φk (λj ) and
[1]
[0]
[1]
Φk (λj ) the filter factors. Let also λi,k , i = 1, ..., k and λi,k , i = 1, ..., k be the
2.7 CGNE vs. CGME
[0]
91
[1]
zeros of pk and pk respectively.
[0]
Then, for every j such that λ2j < λ1,k ,
[0]
[1]
Φk (λj ) > Φk (λj ).
(2.78)
Proof. The filter factors of the conjugate gradient type methods are:
[n]
[n]
Φk (λj ) = qk (λ2j )λ2j ,
n = 0, 1.
[0]
We recall from the theory of orthogonal polynomials that the zeros of pk
[1]
and of pk interlace as follows:
[0]
[1]
[0]
[1]
[0]
[1]
λ1,k < λ1,k < λ2,k < λ2,k < ... < λk,k < λk,k .
Thus, writing down the residual polynomials in the form
!
k
Y
λ
[n]
pk (λ) =
1 − [n] , n = 0, 1,
λj,k
j=1
[0]
[1]
(2.79)
(2.80)
[0]
it is very easy to see that pk < pk on ]0, λ1,k ] (cf. Figure 2.4) and consequently
[0]
[1]
qk > qk
[0]
on ]0, λ1,k ].
This result is a theoretical justification of the heuristic considerations of
the beginning of this section: the iterates of CGNE filter the high frequencies
of the spectrum slightly more than the iterates of CGME. Summing up:
• Thanks to its minimization properties, CGNE works better than CGME
along the high frequency components, keeping the error small for a few
more iteration and usually achieving the better results.
• Anyway this is not a general rule (see the counterexample of this section
and the results of Test 2), because the performances of the two methods
strongly depend on the matrix A and on the vectors x† , b, e and x0 .
• Finding a particular class of problems (or data) in which CGNE always
gets the better results is maybe possible, but rather difficult.
92
2. Conjugate gradient type methods
2.8
Conjugate gradient type methods with
parameter n=2
We now turn to the question about the conjugate gradient type methods
with parameter n > 1, restricting to the case n = 2 for AA∗ .
From the implementation of the corresponding method outlined in Algorithm
7 and performed by the routine cgn2 defined in the Appendix, we can see that
the computation of a new iterate requires 4 matrix-vector multiplications at
each iteration step, against the only 2 needed by CGNE and CGME.
On the other hand, it is obvious that the same analysis of Section 2.7
Algorithm 7 CG type method with parameter n = 2 for AA∗
r0 = y − Ax0 ;
d = A∗ r0 ;
p2 = Ad;
m1 = p2 ;
m2 = A∗ m1 ;
k = 0;
while (not stop) do
α = kp2 k2 /km2 k2 ;
xk+1 = xk + αd;
rk+1 = rk − αm1 ;
p1 = A∗ rk+1 ;
t = Ap1 ;
β = ktk2 /kp2 k2 ;
d = p1 + βd;
m1 = Ad;
m2 = A∗ m1 ;
p2 = t;
k = k + 1;
end while
will suggest that this method filters the high frequency components of the
[1]
[2]
spectrum even better than CGNE, because of the relation λi,k < λi,k , valid for
all i = 1, ..., k. As a matter of fact, the phenomenon of the semi-convergence
appears more attenuate here than in the case of CGNE, as we can see e.g.
from Figure 2.5, where a plot of the relative errors of both methods in the
same assumptions of Test 0 of Section 2.6 has been displayed. Thus, the
conjugate gradient type method with parameter n = 2 is more stable than
2.8 Conjugate gradient type methods with parameter n=2
93
Relative error history
0
10
Relative errors
CGNE
CGN2
−1
10
−2
10
0
20
40
60
Iteration number
80
100
Figure 2.5: Relative error history for CGNE and the conjugate gradient type
method with parameter n = 2 in the assumptions of Test 0 of Section 2.6.
CGNE with respect to the iteration index k (exactly for the same reasons
why we have seen that CGNE is more stable than CGME). This could be an
advantage especially when the data are largely contaminated by the noise.
For example, if we consider the test problem blur(500), with ̺ = 10%, we can
see from Figure 2.6 that the optimal reconstructed solutions of both methods
are similar, but the conjugate gradient type method with parameter n = 2
attenuates the oscillations caused by the noise in the background better than
CGNE.
2.8.1
Numerical results
We compare the conjugate gradient type methods for the matrix AA∗ with
parameters 1 and 2 in the same examples of Test 2 of Section 2.6, by adding
the test problem i− laplace. From the results of Table 2.3, we can see that the
methods obtain quite similar results. The conjugate gradient type method
with parameter n = 2 usually performs a little bit better, but this advantage
is minimal: the average improvement of the results obtained by the method
with n = 2, namely the difference of the total sums of the relative errors
94
2. Conjugate gradient type methods
−1
blur500 CGN2 kN2=2 sigma=10
errN2=0.122231
−1
blur500 CGNE kZBEST=1 sigma=10
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
100
200
300
400
500
500
100
200
300
errZBEST=0.128101
400
500
Figure 2.6: Comparison between the conjugate gradient type method with
parameter n = 2 and CGNE for the test problem blur(500), with ̺ = 10%.
divided by the total sum of the relative errors of CGNE, is equal to
|683.74 − 686.74| ∼
= 0.4%.
686.74
Concerning the stopping index, we observe that in both cases the Discrepancy Principle stops the iteration earlier than the optimal stopping index
in the large majority of the considered cases. We shall return to this important topic later.
Our numerical experiments confirm the trend also for a larger noise. Performing the same test with ̺ ∈ {10−2 , 2 × 10−2 , ..., 10−1} instead of ̺ ∈
{10−3 , 2 × 10−3 , ..., 10−2 }, we obtain that the method with parameter n = 2
achieves the better relative error in 4763 cases (59.5% of the times) and the
overall sums of the relative errors are 1101.8 for n = 2 and 1115.5 for n = 1.
Thus the average improvement obtained by the method with n = 2 is 1% in
this case.
In conclusion, the conjugate gradient type method with parameter n = 2 has
nice regularizing properties: in particular, it filters the high frequency components of the noise even better than CGNE. Consequently, it often achieves
2.8 Conjugate gradient type methods with parameter n=2
95
Comparison of CG type methods: n = 1 and n = 2
Best err. perf.
Baart
Deriv2
Foxgood
Gravity
Heat
I-laplace
Phillips
Shaw
Total
Average rel. err.
n
Discrepancy Stopping
kD < k ♯
kD = k ♯
kD > k ♯
598
0.13404
n=1
564
372
64
402
0.13482
n=2
616
321
63
197
0.19916
n=1
882
112
6
803
0.19792
n=2
965
33
2
565
0.01862
n=1
483
426
91
435
0.01876
n=2
547
362
91
386
0.02179
n=1
861
118
21
614
0.02151
n=2
838
147
15
294
0.05709
n=1
991
9
0
706
0.05657
n=2
996
4
0
447
0.18091
n=1
957
30
13
553
0.18036
n=2
971
18
11
441
0.01774
n=1
751
207
42
559
0.01731
n=2
775
204
21
548
0.05739
n=1
806
185
9
452
0.05649
n=2
837
156
7
3476
0.08584
n=1
6285
1459
256
4524
0.08546
n=2
6545
1245
210
Table 2.3: Comparison between the CG type methods for AA∗ with parameters 1 and 2. We emphasize that kD is not the same stopping index here
for n = 1 and n = 2.
the better results and keeps the phenomenon of the semi-convergence less
pronounced, especially for large errors in the data. On the other hand, it is
more expensive than CGNE from a computational point of view, because it
usually performs more steps to reach the optimal solution and in each step
it requires 4 matrix-vector multiplications (against the only 2 required by
CGNE). Despite the possible advantages described above, in our numerical
tests the improvements were minimal, even in the case of a large δ. For this
reason, we believe that it should be rarely worth it, to prefer the method
with parameter n = 2 to CGNE.
96
2. Conjugate gradient type methods
Chapter 3
New stopping rules for CGNE
In the last sections of Chapter 2 we have seen that CGNE, being both efficient
and precise, is one of the most promising conjugate gradient type methods
when dealing with (discrete) ill-posed problems.
The general theory suggests the Discrepancy Principle as a very reliable stopping rule, which makes CGNE a regularization method of optimal order1 . Of
course, the stopping index of the Discrepancy Principle is not necessarily
the best possible for a given noise level δ > 0 and a given perturbed data
y δ . Indeed, as we have seen in the numerical tests of Chapter 2, it usually
provides a slightly oversmoothed solution. Moreover, in practice the noise
level is often unknown: in this case it is necessary to define heuristic stopping
rules. Due to Bakushinskii’s Theorem a method arrested with an heuristic
stopping rule cannot be convergent, but in some cases it can give more satisfactory results than other methods arrested with a sophisticated stopping
rule of optimal order based on the knowledge of the noise level.
When dealing with discrete ill-posed problems (e.g. arising from the discretization of ill-posed problems defined in a Hilbert space setting), it is very
important to rely on many different stopping rules, in order to choose the best
one depending on the particular problem and data: among the most famous
1
except for the continuity of the operator that maps the data into the k-th iterate of
CGNE, cf. Chapter 2.
97
98
3. New stopping rules for CGNE
stopping rules that can be found in literature, apart from the Discrepancy
Principle, we mention the Monotone Error Rule (cf. [24], [23], [25], [36])
and, as heuristic stopping rules, the L-curve ([35], [36]), the Generalized
Cross Validation ([17] and the references therein) and the Hanke-Raus Criterion ([27], [24]).
In this chapter, three new stopping rules for CGNE will be proposed, analyzed and tested. All these rules rely on a general analysis of the residual
norm of the CGNE iterates.
The first one, called the Approximated Residual L-Curve Criterion, is an
heuristic stopping rule based on the global behavior of the residual norms
with respect to the iteration index.
The second one, called the Projected Data Norm Criterion, is another heuristic stopping rule that relates the residual norms of the CGNE iterates to
the residual norms of the truncated singular value decomposition.
The third one, called the Projected Noise Norm Criterion, is an a-posteriori
stopping rule based on a statistical approach, intended to overcome the oversmoothing effect of the Discrepancy Principle and mainly bound to large
scale problems.
3.1
Residual norms and regularizing properties of CGNE
This section is dedicated to a general analysis of the regularizing properties
of CGNE, by linking the relative error with the residual norm in the case of
perturbed data.
In their paper [60] Kilmer and Stewart related the residual norm of the
minimal residual method to the norm of the relative error.
Theorem 3.1.1 (Kilmer, Stewart). Let the following assumptions hold:
• The matrix A ∈ GLN (R) is symmetric and positive definite, and in
the coordinate system of its eigenvectors the exact linear system can be
3.1 Residual norms and regularizing properties of CGNE
99
written in the form
Λx = b,
(3.1)
where Λ = diag{λ1 , ..., λN }, 1 = λ1 > λ2 > ... > λN > 0.
• The exact data are perturbed by an additive noise e ∈ RN such that its
components ei are random variables with mean 0 and standard deviation
ν > 0.
• For a given δ > 0, let y be the purported solution with residual norm
δ minimizing the distance from the exact solution x, i.e. y solves the
problem2
minimize
kx − yk
subject to
kbδ − Λyk2 = δ 2 .
(3.2)
If c > −1 solves the equation
N
X
i=1
e2i
= δ2,
(1 + cλ2i )2
(3.3)
then the vector y(c), with components
yi (c) := xi +
cλi ei
,
1 + cλ2i
(3.4)
is a solution of (3.2) and
2
N X
cλi ei
.
kx − y(c)k =
1 + cλ2i
i=1
2
(3.5)
Note that the solution of (3.2) is Tikhonov’s regularized solution with
parameter δ 2 > 0, where δ satisfies (3.3).
As c varies from −1 to +∞, the residual norm decreases monotonically from
+∞ to 0 and the error norm kx − y(c)k decreases from ∞ to 0 at c =
0 when δ = kek, but then increases rapidly (for further details, see the
2
here the perturbed data bδ = b + e does not necessarily satisfy kbδ − bk = δ.
100
3. New stopping rules for CGNE
considerations after Theorem 3.1 in [60]). As a consequence, choosing a
solution with residual norm smaller than kek would result in large errors, so
the theorem provides a theoretical justification for the discrepancy principle
in the case of the Tikhonov method.
However, the solution y(c) can differ significantly from the iterates of the
conjugate gradient type methods.
The following simulation shows that the results of Kilmer and Stewart cannot
be applied directly to the CGNE method and introduces the basic ideas
behind the new stopping rules that are going to be proposed.
Fix N = 1000 and p = 900 and let
Λ = diag{λ1 , ..., λp , λp+1 , ..., λN }
(3.6)
be the diagonal matrix such that
λ1 > ... > λp >> λp+1 > ... > λN > 0
and
λi ∼
(
10−2
i = 1, ..., p,
(3.8)
10−8 i = p + 1, ..., N.
Let λ be the vector whose components are the λi and



λp+1
λ1




λN −p := 
λp := 
 ...
 ...  ,
λN
λp
Accordingly, set also
e=
ep
eN −p
!
(3.7)
,
bδ = b + e =
indicate


.

bδp
bδN −p
!
,
(3.9)
(3.10)
where b = Λx† and x† is the exact solution of the test problem gravity from
P.C. Hansen’s Regularization Tools.
The left picture of Figure 3.1 shows the graphic of the residual norm of
the CGNE iterates with respect to the iteration index for this test problem
with noise level ̺ = 1%: we note that this graphic has the shape of an L.
3.1 Residual norms and regularizing properties of CGNE
Diagonal matrix test: residual norm history
−1
Diagonal matrix test: relative error history
0
10
10
−2
10
−3
1
CGNE
Discrepancy
optimal error
Relative errors
Residual norms
CGNE
Discrepancy
optimal error
10
101
−1
10
−2
2
3
4
5
6
Iteration number
7
8
(a) Residual norm history
9
10
10
0
5
10
15
Iteration number
20
25
(b) Relative error norm history
Figure 3.1: Residual and relative error norm history in a diagonal matrix
test problem with two cluster of singular values and ̺ = 1%.
In general, a similar L-shape is observed in the discrete ill-posed problems,
thanks to the rapid decay of the singular values, as described in [76] and [77].
In fact, in the general case of a non diagonal and non symmetric matrix A,
for b ∈ R(A) and kb − bδ k ≤ δ, δ > 0, combining (2.59) and (2.60) we have
kAzδk − bδ k ≤ kFλ1,k ek + |p′k (0)|−1/2 kx† k ≤ δ + |p′k (0)|−1/2 kx† k.
(3.11)
Since |p′k (0)| is the sum of the reciprocals of the Ritz values at the k-th step
and λ1,k is always smaller than λk , a very rough estimate of the residual norm
1/2
is given by δ + kx† kλk , which has an L-shape if the eigenvalues of A decay
quickly enough. Thus the residual norm curve must lie below this L-shaped
curve and for this reason it is often L-shaped too.
We consider now the numerical results of the simulation. Comparing the
solution obtained by the Discrepancy Principle (denoted by the subscript D)
with the optimal solution (denoted with the superscript ♯) we have:
• k ♯ = 5 and kD = 3 for the stopping indices;
• kbδ − Λzδk♯ k ∼ 1.32 × 10−3 and kbδ − ΛzδkD k ∼ 3.71 × 10−3 for the
residual norms;
102
3. New stopping rules for CGNE
• ε♯ ∼ 1.09 × 10−2 and εD ∼ 1.50 × 10−2 for the relative error norms (a
plot of the relative error norm history is shown in the right picture of
Figure 3.1).
Note that keN −p k ∼ 1.29 × 10−3 is very close to kbδ − Λzδk♯ k: this suggests to
stop the iteration as soon as the residual norm is lower or equal to τ keN −p k
for a suitable constant τ > 1 (instead of τ δ, as in the Discrepancy Principle).
This remark can be extended to the general case of a discrete ill-posed problem. In fact, the stopping index of the discrepancy principle is chosen large
enough so that kx† k|p′k (0)|−1/2 is lower than (τ − 1)δ in the residual norm
estimate (3.11) and small enough so that the term δ|p′k (0)|1/2 is as low as
possible in the error norm estimate (2.64) in Chapter 2. However, in the
sharp versions of these estimates with δ replaced by kFλ1,k (bδ − b)k, when k
is close to the optimal stopping index k ♯ , Fλ1,k is the projection of the noise
onto the high frequency part of the spectrum and the quantity kFλ1,k ek is a
reasonable approximation of the residual norm threshold eN −p considered in
our simulation with the diagonal matrix.
Summarizing:
• the behavior of the residual norm plays an important role in the choice
of the stopping index: usually its plot with respect to k has the shape
of an L (the so called Residual L-curve);
• the norm of the projection of the noise onto the high frequency part of
the spectrum may be chosen to replace the noise level δ as a residual
norm threshold for stopping the iteration.
3.2
SR1: Approximated Residual L-Curve Criterion
In Section 3.1 we have seen that the residual norms of the iterates of CGNE
tend to form an L-shaped curve. This curve, introduced for the first time
3.2 SR1: Approximated Residual L-Curve Criterion
103
by Reichel and Sadok in [76], differs from the famous Standard L-Curve
considered e.g. in [33], [34] and [36], which is defined by the points
(ηk , ρk ) := (kzδk k, kbδ − Azδk k),
k = 1, 2, 3, ....
(3.12)
Usually a log-log scale is used for the Standard L-Curve, i.e. instead of
(ηk , ρk ) the points (log(ηk ), log(ρk )) are considered. In the case of the Residual L-Curve, different choices are possible, cf. [76] and [77]: we shall plot the
Residual L-Curve in a semi-logarithmic scale, i.e. by connecting the points
(k, log(ρk )), k = 1, 2, 3, ....
In contrast to the Discrepancy Principle and the Residual L-Curve, the Standard L-Curve explicitly takes into account the growth of the norm of the
computed approximate solutions (cf. e.g. [36] and [87]) as k increases. In
his illuminating description of the properties of the Standard L-Curve for
Tikhonov regularization and other methods in [36], P. C. Hansen suggests
to define the stopping index as the integer k corresponding to the corner of
the L-curve, characterized by the point of maximal curvature (the so called
L-Curve Criterion).
Castellanos et al. [9] proposed a scheme for determining the corner of a discrete L-curve by forming a sequence of triangles with vertices at the points of
the curve and then determining the desired vertex of the L from the shape
of these triangles. For obvious reasons, this algorithm is known in literature
as the Triangle method.
Hansen et al. [37] proposed an alternative approach for determining the
vertex of the L: they constructed a sequence of pruned L-curves, removing
an increasing number of points, and considered a list of candidate vertices
produced by two different selection algorithms. The vertex of the L is selected from this list by taking the last point before reaching the part of the
L-curve, where the norm of the computed approximate solution starts to
increase rapidly and the norm of the associated residual vectors stagnates.
This is usually called the Pruning method or the L-Corner method.
The Standard L-Curve has been applied successfully to the solution of many
linear discrete ill-posed problems and is a very popular method for choos-
104
3. New stopping rules for CGNE
Blur(100) noise 3%
Standard L−curve
2
10
standard L−curve
triangle corner
optimal sol
1
10
0
10
−1
10
2.01
10
2.02
10
2.03
10
Figure 3.2: L-curves for blur(100), ̺ = 3%. The L-curve is simply not Lshaped.
ing the regularization parameter, also thanks to its simplicity. However, it
has some well known drawbacks, as shown by Hanke [28] and Vogel [95]. A
practical difficulty, as pointed out by Reichel et al. in [76] and in the recent
paper [77], is that the discrete points of the Standard L-Curve may be irregularly spaced, the distance between pairs of adjacent points may be small for
some values of k and it can be difficult to define the vertex in a meaningful
way. Moreover, sometimes the L-curve may not be sufficiently pronounced
to define a reasonable vertex (cf., e.g., Figure 3.2).
In their paper [76], Reichel and Sadok defined the Residual L-Curve for
the TSVD in a Hilbert space setting seeking to circumvent the difficulties
caused by the cluster of points near the corner of the Standard L-Curve and
showing that it often achieved the better numerical results. Among all heuristic methods considered in the numerical tests of [77], the Residual L-Curve
3.2 SR1: Approximated Residual L-Curve Criterion
105
proved to be one of the best heuristic stopping rules for the TSVD, but it
also obtained the worst results in the case of CGNE, providing oversmoothed
solutions.
Two reasons for this oversmoothing effect are the following:
• the Residual L-Curve in the case of CGNE sometimes presents some
kinks before getting flat, thus the corner may be found at an early
stage;
• the residual norm of the solution is often too to large at the corner
of the Residual L-Curve: it is preferable to stop the iteration as soon
as the term kx† k|p′k (0)|−1/2 is neglibible in the residual norm estimate
(3.11), i.e., when the curve begins to be flat.
In Figure 3.3 we show the results of the test problem phillips(1000), with noise
level ̺ = 0.01%. In this example, both L-curve methods fail: as expected by
Hanke in [28], the Standard L-curve stops the iteration too late, giving an
undersmoothed solution; on the other hand, due to a very marked step at an
early stage, the Residual L-Curve provides a very oversmoothed solution.
We propose to approximate the Residual L-Curve by a smoother curve.
More precisely, let npt be the total number of iterations performed by CGNE.
For obvious reasons, to obtain a reasonable plot of the L-curves, we must
perform enough iterations, i.e. npt > k ♯ . We approximate the data points
{(k, log(ρk ))}k=1,...,npt with cubic B-splines using the routine data− approx
defined in the Appendix, obtaining a new (smoother) set of data points
{(k, log(ρ̃k ))}k=1,...,npt .
We call the curve obtained by connecting the points (k, log(ρ̃k )) with straight
lines the Approximated Residual L-Curve and we denote by kL , krL and karL
the indices determined by the triangle method for the Standard L-Curve, the
Residual L-Curve and the Approximated Residual L-Curve respectively.
In Figure 3.4 we can see 2 approximate residual L-curves. Typically, the
approximation has the following properties:
106
3. New stopping rules for CGNE
The residual L−curve fails:
phillips(1000) noise 0.01% np=65
1
Standard L−curve fails:
phillips(1000) noise 0.01% np=65 zoom
10
standard L−curve
triangle corner
optimal sol
standard L−curve
triangle corner
optimal sol
−2.815
10
−2.817
10
0
10
−2.819
10
−2.821
−1
10
10
−2.823
10
−2.825
−2
10
10
−2.827
10
−2.829
−3
10
0
1
10
10
2
10
10
(a) Residual L-curve
0.477
0.479
10
10
0.481
10
0.483
10
(b) Standard L-curve
Figure 3.3: Residual L-Curve and Standard L-Curve for the test problem
phillips(1000), ̺ = 0.01%.
(i) it tends to remove or at least smooth the steps of the Residual L-Curve
when they are present (cf. the picture on the left in Figure 3.4);
(ii) when the Residual L-Curve has a very marked L-shape it tends to have
a minimum in correspondence to the plateau of the Residual L-Curve
(cf. the picture on the right in Figure 3.4);
(iii) when the Residual L-Curve is smooth the shape of both curves is similar.
As a consequence, very often we have krL < karL and karL corresponds to
the plateau of the Residual L-Curve. This should indeed improve the performances, because it allows to push the iteration a little bit further, overcoming
the oversmoothing effects described above.
We are ready to define the first of the three stopping rules (SR) for CGNE.
Definition 3.2.1 (Approximated Residual L-Curve Criterion). Consider the sequence of points (k, log(ρk )) obtained by performing npt steps
of CGNE and let (k, log(ρ̃k )) be the sequence obtained by approximating
(k, log(ρk )) by means of the routine data− approx. Compute the corners krL
3.3 SR1: numerical experiments
107
Residual L−curve and Approximated Residual L−curve
phillips(1000) noise 0.01% np=65
1
10
residual L−curve
appr. res. L−curve
tr. corn. resL
optimal sol
tr. corn. app.resL
0
10
−1
−1
10
10
−2
−3
10
10
−3
0
residual L−curve
appr. res. L−curve
tr. corn. resL
optimal sol
tr. corn. app.resL
−2
10
10
Residual L−curve and Approximated Residual L−curve
baart(1000) noise 0.01% np=10
0
10
−4
10
20
30
40
50
60
70
(a) Phillips
10
1
2
3
4
5
6
7
8
9
10
(b) Baart
Figure 3.4: Residual L-Curve and Approximated Residual L-Curve for 2
different test problems. Fixed values: N = 1000, ̺ = 0.01%, seed = 1.
and karL using the triangle method and let k0 be the first index such that
ρ̃k0 = min{ρ̃k | k = 1, ..., npt }. Then, as a stopping index for the iterations
of CGNE, choose
kSR1 := max{krL , ǩ},
ǩ = min{karL , k0 }.
(3.13)
This somewhat articulated definition of the stopping index avoids possible
errors caused by an undesired approximation of the Residual L-Curve or by
an undesired result in the computation of the corner of the Approximated
Residual L-Curve. Below, we will analyze and compare the stopping rules
defined by kL , krL and kSR1 .
3.3
SR1: numerical experiments
This section is dedicated to show the performances of the stopping rule SR1.
In all examples below, in order to avoid some problems caused by rounding
errors, the function lsqr− b from [35] has been used with parameter reorth = 1.
For more details on this function and rounding errors in the CGNE algorithm,
see Appendix D.
108
3.3.1
3. New stopping rules for CGNE
Test 1
In order to test the stopping rule of Definition 3.2.1, we consider 10 different
test problems from P.C. Hansen’s Regularization Tools [35]. For each test
problem, we fix the number npt in such a way that both the standard and the
residual L-curves can be visualized appropriately and take 2 different values
for the dimension and the noise level. In each of the possible cases, we run
the algorithm with 25 different Matlab seeds, for a total of 1000 different
examples.
In Table 3.1, for each test problem and for each couple (Ni , ̺j ), i,j ∈ {1, 2},
we show the average relative errors obtained respectively by the stopping
indices kL (Standard L-Curve), krL (Residual L-Curve) and kSR1 for all possible seeds. In round brackets we collect the number of failures, i.e. how
many times the relative error obtained by the stopping rule is at least 5
times larger than the relative error obtained by the optimal stopping index
k ♯ . We can see that stopping rule associated to the index kSR1 improves the
results of the Residual L-Curve in almost all the cases.
This stopping rule proves to be reliable also when the noise level is smaller
and the Standard L-Curve fails, as we will see below.
3.3.2
Test 2
In this example we test the robustness of the method when the noise level is
small and with respect to the number of points npt . As we have seen, this
is a typical case in which the Standard L-Curve method may fail to obtain
acceptable results.
We consider the test problems gravity, heat and phillips, with N = 1000, ̺ ∈
{1 × 10−5 , 5 × 10−5 , 1 × 10−4 , 5 × 10−4 }, seed ∈ {1, 2, ..., 25}. For each case,
we also take 3 different values of npt (the smallest one is only a little bit
larger than the optimal index k ♯ ), in order to analyze the dependence of the
methods on this particular parameter.
The results of Table 3.2 clearly show that the Approximated Residual L-
3.3 SR1: numerical experiments
109
Test 1: results
Average Rel. Err. (no. of failures) for kL , krL , kSR1
Problem[npt ]
Baart[8]
Deriv2[30]
Foxgood[6]
Gravity[20]
Heat[50]
I-Laplace[20]
Phillips[30]
Shaw[15]
Blur(50,3,1)[200]
Tomo[200]
N
N1
N2
N1
N2
N1
N2
N1
N2
N1
N2
N1
N2
N1
N2
N1
N2
N1
N2
N1
N2
̺1
̺2
0.1216(0),0.1660 (0),0.1216(0)
0.1688(0),0.2024(0),0.1676(0)
0.1156(0),0.1656(0),0.1156(0)
0.1680(0),0.1872(0),0.1660(0)
0.2024(0),0.1920(0),0.1872(0)
0.3024(0),0.2732(0),0.2632(0)
0.1488(0),0.1924(0),0.1924(0)
0.2184(0),0.2772(0),0.2268(0)
0.0104 (0),0.0308(2),0.0104 (0)
0.0704(0),0.0324(0),0.1192(2)
0.0076 (0),0.0308 (3),0.0076(0)
0.0300(0),0.0308(0),0.0400(0)
0.0732(11),0.0232(0),0.0104(0)
0.1080(4),0.0596(0),0.0388(0)
0.0176 (0),0.0220(0),0.0156(0)
0.0368(0),0.0580(0),0.0320(0)
0.2160 (0),0.0440(0),0.0436(0)
0.3296(0),0.1232(0),0.1300(0)
0.0680 (0),0.0404(0),0.0404(0)
0.0748(0),0.1124(0),0.1040(0)
0.1156 (0),0.1224(0),0.1028(0)
0.1964(0),0.1600(0),0.1584(0)
0.1904 (0),0.2164(0),0.2020(0)
0.2128(0),0.2488(0),0.2488(0)
0.1036 (23),0.0240(0),0.0236(0)
0.0908(2),0.0276(0),0.0272(0)
0.0204 (1),0.0240(0),0.0240(0)
0.0328(0),0.0248(0),0.0244(0)
0.1008(1),0.0596(0),0.0492(0)
0.1400(0),0.1680(0),0.1284(0)
0.0536(0),0.0592(0),0.0476(0)
0.0636(0),0.1676(0),0.0672(0)
0.3280(0),0.2324(0),0.2308(0)
0.3540(0),0.3536(0),0.3536(0)
0.2556 (0),0.1980(0), 0.1976(0)
0.3040(0),0.1776(0),0.1768(0)
0.6292 (0),0.3732(0),0.2768(0)
0.8228(0),0.3780(0),0.3808(0)
0.6892 (0),0.3732(0),0.3748(0)
0.6424(0),0.1776(0),0.1768(0)
Table 3.1: General test for the L-curves: numerical results. In the 1D test
problems N1 = 100, N2 = 1000, ̺1 = 0.1%, ̺2 = 1%; in the 2D test problems
N1 = 900, N2 = 2500, ̺1 = 1%, ̺2 = 5%.
Curve method is by far the best in this case, not only because it gains the
better results in terms of the relative error (cf. the sums of the relative errors for all possible seed = 1, ..., 25), but also because it is more stable with
respect to the parameter npt .
Concerning the number of failures of this example, the Standard L-Curve
fails in the 66% of the cases, the Residual L-Curve in the 24.7% and the Approximated Residual L-Curve only in the 1% of the cases. We also note that
for the Residual L-Curve and the Approximated Residual L-Curve methods
the results tend to improve for large values of npt .
110
3. New stopping rules for CGNE
P
npt:
Gravity
̺1
̺2
̺3
̺4
Test 2: results
Rel. Err. (no. of failures) for kL , krL , kSR1
Heat
Phillips
20: 0.44(19),0.13(0),0.09(0)
80: 3.39(25),0.31(0),0.30(0)
45: 1.91(25),0.49(22),0.04(0)
30: 0.49(22),0.13(0),0.06(0)
120: 1.50(25),0.31(0),0.30(0)
60: 0.87(25),0.40(18),0.04(0)
40: 0.85(25),0.13(0),0.06(0)
160: 1.73(25),0.31(0),0.30(0)
45: 0.76(25),0.40(18),0.04(0)
18: 0.36(10),0.20(0),0.13(0)
60: 3.47(25),0.39(0),0.36(0)
35: 1.25(25),0.60(25),0.05(0)
24: 0.33(7),0.20(0),0.10(0)
90: 1.47(19),0.39(0),0.36(0)
45: 0.59(25),0.60(25),0.05(0)
30: 0.54(17),0.19(0),0.13(0)
160: 1.61(22),0.36(0),0.36(0)
55: 0.63(25),0.60(25),0.05(0)
15: 0.46(6),0.27(0),0.19(0)
50: 2.33(25),0.42(0),0.39(0)
25: 1.23(25),0.60(25),0.07(0)
20: 0.18(0),0.27(0),0.19(0)
70: 1.94(23),0.40(0),0.39(0)
32: 0.95(25),0.60(25),0.07(0)
25: 0.43(4),0.27(0),0.19(0)
90: 1.46(4),0.39(0),0.39(0)
55: 0.51(23),0.60(25),0.07(0)
15: 0.32(0),0.55(0),0.28(0)
40: 3.72(25),0.88(0),0.75(0)
20: 1.30(25),0.61(5),0.60(5)
20: 0.35(0),0.55(0),0.28(0)
60: 1.55(0),0.88(0),0.46(0)
27: 0.49(2),0.61(5),0.53(2)
25: 0.60(4),0.55(0),0.28(0)
80: 1.61(0),0.88(0),0.45(0)
35: 0.56(8),0.61(5),0.53(2)
Table 3.2: Second test for the approximated Residual L-Curve Criterion:
numerical results with small values of δ.
3.4
SR2: Projected Data Norm Criterion
The diagonal matrix example and the observation of Section 3.1 suggest to
replace the classic threshold of the Discrepancy Principle kek with the norm
of the projection of the noise onto the high frequency part of the spectrum.
However, in practice a direct computation of this quantity is impossible, because the noise is unknown (only information about its norm and its stochastic distribution is usually available) and because the Ritz values are too
expensive to be calculated during the iteration.
To overcome these difficulties, we propose the following strategy, based on
the singular value decomposition of the matrix A.
Let A ∈ Mm,N (R), m ≥ N, rank(A) = N, let A = UΛV∗ be a SVD of
A and suppose that the singular values of A may be divided into a set of
large singular values λp ≤ ... ≤ λ1 and a set of N − p small singular values
λN ≤ ... ≤ λp+1 , with λp+1 < λp . If the exact data b satisfy the Discrete
Picard condition, then the SVD coefficients |u∗i b| are very small for i > p.
3.4 SR2: Projected Data Norm Criterion
Therefore, if
xTSVD
p
:=
p
X
u∗j bδ
j=1
λj
111
vj = VΛ†p U∗ bδ ,
(3.14)
with Λ†p being the pseudo inverse matrix of







Λp = 






then

λ1
...
λp
0
0
...










...

0 

0
∈ Mm,N (R),
kbδ − AxTSVD
k = kU∗ bδ − ΛV∗ xTSVD
k = kU∗ bδ − U∗p bδ k
p
p
= kU∗m−p bδ k ∼ kU∗m−p ek,
(3.15)
(3.16)
where Up , Um−p ∈ Mm (R), depending on the column vectors ui of the ma-
trix U, are defined by (u1 , .., up , 0, .., 0) and (0, .., 0, up+1, .., um ) respectively.
The right-hand side is exactly the projection of the noise onto the high frequency part of the spectrum, so we can interpret (3.16) as a relation between
the residual norm of the truncated singular value decomposition and this
quantity.
The equation (3.16) and the considerations of Section 3.1 suggest to calculate the regularized solution xTSVD
of the perturbed problem Ax = bδ up
sing the truncated singular value decomposition, by stopping the iteration of
CGNE as soon as the residual norm becomes smaller than kbδ − AxTSVD
k=
p
kU∗m−p bδ k.
The following numerical simulation on 8 problems of P.C. Hansen’s Regularization Tools confirms the statement above. We fix the dimension N = 1000,
̺ = 0.1% and the constant of the Discrepancy Principle τ = 1.001, run lsqr− b
with reorthogonalization for each problem with 25 different Matlab seeds and
compare the Discrepancy Principle solutions with those obtained by arresting
the iteration of CGNE at the first index such that the residual norm is lower
112
3. New stopping rules for CGNE
Residual thresholds for stopping CGNE
Problem
Avg. rel. err. CGNE
τ kU∗m−p♯ bδ k
τ kU∗m−p♯ ek
τδ
Opt. err.
0.1158
0.1158
0.1041
Deriv2
0.1456
0.1517
0.1442
0.1459
Foxgood
0.0079
0.0079
0.0076
0.0079
Gravity
0.0129
0.0144
0.0111
0.0124
Heat
0.0228
0.0281
0.0225
0.0228
I− Laplace
0.1916
0.1952
0.1870
0.1898
Phillips
0.0078
0.0087
0.0075
0.0086
Shaw
0.0476
0.0476
0.0440
0.0480
Baart
0.1158
Table 3.3: Comparison between different residual norm thresholds for stopping CGNE, with p as in 3.17.
or equal to τ kU∗m−p♯ bδ k, with p♯ minimizing the error of the truncated singular value decomposition:
kxTSVD
− x† k = min kxTSVD
− x† k.
p♯
j
j
(3.17)
The results, summarized in Table 3.3, show that this gives an extremely
precise solution in a very large number of cases. Moreover, the corresponding
stopping index is equal to k ♯ (the optimal stopping index of CGNE) in the
53% of the considered examples.
In the table, we also consider the results obtained by arresting the iteration
when the residual norm is lower or equal to τ kU∗m−p♯ ek. As a matter of
fact, the residual norm corresponding to the optimal stopping index is very
well approximated by this quantity in the large majority of the considered
examples.
Performing the same simulation with the same parameters except for ̺ = 1%
leads to similar results: the method based on the optimal solution of the
TSVD obtains the better performance in 86 cases on 200 and the worse
performance only 24 times and in a very large number of examples (45%) its
stopping index is equal to k ♯ .
These considerations justify the following heuristic stopping rule for CGNE.
3.4 SR2: Projected Data Norm Criterion
113
Definition 3.4.1 (Projected Data Norm Criterion). Let zδk be the iterates of CGNE for the perturbed problem
Ax = bδ .
(3.18)
Let A = UΛV∗ be a SVD of A. Let p be a regularization index for the
TSVD relative to the data bδ and to the matrix A and fix τ > 1. Then stop
CGNE at the index
kSR2 := min{k ∈ N | kbδ − Azδk k ≤ τ kU∗m−p bδ k}.
3.4.1
(3.19)
Computation of the index p of the SR2
Obviously, in practice the optimal index of the truncated singular value decomposition is not available, since the exact solution is unknown. However,
for discrete ill-posed problems it is well known that a good index can be
chosen by analyzing the plot of the SVD coefficients |ui ∗ bδ | of the perturbed
problem Ax = bδ . The behavior of the SVD coefficients in the case of white
Gaussian noise given by e ∼ N(0, ν 2 Im ), ν > 0, is analyzed by Hansen in
[36]: as long as the unperturbed data b satisfy the discrete Picard condition,
the coefficients |u∗i b| decay on the average to 0 at least as fast as the singular
values λi . On the other hand, the coefficients |u∗i bδ | decay, on the average,
only for the first small values of i, because for large i the noisy components
u∗i e dominate, thus after a certain critical index ibδ they begin level off at
the level
E(u∗i e) = ν.
(3.20)
To compute a good index p for the truncated singular value decomposition,
one must also consider machine errors if the last singular values are very
small: as pointed out in [36], pg. 70 − 71, the number of terms that can be
safely included in the solution is such that:
p ≤ min{iA , ibδ },
(3.21)
where iA is the index at which λi begin to level off and ibδ is the index at
which |u∗i bδ | begin to level off. The value iA is proportional to the error
114
3. New stopping rules for CGNE
Picard plot Shaw(200)
gaussian noise 0.1%
15
Picard plot Phillips(200)
gaussian noise 0.1%
4
10
10
σi
|uTi b|
10
10
2
10
σi
|uTi b|
5
10
|uTi b|/σi
0
|uTi b|/σi
10
0
10
−2
10
−5
10
−4
10
−10
10
−6
10
−15
10
−20
10
0
−8
50
100
i
(a) Shaw(200)
150
200
10
0
50
100
i
150
200
(b) Phillips(200)
Figure 3.5: Plot of the singular values λi (blue cross), of the SVD coefficients
|u∗i bδ | (red diamond) and of the ratios |u∗i bδ |/λi (green circle) of 2 different
test problems with perturbed data: ̺ = 0.1%, seed = 0.
present in the matrix A (i.e. model error) while the value ibδ is proportional
to the errors present in the data bδ (i.e. noise error). Although very often
a visual inspection of the plot of the coefficients in a semi-logarithmic scale
is enough to choose the index p, defining an algorithm to compute the index
p automatically is not easy, because the decay of the SVD coefficients may
not be monotonic and indeed is often affected by outliers (cf. Figure 3.5).
A rule based on the moving geometric mean has been proposed in [32] and
implemented in the file picard.m of [35]. Here we suggest to use the Modified
Min− Max Rule, defined in Section C.4 of the Appendix.
3.5
SR2: numerical experiments
We test the stopping rule of Definition 3.4.1 in 4000 different examples. For
each of 8 test problems we choose 2 values of N, 10 values of ̺ and 25 values
of seed. We compare the results obtained by the stopping index kSR2 with
those obtained by the Discrepancy Principle. In Table 3.4 we summarize the
main results of the test.
3.5 SR2: numerical experiments
115
Numerical test for SR2
Problem
Dim
Avg. rel. err. CGNE
Failures
kSR2
kD
k♯
εSR2 >5(10)ε♯
kSR2 vs. kD (kSR2 = k ♯ )
Baart
100
0.1742
0.1750
0.1469
0(0)
20,201,29 (135)
Baart
1000
0.1612
0.1586
0.1357
0(0)
8,234,8 (89)
Deriv2
100
0.3533
0.2352
0.2242
7(1)
100,11,139 (60)
Deriv2
1000
0.2596
0.1992
0.1874
9(0)
146,32,72 (57)
Foxgood
100
0.0390
0.0312
0.0232
7(1)
41,146,63 (130)
Foxgood
1000
0.0267
0.0270
0.0166
4(0)
14,226,10 (96)
Gravity
100
0.0332
0.0349
0.0276
0(0)
146,24,80 (115)
Gravity
1000
0.0236
0.0256
0.0192
0(0)
81,163,6 (67)
Heat
100
0.1065
0.0921
0.0803
0(0)
148,4,98 (62)
Heat
1000
0.0530
0.0598
0.0476
0(0)
215,14,21 (79)
I-laplace
100
0.1360
0.1370
0.1242
0(0)
91,116,43 (114)
I-laplace
1000
0.2140
0.2134
0.2004
0(0)
84,122,44 (13)
Phillips
100
0.0311
0.0244
0.0215
0(0)
97,15,138 (74)
Phillips
1000
0.0183
0.0200
0.0156
0(0)
138,102,10 (74)
Shaw
100
0.0890
0.0973
0.0760
0(0)
96,112,42 (82)
Shaw
1000
0.0724
0.0635
0.0520
0(0)
22,168,60 (53)
Table 3.4: Comparison between the numerical results obtained by kSR2 , kD
and the optimal index k ♯ .
The constant τ , chosen equal to 1.001 when N = 100 and equal to 1.005
when N = 1000, is always the same for kSR2 and kD . The columns 3, 4 and 5
of the table collect the average relative errors for all values of ̺ = 10−3 , 2 ×
10−3 , ..., 10−2 and seed = 1, ..., 25 obtained by kSR2 , kD and k ♯ respectively.
The column 6 contains the number of times the relative error corresponding
to kSR2 , denoted by εSR2 , is larger than 5(10) times the optimal error ε♯ . The
numbers in the last column count how many times εSR2 > εD , how many
times εSR2 = εD and how many times εSR2 < εD respectively. Finally, the
fourth number in round brackets counts how many times εSR2 = ε♯ .
The results clearly show that the stopping rule is very reliable for discrete illposed problems of medium size. It is remarkable that in the 4000 examples
considered it failed (that is, εSR2 > 10ε♯ ) only twice (cf., e.g., the results
116
3. New stopping rules for CGNE
obtained by the heuristic stopping rules in [77], Table 2). Moreover, in many
cases it even improves the results of the Discrepancy Principle, which is based
on the knowledge of the noise level.
3.6
Image deblurring
One of the most famous applications of the theory of ill-posed problems is
to recover a sharp image from its blurry observation, i.e. image deblurring.
It frequently arises in imaging sciences and technologies, including optical,
medical, and astronomical applications and is crucial for allowing to detect
important features and patterns such as those of a distant planet or some
microscopic tissue.
Due to its importance, this subject has been widely studied in literature:
without any claim to be exhaustive, we point out at some books [10], [38],
[48], [100], or chapters of books [4], [96] dedicated to this problem.
In most applications, blurs are introduced by three different types of physical
factors: optical, mechanical, or medium-induced, which could lead to familiar
out-of-focus blurs, motion blurs, or atmospheric blurs respectively. We refer
the reader to [10] for a more detailed account on the associated physical
processes.
Mathematically, a continuous (analog) image is described by a nonnegative
function f = f (x) on R2 supported on a (rectangular) 2D domain Ω and the
blurring process is either a linear or nonlinear operator K acting on the some
functional space. Since we shall focus only on linear deblurring problems, K
is assumed to be linear.
Among all linear blurs, the most frequently encountered type is the shift
invariant blur, i.e. a linear blur K = K[f ] such that for any shift vector y ∈
R2 ,
g(x) = K [f (x)]
=⇒
g(x − y) = K [f (x − y)] , x ∈ Ω.
(3.22)
3.6 Image deblurring
117
It is well known in signal processing as well as system theory [72] that a
shift-invariant linear operator must be in the form of convolution:
Z
g(x) = K[f ](x) = κ ∗ f (x) =
κ(x − y)f (y)dy,
(3.23)
Ω
for some suitable kernel function κ(x), or the point spread function (PSF).
The function g(x) is the blurred analog image that is converted into a digital
image through a digitalization process (or sampling).
A digital image is typically recorded by means of a CCD (charge-coupled
device), which is an array of tiny detectors (potential wells), arranged in a
rectangular grid, able to record the amount, or intensity, of the light that
hits each detector.
Thus, a digital grayscale image
G = (gj,l),
j = 1, ..., J, l = 1, ..., L
(3.24)
is a rectangular array, whose entries represent the (nonnegative) light intensities captured by each detector.
The PSF is described by a matrix H = (hj,l ) of the same size of the image,
whose entries are all zero except for a very small set of pixels (j, l) distributed
around a certain pixel (jc , lc ) which is the center of the blur. Since we are
assuming spatial invariant PSFs, the center of the PSF corresponds to the
center of the 2D array.
In some cases the PSF can be described analytically and H can be constructed from a function, rather than through experimentation (e.g. the
horizontal and vertical motion blurs are constructed in this way).
In other cases, the knowledge of the physical process that causes the blur
provides an explicit formulation of the PSF. In this case, the elements of
the PSF array are given by a precise mathematical expression: e.g. the
out-of-focus blur is given by the formula
hj,l =
(
1
πr 2
0
if (j − jc )2 + (l − lc )2 ≤ r 2 ,
otherwise,
(3.25)
118
3. New stopping rules for CGNE
where r > 0 is the radius of the blur.
For other examples, such as the blur caused by atmospheric turbulence or
the PSF associated to an astronomical telescope, we refer to [38] and the
references therein.
As a consequence of the digitalization process, the continuous model described by (3.23) has to be adapted to the discrete setting as well. To do
this, we consider first the 1D case
Z
g(t) = κ(t − s)f (s)ds.
(3.26)
To fix the ideas, we assume that J is even and that the function f (s) is
defined in the interval [− J−1
, J−1
]. Let
2
2
sj = −
J −1
+ j − 1,
2
j = 1, ..., J
(3.27)
be the J points in which the interval is subdivided and discretize κ and f
in such a way that κ(s) = κ(sj ) = hj if |s − sj | <
1
2
or s = sj +
1
2
and
analogously for κ. Approximating (3.26) with the trapezoidal rule
g(t) ∼
=
J
X
j ′ =1
κ(t − sj ′ )f (sj ′ )
(3.28)
and recomputing in the points sj , we obtain the components of the discretized
version g of the function g:
gj =
J
X
j ′ =1
κ(sj − sj ′ )f (sj ′ ),
j = 1, ...J.
(3.29)
As a consequence of the assumptions we have made, (3.29) can be rewritten
into
gj =
J
X
j ′ =1
hj−j ′+ J fj ′ ,
2
j = 1, ...J,
(3.30)
which is the componentwise expression of the discrete convolution between
the column vectors h = (hj ) and f = (fj ). We observe that some terms in
the sum in the right-hand side of (3.30) may be not defined: this happens
3.6 Image deblurring
119
because the support of the convolution between κ and f is larger than the
supports of κ and f . The problem is solved by extending the vector h to the
larger vector

h− J +1
2

 ...


 h0

h̃ = 
 h

 hJ+1

 ...

hJ+ J
2







,






j=−
hj = hj+J ,
J
J
+ 1, ...,
2
2
(3.31)
and substituting h with h̃ in (3.30), which is equivalent to extend κ periodically on the real line. The convolution (3.30) may also be expressed in the
form
g = Af,
A = (ai,j ),
ai,j = hi−j+ J ,
i, j = 1, ..., J.
2
(3.32)
In the 2D case, proceeding in an analogous way, we get
gj,l =
J X
L
X
j ′ =1 l′ =1
hj−j ′ + J ,l−l′ + L fj ′ ,l′ .
2
2
(3.33)
The equation above (3.33) is usually written in the form
G = H ∗ F,
(3.34)
where G, H and F are here matrices in MJ,L (R). If g and f are the column
vectors obtained by concatenating the columns of G and F respectively, then
(3.34) can be rewritten in the form
g = Af.
(3.35)
In the case of an image f with 1024 × 1024 pixels, the system (3.35) has
then more than one million unknowns. For generic problems of this size,
the computation of the singular value decomposition is not usually possible.
120
3. New stopping rules for CGNE
However, if we set N := JL, then A ∈ MN (R) is a circulant matrix with
circulant blocks (BCCB). It is well known (cf. e.g. [4] or [38]) that BCCB
matrices are normal (that is, A∗ A = AA∗ ) and that may be diagonalized by
A = Φ∗ ΛΦ,
(3.36)
where Φ is the two-dimensional unitary matrix unitary Discrete Fourier
Transform (DFT) matrix.
We recall that the DFT of a vector z ∈ CN is the vector ẑ whose components
are defined by
N
1 X
′
ẑj = √
zj ′ e−2πı(j −1)(j−1)/N
N j ′ =1
(3.37)
and the inverse DFT of a vector w ∈ CN is the vector w̃, whose components
are calculated via the formula
N
1 X
′
w̃j = √
wj ′ e2πı(j −1)(j−1)/N .
N j ′ =1
(3.38)
The two-dimensional DFT of a 2D array can be obtained by computing the
DFT of its columns, followed by a DFT of its rows. A similar approach is
used for the inverse two-dimensional DFT. The DFT and inverse DFT computations can be written as matrix-vector multiplication operations, which
may be computed by means of the FFT algorithm, see e.g. [12], [94] and
Matlab’s documentation on the routines fft, ifft, fft2 and ifft2. In general, the
speed of the algorithms depends on the size of the vector x: they are most
efficient if the dimensions have only small prime factors. In particular, if N
is a power of 2, the cost is N log2 (N).
Thus, matrix-vector multiplications with Φ and Φ∗ may be performed quickly
without constructing the matrices explicitly and since the first column of Φ
is the vector of all ones scaled by the square root of the dimension, denoting
with a1 and φ1 the first column of A and Φ respectively, it follows that
1
Φa1 = Λφ1 = √ λ,
N
(3.39)
3.7 SR3: Projected Noise Norm Criterion
121
where λ is the column vector of the eigenvalues of A.
As a consequence, even in the 2D case, the spectral decomposition of the
matrix A can be calculated with a reasonable computational effort, so it is
possible to apply the techniques we have seen in Chapter 3 for this particular
problem. In particular, we shall be able to compute the SVD coefficients
(Fourier coefficients) |φ∗j g| and the ratios |φ∗j g|/λj , and arrest the CGNE
method according to the stopping rule of Definition 3.4.1.
Moreover, when the spectral decomposition of A is known, matrix-vector
multiplications can be performed very efficiently in Matlab using the DFT.
For example, as shown in [38], given the PSF matrix H and an image F:
• to compute the eigenvalues of A, use
S = fft2(fftshift(H));
(3.40)
• to compute the blurred image G = H ∗ F, use3
G = real(ifft2(H ⊚ fft2(F))).
3.7
(3.41)
SR3: Projected Noise Norm Criterion
In this section we define an a-posteriori stopping rule, based on a statistical
approach, suited mainly for large scale problems. The aim is again to approximate the norm of the projection of the noise onto the high frequency
components
kU∗m−p ek,
(3.42)
where p ≥ 0 is the number of the low frequency components of the problem.
We assume that p can be determined by means of an algorithm: for example,
in the case of image deblurring, the algorithm of Section 3.4 can be applied.
We simply consider a modified version of the Discrepancy Principle with δ
replaced by the expected value of kU∗m−p ek:
3
the operation ⊚ is the componentwise multiplication of two matrices.
122
3. New stopping rules for CGNE
Definition 3.7.1 (Projected Noise Norm Criterion). Suppose the matrix A ∈ Mm,N (R) has m − p noise-dominated SVD coefficients. Fix τ > 1
and stop the iteration of CGNE as soon as the norm of the residual is lower
p
or equal to τ δ̄ with δ̄ := δ (m − p)/m:
kSR3 := min{k ∈ N | kAzδk − bδ k ≤ τ δ̄}.
(3.43)
Note that the definition does not require a SVD of the matrix A, but
only a knowledge about p.
The following result provides a theoretical justification for the definition
above: if m is big, with high probability δ̄ is not smaller than kU∗m−p ek.
Theorem 3.7.1. Let ǫ1 > 0 and ǫ2 > 0 be small positive numbers and let α
∈ (0, 1/2). Then there exists a positive integer m̄ = m̄(ǫ1 , ǫ2 , α) such that for
every m > m̄ the estimate
P kU∗m−p ek2 − ǫ1 > δ̄ 2 ≤ ǫ2
(3.44)
holds whenever the following conditions are satisfied:
(i) p ≤ αm;
(ii) e ∼ N (0, ν 2 Im ), ν > 0;
(iii) δ 2 = kek2 .
Before proving the theorem, a few remarks:
• The theorem is based on the simple idea that if e ∼ N (0, ν 2Im ), then
U∗ e has the same distribution: this argument fails if the perturbation
has a different distribution!
• In principle it is possible, but maybe hard, to calculate m̄ in terms of
ǫ1 , ǫ2 and α. For this reason we do not recommend this stopping rule
for the small and medium size problems of Section 3.4.
3.7 SR3: Projected Noise Norm Criterion
123
• The assumption on p restricts the validity of the statement: luckily
enough, in most cases p is small with respect to m. When this does
not happen (p >> m/2), the choice p = m/2 should improve the
performances anyway.
Proof. Fix ǫ1 > 0, ǫ2 > 0 and α ∈ (0, 1/2). Let A = UΛV∗ be an SVD of
A and let p ∈ {1, ..., m − 1}. We observe that if e ∼ N (0, ν 2 Im ), ν > 0,
then U∗ e ∼ N (0, ν 2 Im ), so U∗m−p e ∼ (0, ..., 0, ep+1, ..., em ). Thus, if p, e and
δ satisfy (i), (ii) and (iii) respectively, there holds:
δ 2 (m − p)
∗
2
P kUm−p ek −
> ǫ1 = P
m
m
X
δ 2 (m − p)
> ǫ1
e2i −
m
i=p+1
!
.
(3.45)
2
For 0 < t < min{1/(2pν )}, Markov’s inequality and our assumptions (ii)
and (iii) yield
m
X
δ 2 (m − p)
P
e2i −
> ǫ1
m
i=p+1
=P p
m
X
i=p+1
"
e2i − (m − p)
= P exp t p
m
X
e2i
i=p+1
!
p
X
e2i > mǫ1
i=1
− (m − p)
"
≤ exp[−tmǫ1 ]E exp t p
m
X
i=p+1
2 − m−p
2
= exp[−tmǫ1 ](1 − 2tsν )
p
X
i=1
e2i −
!
e2i
!#
p
X
i=1
e2i
!
> exp[tmǫ1 ]
(3.46)
!#!
p
(1 + 2t(m − p)ν 2 )− 2 ,
where in the last equality we have used the assumption (ii) and the fact that
if X is a random variable with gaussian distribution X ∼ N (0, ν 2 ), then, for
every a < 1/(2ν 2 ),
1
E(exp[aX 2 ]) = (1 − 2aν 2 )− 2 .
Putting w := 2tν 2 the right-hand side of (3.46) can be rewritten as
m−p
p
−mǫ1 w
exp
(1 − sw)− 2 (1 + (m − p)w)− 2 .
2
2ν
(3.47)
(3.48)
124
If p ≤
3. New stopping rules for CGNE
√
m, the expression above can be made < ǫ2 choosing w close enough
√
to 1/p for all m sufficiently large. On the other hand, if m < p ≤ αm and
w = o( m1 ), a Taylor expansion of the second and the third factor gives
mǫ1 w m − p
p2 w 2
exp −
+
(sw
+
+ O(p3w 3 ))
2ν 2
2
2
(m − p)2 w 2
p
3 3
(m − p)w +
+ O((m − p) w )
−
2
2
mǫ1 w p(m − p)(m − 2p)w 2
= exp −
−
(1 + o(1)) ,
2ν 2
4
(3.49)
thus it is possible to choose w (e.g., w ≈ 1/(mp1/4 )) in such a way that the
last expression can be made arbitrarily small for all m sufficiently large.
Summing up, we have proved that there exists m̄ ∈ N such that for all m > m̄
and for all p < αm there holds
inf{t > 0 | exp[−tmǫ1 ](1 − 2tsν 2 )−
m−p
2
p
(1 + 2t(m − p)ν 2 )− 2 } ≤ ǫ2
(3.50)
and according to (3.45) and (3.46) this completes the proof.
3.8
Image deblurring: numerical experiments
This section is dedicated to show the performances of the stopping rules SR2
and SR3 (with p as in the SR2) for the case of image deblurring.
In all examples below, we are given an exact image F and a PSF H. Assuming periodic boundary conditions on the matrix A corresponding to H and
perturbing G := H ∗ F with white gaussian noise, we get a blurred and noisy
image Gδ = G + E, where E ∈ MJ (R) is the noise matrix. The problem
fits the framework of Section 3.6: as a consequence, the singular values and
the Fourier coefficients are computed by means of the 2D Discrete Fourier
Transform fft2 using the routine fou− coeff from the Appendix. The stopping
rules SR2 and SR3 can be applied with δ = kek = kgδ − Afk, where the
vectors e, gδ and f are the columnwise stacked versions of the matrices E,
Gδ and F respectively.
3.8 Image deblurring: numerical experiments
125
blurred image
Exact image
20
20
40
40
60
60
80
80
100
100
120
120
140
140
160
160
180
180
200
20
40
60
80
100
120
140
(a) Exact image.
160
180
200
200
20
40
60
80
100
120
140
160
180
200
(b) Perturbed image: stdev = 2.0, ̺ = 1.0%.
Figure 3.6: The exact data F and a perturbed data Gδ for the gimp test
problem.
We fix τ = 1.001, run cgls− deb, a modified version of the function cgls from
the Regularization Tools and compare the results obtained by the Discrepancy Principle, SR2 and SR3 with the optimal solution.
3.8.1
Test 1 (gimp test problem)
The dimension of the square matrix F ∈ MJ (R) corresponding to the image
gimp.png, shown on the left in Figure 3.6, is given by J = 200. The algorithm
blurring, described in the Appendix, generates a Gaussian PSF H and a
blurred and noisy image.
We consider different values for the standard deviation of the Gaussian PSF
and for the noise level, and compare the results obtained by stopping CGNE
with the discrepancy principle, SR2 and SR3.
We underline that in almost all the considered problems the computation of
the index p of the stopping rule SR2 is made using only the first N/2 = J 2 /2
Fourier coefficients, to spare time in the minimization process of the function
mod− min− max. When stdv = 1.0 and ̺ = 0.1, 0.5, all the Fourier coefficients
are used, since in these cases they are necessary for calculating a good value
126
3. New stopping rules for CGNE
CGNE results for the gimp test problem
stdev
Relative error (stopping index)
̺
Discr. Princ.
SR2
SR3
Opt. Sol.
3.0
0.1%
0.2104(154)
0.2082(196)
0.2053(313)
0.2048(348)
3.0
0.5%
0.2257(49)
0.2259(48)
0.2194(86)
0.2182(112)
3.0
1.0%
0.2331(34)
0.2286(45)
0.2264(61)
0.2261(68)
3.0
5.0%
0.2733(13)
0.3079(7)
0.2613(17)
0.2583(22)
2.0
0.1%
0.1265(144)
0.1703(61)
0.1196(213)
0.1167(310)
2.0
0.5%
0.1846(36)
0.1762(54)
0.1612(96)
0.1599(109)
2.0
1.0%
0.1962(20)
0.1962(20)
0.1878(36)
0.1849(53)
2.0
5.0%
0.2199(7)
0.2228(6)
0.2160(9)
0.2149(11)
1.0
0.1%
0.0442(45)
0.0419(50)
0.0350(68)
0.0344(92)
1.0
0.5%
0.0725(15)
0.0652(22)
0.0636(28)
0.0636(28)
1.0
1.0%
0.0865(9)
0.0797(13)
0.0781(15)
0.0780(16)
1.0
5.0%
0.1244(4)
0.1435(3)
0.1194(5)
0.1194(5)
Table 3.5: Comparison between different stopping rules of CGNE for the
gimp problem.
of p.
The numerical results of Table 3.5 show that the a-posteriori stopping rule
SR3 always finds a relative error lower than that obtained by the Discrepancy
Principle. The heuristic stopping rule SR2 gives very good results apart
from some cases where it provides an over-regularized solution. Since the
performance of SR3 is excellent here and changing the Matlab seed does
not make things significantly different (i.e. the statistical approximation of
kU∗m−p ek seems to be very solid in such a large-size problem), we deduce
that the approximation of the residual norm of the TSVD solution with the
norm of the projection of the noise onto the high frequency components
kgδ − AfpTSVD k ∼ kU∗m−p ek
is not very appropriate in these cases.
3.8 Image deblurring: numerical experiments
127
blurred image
exact image
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
50
100
150
200
250
300
350
(a) Exact image.
400
450
500
50
100
150
200
250
300
350
400
450
500
(b) Perturbed image: stdev = 2.0, ̺ = 1.0%.
Figure 3.7: The exact data F and a perturbed data Gδ for the pirate test
problem.
3.8.2
Test 2 (pirate test problem)
The image pirate.tif, shown on the left in Figure 3.7, has a higher resolution
than gimp.png: J = 512.
We proceed as in the gimp test problem, but with a few variations:
• instead of the values 1.0, 2.0, 3.0 for the stdev we consider the values
3.0, 4.0, 5.0;
• To compute the index p of the stopping rules SR2 and SR3, instead of
considering the first N/2 ratios ϕi = |φ∗i gδ |/λi , we take only the first
N/4. Moreover, in the computation of the curve that approximates
the ϕi , we use the function data− approx with only ⌊N/200⌋ inner knots
(instead of ⌊N/50⌋).
The results, summarized in Table 3.6, show that both SR2 and SR3 give
excellent results, finding a relative error lower than the Discrepancy Principle
in almost all cases. The phenomenon observed in the gimp test problem
concerning SR2 appears to be much more attenuate here.
128
3. New stopping rules for CGNE
CGNE results for the pirate test problem
stdev
Relative error (stopping index)
̺
Discr. Princ.
SR2
SR3
Opt. Sol.
5.0
0.1%
0.1420(171)
0.1420(170)
0.1384(330)
0.1375(484)
5.0
0.5%
0.1524(49)
0.1519(52)
0.1481(90)
0.1472(122)
5.0
1.0%
0.1579(29)
0.1551(40)
0.1529(57)
0.1524(69)
5.0
5.0%
0.1746(9)
0.1746(9)
0.1697(14)
0.1686(18)
3.0
0.1%
0.1076(104)
0.1051(153)
0.1032(233)
0.1029(280)
3.0
0.5%
0.1187(31)
0.1161(42)
0.1141(61)
0.1140(69)
3.0
1.0%
0.1251(18)
0.1251(18)
0.1204(31)
0.1198(39)
3.0
5.0%
0.1425(6)
0.1405(7)
0.1385(9)
0.1382(10)
4.0
0.1%
0.1268(142)
0.1260(158)
0.1226(290)
0.1219(397)
4.0
0.5%
0.1379(40)
0.1408(29)
0.1343(65)
0.1329(98)
4.0
1.0%
0.1432(24)
0.1418(28)
0.1389(43)
0.1385(53)
4.0
5.0%
0.1591(8)
0.1566(10)
0.1549(13)
0.1547(14)
Table 3.6: Comparison between different stopping rules of CGNE for the
pirate test problem.
3.8.3
Test 3 (satellite test problem)
CGNE results for the satellite problem
Discr. Princ.
SR2
SR3
Opt. Sol.
Stopping index
23
34
28
34
Relative Error
0.3723
0.3545 0.3592
0.3545
Table 3.7: Comparison between different stopping rules of CGNE for the
satellite problem.
The data for this example were developed at the US Air Force Phillips
Laboratory and have been used to test the performances of several available
algorithms for computing regularized nonnegative solutions, cf. e.g. [29] and
[5]. The data consist of an (unknown) exact gray-scale image F, a space
invariant point spread function H and a perturbed version Gδ of the blurred
image G = H ∗ F, see Figure 3.8. All images F, H and Gδ are 256 × 256
3.8 Image deblurring: numerical experiments
129
Exact image for the satellite problem
Perturbed data for the satellite problem
50
50
100
100
150
150
200
200
250
250
50
100
150
200
250
(a) Exact image
50
100
150
200
250
(b) Perturbed image
Figure 3.8: The exact data F and the perturbed data Gδ for the satellite
problem.
matrices with nonnegative entries.
The results, gathered in Table 3.7 show that both SR2 and SR3 improve the
results of the Discrepancy Principle in this example and the stopping index
of SR2 coincides with the optimal stopping index k ♯ .
3.8.4
The new stopping rules in the Projected Restarted
CGNE
These stopping rules may work very well also when CGNE is combined with
other regularization methods. To show this, we consider CGNE as the inner
iteration of the Projected Restarted Algorithm from [5]. Using the notations
of Chapter 2, it is a straightforward exercise to prove that Algorithm 1 of [5]
is equivalent to the following scheme:
• Fix f̃ (0) = f (0) = 0 and i = 0.
• For every i = 0, 1, 2, ..., if f (i) does not satisfy the Discrepancy Principle
(respectively, SR2, SR3), compute f̃ (i+1) as the regularized solution of
CGNE applied to the system Af = gδ with initial guess f̃ (i) arrested
130
3. New stopping rules for CGNE
Projected restarted CGNE with discrepancy principle
outer step:20
Projected restarted CGNE solution with SR2
outer step: 20
50
50
100
100
150
150
200
200
250
250
50
100
150
200
250
(a) Discrepancy principle: rel. err. 0.3592.
50
100
150
200
250
(b) SR2: rel. err. 0.3302.
Figure 3.9: The solutions of the satellite problem reconstructed by the Projected Restarted CGNE algorithm at the 20-th outer step.
with the Discrepancy Principle (respectively, SR2, SR3) and define
f (i+1) as the projection of f̃ (i+1) onto the set W of nonnegative vectors
W := f ∈ RN | f ≥ 0 .
(3.51)
• Stop the iteration as soon as f (i) satisfies the Discrepancy Principle
(respectively, SR2, SR3) or a prefixed number of iteration has been
carried out.
An implementation of this scheme for the satellite test problem with τ =
1.001 leads to the results of Table 3.8. The relative errors obtained by arresting CGNE according to SR2 and SR3 are smaller than those obtained
by means of the Discrepancy Principle. We underline that the relative errors of the stopping rule SR2 are even lower than those obtained in [5] with
RRGMRES instead of CGNE. The regularized solutions obtained by this
projected restarted CGNE with the Discrepancy Principle and SR2 as stopping rules at the 20-th outer iteration step are shown in Figure 3.9.
3.8 Image deblurring: numerical experiments
131
Projected Restarted CGNE results for the satellite problem
2 outer steps
5 outer steps
10 outer steps
20 outer steps
50 outer steps
200 outer steps
Discr. Princ.
SR2
SR3
total CGNE steps
27
45
33
Relative Error
0.3644
0.3400
0.3499
total CGNE steps
37
74
47
Relative Error
0.3614
0.3347
0.3465
total CGNE steps
46
108
64
Relative Error
0.3601
0.3321
0.3446
total CGNE steps
57
162
86
Relative Error
0.3592
0.3302
0.3432
total CGNE steps
60
274
120
Relative Error
0.3590
0.3290
0.3422
total CGNE steps
60
532
129
Relative Error
0.3590
0.3286
0.3420
Table 3.8: Comparison between different stopping rules of CGNE as inner
iteration of the Projected restarted algorithm for the satellite problem.
132
3. New stopping rules for CGNE
Chapter 4
Tomography
The term tomography is derived from the Greek word τ oµos, slice. It stands
for a variety of different techniques for imaging two-dimensional cross sections
of three-dimensional objects. The impact of these techniques in diagnostic
medicine has been revolutionary, since it has enabled doctors to view internal
organs with unprecedented precision and safety to the patient. We have
already seen in the first Chapter that these problems can be mathematically
described by means of the Radon Transform. The aim of this chapter is to
analyze the main properties of the Radon Transform and to give an overview
of the most important algorithms.
General references for this chapter are [19], [68], [69] and [43].
4.1
The classical Radon Transform
In this section we provide an outline of the main properties of the Radon
Transform over hyperplanes of RD . An hyperplane H of RD can be represented by an element of the unit cylinder
CD := {(θ, s) | θ ∈ SD−1 , s ∈ R},
(4.1)
H(θ, s) = {x ∈ RD | hx, θi = s},
(4.2)
via the formula
133
134
4. Tomography
where h·, ·i denotes the usual euclidean inner product of RD and where we
identify H(−θ, −s) with H(θ, s). We denote the set of all hyperplanes of RD
with
ΞD := CD /Z2 ,
(4.3)
and define the Radon Transform of a rapidly decreasing function f ∈ S(RD )
as its integral on H(θ, s):
R[f ](θ, s) :=
Z
f (x)dµH (x),
(4.4)
H(θ,s)
where µH (x) is the Lebesgue measure on H(θ, s).
Much of the theory of the Radon Transform is based on its behavior under
the Fourier Transform and convolution. We recall that the Fourier Transform
of a function f ∈ L1 (RD ) is given by
Z
−D/2
ˆ
f (y) = F (f )(y) = (2π)
f (x)e−ıhx,yi dx,
RD
y ∈ RD .
(4.5)
Observing that the exponential e−ıhx,yi is constant on hyperplanes orthogonal
to y, an important relation between the Fourier and the Radon Transform
is obtained integrating (4.5) along such hyperplanes. Explicitly, we write
y = ξθ for ξ ∈ R and θ ∈ SD−1 to get the famous Projection-Slice Theorem:
Z Z
−D/2
ˆ
f (x)e−ıξhx,θi dµH (x)ds
f (ξθ) = (2π)
R H(θ,s)
Z Z
−D/2
(4.6)
f (x)dµH (x) e−ısξ ds
= (2π)
R
H(θ,s)
Z
−D/2
R[f ](θ, s)e−ısξ ds.
= (2π)
R
This immediately implies that the operator R is injective on S(RD ) (but also
on larger spaces, e.g. on L1 (RD )).
Moreover, let us introduce the space of Schwartz-class functions on ΞD . We
say that a function f ∈ C ∞ (CD ) belongs to S(CD ) if for every k1 ,k2 ∈ N0 we
have
k
2
∂
sup(1 + |s|)k1 k2 f (θ, s) < +∞.
∂s
(θ,s)
4.1 The classical Radon Transform
135
The space S(ΞD ) is the space of the even functions in S(CD ), i.e. f (θ, s) =
f (−θ, −s). The Partial Fourier Transform of a function f ∈ S(ΞD ) is its
Fourier Transform in the s variable:
f (θ, s) 7→ fˆ(θ, ξ) = F (s 7→ f (θ, s)) (ξ) =
Z
f (θ, s)e−ıξs ds.
(4.7)
R
On ΞD , we understand the convolution as acting on the second variable as
well:
(g ∗ h)(θ, s) =
Z
R
g(θ, s − t)h(t)dt.
(4.8)
The Partial Fourier Transform maps S(CD ) into itself and S(ΞD ) into itself
and the Projection-Slice Theorem states that the Fourier Transform of f
∈ S(RD ) is the Partial Fourier Transform of its Radon Transform. For f ∈
S(RD ), if we change to polar coordinates (θ, s) 7→ sθ in RD , a straightforward
application of the chain rule shows that its Fourier Transform lies in S(CD ).
By the Projection-Slice Theorem then
R : S(RD ) −→ S(ΞD ).
(4.9)
Another consequence of the Projection-Slice Theorem is that the Radon
Transform preserves convolutions, in the sense that for every f and g ∈
S(RD ) the following formula holds:
Z
R(f ∗ g)(θ, s) =
R[f ](θ, s − t)R[g](θ, t)dt.
(4.10)
R
We now introduce the backprojection operator R ∗ by
Z
∗
R [g](x) =
g(θ, hx, θi)dθ, g ∈ S(ΞD ).
(4.11)
SD−1
For g = Rf , R ∗ [g](x) is the average of all hyperplane integrals of f through
x. Mathematically speaking, R ∗ is simply the adjoint of R: for φ ∈ S(R)
and f ∈ S(RD ), there holds
Z
Z
φ(s)R[f ](θ, s)ds =
R
RD
φ(hx, θi)f (x)dx
(4.12)
136
4. Tomography
and consequently for g ∈ S(ΞD ) and f ∈ S(RD )
Z
Z
Z
g(Rf )dθds =
(R ∗ g)f dx.
SD−1
R
(4.13)
RD
A more general approach for studying the Radon Transform leading to the
same result can be found in the first chapters of [19].
The following result is the starting point for the Filtered Backprojection
algorithm, which will be discussed later.
Theorem 4.1.1. Let f ∈ S(RD ) and υ ∈ S(ΞD ). Then1
R ∗ (υ ∗ Rf ) = (R ∗ υ) ∗ f.
(4.14)
Proof. For any x ∈ RD , we have
Z
Z
∗
υ(θ, hθ, xi − s)R[f ](θ, s)ds dθ
R (υ ∗ Rf )(x) =
R
SD−1
Z
Z
Z
=
υ(θ, hθ, xi − s)
f (y)dµH(y) ds dθ
SD−1
R
H(θ,s)
Z
Z
=
υ(θ, hθ, x − yi)f (y)dy dθ
SD−1
RD
Z Z
=
υ(θ, hθ, x − yi)dθ f (y)dy
RD
SD−1
Z
=
R ∗ υ(x − y)f (y)dy
RD
= ((R ∗ υ) ∗ f )(x).
4.1.1
(4.15)
The inversion formula
We are now ready to derive the inversion formula for the Radon Transform.
The proof is basically taken from [19], but here, apart from the different
notations and definitions, some small errors in the resulting constants have
1
Note the different meaning of the symbol ∗ in the formula!
4.1 The classical Radon Transform
137
been corrected. We state the general theorem exactly as in [69].
We start from the inversion formula of the classical Fourier Transform in RD
Z
D/2
fˆ(y)eıhx,yi dy
(4.16)
(2π) f (x) =
RD
and switch to polar coordinates y = ξθ, obtaining
(2π)
D/2
f (x) =
Z
SD−1
1
=
2
Z
Z
+∞
fˆ(ξθ)eıξhx,θi ξ D−1 dξdθ
0
Z
+∞
fˆ(ξθ)eıξhx,θi ξ D−1dξdθ
0
Z
Z +∞
1
fˆ(ξ(−θ))eıξhx,−θi ξ D−1 dξdθ
+
2 SD−1 0
Z
Z +∞
1
=
fˆ(ξθ)eıξhx,θi ξ D−1dξdθ
2 SD−1 0
Z
Z 0
1
fˆ(ξθ)eıξhx,θi (−ξ)D−1 dξdθ
+
2 SD−1 −∞
Z
Z +∞
1
=
fˆ(ξθ)eıξhx,θi |ξ|D−1dξdθ.
2 SD−1 −∞
SD−1
(4.17)
If D is odd, |ξ|D−1 = ξ D−1 , thus the Projection-Slice Theorem and the pro-
perties of the Fourier Transform in one variable imply
Z
Z
Z +∞
1
−ısξ
ıξhx,θi D−1
f (x) =
R[f ](θ, s)e ds dξdθ
e
ξ
2(2π)D SD−1 −∞
R
Z
∂ D−1
−1
ξ 7→ F s 7→ D−1 R[f ](θ, s) (ξ) (hx, θi)dθ
= cD
F
∂s
SD−1
D−1
∂
= cD R ∗
Rf (x),
∂sD−1
(4.18)
where
cD :=
2π
2(ı)D−1 (2π)D
D−1
1
= (2π)1−D (−1) 2 .
2
(4.19)
Suppose now D is even. To obtain a complete inversion formula, we recall a
few facts from the theory of distributions (cf. [19] and, as a general reference
for the theory of distributions, [80]).
138
4. Tomography
1. The linear mapping p.v.[1/t] from S(R) to C defined by
Z
h(t)
dt
p.v.[1/t]h = lim+
ǫ→0
|t|>ǫ t
(4.20)
is well defined (although 1/t is not locally integrable) and belongs to
the dual space of S(R); that is to say, it is a tempered distribution on
R.
2. The signum function on R defined by
(
1
if
sgn(ξ) =
−1 if
ξ ≥ 0,
ξ<0
(4.21)
is a tempered distribution on R as well, whose (distributional) Fourier
Transform is related to p.v.[1/t] via the formula
r
π
F (p.v.[1/t])(ξ) = −
ısgn(ξ).
2
(4.22)
The Hilbert Transform of a function φ ∈ S(R) is the convolution H φ :=
φ ∗ π1 p.v.[1/t], i.e.
H [φ](p) = lim+
ǫ→0
Z
|t|>ǫ
φ(p − t)
dt,
πt
p ∈ R.
(4.23)
As a consequence, the Fourier Transform of H φ is the function
F [H φ](ξ) = −ıφ̂(ξ)sgn(ξ).
(4.24)
Now we return to the right-hand side of (4.17), which, for D even, is equal
to
Z
Z +∞
1
fˆ(ξθ)eıξhx,θi ξ D−1sgn(ξ)dξdθ.
2 SD−1 −∞
We proceed similarly to the odd case and using (4.24) we have
Z
Z
Z +∞
1
−ısξ
ıξhx,θi D−1
f (x) =
R[f ](θ, s)e ds dξdθ
e
ξ
sgn(ξ)
2(2π)D SD−1 −∞
R
Z
∂ D−1
−1
ξ 7→ F H (s 7→ D−1 R[f ])(θ, s) (ξ) (hx, θi)dθ
= cD
F
∂s
SD−1
D−1
∂
R[f ]
(x),
= cD R ∗ H
∂sD−1
(4.25)
4.1 The classical Radon Transform
where
cD :=
139
2π
2(ı)D−2 (2π)D
D−2
1
= (2π)1−D (−1) 2 .
2
(4.26)
Altogether, we have proved the following theorem.
Theorem 4.1.2. Let f ∈ S(RD ) and let g = Rf . Then
f = cD
(
R ∗ H g (D−1) n even,
R ∗ g (D−1)
with
1
cD = (2π)1−D
2
(
n odd,
(−1)
D−2
2
n even,
(−1)
D−1
2
n odd,
(4.27)
(4.28)
where the derivatives in ΞD are intended in the second variable.
We conclude the section with a remark on the inversion formula from [69].
For D odd, the equation (4.18) says that f (x) is simply an average of g (D−1)
over all hyperplanes through x. Thus, in order to reconstruct f at some
point x, one needs only the integrals of f through x. This is not true for
D even. In fact, inserting the definition of the Hilbert Transform into (4.25)
and changing the order of integration, we obtain2
Z
Z
1
f (x) = lim+ cD
g (D−1) (θ, hx, θi − t)dθdt,
ǫ→0
t
D−1
|t|>ǫ
S
(4.29)
from which we can see that the computation of f at some point x requires
integrals of f also over hyperplanes far away from x.
4.1.2
Filtered backprojection
Rather than using the inversion formula described above, the most common
method for reconstructing X-ray images is the method of the Filtered Backprojection. Its main advantage is the ability to cancel out high frequency
noise.
2
the equality with the right-hand side of formula (4.25) is guaranteed because g is a
rapidly decreasing function.
140
4. Tomography
Let υ ∈ S(ΞD ). There is a constant C such that |υ(ξ)| ≤ C for all ξ ∈ ΞD and
so the definition of R ∗ implies that also |R ∗ υ(x)| ≤ C for all x. Thus R ∗ υ
is a tempered distribution. Moreover, in [19] it is shown that the following
relation between R ∗ υ and υ̂ holds for all y 6= 0 in RD :
y
∗
(D−1)/2
1−D
F (R υ)(y) = 2(2π)
kyk
υ̂
, kyk .
kyk
(4.30)
The starting point of the Filtered Backprojection algorithm is Theorem 4.1.1:
we put in formula (4.14) g = Rf and V = R ∗ υ obtaining
V ∗ f = R ∗ (υ ∗ g).
(4.31)
The idea is to choose V as an approximation of the Dirac δ-function and to
determine υ from V = R ∗ υ. Then V ∗ f is an approximation of the sought
function f that is calculated in the right-hand side by backprojecting the
convolution of υ with the data g.
Usually only radially symmetric functions V are chosen, i.e. V (y) = V (kyk).
Then υ = υ(θ, s) can be assumed to be only an even function of s and formula
(4.30) reads now
F (R ∗ υ)(y) = 2(2π)(D−1)/2 kyk1−D υ̂(kyk).
(4.32)
Now we choose V as a band limited function by allowing a filter factor φ̂(y)
which is close to 1 for kyk ≤ 1 and which vanishes for kyk > 1 putting
V̂Υ (y) := (2π)−D/2 φ̂(kyk/Υ),
Υ > 0.
(4.33)
Then the corresponding function υΥ such that R ∗ υΥ = VΥ satisfies
1
υ̂Υ (ξ) = (2π)1/2−D ξ D−1φ̂(ξ/Υ),
2
ξ>0
(4.34)
(note that υ̂Υ is an even function being the Fourier Transform of an even
function).
Many choices of φ̂ can be found in literature. We mention the choice proposed
by Shepp and Logan in [84], where
(
sinc(ξπ/2),
φ̂(ξ) :=
0,
0 ≤ ξ ≤ 1,
ξ > 1,
(4.35)
4.2 The Radon Transform over straight lines
141
where sinc(t) is equal to sin(t)/t if t 6= 0, and 1 otherwise.
Once υ has been chosen, the right-hand side of equation (4.14) has to be
calculated to obtain the approximation V ∗ f of f . This has to be done in
a discrete setting, and the discretization of (4.14) depends on the way the
function g is sampled: different samplings lead to different algorithms. Since
we are going to concentrate on iterative methods, we will not proceed in the
description of these algorithms and for a detailed treatment of this argument
we refer to [69].
4.2
The Radon Transform over straight lines
In the previous section we studied the Radon Transform, which integrates
functions on RD over hyperplanes. One can also consider an integration
over d-planes, with d = 1, ..., D − 1. In this case, ΞD is replaced by the
set of unoriented affine d-planes in RD , the affine Grassmannian G(d, D).
For simplicity, here we will consider only the case d = 1: the corresponding
transform is the so called X-Ray Transform or just the Ray Transform.
We identify a straight line L of G(1, D) with a direction θ ∈ SD−1 and a
point s ∈ θ ⊥ as {s + tθ, t ∈ R} and define the X-Ray Transform P by
P[f ](θ, s) =
Z
f (s + tθ)dt.
(4.36)
R
Similarly to the case d = D−1 we have a Projection Slice Theorem as follows.
For f ∈ L1 (RD ) and y ∈ RD let L = L(θ, s) ∈ G(1, D) be a straight line
such that y lies in θ ⊥ . Then
fˆ(y) = (2π)
Z
f (x)e−ıhx,yi dx
D
ZR Z
−ıhs+tθ,yi
−D/2
f (s + tθ)e
dt dµθ⊥ (s)
= (2π)
⊥
R
Zθ
Pf (θ, s)e−ıhs,yi dµθ⊥ (s).
= (2π)−D/2
−D/2
θ⊥
(4.37)
142
4. Tomography
Thus Pf is a function on T D := {(θ, s) θ ∈ SD−1 , s ∈ θ ⊥ } and f ∈ S(RD )
implies Pf ∈ S(T D ), where
(
S(T D ) =
)
k
2
∂
: sup(1 + |s|)k1 k2 g(θ, s) < +∞ .
∂s
g ∈ C∞
(4.38)
(θ,s)
On T D , the convolution and the Partial Fourier Transform are defined by
Z
(g ∗ h)(θ, s) =
g(θ, s − t)h(θ, t)dµθ⊥ (t), s ∈ θ ⊥ ,
(4.39)
θ⊥
ĝ(θ, ξ) = (2π)
(1−D)/2
Z
θ
⊥
e−ıhs,ξi g(θ, s)dµθ⊥ (s),
ξ ∈ θ⊥ .
(4.40)
Theorem 4.2.1. For f, g ∈ S(RD ), we have
Pf ∗ Pg = P(f ∗ g).
(4.41)
As in the case d = D − 1, the convolution on RD and on T D are denoted
by the same symbol in the theorem.
The backprojection operator P ∗ is now
Z
∗
P [g](x) =
g(θ, Eθ x)dθ,
(4.42)
SD−1
where Eθ is the orthogonal projector onto θ ⊥ , i.e. Eθ x = x − hx, θiθ. Again
it is the adjoint of P:
Z
Z
SD−1
θ
⊥
gPf dθdµθ⊥ (s) =
Z
f P ∗ gdx.
(4.43)
RD
There is also an analogous version of Theorem 4.1.1:
Theorem 4.2.2. Let f ∈ S(RD ) and g ∈ S(T D ). Then
P ∗ (g ∗ Pf ) = (P ∗ g) ∗ f.
(4.44)
From (4.37) and (4.40) follows immediately that for f ∈ S(RD ), θ ∈ SD−1
and ξ ⊥ θ there holds
\)(θ, ξ) = (2π)1/2 fˆ(ξ).
(Pf
This is already enough to state the following uniqueness result in R3 .
(4.45)
4.2 The Radon Transform over straight lines
143
Theorem 4.2.3. Let S20 be a subset of S2 that meets every equatorial circle
of S2 (Orlov’s condition, 1976). Then the knowledge of Pf for θ ∈ S20
determines f uniquely.
Proof. For every ξ ∈ R3 , since S20 satisfies Orlov’s condition, there exists an
element θ ∈ S2 such that θ ⊥ ξ. Hence fˆ(ξ) is determined by (4.45).
0
An explicit inversion formula on S20 was found by the same Orlov in 1976
(see Natterer [69] for a proof).
In spherical coordinates,

cos χ cos ϑ

x=
 sin χ cos ϑ
sin ϑ


,

0 ≤ χ < 2π,
|ϑ| ≤
π
,
2
(4.46)
S20 is given by ϑ− (χ) ≤ ϑ ≤ ϑ+ (χ), 0 ≤ χ < 2π, where ϑ± are functions such
that − π2 < ϑ− (χ) < 0 < ϑ+ (χ) < π2 , 0 ≤ χ < 2π.
Now, let l(x, y) be the length of the intersection of S20 with the plane spanned
by x and y ∈ R3 . According to the assumption made on ϑ± , l(x, y) > 0 if x
and y are linearly independent.
Theorem 4.2.4 (Orlov’s inversion formula). Let f ∈ S(R3 ) and g(θ, s) =
P[f ](θ, s) for θ ∈ S20 and s ⊥ θ. Then
Z
f (x) = △
h(θ, Eθ x)dθ,
(4.47)
S20
where
1
h(θ, s) = − 2
4π
Z
θ⊥
and △ is the Laplace operator on R3 .
4.2.1
g(θ, s − t)
dµ ⊥ (t)
ktkl(θ, t) θ
(4.48)
The Cone Beam Transform
We define the Cone Beam Transform of a density function f ∈ S(R3 ) by
Z +∞
D[f ](x, θ) =
f (x + tθ)dt, x ∈ R3 , θ ∈ S2 .
(4.49)
0
144
4. Tomography
When x is the location of an X-ray source traveling along a curve Γ, the
operator D is usually called the Cone Beam Transform of f along the source
curve Γ. We want to invert D in this particular case, which is of interest in
the applications.
We start by considering the case where Γ is the unit circle in the x1 -x2
plane. A point x on Γ is expressed as x = (cos φ, sin φ, 0)∗ , with x⊥ =
(− sin φ, cos φ, 0)∗ . An element θ ∈ S2 \ {x} corresponds to a couple (y2 ,y3 )∗
∈ R2 by taking the intersection of the beam through θ and x with the plane
spanned by x⊥ and e3 := (0, 0, 1)∗ passing through −x 3 . Thus, if f vanishes
outside the unit ball of R3 , we have
Z +∞
D[f ](φ, y2 , y3 ) =
f ((1 − t)x + t(y2 x⊥ + y3 e3 ))dt.
(4.50)
0
We also define the Mellin Transform of a function h ∈ R by
Z +∞
M [h](s) =
ts−1 h(t)dt.
(4.51)
0
Then in [69] it is shown that performing a Mellin Transform of g and f with
respect to y3 and x3 respectively and then expanding the results in Fourier
series with respect to φ one obtains the equations
Z +∞ q
2
2
2
gl (y2 , s) =
fl
(1 − t) + t y2 , s e−ılα(t,y2 ) dt,
0
l ∈ Z,
(4.52)
where α(t, y2 ) is the argument of the point (1 − t, ty2 ) in the x1 -x2 plane.
Unfortunately an explicit solution to this equation does not exist and the
entire procedure seems rather expensive from a computational point of view.
This is a reason why usually nowadays different paths are used.
An explicit inversion formula was found by Tuy in 1983 (cf. [93]). It applies
to paths satisfying the following condition:
Definition 4.2.1 (Tuy’s condition). Let Γ = Γ(t), t ∈ [0, 1] be a para-
metric curve on R3 . Γ is said to satisfy Tuy’s condition if it intersects each
3
in other words, y2 and y3 are just the coordinates of the stereographic projection of
θ from the projection point x.
4.2 The Radon Transform over straight lines
145
plane hitting supp(f ) transversally, i.e. if for each x ∈ supp(f ) and each θ
∈ S2 there exists t = t(x, θ) ∈ [0, 1] such that
hΓ(t), θi = hx, θi,
hΓ′ (t), θi =
6 0.
(4.53)
Theorem 4.2.5. Suppose that the source curve satisfies Tuy’s condition.
Then
f (x) = (2π)
−3/2 −1
ı
Z
S2
(hΓ′ (t), θi)−1
d [
(Df )(Γ(t), θ)dθ,
dt
(4.54)
where t = t(x, θ) and where the Fourier Transform is performed only with
respect to the first variable.
Of course, Tuy’s condition doesn’t hold for the circular path described
above since it lies entirely on a plane. In the pursuit of the data sufficiency
condition, various scanning trajectories have been proposed, such as circle
and line, circle plus arc, double orthogonal circles, dual ellipses and helix
(see [50] and the references therein). Of particular interest is the helical
trajectory, for which in a series of papers from 2002 to 2004 ([56], [57] and
[58]) the Russian mathematician A. Katsevich found an inversion formula
strictly related to the Filtered Backprojection algorithm. Although we won’t
investigate the implementation of these algorithms, we dedicate the following section to the basic concepts developed in those papers since they are
considered a breakthrough by many experts in the field.
4.2.2
Katsevich’s inversion formula
In the description of Katsevich’s inversion formula we follow [97] and [98].
First of all, the source lies on an helical trajectory defined by
∗
t
Γ(t) = R cos(t), R sin(t), P
, t ∈ I,
2π
(4.55)
where R > 0 is the radius of the helix, P > 0 is the helical pitch and I := [a, b],
b > a. In medical applications, the helical path is obtained by translating the
platform where the patient lies through the rotating source-detector gantry.
146
4. Tomography
Thus the pitch of the helix is the displacement of the patient table per source
turn. We assume that the support Ω of the function f ∈ C ∞ (R3 ) to be
reconstructed lies strictly inside the helix, i.e. there exists a cylinder
U := {x ∈ R3 | x21 + x22 < r},
0 < r < R,
(4.56)
such that Ω ⊆ U.
To understand the statement of Katsevich’s formula we introduce the notions
of π-line and Tam-Danielsson window.
A π-line is any line segment that connects two points on the helix which are
separated by less than one helical turn (see Figure 4.1). It can be shown
Figure 4.1: The π-line of an helix: in the figure, y(sb ) and y(st ) correspond
to Γ(t0 ) and Γ(t1 ) respectively.
(cf. [11]) that for every point x inside the helix, there is a unique π-line
through x. Let Iπ (x) = [t0 (x), t1 (x)] be the parametric interval corresponding
to the unique π-line passing through x. In particular, Γ(t0 ) and Γ(t1 ) are
the endpoints of the π-line which lie on the helix. By definition, we have
t1 − t0 < 2π.
The region on the detector plane bounded above and below by the projections
of an helix segment onto the detector plane when viewed from Γ(t) is called
the Tam-Danielsson window in the literature (cf. Figure 4.2). Now, consider
the ray passing through Γ(t) and x. Let the intersection of this ray with the
detector plane be denoted by x̄. Tam et al. in [89] and Danielsson et al. in
4.2 The Radon Transform over straight lines
147
Figure 4.2: The Tam-Danielsson window: a(λ) in the figure corresponds to
Γ(t) in our notation.
[11] showed that if x̄ lies inside the Tam-Danielsson window for every t ∈ Iπ ,
then f (x) may be reconstructed exactly. We define a κ-plane to be any plane
that has three intersections with the helix such that one intersection is halfway between the two others. Denote the κ-plane which intersects the helix
at the three points Γ(t), Γ(t + ψ) and Γ(t + 2ψ) by κ(t, ψ), ψ ∈ (−π/2, π/2).
The κ(t, ψ)-plane is spanned by the vectors ν 1 (t, ψ) = Γ(t + ψ) − Γ(t) and
ν 2 (t, ψ) = Γ(t + 2ψ) − Γ(t) and the unit normal vector to the κ(t, ψ)-plane
is
n(t, ψ) := sgn(ψ)
ν 1 (t, ψ) × ν 2 (t, ψ)
,
kν 1 (t, ψ) × ν 2 (t, ψ)k
ψ ∈ (−π/2, π/2),
(4.57)
where the symbol × stands for the external product in R3 . Katsevich [58]
proved that for a given x, the κ-plane through x with ψ ∈ (−π/2, π/2) is
uniquely determined if the projection x̄ onto the detector plane lies in the
Tam-Danielson window. A κ-line is the line of intersection of the detector
plane and a κ-plane, so if x̄ lies in the Tam-Danielson window, there is a
unique κ-line. We denote the unit vector from Γ(t) toward x by
β(t, x) =
x − Γ(t)
.
kx − Γ(t)k
(4.58)
For a generic α ∈ S2 , let m(t, α) be the unit normal vector for the plane
κ(t, ψ) with the smallest |ψ| value that contains the line of direction α which
148
4. Tomography
passes through Γ(t), and put e(t, x) := β(t, x) × m(t, β). Then β(t, x) and
e(t, x) span the κ-plane that we will want to use for the reconstruction of f
in x. Any direction in the plane may be expressed as
θ(t, x, γ) = cos(γ)β(t, x) + sin(γ)e(t, x),
γ ∈ [0, 2π).
(4.59)
We can now state Katsevich’s result as follows.
Theorem 4.2.6 (Katsevich). For f ∈ C0∞ (U),
Z
Z 2π
1
dγdt
1
∂
f (x) = − 2
p.v.
Df (Γ(q), θ(t, x, γ))|q=t
,
2π Iπ (x) kx − Γ(t)k
∂q
sin γ
0
(4.60)
where p.v. stands for the principal value integral and where all the objects
appearing in the formula are defined as above.
Proof. See [56], [57], [58].
For a fixed x, consider the κ-plane with unit normal m(t, β(t, x)). We
consider a generic line in this plane with direction θ 0 (t, x) = cos(γ0 )β(t, x) +
sin(γ0 )e(t, x), γ0 ∈ [0, 2π) and define
g ′(t, θ 0 (t, x)) :=
∂
Df (Γ(q), θ(t, x, γ)|q=t
∂q
(4.61)
and
F
g (t, θ 0 (t, x)) := p.v.
Z
0
2π
1
g ′ (t, cos(γ0 −γ)β(t, x)−sin(γ0 −γ)e(s, x))dγ.
π sin γ
(4.62)
Thus Katsevich’s formula can be rewritten as
Z
1
1
f (x) = −
g F (t, β(t, x))dt.
2π Iπ (x) kx − Γ(t)k
(4.63)
Therefore, we see that Katsevich’s formula may be implemented as a derivative, followed by a 1D convolution, and then a back-projection: for this reason
it is usually described as a Filtered Backprojection-type formula. Further details for the implementation of Katsevich’s formula can be found, e.g., in [97]
and [98].
4.3 Spectral properties of the integral operator
4.3
149
Spectral properties of the integral operator
In this section we study the spectral properties of the operator R defined by
(4.4).
We consider the polynomials of degree k orthogonal with respect to the weight
function tl in [0, 1] and denote them by Pk,l = Pk,l (t). Similarly to what we
have already seen in Chapter 2, the polynomials Pk,l are well defined for every
k and l ∈ N0 and the orthogonality property means that
Z 1
tl Pk1 ,l (t)Pk2 ,l (t) = δk1 ,k2 ,
(4.64)
0
where δk1 ,k2 = 1 if k1 =k2 and 0 otherwise. In fact, up to a normalization, they
are the Jacobi polynomials Gk (l + (D −2)/2, l + (D −2)/2, t) (cf. Abramowitz
and Stegun [1], formula 22.2.2).
We shall also need the Gegenbauer polynomials Cµm , which are defined as
the orthogonal polynomials on [−1, 1] with weight function (1 − t2 )µ−1/2 ,
µ > −1/2. Moreover, we recall that a spherical harmonic of degree l is the
restriction to SD−1 of an harmonic polynomial homogeneous of degree l on
RD (cf. e.g. [66], [83] and [85]). There exist exactly
n(D, l) :=
(2l + D − 2)(D + l − 3)!
l!(D − 2)!
(4.65)
linearly independent spherical harmonics of degree l and spherical harmonics
of different degree are orthogonal in L2 (SD−1 ).
Now let Yl,k , k = 1, ..., n(D, l) be an orthonormal basis for the spherical
harmonics of degree l. We define, for i ≥ 0, 0 ≤ l ≤ i, 1 ≤ k ≤ n(D, l),
√
filk (x) = 2P(i−l)/2,l+(D−2)/2 (kxk2 )kxkl Yl,k (x/kxk)
(4.66)
and
D/2
gilk (θ, s) = c(i)w(s)D−1 Ci
Here w(s) := (1 − s2 )1/2 and
c(i) =
(s)Yl,k (θ).
π21−D/2 Γ(i + D)
,
i!(i + D/2)(Γ(D/2))2
(4.67)
(4.68)
150
4. Tomography
where Γ stands for Euler’s Gamma function.
Theorem 4.3.1 (Davison(1983) and Louis (1984)). The functions filk
and gilk , i ≥ 0, 0 ≤ l ≤ i, 1 ≤ k ≤ n(D, l), are complete orthonormal families
in the spaces L2 ({kxk < 1}) and L2 (ΞD , w 1−D ), respectively. The singular
value decomposition of R as an operator between these spaces is given by
Rf =
∞
X
X
λi
i=0
where
λi =
n(D,l)
X
0≤l≤i, k=1
l+i even
hf, filk iL2 ({kxk<1}) gilk ,
2D π D−1
(i + 1) · · · (i + D − 1)
1/2
(4.69)
(4.70)
⌋.
are the singular values of R, each being of multiplicity n(D, l)⌊ i+2
2
Proof. See [13] and [64]. For a proof in a simplified case with D = 2, cf.
[4].
We observe that in the case D = 2 the singular values decay to zero
rather slowly, namely λi = O(i−1/2 ). This is in accordance with the remark
we have made in the introductory example of Chapter 1, where we have seen
that to compute an inversion of R the data are differentiated and smoothed
again. A more precise statement to explain this can be made by means of
the following theorem (see [69] for the details not specified below).
Theorem 4.3.2. Let Ω be a bounded and sufficiently regular domain in RD
and let α ∈ R. Then there are positive constants c(α, Ω) and C(α, Ω) such
that for all f ∈ H0α (Ω)
c(α, Ω)kf kHα0 (Ω) ≤ kRf kHα+(D−1)/2 (ΞD ) ≤ C(α, Ω)kf kHα0 (Ω) .
(4.71)
Here, H0α (Ω) is the closure of C0∞ (Ω) with respect to the norm of the Sobolev
space Hα (RD ) and Hβ (ΞD ) is the space of even functions g on the cylinder
CD such that
kgkHβ (CD ) :=
Z
SD−1
Z
R
(1 + ξ 2 )β/2 ĝ(θ, ξ)2 dξdθ,
β ∈ R.
(4.72)
4.4 Parallel, fan beam and helical scanning
151
Thus, roughly speaking, in the general case we can say that Rf is smoother
than f by an order of (D − 1)/2.
Similar results can be found for the operator P.
Theorem 4.3.3 (Maass (1987)). With the functions filk defined above
and a certain complete orthonormal system gilk on L2 (T D , w), where w(ξ) =
(1 − kξk2 )1/2 , there are positive numbers λil such that
Pf (θ, s) =
∞
X
X
i=0 0≤l≤i,
l+i even
n(D,l)
λil
X
k=1
hf, filk igilk .
(4.73)
The singular values λil , each of multiplicity n(D, l), satisfy
λil = O(i−1/2 )
(4.74)
as i → +∞, uniformly in l.
Proof. See [65].
Theorem 4.3.4. Let Ω be a bounded and sufficiently regular domain in RD
and let α ∈ R. Then there are positive constants c(α, Ω) and C(α, Ω) such
that for all f ∈ H0α (Ω)
c(α, Ω)kf kHα0 (Ω) ≤ kPf kHα+1/2 (T D ) ≤ C(α, Ω)kf kHα0 (Ω) ,
(4.75)
where
kgkHβ (T D ) :=
4.4
Z
SD−1
Z
θ⊥
(1 + kξk2 )β/2 ĝ(θ, ξ)2 dξdθ,
β ∈ R.
(4.76)
Parallel, fan beam and helical scanning
In this section we give a very brief survey of the different scanning geometries
in computerized tomography. We distinguish between the 2D and the 3D
cases.
152
4. Tomography
(a) First generation: parallel, dual mo-
(b) Second generation: narrow fan beam
tion scanner.
(∼ 10◦ ), dual motion scanner.
Figure 4.3: First and second generation scanners.
4.4.1
2D scanning geometry
In 2D geometry, only one slice of the object is scanned at a time, the reconstruction is made slice by slice by means of the classical 2D Radon Transform.
In parallel scanning, the X-rays are emitted along a two-parameter family
of straight lines Ljl , j = 0, ..., j̄ − 1, l = −¯l, ..., ¯l, where Ljl is the straight
line making an angle φj = j∆φ with the x2 -axis and having signed distance
sl = l∆s from the origin, i.e., hx, θ j i = sl , θ j = (cos φj , sin φj )∗ . The mea-
P AR
sured values gjl
are simply
P AR
gjl
= R[f ](θ j , sl ),
j = 0, ..., j̄ − 1,
l = −¯l, ..., ¯l.
(4.77)
In the first CT scanners (first generation scanners) the source moves along a
straight line. The X-ray is fired at each position sl and the intensity is measured by a detector behind the object which translates simultaneously with
the source. Then the same process is repeated with a new angular direction.
A first improvement on this invasive and rather slow technique came with
the second generation scanners, where more than one detector is used, but
4.4 Parallel, fan beam and helical scanning
153
the number of detectors is still small (cf. Figure 4.3).
Fan beam scanning geometry is characterized by the use of many detectors.
In a third generation scanner, the X-ray source and the detectors are mounted
(a) Third generation: fan beam, rotating
(b) Fourth generation: fan beam, sta-
detectors.
tionary detectors.
Figure 4.4: Fan beam geometry: third and fourth generation scanners.
on a common rotating frame (cf. Figure 4.4). During the rotation the detectors are read out in small time intervals which is equivalent to assume that
the X-rays are fired from a number of discrete source positions. Let rθ(βj ),
θ(β) := (cos β, sin β)∗ be the j-th source position and let αl be the angle
the l-th ray in the fan emanating from the source at rθ(βj ) makes with the
central ray. Then the measured valued gjl correspond to the 2D Cone Beam
Transform:
F AN
gjl
= D[f ](rθ(βj ), θ(αl + βj + π)),
j = 0, ..., j̄ − 1, l = −¯l, ..., ¯l. (4.78)
In a fourth generation scanner, the detectors are at rθ(βj ), the source is
rotating continuously on a circle around the origin (cf. Figure 4.4) and the
detectors are read out at discrete time intervals.
154
4.4.2
4. Tomography
3D scanning geometry
In 3D cone beam scanning, the source runs around the object on a curve,
together with a 2D detector array. As we have already seen in the section
dedicated Katsevich’s formula, in the simplest case the curve is a circle and
the situation can be modeled by the 3D Cone Beam Transform in the same
way as for 2D third generation scanners. When the object is translated continuously in the direction of the axis of symmetry of the fan beam scanner,
we obtain the case of 3D helical scanning.
The number of rays actually measured varies between 100 in industrial tomography to 106 -108 in radiological applications [69]. Thus the number of
data to be processed is extremely large: for this reason we shall concentrate on iterative regularization methods which are usually faster than other
reconstruction techniques.
4.5
Relations between Fourier and singular
functions
We have seen that in the case of image deblurring the SVD of the matrix
of the underlying linear system is given by means of the DFT, according
to (3.36). This allows to study the spectral properties of the problem, even
when the size of the system is large.
In the tomography related problems the exact SVD of the matrix of the system is not available. Anyway, it is possible to exploit some a-priori known
qualitative information to obtain good numerical results. There are two important pieces of information that are going to be used in the numerical
experiments below: the first one is the decay of the singular values of the
Radon operator described in Section 4.3; the second one is a general property
of ill-posed problems, which is the subject of the current section.
In the paper [39] Hansen, Kilmer and Kjeldsen developed an important insight into the relationship between the SVD and discrete Fourier bases for
4.5 Relations between Fourier and singular functions
155
discrete ill-posed problems arising from the discretization of Fredholm integral equations of the first kind. We reconsider their analysis and relate it to
the considerations we have made in Chapter 3 and in particular in Section
3.1 and Section 3.4.
4.5.1
The case of the compact operator
Although we intend to study large scale discrete ill-posed problems, we return
first to the continuous setting of Section 1.8, because the basic ideas draw
upon certain properties of the underlying continuous problem involving a
compact integral operator. As stated already in [39], this material is not
intended as a rigorous analysis but rather as a tool to gain some insight into
non-asymptotic relations between the Fourier and singular functions of the
integral operator.
Consider the Hilbert space X = L2 ([−π, π]) with the usual inner product
h·, ·i and denote with k · k the norm induced by the scalar product. Define
Z
K[f ](s) := κ(s, t)f (t)dt, I = [−π, π].
(4.79)
I
Assume that the kernel κ is real, (piecewise) C 1 (I × I) and non-degenerate.
Moreover, assume for simplicity also that kκ(π, ·) − κ(−π, ·)k = 0. Then,
as we have seen in Section 1.8, K is a compact operator from X to itself
and there exist a singular system {λj ; vj , uj } for K such that (1.38), (1.39),
(1.40) and (1.41) hold.
Define the infinite matrix B whose rows indexed by k = −∞, ..., +∞ and
columns indexed by j = 1, ..., +∞, with entries
√
Bk,j = |huj , eıks / 2πi|.
Then the following phenomenon, observed in the discrete setting, can be
shown:
The largest entries in the matrix B form a V -shape, with the V lying on the
side and the tip of the V located at k = 0, j = 1.
This means that the function uj is well represented only by a small number
156
4. Tomography
√
of the eıks / 2π for some |k| in a band of contiguous integers depending on j.
Therefore, the singular functions are similar to the Fourier series functions
in the sense that large singular values (small j) and their corresponding
singular functions correspond to low frequencies, and small singular values
(larger j) correspond to high frequencies. The important consequence is
that it is possible to obtain at least some qualitative information about the
spectral properties of an ill-posed problem without calculating the SVD of
the operator but only performing a Fourier Transform.
4.5.2
Discrete ill-posed problems
As a matter of fact, the properties of the integral operator described in
Section 4.5.1 are observed in practice. As already shown in the paper [39],
the Fourier and the SVD coefficients of a discrete ill-posed problem Ax = b
with perturbed data bδ have a similar behavior if they are ordered in the
correct way.
As a consequence of the phenomenon described in Section 4.5.1 and of the
properties of the Discrete Fourier Transform, we can reorder the Fourier
coefficients as follows:
ϕi :=
(
(Φ∗ bδ )(i+1)/2
if i is odd,
(Φ∗ bδ )(2m−i+2)/2
if i is even.
(4.80)
In Figure 4.5 we compare the SVD and the Fourier coefficients of the test
problem phillips(200) with ̺ = 0.1% and Matlab seed = 1. It is evident from
the graphic that the Fourier coefficients, reordered according to the formula
(4.80), decay very similarly to the SVD coefficients in this example.
In [39], only the case of the 1D ill-posed problems was considered in detail.
Here we consider the case of a 2D tomographic test problem, where:
• the matrix A of the system is the discretization of a 2D integral operator acting on a function of two space variables. For example, in the
case of parallel X-rays modeled by the Radon Transform, R[f ](θ, s) is
4.6 Numerical experiments
157
Phillips(200) noise 0.1%
SVD and Fourier coefficients
2
10
|u*
bδ|
i
|F bδ|
0
10
−2
10
−4
10
−6
10
−8
10
0
20
40
60
80
100
Figure 4.5: SVD(blue full circles) and Fourier (red circles) coefficients of
phillips(200), noise 0.1%
simply the integral of the density function f multiplied by the Dirac
δ-function supported on the straight line corresponding to (θ, s).
• The exact data, denoted by the vector g, is the columnwise stacked
version of the matrix G with entries given by a formula of the type
(4.77) or (4.78) (the sinogram).
• The exact solution f is the columnwise stacked version of the image
F obtained by computing the density function f on the discretization
points of the domain.
To calculate the Fourier coefficients of the perturbed problem Af = gδ ,
kgδ − gk ≤ δ, we suggest the following strategy:
(i) compute the two-dimensional Discrete Fourier Transform of the (perturbed) sinogram Gδ corresponding to gδ .
(ii) Consider the matrix of the Fourier coefficients obtained at the step
(i) and reorder its columnwise stacked version as in the 1D case (cf.
formula (4.80)).
158
4. Tomography
Fourier plot
fanbeamtomo(100) noise 3%
4
10
Fou. Coeff.
App. Fou. Coeff.
Sub. App. Fou. Coeff.
p
3
10
2
10
1
10
0
10
−1
10
−2
10
−3
10
0
1
2
3
4
5
6
4
x 10
Figure 4.6: The computation of the index p for fanbeamtomo(100), with
̺ = 3% and Matlab seed = 0.
4.6
Numerical experiments
In this section we present some numerical experiments performed on three
different test problems of P.C. Hansen’s Air Tools (cf. Appendix C.6). In all
the considered examples we denote with:
• A, F, G and Gδ the matrix of the system, the exact solution, the exact
data and the perturbed data respectively;
• f, g and gδ the columnwise stacked versions of the matrices F, G and
Gδ ;
• m and N the number of rows and columns of A respectively (in all the
test problems considered m > N ≥ 10000);
• l0 and j0 the number of rows and columns of the sinogram Gδ (see also
Appendix C.6);
• J the number of rows (and columns) of the exact solution F;
• ϕ the vector of the Fourier coefficients of the perturbed system Af =
gδ , reordered as described in Section 4.5.2;
• ϕ̃ the approximation of the vector ϕ computed by the routine data− approx
with ⌊m/5000⌋ inner knots.
4.6 Numerical experiments
159
CGNE results for the fanbeamtomo problem
J
Average relative error
̺
Discr. Princ.
SR1(np)
SR3
Opt. Sol.
100
1%
0.0750
0.0817(100)
0.0568
0.0427
100
3%
0.1437
0.1347(75)
0.1232
0.1132
100
5%
0.1943
0.1847(50)
0.1715
0.1696
100
7%
0.2273
0.2273(25)
0.2139
0.2127
200
1%
0.1085
0.1049(80)
0.0947
0.0886
200
3%
0.1747
0.1669(60)
0.1693
0.1668
200
5%
0.2226
0.2143(40)
0.2127
0.2123
200
7%
0.2550
0.2550(20)
0.2495
0.2477
Table 4.1: Comparison between different stopping rules of CGNE for the
fanbeamtomo test problem.
Using the algorithm cgls, we compute the regularized solutions of CGNE
stopped according to the Discrepancy Principle and to SR1 and SR3.
The index p of SR3 is calculated as follows (see Figure 4.6):
(i) determine a subsequence of the approximated Fourier coefficients by
choosing the elements ϕ̃1 , ϕ̃1+J , ϕ̃1+2J , ....
(ii) Discard all the elements after the first relative minimum of this subsequence.
(iii) The index p is defined by p := 1 + (i + 1)J where i is the corner of the
discrete curve obtained at the step (ii) determined by the algorithm
triangle.
4.6.1
Fanbeamtomo
We consider the function fanbeamtomo, that generates a tomographic test
problem with fan beam X-rays. We choose two different values for the dimension J, four different percentages ̺ of noise on the data, and 16 different
Matlab seeds, for a total of 128 simulations.
For fixed values of J and ̺, the average relative errors over all Matlab seeds
160
4. Tomography
CGNE results for the seismictomo problem
dim
Average relative error
̺
Discr. Princ.
SR1(np)
SR3
Opt. Sol.
100
1%
0.0985
0.1022(50)
0.0954
0.0929
100
3%
0.1344
0.1367(35)
0.1291
0.1291
100
5%
0.1544
0.1603(25)
0.1549
0.1533
100
7%
0.1713
0.1820(20)
0.1813
0.1713
200
1%
0.1222
0.1260(50)
0.1207
0.1177
200
3%
0.1503
0.1637(35)
0.1484
0.1481
200
5%
0.1699
0.1934(25)
0.1661
0.1660
200
7%
0.1830
0.1960(20)
0.1814
0.1800
Table 4.2: Comparison between different stopping rules of CGNE for the
seismictomo test problem.
are summarized in Table 4.1. The results show that SR3 provides an improvement on the Discrepancy Principle, which is more significant when the noise
level is small. Moreover, SR1 confirms to be a reliable heuristic stopping
rule.
4.6.2
Seismictomo
The function seismictomo creates a two-dimensional seismic tomographic test
problem (see Appendix C.6 and [40]).
As for the case of fanbeamtomo, we choose two different values for the
dimension J, four different percentages ̺ of noise on the data, and 16 different
Matlab seeds. For fixed values of J and ̺, the average relative errors over all
Matlab seeds are summarized in Table 4.2.
The numerical results show again that SR3 improves the results of the Discrepancy Principle. It is interesting to note that in this case the solutions of
SR1 are slightly oversmoothed. In fact, the Approximated Residual L-Curve
is very similar to the Residual L-Curve, so in this example the approximation
helps very little in overcoming the oversmoothing effect of the Residual LCurve method described in Chapter 3.
4.6 Numerical experiments
161
paralleltomo(100) noise 3%
Computations of the index p
4
10
Fou. Coeff.
App. Fou. Coeff.
(App. Fou. Coeff.)./λ
Sub. App. Fou. Coeff.
pc
3
10
HUGE FIRST SING. VALUES
2
10
p
λ
CORNER ∼ pc
1
RELATIVE MINIMUM ∼ p
10
λ
0
10
−1
10
−2
10
−3
10
0
0.5
1
1.5
2
2.5
3
4
x 10
Figure 4.7: The computation of the index pλ for paralleltomo(100), with
̺ = 3% and Matlab seed = 0.
4.6.3
Paralleltomo
The function paralleltomo generates a 2D tomographic test problem with parallel X-rays.
For this test problem we consider also the following variant of SR3 for computing the index p (cf. Figure 4.7). Using the Matlab function svds, we
calculate the largest singular value λ1 of the matrix A. Assuming that the
singular values of A decay like O(i−1/2 ) we define a vector of approximated
singular values λ̃i := λ1 i−1/2 for i ≥ 1 (the approximation is justified by
Theorem 4.3.1). Typically, the graphic of the ratios ϕ̃i /λ̃i is similar to that
shown in Figure 4.7. Similarly to the 1D examples described in Chapter
3, this graphic has a relative minimum when the Fourier coefficients begin
to level off. We denote with pλ the index corresponding to this minimum
and with pc the index determined as in the test problems fanbeamtomo and
seismictomo.
As in the previous cases, we choose two different values for the dimension
J, four different percentages ̺ of noise on the data, and 16 different Matlab
seeds. For fixed values of J and ̺, the average relative errors over all Matlab
seeds are summarized in Table 4.3.
162
4. Tomography
CGNE results for the paralleltomo problem
dim
Average relative error
̺
Discr. Princ.
SR1(np)
SR3c
SR3λ
Opt. Sol.
100
1%
0.1247
0.1186(100)
0.1041
0.0868
0.0759
100
3%
0.1992
0.1861(75)
0.1861
0.1759
0.1703
100
5%
0.2378
0.2378(50)
0.2299
0.2272
0.2269
100
7%
0.2724
0.2674(25)
0.2674
0.2670
0.2670
200
1%
0.1568
0.1501(80)
0.1516
0.1507
0.1465
200
3%
0.2140
0.2055(60)
0.2055
0.2051
0.2051
200
5%
0.2593
0.2517(40)
0.2523
0.2517
0.2517
200
7%
0.2962
0.2950(20)
0.2931
0.2931
0.2889
Table 4.3: Comparison between different stopping rules of CGNE for the
paralleltomo test problem.
The numerical results show that the qualitative a-priori notion about the
spectrum improves the results. As a matter of fact, SR3 with p = pλ provides
excellent results, in particular when the noise level is low. The performances
of the heuristic stopping rule SR1 are very good here as well.
Chapter 5
Regularization in Banach
spaces
So far, we have considered only linear ill-posed problems in a Hilbert space
setting. We have seen that this theory is well established since the nineties,
so we focused mainly on the applications in the discrete setting.
In the past decade of research, in the area of inverse and ill-posed problems, a
great deal of attention has been devoted to the regularization in the Banach
space setting. The research on regularization methods in Banach spaces was
driven by different mathematical viewpoints. On the one hand, there are
various practical applications where models that use Hilbert spaces are not
realistic or appropriate. Usually, in such applications sparse solutions1 of
linear and nonlinear operator equations are to be determined, and models
working in Lp spaces, non-Hilbertian Sobolev spaces or continuous function
spaces are preferable. On the other hand, mathematical tools and techniques
typical of Banach spaces can help to overcome the limitations of Hilbert
models. In the monograph [82], a series of different applications ranging from
non-destructive testing, such as X-ray diffractometry, via phase retrieval, to
an inverse problem in finance are presented. All these applications can be
1
Sparsity means that the searched-for solution has only a few nonzero coefficients with
respect to a specific, given, basis.
163
164
5. Regularization in Banach spaces
modeled by operator equations
F (x) = y,
(5.1)
where the so-called forward operator F : D(F ) ⊆ X → Y denotes a continuous linear or nonlinear mapping between Banach spaces X and Y.
In the opening section of this chapter we shall describe another important
example, the problem of identifying coefficients or source terms in partial
differential equations (PDEs) from data obtained from the PDE solutions.
Then we will introduce the fundamental notions and tools that are peculiar
of the Banach space setting and discuss the problem of regularization in this
new framework.
At the end of the chapter we shall focus on the properties of some of the most
important regularization methods in Banach spaces for solving nonlinear illposed problems, such as the Landweber-type methods and the Iteratively
Regularized Gauss-Newton method.
We point out that the aim of this chapter is the introduction of the NewtonLandweber type iteration that will be discussed in the final chapter of this
thesis. Thus, methods using only Tikhonov-type penalization terms shall not
be considered here.
5.1
A parameter identification problem for
an elliptic PDE
The problem of identifying coefficients or source terms in partial differential
equations from data obtained from the PDE solution arises in a variety of
applications ranging from medical imaging, via nondestructive testing, to
material characterization, as well as model calibration.
The following example has been studied repeatedly in the literature (see,
e.g. [7], [16], [31], [52], [78] and [82]) to illustrate theoretical conditions and
numerically test the convergence of regularization methods.
Consider the identification of the space-dependent coefficient c in the elliptic
5.1 A parameter identification problem for an elliptic PDE
boundary value problem
(
−△u + cu = f
u=0
in Ω
on ∂Ω
165
(5.2)
from measurements of u in Ω. Here, Ω ⊆ RD , D ∈ N is a smooth, bounded
domain and △ is the Laplace operator on Ω.
The forward operator F : D(F ) ⊆ X → Y, where X and Y are to be specified
below, and its derivative can be formally written as
F (c) = A (c)−1 f,
F ′ (c)h = −A (c)−1 (hF (c)),
(5.3)
where A (c) : H2 (Ω) ∩ H01 (Ω) → L2 (Ω) is defined by A (c)u = −△u + cu.
In order to preserve ellipticity, a straightforward choice of the domain D(F )
is
D(F ) := {c ∈ X | c ≥ 0 a.e., kckX ≤ γ},
(5.4)
for some sufficiently small γ > 0. For the situation in which the theory
requires a nonempty interior of D(F ) in X , the choice
D(F ) := {c ∈ X | ∃ ϑ̂ ∈ L∞ (Ω), ϑ̂ ≥ 0 a.e. : kc − ϑ̂kX ≤ γ},
(5.5)
for some sufficiently small γ > 0, has been devised in [30].
The preimage and image spaces X and Y are usually both set to L2 (Ω),
in order to fit into the Hilbert space theory. However, as observed in [82],
the choice Y = L∞ (Ω) is the natural topology for the measured data and
in the situation of impulsive noise the choice Y = L1 (Ω) provides a more
robust option than the choice Y = L2 (Ω) (cf. also [6] and [49]). Concerning
the preimage space, one often aims at actually reconstructing a uniformly
bounded coefficient, or a coefficient that is sparse in some sense, suggesting
the use of the L∞ (Ω) or the L1 (Ω)-norm.
This motivates to study the use of
X = Lp (Ω),
Y = Lr (Ω),
with general exponents p, r ∈ [1, +∞] within the context of this example.
Restricting to the choice (5.4) of the domain, it is possible to show the
following results (cf. [82]).
166
5. Regularization in Banach spaces
Proposition 5.1.1. Let p, r, s ∈ [1, +∞] and denote by (W 2,s ∩ H01 )(Ω) the
closure of the space C0∞ (Ω) with respect to the norm
kvkW 2,s ∩H10 := k△vkLs + k∇vkL2 ,
invoking Friedrichs’ inequality.
Let also X = Lp (Ω), Y = Lr (Ω).
(i) If
s ≥ D/2 and {s = 1 or s > max{1, D/2} or r < +∞}
or
Ds
,
D − 2s
r
then (W 2,s ∩ H01 )(Ω) ⊆ Lr (Ω) and there exists a constant C(a)
> 0 such
s < D/2 and r ≤
that
r
kvkW 2,s ∩H10 .
∀v ∈ (W 2,s ∩ H01 )(Ω) : kvkLr ≤ C(a)
(ii) Assume c ∈ D(F ) with a sufficiently small γ > 0 and let
1=s≥
D
D
or s > max{1, }.
2
2
(5.6)
Then the operator A (c)−1 : Ls (Ω) → (W 2,s ∩ H01 )(Ω) is well defined
s
and bounded by some constant C(d)
.
(iii) For any f ∈ Lmax{1,D/2} (Ω), the operator F : D(F ) ⊆ X → Y, F (c) =
A (c)−1 f is well defined and bounded on D(F ) as in (5.4) with γ > 0
sufficiently small.
(iv) For any
p, r ∈ [1, +∞], f ∈ L1 (Ω), D ∈ {1, 2}
or
p ∈ (D/2, ∞], p ≥ 1, r ∈ [1, +∞], f ∈ LD/2+ǫ (Ω), ǫ > 0
and c ∈ D(F ), the operator F ′ (c) : X → Y, F ′ (c) = −A (c)−1 (hF (c)),
is well defined and bounded.
5.2 Basic tools in the Banach space setting
5.2
167
Basic tools in the Banach space setting
The aim of this section is to introduce the basic tools and fix the classical
notations used in the Banach space setting for regularizing ill-posed problems.
For details and proofs, cf. [82] and the references therein.
5.2.1
Basic mathematical tools
Definition 5.2.1 (Conjugate exponents, dual spaces and dual pairings). For p > 1 we denote by p∗ > 1 the conjugate exponent of p, satisfying
the equation
1
1
+ ∗ = 1.
(5.7)
p p
We denote by X ∗ the dual space of a Banach space X , which is the Banach
space of all bounded (continuous) linear functionals x∗ : X → R, equipped
with the norm
kx∗ kX ∗ := sup |x∗ (x)|.
(5.8)
kxk=1
For x∗ ∈ X ∗ and x ∈ X we denote by hx∗ , xiX ∗ ×X and hx, x∗ iX ×X ∗ the duality
pairing (duality product) defined as
hx∗ , xiX ∗ ×X := hx, x∗ iX ×X ∗ := x∗ (x).
(5.9)
In norms and dual pairings, when clear from the context, we will omit the
indices indicating the spaces.
Definition 5.2.2. Let {xn }n∈N be a sequence in X and let x ∈ X . The
sequence xn is said to converge weakly to x if, for every x∗ ∈ X ∗ , hx∗ , xn i
converges to hx∗ , xi.
As in the Hilbert space case, we shall denote by the symbol ⇀ the weak
convergence in X and by → the strong convergence in X .
Definition 5.2.3 (Adjoint operator). Let A be a bounded (continuous)
linear operator between two Banach spaces X and Y. Then the bounded
linear operator A∗ : Y ∗ → X ∗ , defined as
hA∗ y ∗ , xiX ∗ ×X = hy ∗, AxiY ∗ ×Y ,
∀x ∈ X , y ∗ ∈ Y ∗ ,
168
5. Regularization in Banach spaces
is called the adjoint operator of A.
As in the Hilbert space setting, we denote by ker(A) the null-space of A
and by R(A) the range of A.
We recall three important inequalities that are going to be used later.
Theorem 5.2.1 (Cauchy’s inequality). For x ∈ X and x∗ ∈ X ∗ we have:
|hx∗ , xiX ∗ ×X | ≤ kx∗ kX ∗ kxkX .
Theorem 5.2.2 (Hölder’s inequality). For functions f ∈ Lp (Ω), g ∈
∗
Lp (Ω), Ω ⊆ RD as in Section 5.1:
Z
Z
1/p Z
1/p∗
p∗
p
f (x)g(x)dx ≤
.
|f (x)| dx
|g(x)| dx
Ω
Ω
Ω
Theorem 5.2.3 (Young’s inequality). Let a and b denote real numbers
and p, p∗ > 1 conjugate exponents. Then
1
1
∗
ab ≤ |a|p + ∗ |b|p .
p
p
5.2.2
Geometry of Banach space norms
Definition 5.2.4 (Subdifferential of a convex functional). A functional
f : X → R ∪ ∞ is called convex if
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y),
∀x, y ∈ X , ∀t ∈ [0, 1].
In this case, an element x∗ ∈ X ∗ is a subgradient of f in x if
f (y) ≥ f (x) + hx∗ , y − xi,
∀y ∈ X .
The set ∂f (x) of all subgradients of f in x is called the subdifferential of f
in x.
Theorem 5.2.4 (Optimality conditions). Let f : X → R ∪ ∞ be a convex
functional and let z be such that f (z) < ∞. Then
f (z) = min f (x) ⇐⇒ 0 ∈ ∂f (z).
x∈X
5.2 Basic tools in the Banach space setting
169
This result generalizes the classical optimality condition f ′ (z) = 0, where
f ′ is the Fréchet derivative of f . The subdifferential also is a generalization of the differential in the sense that if f is Gateaux-differentiable, then
∂f (x) = {∇f (x)}.
In general the subdifferential of a function may also be the empty set. How-
ever, if the function is Lipschitz-continuous, its sudifferential is not empty.
Among the various properties of the subdifferential of a convex functional,
we recall the following one.
Theorem 5.2.5 (Monotonicity of the subgradient). Assume that the
convex functional f : X → R ∪ ∞ is proper, i.e. the set of all x ∈ X
such that f (x) < ∞ is non-empty. Then the following monotonicity property
holds:
hx∗ − y ∗ , x − yi ≥ 0,
∀x∗ ∈ ∂f (x), y ∗ ∈ ∂f (y), x, y ∈ X .
To understand the geometry of the Banach spaces, it is important to
study the properties of the proper convex functional x 7→ 1p kxkp . We start
introducing the so-called duality mapping JpX .
Definition 5.2.5 (Duality mapping). The set valued mapping JpX : X →
∗
2X , with p ≥ 1 defined by
JpX (x) := {x∗ ∈ X ∗ | hx∗ , xi = kxkkx∗ k, kx∗ k = kxkp−1 },
(5.10)
is called the duality mapping of X with gauge function t 7→ tp−1 .
By jpX we denote a single-valued selection of JpX , i.e. jpX : X → X ∗ is a
mapping with jpX ∈ JpX for all x ∈ X .
The duality mapping is the subgradient of the functional above:
Theorem 5.2.6 (Asplund). Let X be a normed space and p ≥ 1. Then
1
p
X
Jp = ∂
k · kX .
p
170
5. Regularization in Banach spaces
∗
Example 5.2.1. For every p ∈ (1, +∞), Jp : Lp (Ω) → Lp (Ω) is given by
JpX (f ) = |f |p−1sgn(f ), f ∈ Lp (Ω).
(5.11)
From the monotonicity property of the subgradient and the Asplund
Theorem, we know that for all x, z ∈ X we have
1
1
kzkp − kxkp − hJpX (x), z − xi ≥ 0,
p
p
and setting y = −(z − x) yields
1
1
kx − ykp − kxkp + hJpX (x), yi ≥ 0.
p
p
We are interested in the upper and lower bounds of the left-hand side of the
above inequality in terms of the norm of y.
Definition 5.2.6 (p-convexity and p-smoothness).
• A Banach space
X is said to be convex of power type p or p-convex if there exists a con-
stant cp > 0 such that
1
1
cp
kx − ykp ≥ kxkp − hjpX (x), yi + kykp
p
p
p
for all x, y ∈ X and all jpX ∈ JpX .
• A Banach space X is said to be smooth of power type p or p-smooth if
there exists a constant Gp > 0 such that
1
1
Gp
kx − ykp ≤ kxkp − hjpX (x), yi +
kykp
p
p
p
for all x, y ∈ X and all jpX ∈ JpX .
The p-convexity and p-smoothness properties can be regarded as an extension of the polarization identity
1
1
1
kx − yk2 = kxk2 − hx, yi + kyk2,
2
2
2
(5.12)
which ensures that Hilbert spaces are 2-convex and 2-smooth.
The p-convexity and p-smoothness are related to other famous properties of
convexity and smoothness.
5.2 Basic tools in the Banach space setting
Definition 5.2.7.
171
• A Banach space X is said to be strictly convex if
k 21 (x + y)k < 1 for all x, y of the unit ball of X satisfying the condition
x 6= y.
• A Banach space X is said to be uniformly convex if, for the modulus of
convexity δX : [0, 2] → [0, 1], defined by
1
δX (ǫ) := inf 1 − k (x + y)k : kxk = kyk = 1, kx − yk ≥ ǫ ,
2
we have
δX (ǫ) > 0,
0 < ǫ ≤ 2.
• A Banach space X is said to be smooth if, for every x ∈ X with x 6= 0,
there is a unique x∗ ∈ X ∗ such that kx∗ k = 1 and hx∗ , xi = kxk.
• A Banach space X is said to be uniformly smooth if, for the modulus
of smoothness ρX : [0, +∞) → [0, +∞), defined by
ρX (τ ) :=
there holds:
1
sup{kx + yk + kx − yk − 2 | kxk = 1, kyk ≤ τ },
2
ρX (τ )
= 0.
τ →0
τ
lim
If a Banach space is p-smooth for some p > 1, then x 7→ kxkp is Fréchet-
differentiable, hence Gateaux-differentiable and therefore JpX is single-valued.
In the famous paper [99], Xu and Roach proved a series of important inequalities, some of which will be very useful in the proofs of the following chapter.
Here we recall only the results about uniformly smooth and s-smooth Banach
spaces and refer to [82] and to [99] for the analogous results about uniformly
convex and s-convex Banach spaces.
Theorem 5.2.7 (Xu-Roach inequalities I). Let X be uniformly smooth,
1 < p < ∞, and jpX ∈ JpX . Then, for all x, y ∈ X , we have
kx − ykp − kxkp + phjpX (x), yi
Z 1
tkyk
(kx − tyk ∨ kxk)p
dt,
ρX
≤ +pGp
t
kx − tyk ∨ kxk
0
(5.13)
172
5. Regularization in Banach spaces
where a ∨ b = max{a, b}, a, b ∈ R, and where Gp > 0 is a constant that can
be written explicitly (cf. its expression in [82]).
Theorem 5.2.8 (Xu-Roach inequalities II). The following statements
are equivalent:
(i) X is s-smooth.
(ii) For some 1 < p < ∞ the duality mapping JpX is single-valued and for
all x, y ∈ X we have
kJpX (x) − JpX (y)k ≤ C(kxk ∨ kyk)p−skx − yks−1.
(iii) The statement (ii) holds for all p ∈ (1, ∞).
(iv) For some 1 < p < ∞, some jpX ∈ JpX and for all x, y, the inequality
(5.13) holds. Moreover, the right-hand side of (5.13) can be estimated
by
C
Z
0
1
ts−1 (kx − tyk ∨ kxk)p−s kyksdt.
(v) The statement (iv) holds for all p ∈ (1, ∞) and all jpX ∈ JpX .
The generic constant C can be chosen independently of x and y.
An important consequence of the Xu-Roach inequalities is the following
result.
Corollary 5.2.1. If X be p-smooth, then for all 1 < q < p the space X is
also q-smooth. If on the other hand X is p-convex, then for all q such that
p < q < ∞ the space X is also q-convex.
If X is s-smooth and p > 1, then the duality mapping JpX is single-valued.
The spaces that are convex or smooth of power type share many interesting properties, summarized in the following theorems.
Theorem 5.2.9. If X is p-convex, then:
5.2 Basic tools in the Banach space setting
173
• p ≥ 2;
• X is uniformly convex and the modulus of convexity satisfies δX (ǫ) ≥
Cǫp ;
• X is strictly convex;
• X is reflexive (i.e. X ∗∗ = X ).
If X is p-smooth, then:
• p ≤ 2;
• X is uniformly smooth and the modulus of smoothness satisfies ρX (τ ) ≤
Cτ p ;
• X is smooth;
• X is reflexive.
Theorem 5.2.10. There hold:
• X is p-smooth if and only if X ∗ is p∗ convex.
• X is p-convex if and only if X ∗ is p∗ smooth.
• X is uniformly convex (respectively uniformly smooth) if and only if X ∗
is uniformly smooth (respectively uniformly convex).
• X is uniformly convex if and only if X is uniformly smooth.
Theorem 5.2.11. The duality mappings satisfy the following assertions:
• For every x ∈ X the set JpX (x) is empty and convex.
• X is smooth if and only if the duality mapping JpX is single-valued.
• If X is uniformly smooth, then JpX is single-valued and uniformly continuous on bounded sets.
174
5. Regularization in Banach spaces
• If X is convex of power type and smooth, then JpX is single-valued,
∗
bijective, and the duality mapping JpX∗ is single-valued with
∗
JpX∗ (JpX (x)) = x.
An important consequence of the last statement is that for spaces being
smooth of power type and convex of power type the duality mappings on the
primal and dual spaces can be used to transport all elements from the primal
to the dual space and vice versa. This is crucial to extend the regularization
methods defined in the Hilbert space setting to the Banach space setting, as
we shall see later.
The smoothness and convexity of power type properties have been studied for
some important function spaces. We summarize the results in the theorem
below.
Theorem 5.2.12. Let Ω ⊆ RD be a domain. Then for 1 < r < ∞ the spaces
ℓr of infinite real sequences, the Lebesgue spaces Lr (Ω) and the Sobolev spaces
W m,r (Ω), equipped with the usual norms, are
max{2, r}-convex
and
min{2, r}-smooth.
Moreover, it is possible to show that ℓ1 cannot be p-convex or p-smooth for
any p.
5.2.3
The Bregman distance
Due to the geometrical properties of the Banach spaces, it is often more
appropriate to exploit the Bregman distance instead of functionals like kx −
ykpX or kjpX (x) − jpX (y)kpX ∗ to prove convergence of the algorithms.
Definition 5.2.8. Let jpX be a single-valued selection of the duality mapping
JpX . Then, the functional
1
1
Dp (x, y) := kxkp − kykp − hjpX (y), x − yiX ∗ ×X , x, y ∈ X
p
p
is called the Bregman distance (with respect to the functional p1 k · kp ).
(5.14)
5.2 Basic tools in the Banach space setting
175
The Bregman distance is not a distance in the classical sense, but has
many useful properties.
Theorem 5.2.13. Let X be a Banach space and jpX be a single-valued selec-
tion of the duality mapping JpX . Then:
• Dp (x, y) ≥ 0 ∀ x, y ∈ X .
• Dp (x, y) = 0 if and only if jpX (y) ∈ JpX (x).
• If X is smooth and uniformly convex, then a sequence {xn } ⊆ X remains bounded in X if Dp (y, xn ) is bounded in R. In particular, this is
true if X is convex of power type.
• Dp (x, y) is continuous in the first argument. If X is smooth and uni-
formly convex, then JpX is continuous on bounded subsets and Dp (x, y)
is continuous in its second argument. In particular, this is true if X is
convex of power type.
• If X is smooth and uniformly convex, then the following statements are
equivalent:
◮ limn→∞ kxn − xk = 0;
◮ limn→∞ kxn k = kxk and limn→∞ hJpX (xn ), xi = hJpX (x), xi;
◮ limn→∞ Dp (x, xn ) = 0.
In particular, this is true if X is convex of power type.
• The sequence {xn } is a Cauchy sequence in X if it is bounded and for all
ǫ > 0 there is an N(ǫ) ∈ N such that Dp (xk , xl ) < ǫ for all k, l ≥ N(ǫ).
• X is p-convex if and only if Dp (x, y) ≥ cp kx − ykp .
• X is p-smooth if and only if Dp (x, y) ≤ Gp kx − ykp.
The following property of the Bregman distance replaces the classical
triangle inequality.
176
5. Regularization in Banach spaces
Proposition 5.2.1 (Three-point identity). Let jpX be a single-valued selection of the duality mapping JpX . Then:
Dp (x, y) = Dp (x, z) + Dp (z, y) + hjpX (z) − jpX (y), x − zi.
(5.15)
There is a close relationship between the primal Bregman distances and
the related Bregman distances in the dual space.
Proposition 5.2.2. Let jpX be a single-valued selection of the duality map∗
∗
ping JpX . If there exists a single-valued selection jpX∗ of JpX∗ such that for
∗
fixed y ∈ X we have jpX∗ (jpX (y)) = y, then
Dp (y, x) = Dp∗ (jpX (x), jpX (y))
(5.16)
for all x ∈ X .
5.3
Regularization in Banach spaces
In this section we extend the fundamental concepts of the regularization
theory for linear ill-posed operator equations in Hilbert spaces discussed in
Chapter 1 to the more general framework of the present chapter.
5.3.1
Minimum norm solutions
In Section 1.6 we have seen Hadamard’s definition of ill-posed problems. We
recall that an abstract operator equation F (x) = y is well posed (in the sense
of Hadamard) if for all right-hand sides y a solution of the equation exists,
is unique and the solution depends continuously on the data.
For linear problems in the Hilbert space setting, we have defined the MoorePenrose generalized inverse, that allows to overcome the problems of existence
and uniqueness of the solution by defining the minimum norm (or bestapproximate) solution. For general ill-posed problems in Banach spaces,
it is also possible to define a minimum norm solution.
In the linear case, the definition is similar to the Hilbert space setting.
5.3 Regularization in Banach spaces
177
Definition 5.3.1 (Minimum norm solution). Let A be a linear operator
between Banach spaces X and Y An element x† ∈ X is called a minimum
norm solution of the operator equation Ax = y if
Ax† = y
and
kx† k = inf{kx̃k | x̃ ∈ X , Ax̃ = y}.
The following result gives a characterization of the minimum norm solution (see [82] for the proof).
Proposition 5.3.1. Let X be smooth and uniformly convex and let Y be an
arbitrary Banach space. Then, if y ∈ R(A), the minimum norm solution of
Ax = y is unique. Furthermore, it satisfies the condition JpX (x† ) ∈ R(A∗ )
for 1 < p < ∞. If additionally there is some x ∈ X such that JpX (x) ∈ R(A∗ )
and x − x† ∈ ker(A), then x = x† .
In the nonlinear case, one has to face nonlinear operator equations of the
type
F (x) = y,
x ∈ D(F ) ⊆ X ,
y ∈ F (D(F )) ⊆ Y,
(5.17)
where F : D(F ) ⊆ X → Y is a nonlinear mapping with domain D(F ) and
range F (D(F )).
According to the local character of the solutions in nonlinear equations we
have to focus on some neighborhood of a reference element x0 ∈ X which can
be interpreted as an initial guess for the solution to be determined. Then,
one typically shifts the coordinate system from the zero to x0 and searches
for x0 -minimum norm solutions.
Definition 5.3.2 (x0 -minimum norm solution). An element x† ∈ D(F )
⊆ X is called a x0 -minimum norm solution of the operator equation F (x) = y
if
F (x† ) = y
and
kx† − x0 k = inf{kx̃ − x0 k | x̃ ∈ D(F ), F (x̃) = y}.
To ensure that x0 -minimum norm solutions to the nonlinear operator
equation (5.17) exist, some assumptions have to be made on the Banach
spaces X and Y, on the domain D(F ) and on the operator F .
178
5. Regularization in Banach spaces
Proposition 5.3.2. Assume the following conditions hold:
(i) X and Y are infinite dimensional reflexive Banach spaces.
(ii) D(F ) ⊆ X is a convex and closed subset of X .
(iii) F : D(F ) ⊆ X → Y is weak-to-weak sequentially continuous, i.e. xn
⇀ x̄ in X with xn ∈ D(F ), n ∈ N, and x̄ ∈ D(F ) implies F (xn ) ⇀
F (x̄) in Y.
Then the nonlinear operator equation (5.17) admits an x0 -minimum norm
solution.
Proof. See [82], Proposition 3.14.
5.3.2
Regularization methods
As usual, we shall assume that the data y of an ill-posed (linear or nonlinear)
operator equation are not given exactly, but only elements y δ ∈ Y satisfying
the inequality ky − y δ k ≤ δ, with noise level δ > 0 are available.
Consequently, regularization approaches are required for detecting good approximate solutions. Here we give a definition of regularization methods
which is analogous to that given in Chapter 1, but a little more general.
Definition 5.3.3. Let σ0 ∈ (0, +∞]. For every σ ∈ (0, σ0 ), let Rσ : Y → X
be a continuous operator.
The family {Rσ } is called a regularization operator for A† if, for every y ∈
D(A† ), there exists a function
α : (0, +∞) × Y → (0, σ0 ),
called parameter choice rule for y, that allows to associate to each couple (δ, y δ ) a specific operator Rα(δ,yδ ) and a regularized solution xδα(δ,yδ ) :=
Rα(δ,yδ ) y δ , and such that
lim sup α(δ, y δ ) = 0.
δ→0
y δ ∈Bδ (y)
(5.18)
5.3 Regularization in Banach spaces
179
If, in addition, for every sequence {ynδ }n∈N ⊆ Y with kynδ − yk ≤ δn and δn
n
→ 0 as n → ∞ the regularized solutions xδα(y
δ ,δ ) converge in a well-defined
n
n
†
sense to a well-defined solution x of (5.17), then α is said to be convergent.
Convergent regularization methods are defined accordingly, analogously to
Definition 1.9.1. If the solution of equation (5.17) is not unique, convergence
to solutions possessing desired properties, e.g. x0 -minimum norm solutions,
is required.
In the linear case, similar definitions hold.
The distinction between a-priori, a-posteriori and heuristic parameter
choice rules is still valid in this context.
We have seen that in the case of linear operator equations in a Hilbert space
setting the construction of regularization methods is based in general on
the approximation of the Moore-Penrose approximated inverse of the linear
operator A by a σ-dependent family of bounded operators with regularized
solutions
xδσ = gσ (A∗ A)A∗ y δ ,
σ > 0.
However, in the Banach space setting neither A† nor A∗ A is available, since
the adjoint operator A∗ : Y ∗ → X ∗ maps between the dual spaces. In the case
of nonlinear operator equations, a comparable phenomenon occurs, because
the adjoint operator
F ′ (x† )∗ : Y ∗ → X ∗
of a bounded linear derivative operator
F ′ (x† ) : X → Y
of F at the solution point x† ∈ D(F ) also maps between the dual spaces.
Nevertheless, two large and powerful classes of regularization methods with
prominent applications, for example in imaging, were recently promoted: the
class of Tikhonov-type regularization methods in Banach spaces and the class
of iterative regularization methods in Banach spaces.
Once again, we shall focus our attention on iterative regularization methods:
180
5. Regularization in Banach spaces
since Tikhonov-type regularization methods require the computation of a
global minimizer, often the amount of work required for carrying out iterative
regularization methods is much smaller than the comparable amount for a
Tikhonov-type regularization.
For a detailed coverage of the most recent results about Tikhonov-type regularization methods, see [82].
5.3.3
Source conditions and variational inequalities
We have seen in Chapter 1 that the convergence of the regularized solutions
of a regularization method to the minimum norm solution of the ill-posed
operator equation Ax = y can be arbitrarily slow. To obtain convergence
rates
ε(xδα , x† ) = O(ϕ(δ)) as δ → 0
(5.19)
for an error measure ε and an index function ϕ, some smoothness of the
solution element x† with respect to A : X → Y is required. In Chapter 1,
we have seen a classical tool for expressing the smoothness of x† , the source
conditions. In the Hilbert space setting, this allows to define the concept of
(order) optimality of a regularization method.
In the Banach space setting, things are more complicated. The issue of
the order optimality of a regularization method is, at least to the author’s
knowledge, still an open question in the field. The rates depend on the
interplay of intrinsic smoothness of x† and the smoothing properties of the
operator A with non-closed range, but very often proving rates is a difficult
task and it is not easy to find the correct smoothness assumptions on x† to
obtain convergence rates that, at least in the special case of Hilbert spaces,
can be considered optimal.
In this presentation, the extension of the concept of source conditions to the
Banach space setting is omitted. We only say that a wide variety of choices
has been proposed and analyzed, where either x† itself or an element ξ † from
the subdifferential of functionals of the form 1p k·kpX in x† belongs to the range
of a linear operator that interacts with A in an appropriate manner. The
5.4 Iterative regularization methods
181
source conditions defined in Chapter 1 are only one of these choices.
We will focus our attention on a different strategy for proving convergence
rates, the use of variational inequalities. As many authors pointed out (cf.
e.g. [46], [82] and the references therein), estimating from above a term of
the form
|hJpX (x† ), x† − xiX ∗ ×X |
is a very powerful tool for proving convergence rates in regularization. The
main advantage of this approach is that this term is contained in the Bregman
distance with respect to the functional p1 k · kpX .
In the literature, many similar variational inequalities have been proposed.
Essentially, the right-hand side of these inequalities contain a term with the
Bregman distance between x and x† and a term that depends on the operator
A or, in the nonlinear case, on the forward operator F , for example:
|hJpX (x† ), x† − xiX ∗ ×X | ≤ β1 Dp (x, x† ) + β2 kF (x) − F (x† )k,
(5.20)
for constants 0 ≤ β1 < 1 and β2 ≥ 0. The inequalities must hold for all x ∈
M, with some set M which contains all regularized solutions of interest.
We shall not discuss these assumptions here, but we limit ourselves to state a
particular variational inequality in each examined case. For a more detailed
treatment of the argument, we refer to [82], where some important results
that link the variational inequalities with the source conditions can also be
found.
5.4
Iterative regularization methods
In this section we consider iterative regularization methods for nonlinear illposed operator equations (5.17). We will assume that the noise level δ is
known to provide convergence and convergence rates results.
In the following, x0 is some initial guess. We will assume that a solution
to (5.17) exists: according to Proposition 5.3.2, this implies the existence
of an x0 -minimum norm solution x† , provided that the assumptions of that
182
5. Regularization in Banach spaces
proposition are satisfied.
The iterative methods discussed in this section will be either of gradient
(Landweber and Iteratively Regularized Landweber) or of Newton-type (Iteratively Regularized Gauss-Newton method).
5.4.1
The Landweber iteration: linear case
Before turning to the nonlinear case, we consider the Landweber iteration
for linear ill-posed problems in Banach spaces with noisy data.
In Chapter 1, we have seen that the Landweber iteration for solving linear
ill-posed problems in Hilbert spaces can be expressed in the form
xk+1 = xk − ωA∗ (Axk − y),
where ω > 0 is the step size of the method. Here, we shall consider a variable
step size ωk > 0 (k ∈ N) in the course of the iteration, since an appropriate
choice of the step size helps to prove convergence.
The generalization of the Landweber iteration to the Banach space setting
requires the help of the duality mappings. As a consequence, the space X is
assumed to be smooth and uniformly convex, whereas Y can be an arbitrary
Banach space. Note that this implies that jpX = JpX is single-valued, X is
reflexive and X ∗ is strictly convex and uniformly smooth.
The Landweber algorithm
Assume that instead of the exact data y ∈ R(A) and the exact linear and
bounded operator A : X → Y, only some approximations {yj }j in Y and
{Al }l in the space L (X , Y) of linear and bounded operators between X and
Y, are available. Assume also that estimates for the deviations
kyj − yk ≤ δj ,
δj > δj+1 > 0,
kAl − Ak ≤ ηl ,
ηl > ηl+1 > 0,
lim δj = 0,
(5.21)
lim ηl = 0,
(5.22)
j→∞
and
l→∞
5.4 Iterative regularization methods
183
are known. Moreover, to properly include the second case (5.22), we need an
a-priori estimate for the norm of x† , i.e. there is a constant R > 0 such that
kx† k ≤ R.
(5.23)
S := sup kAl k.
(5.24)
Further, set
l∈N
(i) Fix p, r ∈ (1, ∞). Choose constants C, C̃ ∈ (0, 1) and an initial vector
x0 such that
1
jpX (x0 ) ∈ R(A∗ ) and Dp (x† , x0 ) ≤ kx† kp .
p
(5.25)
Set j−1 := 0 and l−1 := 0. For k = 0, 1, 2, ... repeat
(ii) If for all j > jk−1 and all l > lk−1,
kAl xk − yj k ≤
1
(δj + ηl R),
C̃
stop iterating.
Else, find jk > jk−1 and lk > lk−1 with
δjk + ηlk R ≤ C̃Rk
where
Rk := kAlk xk − yjk k.
Choose ωk according to:
(a) In case x0 = 0 set
ω0 := C(1 − C̃)p−1
p∗p−1 p−r
R .
Sp 0
(b) For all k ≥ 0 (respectively k ≥ 1 if x0 = 0), set
!)
(
C(1 − C̃)Rk
,
γk := min ρX ∗ (1),
2p∗ Gp∗ Skxk k
(5.26)
184
5. Regularization in Banach spaces
where Gp∗ is the constant from the Xu-Roach inequalities (cf. Theorem 5.2.7), and choose τk ∈ (0, 1], with
ρX ∗ (τk )
= γk
τk
and set
ωk :=
τk kxk kp−1
.
S Rkr−1
(5.27)
Iterate
xk+1 := JpX∗
∗
JpX (xk ) − ωk A∗lk jrY (Alk xk − yjk ) .
(5.28)
Theorem 5.4.1. The Landweber algorithm either stops after a finite number
of iterations with the minimum norm solution of Ax = y or the sequence of
iterates {xk } converges strongly to x† .
Proof. See [82], Theorem 6.6.
Let us consider now the case of noisy data y δ and a perturbed operator
Aη , with noise level
ky − y δ k and kA − Aη k ≤ η.
(5.29)
We apply the Landweber algorithm with δj = δ and ηl = η for all j, l ∈ N
and use the Discrepancy Principle. To that end, condition (5.26) provides us
with a stopping rule: we terminate the iteration at kD = kD (δ), where
kD (δ) := min{k ∈ N | Rk <
1
(δ + ηR)}.
C̃
The proof of Theorem 5.4.1 shows that, as long as Rk ≥
(5.30)
1
(δ
C̃
+ ηR), xk+1
is a better approximation of x† than xk . A consequence of this fact and of
Theorem 5.4.1 is the stability of this method with respect to the noise.
Corollary 5.4.1. Together with the Discrepancy Principle (5.30) as a stopping rule, the Landweber algorithm is a regularization method for Ax = y.
We observe that since the selection jrY needs not to be continuous, the
method is another example of regularization with non-continuous mapping,
exactly like the the conjugate gradient type methods.
5.4 Iterative regularization methods
5.4.2
185
The Landweber iteration: nonlinear case
Analogous to the Landweber method in Hilbert spaces (cf. [30]), we study a
generalization of the Landweber iteration described in Section 5.4.1 to solve
nonlinear problems of the form (5.17):
JpX (xδk+1 ) = JpX (xδk ) − ωk A∗k jrY (F (xδk ) − y δ ),
(5.31)
∗
xδk+1 = JpX∗ (JpX (xδk+1 )), k = 0, 1, ...
where we abbreviate Ak = F ′ (xδk ).
Of course, some assumptions are required on the spaces and on the forward
operator F (see the results below). A typical assumption on the forward
operator is the so-called η-condition (or tangential cone condition):
kF (x) − F (x̄) − F ′ (x)(x − x̄)k ≤ kF (x) − F (x̄)k,
D
∀x, x̄ ∈ Bρ (x† ) (5.32)
D
for some 0 < η < 1, where Bρ (x† ) := {x ∈ X | Dp (x† , x) ≤ ρ2 , ρ > 0}.
A key point for proving convergence of the Landweber iteration is showing
the monotonicity of the Bregman distances.
Proposition 5.4.1. Assume that X is smooth and p-convex, that the iniD
tial guess x0 is sufficiently close to x† , i.e. x0 ∈ B ρ (x† ), that F satisfies
the tangential cone condition with a sufficiently small η, that F and F ′ are
continuous, and that
D
B ρ (x† ) ⊆ D(F ).
(5.33)
Let τ be chosen sufficiently large, so that
c(η, τ ) := η +
1+η
< 1.
τ
(5.34)
Then, with the choice
ωk :=
p∗ (1 − c(η, τ ))p−1 kF (xδk − y δ kp − r
≥ 0,
kAk kp
Gp−1
p∗
(5.35)
with Gp∗ being the constant from the Xu-Roach inequalities (cf. Theorem
5.2.7), monotonicity of the Bregman distances
Dp (x† , xδk+1 ) − Dp (x† , xδk ) ≤ −
p∗ (1 − c(η, τ ))p kF (xδk − y δ kp
p(Gp∗ p∗ )p−1
kAk kp
(5.36)
186
5. Regularization in Banach spaces
as well as xδk+1 ∈ D(F ) holds for all k ≤ kD (δ) − 1, with kD (δ) satisfying the
Discrepancy Principle:
kD (δ) := min{k ∈ N | kF (xδk ) − y δ k ≤ τ δ}.
(5.37)
This allows to show the following convergence results for the Landweber
iteration. For a proof of this theorem, as well as of Proposition 5.4.1, we
refer as usual to [82].
Theorem 5.4.2. Let the assumptions of Proposition 5.4.1 hold, with additionally Y being uniformly smooth and let kD (δ) be chosen according to
the Discrepancy Principle (5.37), with (5.34). Then, according to (5.31),
the Landweber iterates xδkD (δ) converge to a solution of (5.17) as δ → 0. If
R(F ′ (x)) ⊆ R(F ′ (x† )) for all x ∈ Bρ (x0 ) and JpX (x0 ) ∈ R(F ′ (x† )), then
xδkD (δ) converge to x† as δ → 0.
5.4.3
The Iteratively Regularized Landweber method
In the Hilbert space setting, the proof of convergence rates for the Landweber
iteration under source conditions
ν
x† − x0 ∈ R (F ′ (x† )∗ F ′ (x† )) 2
(5.38)
relies on the fact that the iteration errors xδk − x† remain in
ν
R (F ′ (x† )∗ F ′ (x† )) 2
ν
and their preimages under (F ′ (x† )∗ F ′ (x† )) 2 form a bounded sequence (cf.
Proposition 2.11 in [53]). In [82] is stated that this approach can hardly be
carried over to the Banach space setting, unless more restrictive assumptions
are made on the structure of the spaces than in the proof of convergence
only, even in the case ν = 1.
Therefore, an alternative version of the Landweber iteration is considered,
namely the Iteratively Regularized Landweber method.
5.4 Iterative regularization methods
187
The iterates are now defined by
JpX (xδk+1 − x0 ) = (1 − αk )JpX (xδk − x0 ) − ωk A∗k jrY (F (xδk ) − y δ ),
xδk+1 = x0 + Jp∗ X ∗ (JpX (xδk+1 − x0 )), k = 0, 1, ...
(5.39)
An appropriate choice of the sequence {αk }k∈N ∈ [0, 1], has been shown to
be convergent in a Hilbert space setting (with rates under a source condition
of the form ξ † = (F ′ (x† ))∗ v, v ∈ Y ∗ ) in [81].
In place of the Hilbert space condition (5.38) we consider the variational
inequality
∃ β > 0 : ∀x ∈ BρD (x† )
|hJpX (x† − x0 ), x − x† iX ∗ ×X | ≤ βDpx0 (x† , x)
1−ν
2
kF ′ (x† )(x − x† )kν
(5.40)
where
Dpx0 (x† , x) := Dp (x† − x0 , x − x0 ).
(5.41)
According to (5.40), due to the presence of additional regularity, the tangential cone condition can be relaxed to a more general condition on the degree
of nonlinearity of the operator F :
′ †
(F (x + v) − F ′ (x† ))v ≤ K F ′ (x† )v c1 Dpx0 (x† , v + x† )c2 ,
v ∈ X, x† + v ∈ BρD (x† ) ,
(5.42)
with
c1 = 1 or c1 + rc2 > 1 or (c1 + rc2 ≥ 1 and K > 0 sufficiently small)
(5.43)
and
2ν
≥ 1.
(5.44)
ν +1
For further details on the degree of nonlinearity conditions, see [82] and the
c1 + c2
references therein.
The step size ωk > 0 is chosen such that
ωk
∗
1 − 3C(c1 )K
∗
∗
kF (xδk ) − y δ kr − 2p +p−2Gp∗ ωkp kA∗k jrY (F (xδk ) − y δ )kp ≥ 0,
3(1 − C(c1 )K)
(5.45)
188
5. Regularization in Banach spaces
where C(c1 ) = cc11 (1 − c1 )1−c1 , c1 and K as in (5.42). This is possible, e.g.
by a choice
with Cω :=
p−r
∗ −p
22−p
3
kF (xδk ) − y δ k p−1
=: ω k ,
0 < ωk ≤ Cω
kAk kp∗
1−3C(c1 )K
.
(1−C(c1 )K)Gp∗
D
If r ≥ p, F and F ′ are bounded on Bρ (x† ), it is possible to bound ωk from
above and below, i.e. there exist ω, ω > 0, independent of k and δ, such that
0 < ω ≤ ωk ≤ ω,
(5.46)
cf. [82].
To prove convergence rates, the following a-priori stopping rule has been
proposed:
ν+1
k∗ (δ) := min{k ∈ N | αkr(ν+1)−2ν ≤ τ δ},
(5.47)
where ν < 1 is the exponent of the variational inequality (5.40).
D
Theorem 5.4.3. Assume that X is smooth and p-convex, that x0 ∈ Bρ (x† ),
that the variational inequality (5.40) holds with ν ∈ (0, 1] and β sufficiently
small, that F satisfies (5.42), with (5.43) and (5.44), that F and F ′ are
D
D
continuous and uniformly bounded in B ρ (x† ), that Bρ (x† ) ⊆ D(F ) and that
p∗ ≥
2ν
+ 1.
p(ν + 1) − 2ν
(5.48)
Let k∗ (δ) be chosen according to (5.47), with τ sufficiently large. Moreover,
assume that r ≥ p and that the sequence {ωk }k∈N is chosen such that (5.46)
holds. Finally, assume that the sequence {αk }k∈N ⊆ [0, 1] is chosen such that
2ν
αk+1 p(ν+1)−2ν 1
+ αk − 1 ≥ cαk
(5.49)
αk
3
for some c ∈ (0, 13 ) independent of k and αmax = maxk∈N αk is sufficiently
small.
D
Then, the iterates xδk+1 remain in B ρ (x† ) for all k ≤ k∗ (δ) − 1, with k∗ (δ)
according to (5.47). Moreover, we obtain optimal rates
2ν
Dpx0 (x† , xk∗ ) = O(δ ν+1 ), δ → 0
(5.50)
5.4 Iterative regularization methods
189
as well as in the noise free case δ = 0
2ν
Dpx0 (x† , xk∗ ) = O(αkr(ν+1)−2ν ), k → ∞.
(5.51)
A possible choice of the parameters {αk }k∈N , satisfying (5.49), and small-
ness of αmax is given by
αk =
α0
(k + 1)t
(5.52)
with t ∈ (0, 1] such that 3tθ < α0 sufficiently small, cf. [82].
We emphasize that in the Banach space setting an analogous of Plato’s Theorem 1.11.1 is not available. Consequently, convergence rate results under
source conditions or variational inequalities like (5.40) cannot be used to
prove (strong) convergence results.
5.4.4
The Iteratively Regularized Gauss-Newton method
Among the iterative methods, the Iteratively Regularized Gauss-Newton
(IRGN) method is one of the most important for solving nonlinear ill-posed
problems.
In the Banach space setting, the (n + 1)-th iterate of the IRGN method, denoted by xδn+1 = xδn+1 (αn ), is a minimizer xδn+1 (α) of the Tikhonov functional
kAn (x−xδn )+F (xδn )−y δ kr +αkx−x0 kp ,
x ∈ D(F ),
n = 0, 1, 2, ...., (5.53)
where p, r ∈ (1, ∞), {αn } is a sequence of regularization parameters, and
An = F ′ (xδn ).
The regularizing properties of the IRGN method are now well understood.
If one of the assumptions
F ′ (x) : X → Y
is weakly closed ∀x ∈ D(F ), and Y is reflexive,
D(F ) is weakly closed
(5.54)
(5.55)
holds, then the method is well defined (cf. Lemma 7.9 in [82]). Moreover,
assuming variational inequalities similar to (5.40) and the a-priori choice
(5.47) for αn , it is possible to obtain optimal convergence rates, see [82] and
190
5. Regularization in Banach spaces
the references therein.
Here we concentrate on the a-posteriori choice given by the Discrepancy
Principle. More precisely, we have the following two theorems (see, as usual,
[82] for the proofs).
Theorem 5.4.4. Assume that X is smooth and uniformly convex and that F
D
satisfies the tangential cone condition (5.32) with Bρ (x† ) replaced by D(F ) ∩
B ρ (x0 ) and with η sufficiently small. Assume also that
(xn ⇀ x ∧ F (xn ) → f ) ⇒ (x ∈ D(F ) ∧ F (x) = f )
(5.56)
or
∗
(JpX (xn −x0 ) ⇀ x∗ ∧F (xn ) → f ) ⇒ (x := JpX∗ (x∗ )+x0 ∈ D(F )∧F (x) = f )
(5.57)
for all {xn }n∈N ⊆ X , holds, as well as (5.54) or (5.55). Let
η < σ < σ < 1,
(5.58)
and let τ be chosen sufficiently large, so that
η+
1−σ
1+η
≤ σ and η <
.
τ
2
(5.59)
Choose the regularization parameters αn such that
σkF (xδn )−y δ k ≤ kAn (xδn+1 (αn )−xδn )+F (xδn )−y δ k ≤ σkF (xδn )−y δ k, (5.60)
if
kAn (x0 − xδn ) + F (xδn ) − y δ k ≥ σkF (xδn ) − y δ k
(5.61)
holds. Moreover, assume that
δ<
kF (x0 ) − y δ k
τ
(5.62)
and stop the iteration at the iterate nD = nD (δ) according to the Discrepancy
Principle (5.37). Then, for all n ≤ nD (δ) − 1, the iterates
(
xδn+1 = xδn+1 (αn ), with αn as in (5.60), if (5.61) holds
xδn+1 :=
x0 ,
else
(5.63)
5.4 Iterative regularization methods
191
are well defined.
Furthermore, there exists a weakly convergent subsequence of
(
xδnD (δ) ,
if (5.56) holds
JpX (xδnD (δ) − x0 ),
if (5.56) holds
(5.64)
and along every weakly convergent subsequence xnD (δ) converges strongly to
a solution of F (x) = y as δ → 0. If the solution is unique, then xnD (δ)
converges strongly to this solution as δ → 0.
The theorem above provides us with a convergence result. The following
theorem gives convergence rates.
Theorem 5.4.5. Let the assumptions of Theorem 5.4.4 be satisfied. Then
under the variational inequality
∃ β > 0 : ∀x ∈ BρD (x† )
|hJpX (x† − x0 ), x − x† iX ∗ ×X | ≤ βDpx0 (x, x† )
1−ν
2
kF ′ (x† )(x − x† )kν
(5.65)
with 0 < ν < 1, we obtain optimal convergence rates
2ν
Dpx0 (xnD , x† ) = O(δ ν+1 ), as δ → 0.
(5.66)
192
5. Regularization in Banach spaces
Chapter 6
A new Iteratively Regularized
Newton-Landweber iteration
The final chapter of this thesis is entirely dedicated to a new inner-outer
Newton-Iteratively Regularized Landweber iteration for solving nonlinear
equations of the type (5.17) in Banach spaces.
The reasons for choosing a Banach space framework have already been explained in the previous chapter. We will see the advantages of working in
Banach spaces also in the numerical experiments presented later.
Concerning the method, a combination of inner and outer iterations in a
Newton type framework has already been shown to be highly efficient and
flexible in the Hilbert space context, see, e.g., [78] and [79].
In the recent paper [49], a Newton-Landweber iteration in Banach spaces
has been considered and a weak convergence result for noisy data has been
proved. However, neither convergence rates nor strong convergence results
have been found. The reason for this is that the convergence rates proof
in Hilbert spaces relies on the fact that the iteration errors remain in the
range of the adjoint of the linearized forward operator and their preimages
under this operator form a bounded sequence. Carrying over this proof to
the Banach space setting would require quite restrictive assumptions on the
structure of the spaces, though, which we would like to avoid, to work with
193
194
6. A new Iteratively Regularized Newton-Landweber iteration
as general Banach spaces as possible.
Therefore, we study here a combination of the outer Newton loop with an
Iteratively Regularized Landweber iteration, which indeed allows to prove
convergence rates and strong convergence.
From Section 6.1 to Section 6.5 we will study the inner-outer Newton-Iteratively
Regularized Landweber method following [54]. We will see that a strategy
for the stopping indices similar to that proposed in [49] leads to a weak convergence result. Moreover, always following [54], we will show a convergence
rate result based on an a-priori choice of the outer stopping index.
Section 6.6 is dedicated to some numerical experiments for the elliptic PDE
problem presented in Section 5.1.
In Section 6.7 we will consider a different choice of the parameters of the
method that allows to show both strong convergence and convergence rates.
6.1
Introduction
In order to formulate and later on analyze the method, we recall some basic
notations and concepts. For more details about the concepts appearing below, we refer to Chapter 5.
Consider, for some p ∈ (1, ∞), the duality mapping
JpX (x)
:= ∂
∗
n
1
kxkp
p
o
from X to its dual X . To analyze convergence rates we employ the Bregman distance
1
1
Dp (x̃, x) = kx̃kp − kxkp − hjpX (x), x̃ − xiX ∗ ×X
p
p
(where jpX (x) denotes a single-valued selection of JpX (x)) or its shifted version
Dpx0 (x̃, x) := Dp (x̃ − x0 , x − x0 ) .
Throughout this paper we will assume that X is smooth (which implies that
the duality mapping is single-valued, cf. Chapter 5) and moreover, that X
is s-convex for some s ∈ [p, ∞), which implies
Dp (x, y) ≥ cp,s kx − yks (kxk + kyk)p−s
(6.1)
6.1 Introduction
195
for some constant cp,s > 0, cf. Chapter 5. As a consequence, X is reflexive
and we also have
s∗
∗
∗
Dp∗ (x∗ , y ∗) ≤ Cp∗ ,s∗ kx∗ − y ∗ks ((pDp∗ (JpX∗ (x∗ ), 0))1− p∗ + kx∗ − y ∗kp
∗
∗
for some Cp∗ ,s∗ , where s denotes the dual index s =
s
.
s−1
∗ −s∗
),
(6.2)
The latter can be
concluded from estimate (2.2) in [49], which is the first line in
∗
Dp∗ (x∗ , y ∗) ≤ C̃p∗ ,s∗ kx∗ − y ∗ ks (ky ∗ kp
∗
≤ Cp∗ ,s∗ kx∗ − y ∗ ks (kx∗ kp
∗
∗
∗ −s∗
= Cp∗ ,s∗ kx∗ − y ∗ ks (kJpX∗ (x∗ )k(p
s∗
= Cp∗ ,s∗ kx∗ − y ∗ k ((pDp∗ (J
X∗
p∗
where Cs∗ ,p∗ is equal to C̃s∗ ,p∗ (1 + 2p
otherwise.
∗ −s∗
+ kx∗ − y ∗kp
∗ −s∗ )(p−1)
(x∗ ), 0))
∗ −s∗ −1
∗ −s∗
)
∗ −s∗
)
+ kx∗ − y ∗ kp
∗ −s∗
)
+ kx∗ − y ∗ kp
(p∗ −s∗ ) p−1
p
+ kx∗ − y ∗ k
p∗ −s∗
),
) if p∗ −s∗ > 1 and is simply 2C̃s∗ ,p∗
∗
Note that the duality mapping is bijective and (JpX )−1 = JpX∗ , the latter denoting the (by s-convexity also single-valued) duality mapping on the dual
X ∗ of X .
We will also make use of the Three-point identity (5.15) and the relation
(5.16), which connects elements of the primal space with the corresponding
elements of the dual space.
We here consider a combination of the Iteratively Regularized Gauss-Newton
method with an Iteratively Regularized Landweber method for approximating
the Newton step, using some initial guess x0 and starting from some xδ0 (that
196
6. A new Iteratively Regularized Newton-Landweber iteration
need not necessarily coincide with x0 )
For n = 0, 1, 2 . . . do
un,0 = 0
zn,0 = xδn
For k = 0, 1, 2 . . . , kn − 1 do
un,k+1 = un,k − αn,k JpX (zn,k − x0 )
(6.3)
−ωn,k A∗n jrY (An (zn,k − xδn ) − bn )
JpX (zn,k+1 − x0 ) = JpX (xδn − x0 ) + un,k+1
xδn+1 = zn,kn ,
where we abbreviate
An = F ′ (xδn ) ,
bn = y δ − F (xδn ) .
For obtaining convergence rates we impose a variational inequality
D
∃ β > 0 : ∀x ∈ Bρ (x† )
1
|hJpX (x† − x0 ), x − x† iX ∗ ×X | ≤ βDpx0 (x† , x) 2 −ν kF (x) − F (x† )k2ν , (6.4)
with ν ∈ (0, 21 ], corresponding to a source condition in the special case of
Hilbert spaces, cf., e.g., [42].
Here
D
Bρ (x† ) = {x ∈ X | Dpx0 (x† , x) ≤ ρ2 }
D
with ρ > 0 such that x0 ∈ B ρ (x† ).
By distinction between the cases kx − x0 k < 2kx† − x0 k and kx − x0 k ≥
2kx† − x0 k and the second triangle inequality we obtain from (6.1) that
D
k·k
B ρ (x† ) ⊆ B ρ̄ (x0 ) = Bρ̄ (x0 ) = {x ∈ X | kx − x0 k ≤ ρ̄}
with ρ̄ = max{2kx† − x0 k ,
2p 3s−p ρ
cp,s
1/p
(6.5)
}.
The assumptions on the forward operator besides a condition on the domain
D
Bρ (x† ) ⊆ D(F )
(6.6)
6.1 Introduction
197
include a structural condition on its degree of nonlinearity. For simplicity of
exposition we restrict ourselves to the tangential cone condition
D
kF (x̃) − F (x) − F ′ (x)(x̃ − x)k ≤ η kF (x̃) − F (x)k , x̃, x ∈ B ρ (x† ) ,
(6.7)
and mention in passing that most of the results shown here remain valid under
a more general condition on the degree of nonlinearity already encountered
in Chapter 5 (cf. [42])
′ †
(F (x + v) − F ′ (x† ))v ≤ K F ′ (x† )v c1 D x0 (x† , v + x† )c2 ,
p
D
v ∈ X , x† + v ∈ Bρ (x† ) ,
(6.8)
with conditions on c1 , c2 depending on the smoothness index ν in (6.4). Here
F ′ is not necessarily the Fréchet derivative of F , but just a linearization of
F satisfying the Taylor remainder estimate (6.7). Additionally, we assume
D
that F ′ and F are uniformly bounded on Bρ (x† ).
The method contains a number of parameters that have to be chosen appropriately. At this point we only state that at first the inner iteration will be
stopped in the spirit of an inexact Newton method according to
∀0 ≤ k ≤ kn − 1 :
µkF (xδn ) − y δ k ≤ kAn (zn,k − xδn ) + F (xδn ) − y δ k (6.9)
for some µ ∈ (η, 1).
Since zn,0 = xδn and µ < 1, at least one Landweber step can be carried out in
each Newton iteration. By doing several Landweber steps, if allowed by (6.9),
we can improve the numerical performance as compared to the Iteratively
Regularized Landweber iteration from [55].
Concerning the remaining parameters ωn,k , αn,k and the overall stopping
index n∗ , we refer to the sections below for details.
Under the condition (6.9), we shall distinguish between the two cases:
(a) (6.4) holds with some ν > 0;
(b) a condition like (6.4) cannot be made use of, since the exponent ν is
unknown or (6.4) just fails to hold.
198
6. A new Iteratively Regularized Newton-Landweber iteration
The results we will obtain with the choice (6.9) by distinction between a
priori and a posteriori parameter choice are weaker than what one might
expect at a first glance. While the Discrepancy Principle for other methods
can usually be shown to yield (optimal) convergence rates if (6.4) happens
to hold (even if ν > 0 is not available for tuning the method but only for
the convergence analysis) we here only obtain weak convergence. On the
other hand, the a priori choice will only give convergence with rates if (6.4)
holds with ν > 0, otherwise no convergence can be shown. Still there is an
improvement over, e.g, the results in [55] and [81] in the sense that there no
convergence at all can be shown unless (6.4) holds with ν > 0. Of course from
the analysis in [49] it follows that there always exists a choice of αn,k such
that weak convergence without rates holds, namely αn,k = 0 corresponding
to the Newton-Landweber iteration analyzed in [49]. What we are going to
show here is that a choice of positive αn,k is admissible, which we expect to
provide improved speed of convergence for the inner iteration, as compared
to pure Landweber iteration.
Later on, we will analyze a different choice of the stopping indices that leads
to strong convergence.
6.2 Error estimates
6.2
199
Error estimates
For any n ∈ IN we have
Dpx0 (x† , zn,k+1) − Dpx0 (x† , zn,k )
= Dpx0 (zn,k , zn,k+1 ) + hJpX (zn,k+1 − x0 ) − JpX (zn,k − x0 ), zn,k − x† iX ∗ ×X
|
{z
}
=
=un,k+1 −un,k
Dpx0 (zn,k , zn,k+1 )
{z
|
}
(I)
− ωn,k hjrY (An (zn,k − xδn ) + F (xδn ) − y δ ), An (zn,k − x† )iY ∗ ×Y
{z
}
|
(II)
− αn,k hJpX (x†
†
− x0 ), zn,k − x iX ∗ ×X
{z
}
|
(III)
− αn,k hJpX (zn,k
|
− x0 ) − JpX (x† − x0 ), zn,k − x† iX ∗ ×X .
{z
}
(6.10)
(IV )
D
Assuming that zn,k ∈ Bρ (x† ), we now estimate each of the terms on the
right-hand side separately, depending on whether in (6.4) ν > 0 is known
(case a) or is not made use of (case b).
By (6.2) and (5.16) we have for the term (I)
Dpx0 (zn,k , zn,k+1) ≤ Cp∗ ,s∗ k JpX (zn,k+1 − x0 ) − JpX (zn,k − x0 ) ks
|
{z
}
∗
=un,k+1 −un,k
s∗
∗
∗
· (pρ2 )1− p∗ + kJpX (zn,k+1 − x0 ) − JpX (zn,k − x0 )kp −s
s∗
= Cp∗ ,s∗ (pρ2 )1− p∗ kαn,k JpX (zn,k − x0 )
+ωn,k A∗n jrY (An (zn,k − xδn ) + F (xδn ) − y δ )ks
∗
+Cp∗ ,s∗ kαn,k JpX (zn,k − x0 ) + ωn,k A∗n jrY (An (zn,k − xδn ) + F (xδn ) − y δ )kp
+2
s∗ −1
≤ 2s
C
p∗ ,s∗
∗ −1
∗
s
2 1− p∗
(pρ )
+2
+2
p∗ −1
s∗
s
kzn,k − x0 k(p−1)s
Cp∗ ,s∗ (pρ2 )1− p∗ αn,k
∗
s∗
kA∗n jrY (An (zn,k
ωn,k
p∗ −1
p∗
Cp∗ ,s∗ αn,k kzn,k − x0 k
p∗
Cp∗ ,s∗ ωn,k
kA∗n jrY (An (zn,k
≤ Cp∗ ,s∗ (pρ2 )
∗
1− ps∗
−
xδn )
∗
ρ̄(p−1)s 2s
∗ −1
∗
−
xδn )
∗
F (xδn )
δ
− y )k
(p−1)p∗
+
s
αn,k
+ ρ̄p 2p
+
F (xδn )
∗ −1
∗
∗
δ
− y )k
p∗
s∗
p
+ ϕ(ωn,k t̃n,k ),(6.11)
αn,k
200
6. A new Iteratively Regularized Newton-Landweber iteration
where we have used the triangle inequality in X ∗ and X , the Young’s ine-
quality
(a + b)λ ≤ 2λ−1 (aλ + bλ ) for a, b ≥ 0 , λ ≥ 1 ,
(6.12)
and (6.5), as well as the abbreviations
dn,k = Dpx0 (x† , zn,k )1/2 ,
tn,k = kAn (zn,k − xδn ) + F (xδn ) − y δ k,
t̃n,k = kA∗n jrY (An (zn,k − xδn ) + F (xδn ) − y δ )k,
r−1
≤ kAn ktn,k
.
(6.13)
Here
ϕ(λ) = 2s
∗ −1
s∗
∗
Cp∗ ,s∗ (pρ2 )1− p∗ λs + 2p
∗ −1
∗
Cp∗ ,s∗ λp ,
(6.14)
which by p∗ ≥ s∗ > 1 defines a strictly monotonically increasing and convex
function on R+ .
For the term (II) in (6.10) we get, using (6.7) and (1.44),
ωn,k hjrY (An (zn,k − xδn ) + F (xδn ) − y δ ), An (zn,k − x† )iY ∗ ×Y
= ωn,k trn,k
+ωn,k hjrY (An (zn,k − xδn ) + F (xδn ) − y δ ),
An (xδn − x† ) − F (xδn ) + y δ iY ∗ ×Y
r−1
≥ ωn,k trn,k − ωn,k tn,k
(ηkF (xδn ) − y δ k + (1 + η)δ).
(6.15)
Together with (6.9) this yields
ωn,k hjrY (An (zn,k − xδn ) + F (xδn ) − y δ ), An (zn,k − x† )iY ∗ ×Y
η
r−1
≥ (1 − )ωn,k trn,k − (1 + η)ωn,k tn,k
δ
µ
r
η
r
r−1 (1 + η)
ωn,k δ r ,
)ǫ)ω
t
−
C(
)
≥ (1 − − C( r−1
n,k
n,k
r
r
µ
ǫr−1
(6.16)
(6.17)
where we have used the elementary estimate
a1−λ bλ ≤ C(λ)(a + b) for a, b ≥ 0 , λ ∈ (0, 1)
with C(λ) = λλ (1 − λ)1−λ .
(6.18)
6.2 Error estimates
201
To make use of the variational inequality (6.4) for estimating (III) in case
a) with ν > 0, we first of all use (6.7) to conclude
kF (zn,k ) − F (x† )k
= k(An (zn,k − xδn ) + F (xδn ) − y δ )
+(F (zn,k ) − F (xδn ) − An ((zn,k − xδn ) + (y δ − y)k
≤ tn,k + ηkF (zn,k ) − F (xδn )k + δ
≤ tn,k + η(kF (zn,k ) − F (x† )k + kF (xδn ) − y δ k) + (1 + η)δ ,
hence by (6.9)
1
kF (zn,k ) − F (x )k ≤
1−η
†
η
(1 + )tn,k + (1 + η)δ .
µ
(6.19)
This together with (6.4) implies
|αn,k hJpX (x† − x0 ), zn,k − x† iX ∗ ×X |
2ν
β
η
1−2ν
≤
(1 + )tn,k + (1 + η)δ
αn,k dn,k
(1 − η)2ν
µ
r
n
η
β
2ν
ωn,k (1 + )tn,k + (1 + η)δ
≤ C( r )
(1 − η)2ν
µ
r o
2ν
r−2ν
−
+ ωn,kr αn,k d1−2ν
n,k
n
β
η r r
r−1
r r
2ν
≤ C( r )
2 ωn,k (1 + ) tn,k + (1 + η) δ
(1 − η)2ν
µ
h
r(1+2ν) io
4ν
− r(1+2ν)−4ν
r(1+2ν)−4ν
2
,
α
)
α
d
+
ω
+C( (1−2ν)r
n,k
n,k
n,k
n,k
2(r−2ν)
(6.20)
where we have used (6.18) twice. In the case b) we simply estimate
1
|αn,k hJpX (x† − x0 ), zn,k − x† iX ∗ ×X | ≤ kx† − x0 kp−1 αn,k (pd2n,k ) p .
(6.21)
Finally, for the term (IV) we have that
αn,k hJpX (x† − x0 ) − JpX (zn,k − x0 ), x† − zn,k iX ∗ ×X
= αn,k (Dpx0 (x† , zn,k ) + Dpx0 (zn,k , x† )) ≥ αn,k d2n,k .
(6.22)
202
6. A new Iteratively Regularized Newton-Landweber iteration
Altogether in case a) we arrive at the estimate
p∗
2
s∗
−θ 1+θ
dn,k+1 ≤ 1 − (1 − c0 )αn,k d2n,k + c1 αn,k
+ c2 αn,k
+ c3 ωn,k
αn,k
−(1 − c4 )ωn,k trn,k + C5 ωn,k δ r + ϕ(ωn,k t̃n,k ) ,
(6.23)
where
c0 =
β
C( 2νr )C( (1−2ν)r
)
2(r−2ν)
(1 − η)2ν
s∗
∗
c1 = Cp∗ ,s∗ (pρ2 )1− p∗ ρ̄(p−1)s 2s
c2 = Cp∗ ,s∗ ρ̄p 2
(6.24)
∗ −1
(6.25)
p∗ −1
(6.26)
c3 = c0
β
η r
η
r−1
2ν
)ǫ
+
)2
(1
+
c4 =
+ C( r−1
C(
)
r
r
µ
(1 − η)2ν
µ
r
β
r−1 (1 + η)
C( 2νr )2r−1 (1 + η)r
C5 = C( r ) r−1 +
2ν
ǫ
(1 − η)
4ν
,
θ =
r(1 + 2ν) − 4ν
(6.27)
(6.28)
(6.29)
(6.30)
(small c denoting constants that can be made small by assuming x0 to be
sufficiently close to x† and therewith β, η, kx0 − x† k small).
In case b) we use (6.16), (6.21) instead of (6.17), (6.20), which yields
2
p∗
p
s∗
d2n,k+1 ≤ 1 − αn,k d2n,k + c̃0 αn,k dn,k
+ c1 αn,k
+ c2 αn,k
η
r−1
ωn,k trn,k + (1 + η)ωn,k tn,k
δ + ϕ(ωn,k t̃n,k ), (6.31)
− 1−
µ
where
1
c̃0 = kx† − x0 kp−1 p p .
6.3
(6.32)
Parameter selection for the method
To obtain convergence and convergence rates we will have to appropriately
choose
• the step sizes ωn,k ,
6.3 Parameter selection for the method
203
• the regularization parameters αn,k ,
• the stopping indices kn of the inner iteration,
• the outer stopping index n.
We will now discuss these choices in detail.
In view of estimates (6.23), (6.31) it makes sense to balance the terms
ωn,k trn,k and ϕ(ωn,k t̃n,k ). Thus for establishing convergence in case b), we will
assume that in each inner Landweber iteration the step size ωn,k > 0 in (6.3)
is chosen such that
cω ≤
ϕ ωn,k t̃n,k
ωn,k trn,k
≤ cω
(6.33)
with sufficiently small constants 0 < cω < cω . In case a) of (6.4) holding true
it will turn out that we do not need the lower bound in (6.33) but have to
make sure that ωn,k stays bounded away from zero and infinity
ϕ ωn,k t̃n,k
≤ cω
ω ≤ ωn,k ≤ ω and
ωn,k trn,k
(6.34)
for some ω > ω > 0.
To see that we can indeed satisfy (6.33), we rewrite it as
cω ≤ ϕ(ωn,k t̃n,k )
t̃n,k
1
= ψ(ωn,k t̃n,k ) r ≤ cω ,
r
ωn,k tn,k
tn,k
with sufficiently small constants 0 < cω < cω , and ψ(λ) =
ϕ(λ)
,
λ
which by
(6.14) and p∗ > 1, s∗ > 1 defines a continuous strictly monotonically increasing function on R+ with ψ(0) = 0, limλ→+∞ ψ(λ) = +∞, so that, after fixing
tn,k and t̃n,k , ωn,k is well-defined by (6.33). An easy to implement choice of
ωn,k such that (6.33) holds is given by
r
∗
r
∗
p −1 −p
s −1 −s
ωn,k = ϑ min{tn,k
t̃n,k }
t̃n,k , tn,k
(6.35)
with ϑ sufficiently small, which is similar to the choice proposed in [49] but
avoids estimating the norm of An . Indeed, by (6.14), with this choice, the
204
6. A new Iteratively Regularized Newton-Landweber iteration
quantity to be estimated from below and above in (6.33) becomes
min{ 2s
2
∗ −1
s∗ −1
s∗
Cp∗ ,s∗ (pρ2 )1− p∗ ϑs
Cp∗ ,s∗ (pρ2 )
where
T =
∗
1− ps∗
∗ −1
s∗ −1
ϑ
+ 2p
T
∗ −1
(s∗ −1)
r( ∗1 − ∗1 ) s−p
tn,kp −1 s −1 t̃n,k
This immediately implies the lower bound with
cω = min{2s
∗ −1
s∗
Cp∗ ,s∗ (pρ2 )1− p∗ ϑs
∗ −1
, 2p
Cp∗ ,s∗ ϑp
+2
p∗ −1
∗ −1
T −(p
∗ −1)
p∗ −1
Cp∗ ,s∗ ϑ
,
},
.
∗ −1
Cp∗ ,s∗ ϑp
∗ −1
}.
(6.36)
The upper bound with
cω = 2 s
∗ −1
s∗
Cp∗ ,s∗ (pρ2 )1− p∗ ϑs
∗ −1
+ 2p
∗ −1
Cp∗ ,s∗ ϑp
∗ −1
(6.37)
follows by distinction between the cases T ≥ 1 and T < 1.
For (6.34) in case a) we will need the step sizes ωn,k to be bounded away
from zero and infinity. For this purpose, we will assume that
D
F ′ and F are uniformly bounded on B ρ (x† )
(6.38)
r ≥ s ≥ p,
(6.39)
and that
i.e., r ∗ ≤ s∗ ≤ p∗ . To satisfy (6.34), the choice (6.35) from case b) is modified
to
r
∗
r
∗
p −1 −p
s −1 −s
ωn,k = min{ϑtn,k
t̃n,k , ω}
t̃n,k , ϑtn,k
(6.40)
which, due to the fact that ψ is strictly monotonically increasing, obviously
still satisfies the upper bound in (6.33) with (6.37). Using (6.13) we get
r
ξ∗ −1
tn,k

≥  sup
D
x∈Bρ (x† )
|
t̃−ξ
n,k
′
≥ supx∈BD (x† ) kF (x)k
ρ
−ξ 
kF ′(x)k
(2 + 3η)
sup
D
x∈Bρ (x† )
{z
=:S(ξ)
−ξ
−rξ( r1∗ − ξ1∗ )
tn,k
−rξ( r1∗ − ξ1∗ )
kF (x) − F (x† ))k + δ 
}
6.3 Parameter selection for the method
205
D
by (6.7), provided zn,k , xδn ∈ B ρ (x† ) (a fact which we will prove by induction
below). Hence, we also have that ωn,k according to (6.40) satisfies ωn,k ≥ ω
with ω ≥ ϑ min{S(s) , S(p)}, thus altogether (6.34).
The regularization parameters {αn,k }n∈IN will also be chosen differently
depending on whether the smoothness information ν in (6.4) is known or not.
In the former case we choose {αn,k }n∈IN a priori such that
d20,0
≤ γ̄ ,
θ
α0,0
ρ2
θ
αn,k
≤
,
γ̄
max αn,k → 0 as n → ∞ ,
0≤k≤kn
)
(
θ
αn,k
− 1 + (1 − c0 )αn,k−1 γ̄
αn,k−1
∗
∗
p −θ
s −θ
≥ c1 αn,k−1
+ c2 αn,k−1
+ (c3 ω −θ +
(6.41)
(6.42)
(6.43)
C5 ω
)αn,k−1
τr
(6.44)
for some γ > 0 independent of n, k, where c0 , C1 , c2 , c3 , C5 , θ, ω, ω are as
in (6.24)–(6.30), (6.34), and ν ∈ (0, 12 ] is the exponent in the variational
inequality (6.4). Moreover, when using (6.4) with ν > 0, we are going to
impose the following condition on the exponents p, r, s
1+θ =
r(1 + 2ν)
≤ s∗ ≤ p∗ .
r(1 + 2ν) − 4ν
(6.45)
Well definedness in case k = 0 is guaranteed by setting αn,−1 = αn−1,kn−1 ,
ωn,−1 = ωn−1,kn−1 , which corresponds to the last line in (6.3). To satisfy
(6.41)–(6.44) for instance, we just set
γ̄ :=
θ
with α0,0
≤
have
ρ2
γ̄
d20,0
,
θ
α0,0
αn,k =
α0,0
(n + 1)σ
(6.46)
and σ ∈ (0, 1] sufficiently small. Indeed, with this choice we
1−
αn,k
αn,k−1
θ
=
(
1−
0
nσθ
(n+1)σθ
if k = 0
else.
206
6. A new Iteratively Regularized Newton-Landweber iteration
In the first case by the Mean Value Theorem we have for some t ∈ [0, 1]
θ
(n + 1)σθ − nσθ
σθ(n + t)σθ−1
αn,k
≤
=
1−
αn,k−1
(n + 1)σθ
(n + 1)σθ
1
σθ
σθ
αn,k−1 σ
≤
≤
= σθ
αn,k−1 .
n
α0,0
α0,0
Hence, provided (6.45) holds, a sufficient condition for (6.44) is
1
C5 ω
σθ
p∗ −θ−1
s∗ −θ−1
−θ
+ c3 ω + r
c1 α0,0
+ c2 α0,0
,
+ c0 +
1≥
α0,0
γ̄
τ
sufficiently small and τ
which can be achieved by making c0 , c3 , α0,0 , ασθ
0,0
sufficiently large.
If ν is unknown or just zero, then in order to obtain weak convergence
only, we choose αn,k a posteriori such that
αn,k ≤ min{1 , cωn,k trn,k }
(6.47)
for some sufficiently small constant c > 0.
Also the number kn of interior Landweber iterations in the n-th Newton
step acts as a regularization parameter. We will choose it such that (6.9)
holds. In case b), i.e., when we cannot make use of a ν > 0 in (6.4), we also
require that on the other hand
µkF (xδn ) − y δ k ≥ kAn (zn,kn − xδn ) + F (xδn ) − y δ k = tn,kn .
(6.48)
While by zn,0 = xδn and µ < 1, obviously any kn ≥ 1 that is sufficiently small
will satisfy (6.9), existence of a finite kn such that (6.48) holds will have to
be proven below.
The stopping index n∗ of the outer iteration in case a) ν > 0 is known
will be chosen according to
1+θ
r
n∗ (δ) = min{n ∈ IN : ∃k ∈ {0, . . . , kn } : αn,k
≤ τ δ} .
(6.49)
with some fixed τ > 0 independent of δ, and we define our regularized solution
as zn∗ ,kn∗ ∗ with the index
1+θ
kn∗ ∗ = min{k ∈ {0, . . . , kn∗ } : αn∗r,k ≤ τ δ} .
(6.50)
6.4 Weak convergence
207
Otherwise, in case b) we use a Discrepancy Principle
n∗ (δ) = min{n ∈ IN : kF (xδn ) − y δ k ≤ τ δ}
(6.51)
and consider xδn∗ = znn∗ −1 ,kn∗ −1 as our regularized solution.
6.4
Weak convergence
We now consider the case in which the parameter ν in (6.4) is unknown or
ν = 0.
Using the notations of the previous sections, we recall that ωn,k , αn,k , kn and
n∗ (δ) are chosen as follows. For fixed Newton step n, and Landweber step k
we choose the step size ωn,k > 0 in (6.3) is such that
ϕ ωn,k t̃n,k
cω ≤
≤ cω
ωn,k trn,k
(6.52)
i.e., (6.33) holds. We refer to Section 6.3 for well-definedness of such a step
size.
Next, we select αn,k such that
where γ0 > 0 satisfies
αn,k ≤ min 1, γ0 ωn,k trn,k ,
γ0 <
1−
η+ 1+η
τ
µ
1
− cω
c̃0 Dpx0 (x† , x0 ) p + c1 + c2
(6.53)
.
(6.54)
The stopping index of the inner Landweber iteration is chosen such that
∀0 ≤ k ≤ kn − 1 :
µkbn k ≤ tn,k
(6.55)
i.e., (6.9) holds for some µ ∈ (η, 1) and on the other hand kn is maximal with
this property, i.e.
µkbn k ≥ tn,kn .
(6.56)
208
6. A new Iteratively Regularized Newton-Landweber iteration
The stopping index of the Newton iteration is chosen according to the
discrepancy principle (6.51)
n∗ (δ) = min{n ∈ IN : kbn k ≤ τ δ} .
(6.57)
In order to show weak convergence, besides the tangential cone condition
(6.7) we also assume that there is a constant γ1 > 0 such that
kF ′ (x)k ≤ γ1
(6.58)
D
for all x ∈ Bρ (x† ).
We will now prove monotone decay of the Bergman distances between
the iterates and the exact solution, cf. [30, 49]. Since n∗ is chosen according
to the Discrepancy Principle (6.51), and by (6.9), estimate (6.31) yields
2
p∗
p
s∗
+ c1 αn,k
d2n,k+1 ≤ 1 − αn,k d2n,k + c̃0 αn,k dn,k
+ c2 αn,k
!
η + 1+η
τ
ωn,k trn,k + ϕ(ωn,k t̃n,k ),
(6.59)
− 1−
µ
from (6.59) and the definitions of ωn,k and αn,k according to (6.52), (6.53),
we infer
1
d2n,k+1 − d2n,k ≤ (c̃0 Dpx0 (x† , x0 ) p + c1 + c2 )αn,k
−(1 −
η + 1+η
τ
− cω )ωn,k trn,k .
µ
(6.60)
(6.61)
Thus, since αn,k is chosen smaller than γ0 ωn,k trn,k , we obtain
d2n,k+1 − d2n,k ≤ −γ2 ωn,k trn,k ,
with γ2 := 1 −
η+ 1+η
τ
µ
(6.62)
1
− cω − (c̃0 Dpx0 (x† , x0 ) p + c1 + c2 )γ0 > 0 by(6.54).
Summing over k = 0, ..., kn − 1 we obtain
Dpx0 (x† , xn ) − Dpx0 (x† , xn+1 ) =
kX
n −1
k=0
(d2n,k − d2n,k+1) ≥ γ2
kX
n −1
k=0
ωn,k trn,k . (6.63)
6.4 Weak convergence
209
Now, we use the definition of ωn,k to observe that
r tn,k
µkbn k
r
,
≥Φ
ωn,k tn,k ≥ Φ
kAn k
t̃n,k
(6.64)
for k ≤ kn − 1, where the strictly positive and strictly monotonically increa-
sing function Φ : R+ → R is defined by Φ(λ) = λψ −1 (cω λ), which yields
µkbn k
x0
†
x0
†
Dp (x , xn ) − Dp (x , xn+1 ) ≥ γ2 kn Φ
.
(6.65)
kAn k
Consequently, for every Newton step n with bn 6= 0, the stopping index kn
is finite. Moreover, summing now over n = 0, ..., n∗ (δ) − 1 and using the
assumed bound on F ′ (6.58) as well as (6.51) and kn ≥ 1, we deduce
µτ δ
x0
†
x0
†
x0
†
.
(6.66)
Dp (x , x0 ) ≥ Dp (x , x0 ) − Dp (x , xn∗ (δ) ) ≥ γ2 n∗ (δ)Φ
γ1
Thus, for δ > 0, n∗ (δ) is also finite, the method is well defined and we can
directly follow the lines of the proof of Theorem 3.2 in [49] to show the weak
convergence in the noisy case as stated in Theorem 6.4.1.
Besides the error estimates from Section 6.2, the key step of the proof of
strong convergence as n → ∞ in the noiseless case δ = 0 of Theorem 6.4.1
is a Cauchy sequence argument going back to the seminal paper [30]. Since
some additional terms appear in this proof due to the regularization term
in the Landweber iteration, we provide this part of the proof explicitly here.
By the identity
Dpx0 (xl , xm ) = Dpx0 (x† , xm ) − Dpx0 (x† , xl )
+hJpX (xl − x0 ) − JpX (xm − x0 ), xl − x† iX ∗ ×X (6.67)
and the fact that the monotone decrease (6.63) and boundedness from below
of the sequence Dpx0 (x† , xm ) implies its convergence, it suffices to prove that
the last term in (6.67) tends to zero as m < l → ∞. This term can be
rewritten as
hJpX (xl −x0 )−JpX (xm −x0 ), xl −x† iX ∗ ×X =
l−1 kX
n −1
X
hun,k+1 −un,k , xl −x† iX ∗ ×X
n=m k=0
210
6. A new Iteratively Regularized Newton-Landweber iteration
where
|hun,k+1 − un,k , xl − x† iX ∗ ×X |
= |αn,k hJpX (zn,k − x0 ), xl − x† iX ∗ ×X
+ωn,k hjrY (An (zn,k − xδn ) − bn ), An (xl − x† )iX ∗ ×X |
r−1
(2ρ̄p γ0 tn,k + kAn (xl − x† )k
≤ ωn,k tn,k
by our choice (6.53) of αn,k . Using (6.7), (6.48), it can be shown that
kF (xn+1 ) − yk ≤
with a factor
µ+η
1−η
µ+η
kF (xn ) − yk
1−η
(6.68)
< 1 by our assumption µ < 1 − 2η, (which by continuity of
F implies that a limit of xn – if it exists – has to solve (5.17)). Hence, using
again (6.7), as well as (6.68), we get
kAn (xl − x† )k ≤ 2(1 + η)kF (xn ) − yk + (1 + η)kF (xl ) − yk ≤
3(1 + η)
tn,k ,
µ
so that altogether we arrive at an estimate of the form
||hJpX (xl − x0 ) − JpX (xm − x0 ), xl − x† iX ∗ ×X |
≤ C
l−1 kX
n −1
X
n=m k=0
ωn,k trn,k ≤
C x0 †
(D (x , xm ) − Dpx0 (x† , xl ))
γ2 p
by (6.63) with l ≥ n, where the right-hand side goes to zero as l, m → ∞ by
the already mentioned convergence of the monotone and bounded sequence
Dpx0 (x† , xm ).
Altogether we have proven the following result.
Theorem 6.4.1. Assume that X is smooth and s-convex with s ≥ p, that x0
D
is sufficiently close to x† , i.e., x0 ∈ B ρ (x† ), and that F satisfies (6.7) with
D
(6.6), that F and F ′ are continuous and uniformly bounded in B ρ (x† ). Let
ωn,k , αn,k , kn , n∗ be chosen according to (6.9), (6.33), (6.47), (6.48), (6.51)
with η < 31 , η < µ < 1 − 2η, τ sufficiently large.
D
Then, the iterates zn,k remain in Bρ (x† ) for all n ≤ n∗ − 1, k ≤ kn , hence
6.5 Convergence rates with an a-priori stopping rule
211
any subsequence of xδn∗ = zn∗ −1,kn∗ −1 has a weakly convergent subsequence as
δ → 0. Moreover, the weak limit of any weakly convergent subsequence solves
(5.17). If the solution x† to (5.17) is unique, then xδn∗ converges weakly to x†
as δ → 0.
In the noise free case δ = 0, xn converges strongly to a solution of (5.17) in
D
B ρ (x† ).
6.5
Convergence rates with an a-priori stopping rule
We now consider the situation that ν > 0 in (6.4) is known and recall that
the parameters appearing in the methods are then chosen as follows, using
the notation of Section 6.2.
First of all, for fixed Newton step n and Landweber step k we again choose
the step size ωn,k > 0 in (6.3) such that (6.34) holds with a sufficiently small
constant cω > 0 (see (6.69) below) which is possible as explained in Section
6.3. In order to make sure that ωn,k stays bounded away from zero we assume
that (6.38), (6.39) hold.
Next, we assume (6.45) and select αn,k such that (6.41)–(6.44) holds.
Concerning the number kn of interior Landweber iterations, we only have to
make sure that (6.9) holds for some fixed µ ∈ (0, 1) independent of n and k.
The overall stopping index n∗ of the Newton iteration is chosen such that
(6.49) holds.
With this n∗ , our regularized solution is zn∗ ,kn∗ ∗ with the index according to
(6.50).
The constants µ ∈ (0, 1), τ > 0 appearing in these parameter choices are a
priori fixed, τ will have to be sufficiently large.
Moreover we assume that cω , β and η are small enough (the latter can be
achieved by smallness of the radius ρ of the ball around x† in which we will
212
6. A new Iteratively Regularized Newton-Landweber iteration
show the iterates to remain), so that we can choose ǫ > 0 such that
c4 + cω ≤ 1
(6.69)
with c4 as in (6.28).
By the choice (6.34) of ωn,k , estimate (6.23) implies
p∗
s∗
−θ 1+θ
d2n,k+1 ≤ 1 − (1 − c0 )αn,k d2n,k + c1 αn,k
+ c2 αn,k
+ c3 ωn,k
αn,k
−(1 − c4 − cω )ωn,k trn,k + C5 ωn,k δ r .
(6.70)
−θ
Multiplying (6.70) with αn,k+1
, using (6.49), and abbreviating
−θ
γn,k := d2n,k αn,k
,
we get
γn,k+1 ≤
αn,k
αn,k+1
θ {1 − (1 − c0 )αn,k } γn,k
s∗ −θ
+(c1 αn,k
+
p∗ −θ
c2 αn,k
+ (c3 ω
Using (6.44), this enables to inductively show
−θ
C5 ω
+ r )αn,k .
τ
γn,k+1 ≤ γ̄ ,
hence by (6.42) also
θ
d2n,k+1 ≤ γ̄αn,k
≤ ρ2
(6.71)
for all n ≤ n∗ − 1 and k ≤ kn − 1 as well as for n = n∗ and k ≤ kn∗ ∗ − 1
according to (6.50). Inserting the upper estimate defining kn∗ ∗ we therewith
get
rθ
d2n∗ ,kn∗ ∗ ≤ γ̄αnθ ∗ ,kn∗ ∗ ≤ γ̄(τ δ) 1+θ ,
which is the desired rate. Indeed, by (6.43), there exists a finite kn∗ ∗ ≤ kn∗
such that
1+θ
αn∗r,kn∗ ≤ τ δ
∗
and
∀0 ≤ k ≤ kn∗ ∗ − 1 :
µkF (xδn∗ ) − y δ k ≤ kAn (zn∗ ,k − xδn∗ ) + F (xδn∗ ) − y δ k .
Summarizing, we arrive at
6.6 Numerical experiments
213
Theorem 6.5.1. Assume that X is smooth and s-convex with s ≥ p, that
D
x0 is sufficiently close to x† , i.e., x0 ∈ Bρ (x† ), that a variational inequality
(6.4) with ν ∈ (0, 1] and β sufficiently small is satisfied, that F satisfies (6.7)
D
with (6.6), that F and F ′ are continuous and uniformly bounded in Bρ (x† ),
and that (6.45), (6.39) hold. Let ωn,k , αn,k , kn , n∗ , kn∗ ∗ be chosen according to
(6.9), (6.34), (6.41)–(6.44), (6.49), (6.50) with τ sufficiently large.
D
Then, the iterates zn,k remain in B ρ (x† ) for all n ≤ n∗ − 1, k ≤ kn and
n = n∗ , k ≤ kn∗ ∗ . Moreover, we obtain optimal convergence rates
4ν
Dpx0 (x† , zn∗ ,kn∗ ∗ ) = O(δ 2ν+1 ) ,
as δ → 0
(6.72)
(6.73)
as well as in the noise free case δ = 0
Dpx0 (x† , zn,k )
4ν
r(2ν+1)−4ν
= O αn,k
for all n ∈ IN.
6.6
Numerical experiments
In this section we present some numerical experiments to test the method
defined in Section 6.4. We consider the estimation of the coefficient c in the
1D boundary value problem
(
−u′′ + cu = f in (0, 1)
u(0) = g0
u(1) = g1
(6.74)
from the measurement of u, where g0 , g1 and f ∈ H−1 [0, 1] are given.
Here and below, H−1 ([0, 1]) is the dual space of the closure of C0∞ ([0, 1])
in H1 ([0, 1]), denoted by H01 ([0, 1]), cf. e.g. [92]. We briefly recall some
important facts about this problem (cf. Section 5.1):
1. For 1 ≤ p ≤ +∞ there exists a positive number γp such that for every
c in the domain
D := {c ∈ Lp [0, 1] : kc − ϑ̂kLp ≤ γp , ϑ̂ ≥ 0 a.e.}
(6.74) has a unique solution u = u(c) ∈ H1 ([0, 1]).
214
6. A new Iteratively Regularized Newton-Landweber iteration
2. The nonlinear operator F : D ⊆ Lp ([0, 1]) → Lr ([0, 1]) defined as
F (c) := u(c) is Frechét differentiable and
F ′ (c)h = −A (c)−1 (hu(c)), F ′ (c)∗ w = −u(c)A (c)−1 w,
(6.75)
where A (c) : H2 ([0, 1]) ∩ H01 ([0, 1]) → L2 ([0, 1]) is defined by A (c)u =
−u′′ + cu.
∗
3. For every p ∈ (1, +∞) the duality map Jp : Lp ([0, 1]) → Lp ([0, 1]) is
given by
Jp (c) = |c|p−1sgn(c), c ∈ Lp ([0, 1]).
(6.76)
For the numerical simulations we take X = Lp ([0, 1]) with 1 < p < +∞ and
Y = Lr ([0, 1]), with 1 ≤ r ≤ +∞ and identify c from noisy measurements uδ
of u. We solve all differential equations approximately by a finite difference
method by dividing the interval [0, 1] into N + 1 subintervals with equal
length 1/(N + 1); in all examples below N = 400. The Lp and Lr norms are
calculated approximately too by means of a quadrature method.
We have chosen the same test problems as in [49]. Moreover, we added a
variant of the example for sparsity reconstruction.
The parameters ωn,k and αn,k are chosen according to (6.35) and (6.53) and
the outer iteration is stopped according to the Discrepancy Principle (6.51).
Concerning the stopping index of the inner iteration, in addition to the conditions (6.9) and (6.48), we require also that if kF (zn,k ) − y δ k ≤ τ δ then the
iteration has to be stopped. More precisely,
kn = min{k ∈ Z, k ≥ 0, | kF (zn,k ) − y δ k ≤ τ δ ∨ tn,k ≤ µkbn k}
(6.77)
and the regularized solution is xδn∗ = znn∗ −1 ,kn∗ −1 .
Example 6.6.1. In the first simulation we assume that the solution is sparse:



 0.5, 0.3 ≤ t ≤ 0.4,
c† (t) =
1.0, 0.6 ≤ t ≤ 0.7,


 0.0, elsewhere.
(6.78)
6.6 Numerical experiments
215
Sparsity reconstruction with 2 spikes: solution
Sparsity reconstruction with 2 spikes: solution
1.5
1.5
c+
c+
cδ
n*
0.5
cn*
0.5
0
0
−0.5
0
0
δ
1
c(t)
c(t)
1
0.2
0.4
0.6
0.8
−0.5
0
1
0.2
0.4
0.6
0.8
t
Sparsity reconstruction with 2 spikes: relative error
0
10
10
relative error
relative error
−1
−1
10
10
−2
−2
10
1
t
Sparsity reconstruction with 2 spikes: relative error
0
100
200
300
400
500
600
700
800
10
900
0
(a) p = 2, r = 2.
100
200
300
400
500
600
700
800
900
(b) p = 1.1, r = 2.
Figure 6.1: Reconstructed Solutions and relative errors for Example 6.6.1.
The test problem is constructed by taking u(t) = u(c† )(t) = 1 + 5t, f (t) =
u(t)c† (t), g0 = 1 and g1 = 6. We perturb the exact data u with gaussian
white noise: the corresponding perturbed data uδ satisfies kuδ − ukLr = δ,
with δ = 0.1 × 10−3 . When applying the method of Section 6.4, we take
µ = 0.99 and τ = 1.02. The upper bound cω satisfies
cω = 2 s
∗ −1
Cp∗ ,s∗ (pρ2 )1−s
∗ /p∗
ϑ̂s
∗ −1
+ 2p
∗ −1
Cp∗ ,s∗ ϑ̂p
∗ −1
,
(6.79)
♯
and ϑ̂ is chosen as 2−j , where j ♯ is the first index such that γ0 := 0.99(1 −
η
µ
−
1+η
τµ
− cω ) > 0. In the tests, we always choose Y = L2 ([0, 1]) and change
the values of p. In Figure 6.1 we show the results obtained by our method
with p = 2 and p = 1.1 respectively. From the plot of the solution we can
see that the reconstruction of the sparsity in the case p = 1.1 is much better
than in the case with p = 2 and the quality of the solutions is in line with
what one should expect (cf. the solutions obtained in [49]). From the plot of
the relative errors we note that a strict monotonicity of the error cannot be
observed in the case with p = 1.1. The monotonicity holds instead in the case
p = 2. We also underline that in this example the total number of the inner
iterations
Np =
∗ −1
nX
n=0
kn
216
6. A new Iteratively Regularized Newton-Landweber iteration
Sparsity reconstruction with 3 spikes: solution
Sparsity reconstruction with 3 spikes: solution
1.2
1.2
+
+
c
c
cδ
cδ
n*
1
n*
1
0.8
0.6
0.6
c(t)
c(t)
0.8
0.4
0.4
0.2
0.2
0
0
−0.2
0
0.2
0.4
0.6
0.8
1
−0.2
0
0.2
0.4
t
(a) p = 2, r = 2.
0.6
0.8
1
t
(b) p = 1.1, r = 2.
Figure 6.2: Reconstructed Solutions for Example 6.6.2, case A.
is much larger in the case p = 2 (N2 = 20141) than in the case p = 1.1
(N1.1 = 4053), thus the reconstruction with p = 1.1 is also faster.
Example 6.6.2. Choosing a different exact solution doesn’t change the results too much. In this example we only modify c† into


0.25, 0.1 ≤ t ≤ 0.15,



 0.5, 0.3 ≤ t ≤ 0.4,
c† (t) =

1.0, 0.6 ≤ t ≤ 0.7,




0.0
elsewhere.
(6.80)
and choose again δ = 0.1 × 10−3 .
The reconstructed solutions obtained show that choosing a p smaller than
2 improves the results because the oscillations in the zero parts are damped
significantly. Once again, the iteration error and the residual norms do not
decrease monotonically in the case p = 1.1, but only in the average.
We also tested the performance obtained by the method with the choice αn,k =
0 instead of (6.53) and summarized the results in Table 6.1. Similarly to
Example 1, a small p allows not only to get the better reconstruction, but also
to spare time in the computation. Moreover, we notice that in this example
the method with αn,k > 0 chosen according to (6.53) proved to be faster than
6.6 Numerical experiments
217
Results for Example 2
Np
Rel. Err.
p = 2, αn,k > 0
p = 2, αn,k = 0
p = 1.1, αn,k > 0
p = 1.1, αn,k = 0
21610
26303
4529
5701
9.8979 × 10
−2
9.8938 × 10
−2
4.9645 × 10
−2
4.9655 × 10−2
Table 6.1: Numerical results for Example 2.
the method with αn,k = 0, performing fewer iterations, with a gain of 17.8%
in the case with p = 2 and 20.5% with p = 1.1.
Example 6.6.3. At last, we consider an example with noisy data where a
few data points called outliers are remarkably different from other data points.
This situation may arise from procedural measurement errors.
We suppose c† to be a smooth solution
c† (t) = 2 − t + 4 sin(2πt)
(6.81)
and take u(c† )(t) = 1 − 2t, f (t) = (1 − 2t)(2 − t + 4 sin(2πt)), g0 = 1 and
g1 = −1 as exact data of the problem. We start the iteration from the initial
guess c0 (t) = 2 − t, fix the parameters µ = 0.999 and τ = 1.0015 and choose
cω and γ0 as in Example 1.
Case A. At first, we assume the data are perturbed with white gaussian
noise (δ = 0.1 × 10−2 ), fix X = L2 ([0, 1]) and take Y = Lr ([0, 1]), with r = 2
or r = 1.1. As we can see from Figure 6.3, being the data reasonably smooth,
we obtain comparable reconstructions (in the case r = 2 the relative error is
equal to 2.1331 × 10−1 , whereas in the case r = 1.1 we get 2.0883 × 10−1 ).
Case B. The situation is much different if the perturbed data contain
also a few outliers. We added 19 outliers to the gaussian noise perturbed
data of case A obtaining the new noise level δ = 0.0414. In this case taking
Y = L1.1 ([0, 1]) considerably improves the results keeping the relative error
reasonably small (2.9388 × 10−1 against 1.1992, cf. Figure 6.3).
218
6. A new Iteratively Regularized Newton-Landweber iteration
Example 3 gaussian data
p=2 r=1.1 solution
Example 3 gaussian data:
p=2 r=2 solution
Example 3 gaussian noise: perturbed data
1.5
6
6
c+
c+
δ
4
3
3
2
2
c(t)
4
c(t)
0.5
0
1
1
−0.5
cδn*
5
cn*
5
1
0
0
−1
−1
−1
−2
−2
−1.5
0
0.2
0.4
0.6
0.8
−3
0
1
0.2
0.4
0.6
0.8
−3
0
1
0.2
0.4
(a) Gauss data
(b) Gauss r = 2
Data with ouliers:
plot
0.8
Data with outliers:
p=2 r=1.1 solution
6
6
c+
c+
2
δ
cn*
5
1.5
1
(c) Gauss r = 1.1
Data with outliers:
p=2 r=2 solution
2.5
0.6
t
t
cδ
5
4
4
3
3
2
2
1
n*
c(t)
c(t)
δ
u (t)
0.5
0
1
1
−0.5
−1
−1.5
0
−1
−1
−2
−2
−2
−2.5
0
0
0.2
0.4
0.6
0.8
t
(d) Outliers data
1
−3
0
0.2
0.4
0.6
0.8
1
−3
0
(e) Outliers r = 2
0.2
0.4
0.6
0.8
1
t
t
(f) Outliers r = 1.1
Figure 6.3: Numerical results for Example 6.6.3: (a) and (d) are data with
noise; (b) and (d) are reconstructions with X = Y = L2 [0, 1]; (c) and (f) are
reconstructions with X = L2 [0, 1] and Y = L1.1 [0, 1].
6.7 A new proposal for the choice of the parameters
219
Concerning the total number of iterations N, in this example it is not subjected to remarkable variations.
To summarize, in all examples above the method obtained reasonable
results, proving to be reliable, both in the sparsity reconstruction examples
and when the data are affected with outliers. Concerning the total number
of iterations, the introduction of the parameters αn,k has accelerated the
method in Example 2, but the issue of the speed of the algorithms requires
further investigation.
6.7
A new proposal for the choice of the parameters
With the parameter choices proposed in Section 6.3, it is rather difficult to
prove strong convergence and convergence rates for the Discrepancy Principle. Moreover, the numerical experiments show that the method still requires
many iterations to obtain good regularized solutions.
Indeed, the choices that have been made for ωn,k and αn,k seem to make the
method not flexible enough. For this reason, we propose here a different way
to select these parameters. To do this, we return to the estimates of Section
6.2. Using the same notations, we estimate the term d2n,k+1 as in (6.10) and
proceed exactly as in Section 6.2 for estimating the terms (I), (II) and (IV).
For the term (III), if ν = 0 instead of using (6.9), we reconsider the estimate
kF (zn,k ) − F (x† )k
= k(An (zn,k − xδn ) + F (xδn ) − y δ )
+(F (zn,k ) − F (xδn ) − An ((zn,k − xδn ) + (y δ − y)k
≤ tn,k + ηkF (zn,k ) − F (xδn )k + δ
≤ tn,k + η(kF (zn,k ) − F (x† )k + kF (xδn ) − y δ k) + (1 + η)δ ,
to conclude
kF (zn,k ) − F (x† )k ≤
1
(tn,k + ηrn + (1 + η)δ) .
1−η
(6.82)
220
6. A new Iteratively Regularized Newton-Landweber iteration
This together with (6.4) implies
|αn,k hJpX (x† − x0 ), zn,k − x† iX ∗ ×X |
β
2ν
≤
αn,k d1−2ν
n,k (tn,k + ηrn + (1 + η)δ)
2ν
(1 − η)
o
n
4ν
β
2
1
1+2ν
(6.83)
αn,k dn,k + (tn,k + ηrn + (1 + η)δ)
≤ C( 2 + ν)
(1 − η)2ν
where we have used (6.18) with C(λ) = λλ (1 − λ)1−λ .
In the case ν = 0, we simply use (6.21) as in Section 6.2.
Altogether, we obtain
d2n,k+1 ≤
p∗
s∗
1 − (1 − c0 )αn,k d2n,k + c̃0 αn,k + c1 αn,k
+ c2 αn,k
4ν
+c3 αn,k (tn,k + ηrn + (1 + η)δ) 1+2ν
r−1
−ωn,k trn,k + ωn,k tn,k
(ηrn + (1 + η)δ) + ϕ(ωn,k t̃n,k ) , (6.84)
where
c0 =
(
c̃0 =
(
0
if ν = 0
β
(1−η)2ν
C( 12
+ ν)
if ν > 0
1
kx† − x0 kp−1 (pρ2 ) p
0
c1 = Cp∗ ,s∗ (pρ2 )
∗
1− ps∗
∗
ρ̄(p−1)s 2s
if ν = 0
if ν > 0
∗ −1
p∗ −1
c2 = Cp∗ ,s∗ ρ̄p 2
(
c0
c3 =
1
kx† − x0 kp−1 (pρ2 ) p
4ν
θ =
.
r(1 + 2ν) − 4ν
(6.85)
(6.86)
(6.87)
in case (a) ν, θ > 0
in case (b) ν, θ = 0
−θ
Multiplying (6.84) with αn,k+1
and abbreviating
−θ
γn,k := d2n,k αn,k
,
(6.88)
(6.89)
6.7 A new proposal for the choice of the parameters
221
we get
γn,k+1 − γn,k ≤
αn,k
αn,k+1
θ n
θ
αn,k+1 1 − (1 − c0 )αn,k −
γn,k
αn,k
4ν
∗
∗
p −θ
1−θ
s −θ
1−θ
+(c̃0 αn,k
+ c1 αn,k
+ c2 αn,k
+ c3 αn,k
(tn,k + ηrn + (1 + η)δ) 1+2ν
o
r−1
−θ
ωn,k trn,k − ωn,k tn,k
(ηrn + (1 + η)δ) − ϕ(ωn,k t̃n,k ) .
−αn,k
To obtain monotone decay of the sequence γn,k with increasing k we
choose
• ωn,k ≥ 0 such that
ω ≤ ωn,k ≤ ω and
ϕ ωn,k t̃n,k
ωn,k trn,k
≤ cω
(6.90)
for some 0 < ω < ω, cω > 0. We will do so by setting
r
∗
r
∗
p −1 −p
s −1 −s
t̃n,k , ω}
t̃n,k , tn,k
ωn,k = ϑ min{tn,k
(6.91)
with ϑ sufficiently small, and assuming that
r ≥ s ≥ p,
(6.92)
• αn,k ≥ 0 such that
r
αn,k ≥ α̌n,k := τ̃ (tn,k + ηrn + (1 + η)δ) 1+θ
(6.93)
and
c0 +
c̃0
c1 s∗ −θ−1 c2 p∗ −θ−1
c3
1
+
αn,k
αn,k
+ θ
+ 1+θ
≤ q < 1 (6.94)
+
γ 0,0 γ 0,0
γ 0,0
τ̃ γ 0,0 τ̃ γ 0,0
(note that in case θ = 0 we have c0 = 0 and vice versa, in case θ > 0
we have c̃0 = 0). The latter can be achieved by
s∗ ≥ θ + 1 ,
αn,k ≤ 1 and
p∗ ≥ θ + 1 ,
c0 , c̃0 , c1 , c2 , c3 , τ̃ 1 sufficiently small.
(6.95)
(6.96)
222
6. A new Iteratively Regularized Newton-Landweber iteration
In case (a) we additionally require
αn,k+1 ≥ α̂n,k+1
1/θ
:= αn,k 1 − (1 − q)αn,k
(6.97)
with an upper bound γ 0,0 for γ0,0 . Note that this just means αn,k+1 ≥ 0
in case (b) corresponding to ν = 0, i.e., θ = 0, thus an empty condition
in case (b).
To meet conditions (6.93), (6.97) with a minimal αn,k+1 we set
αn,k+1 = max{α̌n,k+1 , α̂n,k+1} for k ≥ 0
(
αn−1,kn−1 if n ≥ 1
.
αn,0 =
α0,0
if n = 0
(6.98)
It remains to choose
• the inner stopping index k n
• the outer stopping index n∗ ,
see below.
Indeed with these choices of ωn,k and αn,k+1 we can inductively conclude from
(6.90) that
γn,k+1 − γn,k
θ n
θ
o
αn,k+1 αn,k
1 − (1 − q)αn,k −
≤
γ 0,0
αn,k+1
αn,k
−θ
−αn,k+1
(1 − cω )ωn,k trn,k ,
−θ
≤ −αn,k+1
(1 − cω )ωn,k trn,k ≤ 0 .
(6.99)
This monotonicity result holds for all n ∈ IN and for all k ∈ IN.
By (6.99) and αn,k ≤ 1 (cf. (6.95)) it can be shown inductively that all
D
iterates remain in B ρ (x† ) provided
γ 0,0 ≤ ρ2 .
(6.100)
Moreover, (6.99) implies that
∞ X
∞
X
n=0 k=0
−θ
αn,k+1
ωn,k trn,k ≤
γ 0,0
< ∞,
1 − cω
(6.101)
6.7 A new proposal for the choice of the parameters
223
hence by αn,k+1 ≤ 1, ωn,k ≥ ω
tn,k → 0 as k → ∞ for all n ∈ IN
(6.102)
sup tn,k → 0 as n → ∞ .
(6.103)
rn → 0 as n → ∞ .
(6.104)
and
k∈IN0
Especially, since tn,0 = rn ,
To quantify the behavior of the sequence αn,k according to (6.93), (6.97),
(6.98) for fixed n we distinguish between two cases.
(i) There exists a k such that for all k ≥ k we have αn,k = α̂n,k . Considering an arbitrary accumulation point ᾱn of αn,k (which exists since
θ1
0 ≤ αn,k ≤ 1) we therefore have ᾱn = ᾱn 1 − (1 − q)ᾱn , hence
ᾱn = 0.
(ii) Consider the situation that (i) does not hold, i.e., there exists a subsequence kj such that for all j ∈ IN we have αn,kj = α̌n,kj . Then by
r
(6.93), (6.97), and (6.102) we have αn,kj → τ̃ (ηrn + (1 + η)δ) 1+θ .
Altogether we have shown that
r
lim sup αn,kj ≤ τ̃ (ηrn + (1 + η)δ) 1+θ for all n ∈ IN.
(6.105)
k→∞
Since η and δ can be assumed to be sufficiently small, this especially implies
the bound αn,k ≤ 1 in (6.95).
We consider zn∗ ,k∗n∗ as our regularized solution, where n∗ , k∗n∗ (and also
k n for all n ≤ n∗ − 1; note that k∗n∗ is to be distinguished from k n∗ - actually
the latter is not defined, since we only define k n for n ≤ n∗ − 1!) are still to
be chosen appropriately, according to the requirements from the proofs of
• convergence rates in case ν, θ > 0,
• convergence for exact data δ = 0,
• convergence for noisy data as δ → 0.
224
6. A new Iteratively Regularized Newton-Landweber iteration
6.7.1
Convergence rates in case ν > 0
From (6.99) we get
θ
d2n,k ≤ γ 0,0 αn,k
for all n, k ∈ IN ,
(6.106)
hence in order to get the desired rate
rθ
d2n∗ ,k∗n∗ = O(δ 1+θ )
in view of (6.105) (which is a sharp bound in case (ii) above) we need to have
a bound
rn∗ ≤ τ δ
(6.107)
for some constant τ > 0, and we should choose k∗n∗ large enough so that
r
αn∗ ,k∗n∗ ≤ Cα (rn∗ + δ) 1+θ
(6.108)
r
which is possible with a finite k∗n∗ by (6.105) for Cα > (τ̃ (1 + η)) 1+θ . Note
that this holds without any requirements on k n .
6.7.2
Convergence as n → ∞ for exact data δ = 0
To show that (xn )n∈IN is a Cauchy sequence (following the seminal paper
[30]), for arbitrary m < j, we choose the index l ∈ {m, . . . , j} such that rl is
minimal and use the identity
Dpx0 (xl , xm ) = Dpx0 (x† , xm ) − Dpx0 (x† , xl )
+hJpX (xl − x0 ) − JpX (xm − x0 ), xl − x† iX ∗ ×X (6.109)
and the fact that the monotone decrease and boundedness from below of the
sequence Dpx0 (x† , xm ) implies its convergence, hence it suffices to prove that
the last term in (6.109) tends to zero as m < l → ∞ (analogously it can
be shown that Dpx0 (xl , xj ) tends to zero as l < j → ∞). This term can be
rewritten as
hJpX (xl − x0 ) − JpX (xm − x0 ), xl − x† iX ∗ ×X
n
l−1 kX
−1
X
=
hun,k+1 − un,k , xl − x† iX ∗ ×X ,
n=m k=0
6.7 A new proposal for the choice of the parameters
225
where
|hun,k+1 − un,k , xl − x† iX ∗ ×X |
= |αn,k hJpX (zn,k − x0 ), xl − x† iX ∗ ×X
+ωn,k hjrY (An (zn,k − xδn ) − bn ), An (xl − x† )iX ∗ ×X |
r−1
≤ 2ρ̄p αn,k + ωn,k tn,k
kAn (xl − x† )k
r−1
≤ 2ρ̄p τ̃ (tn,k + ηrn )r + ωn,k tn,k
(1 + η)(2rn + rl )
r−1
≤ 2ρ̄p τ̃ (tn,k + ηrn )r + 3(1 + η)ωn,k tn,k
rn
by our choice of αn,k = α̌n,k (note that α̂n,k = 0 in case θ = 0), condition
(6.7) and the minimality of rl .
Thus we have by ωn,k ≤ ω and Young’s inequality that there exists C > 0
such that
hJpX (xl − x0 ) − JpX (xm − x0 ), xl − x† iX ∗ ×X ≤ C
l−1
X
n=m
(
kX
n −1
trn,k
k=0
!
+ kn rrn
!
for which we can conclude convergence as m, l → ∞ from (6.101) provided
that
∞
X
n=m
k n rrn → 0 as m → ∞ ,
which we guarantee by choosing, for an a priori fixed summable sequence
(an )n∈IN ,
k n := an r−r
n .
6.7.3
(6.110)
Convergence with noisy data as δ → 0
In case (a) ν, θ > 0 convergence follows from the convergence rates results in
Subsection 6.7.1. Therefore it only remains to show convergence as δ → 0 in
case θ = 0.
In this section we explicitly emphasize dependence of the computed quantities on the noisy data and on the noise level by a superscript δ.
226
6. A new Iteratively Regularized Newton-Landweber iteration
Let ky δj − yk ≤ δj with δj a zero sequence and n∗j the corresponding
stopping index. As usual [30] we distinguish between the two cases that (i)
n∗j has a finite accumulation point and (ii) n∗j tends to infinity.
(i) There exists an N ∈ IN and a subsequence nji such that for all i ∈ IN
we have nji = N. Provided
n∗ (δ) = N for all δ ⇒ The mapping δ 7→ xδN is continuous at δ = 0 ,
(6.111)
we can conclude that
δj
xNi
→ x0N as i → ∞, and by taking the limit as
i → ∞ also in (6.107), x0N is a solution to (5.17). Thus we may set
x† = x0N in (6.99) (with θ = 0) to obtain
Dpx0 (x0N , z
δji
n∗j
i
i
n∗ji ,k∗j
) = Dpx0 (x0N , z
δji
δj
n∗j
i
i
N,k∗j
) ≤ Dpx0 (x0N , xNi ) → 0 as i → ∞
where we have again used the continuous dependence (6.111) in the
last step.
(ii) Let n∗j → ∞ as j → ∞, and let x† be a solution to (5.17). For
arbitrary ǫ > 0, by convergence for δ = 0 (see the previous subsection)
we can find n such that Dpx0 (x† , x0n ) <
ǫ
2
and, by Theorem 2.60 (d) in
[82] there exists j0 such that for all j ≥ j0 we have n∗,j ≥ n + 1 and
δ
|Dpx0 (x† , xnj ) − Dpx0 (x† , x0n )| < 2ǫ , provided
n ≤ n∗ (δ)−1 for all δ ⇒ The mapping δ 7→ xδn is continuous at δ = 0 .
(6.112)
Hence, by monotonicity of the errors we have
Dpx0 (x† , z
δj
n
n∗j ,k∗j∗j
≤
Dpx0 (x† , x0n )
) ≤ Dpx0 (x† , xδnj )
+
|Dpx0 (x† , xδnj )
−
(6.113)
Dpx0 (x† , x0n )|
< ǫ.
Indeed, (6.111), (6.112) can be concluded from continuity of F , F ′ , the definition of the method (6.3), as well as stable dependence of all parameters
ωn,k , αn,k , k n according to (6.91), (6.93), (6.97), (6.98), (6.110) on the data
yδ.
Altogether we have derived the following algorithm.
6.7 A new proposal for the choice of the parameters
6.7.4
227
Newton-Iteratively Regularized Landweber algorithm
Choose τ, τ̃ , Cα sufficiently large, x0 sufficiently close to x† ,
P
α00 ≤ 1, ω > 0, (an )n∈IN0 such that ∞
n=0 an < ∞.
If (6.4) with ν ∈ (0, 1] holds, set θ =
4ν
,
r(1+2ν)−4ν
otherwise θ = 0.
For n = 0, 1, 2 . . . until rn ≤ τ δ do
un,0 = 0
zn,0 = xδn
αn,0 = αn−1,kn−1 if n > 0
(
For k = 0, 1, 2 . . . until
r
∗
k = k n − 1 = an r−r
n
if rn > τ δ
r
αn∗ ,k∗n∗ ≤ Cα (rn∗ + δ) 1+θ
r
∗
p −1 −p
s −1 −s
t̃n,k , tn,k
t̃n,k , ω}
ωn,k = ϑ min{tn,k
if rn ≤ τ δ
)
do
un,k+1 = un,k − αn,k JpX (zn,k − x0 )
−ωn,k F ′ (xδn )∗ jrY (F ′ (xδn )(zn,k − xδn ) + F (xδn ) − y δ )
JpX (zn,k+1 − x0 ) = JpX (xδn − x0 ) + un,k+1
αn,k+1 = max{α̌n,k+1 , α̂n,k+1 } with α̌n,k+1 , α̂n,k+1 as in (6.93), (6.97)
xδn+1
= zn,kn .
Note that we here deal with an a priori parameter choice: θ and therefore
ν has to be known, otherwise θ must be set to zero.
The analysis above yields the following convergence result.
Theorem 6.7.1. Assume that X is smooth and s-convex with s ≥ p, that
D
x0 is sufficiently close to x† , i.e., x0 ∈ Bρ (x† ), that F satisfies (6.7) with
D
(6.6), that F and F ′ are continuous and uniformly bounded in Bρ (x† ), and
that (6.92), (6.96) hold.
Then, the iterates zn,k defined by Algorithm 6.7.4 remain in BρD (x† ) and con-
verge to a solution x† of (5.17) subsequentially as δ → 0 (i.e., there exists
a convergent subsequence and the limit of every convergent subsequence is a
solution).
In case of exact data δ = 0, we have subsequential convergence of xn to a solution of (5.17) as n → ∞. If additionally a variational inequality (6.4) with
228
6. A new Iteratively Regularized Newton-Landweber iteration
ν ∈ (0, 1] and β sufficiently small is satisfied, we obtain optimal convergence
rates
4ν
Dpx0 (x† , zn∗ ,k∗n∗ ) = O(δ 2ν+1 ) ,
as δ → 0 .
(6.114)
Conclusions
In this short conclusive chapter, we point out the main contributions of the
thesis in the area of the regularization of ill-posed problems and present some
possible further developments of this work.
The three stopping rules for the Conjugate Gradient method applied to
the Normal Equation presented in Chapter 3 produced very promising numerical results in the numerical experiments. In particular, SR2 provided an
important insight into the regularizing properties of this method, connecting
the well-known theoretical estimates of Chapter 2 with the properties of the
Truncated Singular Value Decomposition method.
In the numerical experiments presented in Chapter 4, the new stopping
rules defined in Chapter 3 also produced very good numerical results. Of
course, these results can be considered only the starting point of a possible
future work. Some further developments can be the following:
• applications of the new stopping rules in combination with more sophisticated regularization methods that make use of CGNE (e.g., the
Restarted Projected CGNE described in Chapter 3);
• extension of the underlying ideas of the new stopping rules to other
regularization methods (e.g., SART, Kaczmarz,...);
• analysis of the speed of the algorithms presented for computing the
indices of the new stopping rules, to get improvements.
The theoretical results of Chapter 6, and in particular of Section 6.7, enhanced the regularization theory of Banach spaces. However, they have to
229
230
Conclusions
be tested in more serious practical examples. We believe that the new ways
to arrest the iteration can indeed improve the performances significantly.
Besides the repetition of the numerical tests of Section 6.6, also two dimensional examples should be considered, as well as a comparison of the
inner-outer Newton-Landweber iteration proposed here with the classical Iteratively regularized Gauss-Newton method.
Possible extensions to the case of non-reflexive Banach spaces and further
simulations in different ill-posed problems should also be a subject of future
research.
Appendix A
Spectral theory in Hilbert
spaces
We recall briefly some fundamental results of functional calculus for selfadjoint operators in Hilbert spaces. Details and proofs can be found, e.g. in
[2], [17] and [44].
Throughout this section, X will always denote a Hilbert space. The scalar
product in X will be denoted by h·, ·iX and the norm induced by this scalar
product will be denoted by k · kX .
Definition A.0.1 (Spectral family). A family {Eλ }λ∈R of orthogonal pro-
jectors in X is called a spectral family or resolution of the identity if it satisfies
the following conditions:
(i) Eλ Eµ = Emin{λ,µ} , λ, µ ∈ R;
(ii) E−∞ = 0, E+∞ = I, where E±∞ x := limλ→±∞ Eλ x, ∀ x ∈ X , and where
I is the identity map on X .
(iii) Eλ−0 = Eλ , where Eλ−0 x := limǫ→0+ Eλ−ǫ x, ∀ x ∈ X .
Proposition A.0.1. Let f : R → R be a continuous function. Then the
limit of the Riemann sum
n
X
i=1
f (ξi ) Eλi − Eλi−1 x
231
232
A. Spectral theory in Hilbert spaces
exists in X for |λi − λi−1 | → 0, where −∞ < a = λ0 < ... < λn = b < +∞,
ξi ∈ (λi−1 , λi ], and is denoted by
Z
b
f (λ)dEλ x.
a
Definition A.0.2. For any given x ∈ X and any continuous function f :
R +∞
R → R, the integral −∞ f (λ)dEλ x is defined as the limit, if it exists, of
Rb
f (λ)dEλ x when a → −∞ and b → +∞.
a
Since condition (i) in the definition of the spectral family is equivalent to
hEλ x, xi ≤ hEµ x, xi, for all x ∈ X andλ ≤ µ,
the function λ 7→ hEλ x, xi = kEλ xk2 is monotonically increasing and due to
the condition (ii) in the definition of the spectral family also continuous from
the left. Hence it defines a measure on R, denoted by dkEλ xk2 . Then the
following connection holds:
Proposition A.0.2. For any given x ∈ X and any continuous function f :
R → R:
Z
+∞
−∞
f (λ)dEλ x exists ⇐⇒
Z
+∞
f 2 (λ)dkEλxk2 < +∞.
−∞
Proposition A.0.3. Let A be a self-adjoint operator in X . Then there
exist a unique spectral family {Eλ }λ∈R , called spectral decomposition of A or
spectral family of A, such that
Z
D(A) = {x ∈ X |
and
Ax =
Z
+∞
λ2 dkEλ xk2 < +∞}
−∞
+∞
λdEλ x,
−∞
We write:
A=
Z
x ∈ D(A).
+∞
λdEλ .
−∞
233
Definition A.0.3. Let A be a self-adjoint operator in X with spectral family
{Eλ }λ∈R and let f be a measurable function on R with respect to the measure
dkEλ xk2 for all x ∈ X . Then f (A) is the operator defined by the formula
f (A)x =
Z
+∞
x ∈ D(f (A)),
f (λ)dEλx,
−∞
where
D(f (A)) = {x ∈ X |
Z
+∞
f 2 (λ)dkEλ xk2 < +∞}.
−∞
Proposition A.0.4. Let M0 be the set of all measurable functions on R
with respect to the measure dkEλxk2 for all x ∈ X (in particular, piecewise
continuous functions lie in M0 ). Let A be a self-adjoint operator in X with
spectral family {Eλ }λ∈R and let f , g ∈ M0 .
(i) If x ∈ D(f (A)) and z ∈ D(g(A)), then
hf (A)x, g(A)zi =
Z
+∞
f (λ)g(λ)dhEλx, zi.
−∞
(ii) If x ∈ D(f (A)), then f (A)x ∈ D(g(A)) if and only if x ∈ D((gf )(A)).
Furthermore,
g(A)f (A)x = (gf )(A)x.
(iii) If D(f (A)) is dense in X , then f (A) is self-adjoint.
(iv) f (A) commutes with Eλ for all λ ∈ R.
Proposition A.0.5. Let A be a self-adjoint operator in X with spectral fam-
ily {Eλ }λ∈R .
(i) λ0 lies in the spectrum of A if and only if Eλ0 6= Eλ0 +ǫ for all ǫ > 0.
(ii) λ0 is an eigenvalue of A if and only if Eλ0 6= Eλ0 +0 = limǫ→0 Eλ0 +ǫ . The
corresponding eigenspace is given by (Eλ0 +0 − Eλ0 )(X ).
234
A. Spectral theory in Hilbert spaces
At last, we observe that if A is a linear bounded operator, then the
operator A∗ A is a linear, bounded, self-adjoint and semi-positive definite
operator. Let {Eλ } be the spectral family of A∗ A and let M be the set of
all measurable functions on R with respect to the measure dkEλ xk2 for all x
∈ X . Then, for all f ∈ M,
Z
+∞
−∞
f (λ)dEλ x =
Z
0
kAk2
f (λ)dEλ x = lim+
ǫ→0
Z
kAk2 +ǫ
f (λ)dEλ x.
0
Hence, the function f can be restricted to the interval [0, kAk2 + ǫ] for some
ǫ > 0.
Appendix B
Approximation of a finite set of
data with cubic B-splines
B.1
B-splines
Let [a, b] be a compact interval of R, let
∆ = {a = t0 < t1 < ... < tk < tk+1 = b}
(B.1)
be a partition of [a, b] and let m be an integer, m > 1. Then the space
Sm (∆) of polynomial splines with simple knots of order m on ∆ is the space
of all function s = s(t) for which there exist k + 1 polynomials s0 , s1 , ..., sk of
degree ≤ m − 1 such that
(i) s(t) = sj (t) for tj ≤ t ≤ tj+1 ,
(ii)
di
s (t )
dti j−1 j
=
di
s (t ),
dti j j
j = 0, ..., k;
for i = 0, ..., m − 2,
j = 1, ..., k.
The points tj are called the knots of the spline and t1 , ..., tk are the inner
knots.
It is well known (cf. e.g. [86]) that Sm (∆) is a vector space of dimension
m + k. A base of Sm (∆) with good computational properties is given by the
normalized B-splines.
An extended partition of [a, b] associated to Sm (∆) is a sequence of points
235
236
B. Approximation of a finite set of data with cubic B-splines
∆∗ = {t̃−m+1 ≤ ... ≤ t̃k+m } such that t̃i = ti for every i = 0, ..., k + 1. There
are different possible choices of the extended partition ∆∗ . Here, we shall
consider the choice
t̃−m+1 = ... = t̃0 = a,
t̃k+1 = t̃k+2 = ... = t̃k+m .
(B.2)
The normalized B-splines on ∆∗ are the functions {Nj,m}j=−m+1,...,k defined
recursively in the following way:
(
1,
for t̃j ≤ t ≤ t̃j+1 ,
Nj,1 (t) =
0,
elsewhere;
( t−t̃
t̃ −t
j
N
(t) + t̃ j−t̃j+1 Nj+1,h−1 (t),
t̃j+h−1 −t̃j j,h−1
j+h
Nj,h (t) =
0,
(B.3)
for t̃j 6= t̃j+h ,
elsewhere
(B.4)
for h = 2, ..., m. The cases 0/0 must be interpreted as 0.
The functions Nj,h have the following well known properties:
(1) Local support: Nj,m (t) = 0, ∀ t ∈
/ [t̃j , t̃j+m ) if t̃j < t̃j+m ;
(2) Non negativity: Nj,m (t) > 0, ∀ t ∈ (t̃j , t̃j+m ), t̃j < t̃j+m ;
(3) Partition of unity:
B.2
Pk
j=−m+1
Nj,m (t) = 1, ∀ t ∈ [a, b].
Data approximation
Let now (λ1 , µ1 ), ..., (λn , µn ), n ∈ N, n ≥ m + k, λj and µj ∈ R such that
a = λ1 < ... < λn = b
(B.5)
be a given set of data. We want to find a spline s(t) ∈ Sm (∆),
s(t) =
k
X
cj Nj,m(t),
j=−m+1
that minimizes the least-squares functional
n
X
l=1
|s(λl ) − µl |2
(B.6)
B.2 Data approximation
237
on Sm (∆). Simple calculations show that the solutions of this minimization
problem are the solutions of the overdetermined linear system
k
X
j=−m+1
cj
n
X
Ni,m (λl )Nj,m (λl ) =
l=1
n
X
µl Ni,m (λl ),
l=1
i = −m + 1, ..., k. (B.7)
Denoting with H the matrix of the normalized B-splines on the approximation points:
H = {hl,j } = {Nj,m(λl )},
l = 1, ..., n;
j = −m + 1, ..., k,
(B.8)
we can rewrite (B.7) in the form
H ∗ Hc = H ∗ µ,
(B.9)
where c and µ are the column vectors of the cj and of the µl respectively.
It can be shown that the system has a unique solution if ∆∗ satisfies the so
called Schönberg-Whitney conditions:
Theorem B.2.1. The matrix H has maximal rank if there exists a sequence
of indices 1 ≤ j1 ≤ ... ≤ jm+k ≤ n such that
t̃i < λji < t̃i+m ,
i = −m + 1, ..., k,
(B.10)
where the t̃i are the knots of the extended partition ∆∗ .
With equidistant inner knots ti = a + i (b−a)
and the particular choice
k+1
(B.2), it is easy to see that the Schönberg-Whitney conditions are satisfied
for every k ≤ n − m: for example, if k = n − m, ji = i for every i = 1, ..., n.
238
B. Approximation of a finite set of data with cubic B-splines
Appendix C
The algorithms
In this section of the appendix we present the main algorithms used in the
thesis. All numerical experiments have been executed on a Pentium IV PC
using Matlab 7.11.0 R2010b.
C.1
Test problems from P. C. Hansen’s Regularization Tools
blur and tomo exact solution
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
Figure C.1: Exact solution of the test problems tomo and blur from P.C.
Hansen’s Regularization Tools.
Many test problems used in this thesis are taken from P. C. Hansen’s
239
240
C. The algorithms
Regularization Tools. This is a software package that consists of a collection
of documented Matlab functions for analysis and solution of discrete ill-posed
problems. The package and the underlying theory is published in [35] and the
most recent version of the package, which is updated regularly, is described
in [41]. The package can be downloaded directly from the page
http : //www2.imm.dtu.dk/ pch/Regutools/,
(C.1)
where a complete manual is also available.
All the algorithms of the thesis referring to [35] or [41] are taken from the
version 4.1 (for Matlab version 7.3). Below, we describe briefly the files that
have been used in the thesis.
More details on these functions such as the synopsis, the input and output
arguments, the underlying integral equation and the references can be found
in the manual of the Regularization Tools [35] in a pdf format at the web
page (C.1).
We consider 10 different very famous test problems:
• baart, deriv2, foxgood, gravity, heat, i− laplace, phillips, shaw generate the
square matrix A, the exact solution x and the exact right-hand side b
of a discrete ill-posed problem, typically arising from a discretization of
an integral equation of the first kind. The dimension N of the problem
is the main input argument of these functions. In some cases it is
possible to choose between 2 or 3 different exact solutions. Of course,
A is always very ill-conditioned, but in some problems the eigenvalues
decrease more quickly than in others.
• blur and tomo generate the square matrix A, the exact solution f and
the exact right-hand side g a 2D image reconstruction problem. In
both cases, the vector f is a columnwise stacked version of a simple
test image (cf. Figure C.1) with J × J pixels and J is the fundamental
input argument of the function. In the blur problem, the matrix A is
a symmetric J 2 × J 2 doubly Toeplitz matrix, stored in sparse format
associated to an atmospheric turbulence blur and g := Af is the blurred
C.2 Conjugate gradient type methods algorithms
241
image. By modifying the third input argument of the function which
is set by default equal to 0.7, it is possible to control the shape of the
Gaussian point spread function associated to A. In the tomo problem,
the matrix A arises from the discretization of a simple 2D tomography
model. If no additional input arguments are used, A is a square matrix
of dimension J 2 × J 2 as in the case of blur.
C.2
Conjugate gradient type methods algorithms
Two different functions for the implementation of CGNE can be found in
the Regularization tools: cgls and lsqr− b. cgls is a direct implementation of
algorithm 3, lsqr− b is an equivalent implementation of the same algorithm
based on Lanczos bidiagonalization (cf. [73] and [36]).
Both routines require the matrix of the system A, a data vector b and an
integer kM AX corresponding to the number of CGNE steps to be performed
and return all kM AX solutions, stored as columns of the matrix X. The
corresponding solution norms and residual norms are returned in η and ρ,
respectively. If the additional parameter reorth is set equal to 1, then the
routines perform a reorthogonalization of the normal equation residual vectors.
To compare CGNE and CGME, a new routine cgne− cgme has been generated based on Algorithm 6. This function is similar to cgls and lsqr− b, but
returns also the solutions, the residual norms and the solution norms of the
CGME iterates.
A modified version of cgls, cgls− deb has been used for the image deblurring problems to avoid forming the matrix A. In the cgls− deb algorithm,
the matrix A is replaced by the PSF, the data vector g is replaced by its
corresponding image and all matrix-vectors multiplications are replaced by
the corresponding 2D convolutions as in Section 3.6, formula (3.41). As a
242
C. The algorithms
consequence, the synopsis of this function is different from the others:
[f, rho, eta] = cgls− deb(h, g, k);
(C.2)
here the input arguments are the matrices of the PSF h and of the blurred
image g and the number of iterations kM AX . The output is a 3D matrix f
such that for every k = 1, ..., kM AX the Matlab command
f(:, :, k)
gives the k-th iterate of the algorithm in the form of an image.
At last, a new routine cgn2 similar to cgls and lsqr− b has been created to
implement the conjugate gradient type method with parameter n = 2 (cf.
Algorithm 7 from Chapter 2).
We emphasize that in the tests where a visualization of the reconstructed
solutions was not necessary, all these functions were used without generating
the matrix of the reconstructed solutions, but instead overwriting at each
step the new iterate of CGNE on the old one, in order to spare memory and
time.
C.3
The routine data−approx
In the notations of Appendix B, the routine data− approx generates an approximation {(λ1 , µ̃1 ), ..., (λn , µ̃n )} of the data set {(λ1 , µ1 ), ..., (λn , µn )} according
to the following scheme (valid for n ≥ 5):
Step 1: Fix m = 4 and the number of inner knots k according to the dimension
of the problem: if n ≤ 5000 then k = ⌊n/4⌋, if n > 5000 then k =
⌊n/50⌋. Construct the partition ∆ = {t0 < t1 < ... < tk < tk+1 } such
−λ1
that t0 = λ1 , tk+1 = λn and ti = λ1 + i λnk+1
, then choose the extended
partition ∆∗ according to (B.2).
Step 2: Construct the matrix H of the normalized B-splines of order m on the
approximation points λ1 , ..., λn relative to the extended partition ∆∗
according to (B.3), (B.4) and (B.8).
C.4 The routine mod− min− max
243
Step 3: Find the unique solution c of the linear system (B.9).
Step 4: Evaluate the spline s(t) =
Pk
i=−m+1 ci Ni,m (t)
on the approximation
points λj denoting with µ̃j = s(λj ) the corresponding results.
C.4
The routine mod− min− max
This section describes the Matlab function implemented for the computation
of the index p that divides the vector of the SVD (Fourier) coefficients |u∗i bδ |,
i = 1, ..., m, associated to the (perturbed) linear system Ax = bδ , into a
vector of low frequency components, constituted by its first p entries and a
vector of high frequency components, constituted by its last m − p entries.
The routine, denoted with the name mod− min− max, is a variation of the
Min− Max Rule proposed in [101] and requires the singular values λ1 , ..., λN
of A.
Suppose for simplicity m = N and let ϕi denote the ratios |u∗i bδ |/λi for
i = 1, ..., N. Separate the set Ψ = {ϕ1 , ..., ϕN } into 2 sets
Ψ1 := {ϕi | ϕi > λi }
Ψ2 := {ϕi | ϕi ≤ λi }
(C.3)
and let N1 and N2 be the number of elements in Ψ1 and Ψ2 respectively.
Then:
(i) If N2 = 0 or λN < 10−13 , calculate an approximation Ψ̃ of Ψ by means
of cubic B-splines with the routine data− approx of the Section C.3 of
the Appendix and choose p as the index corresponding to the minimal
value in Ψ̃.
(ii) Otherwise, consider the first of the last 5 elements in Ψ2 such that ϕij +1
∈
/ Ψ2 and choose p as the corresponding index.
When the smallest singular value is close to the machine epsilon or the set
Ψ2 is empty, then Ψ can be used to determine the regularization index. In
this case the data noise is assumed to be predominant with respect to the
244
C. The algorithms
model errors, so the minimum of the sequence ϕi should correspond to the
index ibδ . Moreover, the data approximation is used to avoid the presence
of possible outliers. A typical case is shown in Figure 3.5 with the shaw test
problem.
In the second case of the modified Min− Max Rule the model errors are predominant and the greatest indices in Ψ2 are included in the TSVD provided
that they are contiguous (i.e. the successive element does not belong to Ψ1 ).
This situation is shown in the picture on the right of Figure 3.5 obtained
with the phillips test problem.
C.5
Data and files for image deblurring
The experiments on image deblurring performed in Section 3.6 make use of
the following files.
• The file psfgauss is taken the HNO Functions, a small Matlab package
that implements the image deblurring algorithms presented in [38]. The
package is available at the web page
http : //www2.imm.dtu.dk/ pch/HNO/.
For a given integer J and a fixed number stdev, representing the deviations of the Gaussian along the vertical and horizontal directions, the
Matlab command
[h, center] = psfGauss(J, stdev);
(C.4)
generates the PSF matrix h of dimension J × J and the center of the
PSF.
• The file im− blurring generates a test problem for image deblurring. A
gray-scale image is read with the Matlab function im− read. Then a
Gaussian PSF is generated by means of the function psfgauss and the
image is blurred according to the forumla (3.41) from Section 3.6. At
C.6 Data and files for the tomographic problems
245
last, Gaussian white noise is added to the blurred image to obtain the
perturbed data of the problem.
• The function fou− coeff plots the Fourier coefficients of an image de-
blurring test problem. Given the (perturbed) image g and the PSF h,
it returns the Fourier coefficients, the singular values of the BCCB matrix A corresponding to h and the index p computed by the function
mod− min− max of Section C.4 of the Appendix.
C.6
Data and files for the tomographic problems
The numerical experiments on the tomographic problems described in Section 4.6 make use of the files paralleltomo, fanbeamtomo and seismictomo from
P.C. Hansen’s Air Tools. This is a Matlab software package for tomographic
reconstruction (and other imaging problems) consisting of a number of algebraic iterative reconstruction methods. The package, described in the paper
[40], can be downloaded at the web page
http : //www2.imm.dtu.dk/ pcha/AIRtools/.
For a fixed integer J > 0, the Matlab command
[A, g, f] = paralleltomo(J)
(C.5)
generates the exact solution f, the matrix A and the exact data g = Af of a
two dimensional tomographic test problem with parallel X-rays. The input
argument J is the size of the exact solution f of the system. Therefore, the
matrix A has N := J 2 columns. The number of rows of A is given by the
number of total rays for each angle l0 multiplied by the number of angles j0 .
Consequently, the sinogram G corresponding to the exact data g is a matrix
with l0 rows and j0 columns.
The functions fanbeamtomo and seismictomo generate the test problem in
246
C. The algorithms
a similar way. We emphasize that for each problem the dimensions of the
sinogram are different. If these values are not specified, they are set by
default. In particular:
√
• paralleltomo: l0 = round( 2J), j0 = 180;
√
• fanbeamtomo: l0 = round( 2J), j0 = 360;
• seismictomo: l0 = 2J, j0 = J.
Appendix D
CGNE and rounding errors
In literature, there are a number of mathematically equivalent implementations of CGNE and of the other methods discussed above. Many authors
suggest LSQR (cf. [73] and [36]), which is an equivalent implementation of
CGNE based on Lanczos bidiagonalization.
The principal problem with any of these methods is the loss of orthogonality
in the residuals due to finite precision arithmetic. The orthogonality can
be maintained by reorthogonalization techniques that are significantly more
expensive and require a larger number of intermediate vectors (cf. e.g., [18]).
In the literature the influence of round-off errors on conjugate gradient type
methods has been studied mainly for well-posed problems.
In [27] Hanke comments on the ill-posed case that the reorthogonalization
techniques did not improve the optimal accuracy in the case he considered.
Our numerical experiments confirm that the sequence of the relative errors
kx† − zδk k does not change significantly for k smaller than the optimal stop-
ping index.
However, even small differences in the computation of the residual norms
and of the norm of the solutions kzδk k may affect seriously the results pre-
sented in the following sections of Chapter 3, especially in the 1D examples.
Therefore, in these sections the routine lsqr− b from [35], with the parameter
reorth = 1 was preferred to the other routines in the implementation of the
247
248
D. CGNE and rounding errors
CGNE algorithm.
In the other cases we proceeded as follows:
• In the tests of Chapter 2, to compare the results obtained by CGNE
and CGME we implemented Algorithm 6, generating a new routine
cgne− cgme specific for this case;
• In the numerical experiments of Chapter 3 on image deblurring we used
the routine cgls− deb;
• In the numerical experiments of Chapter 4 we simply used cgls without
reorthogonalization.
Bibliography
[1] M. ABRAMOWITZ & I. A. STEGUN, Handbook of Mathematical Functions, Dover, New York, 1970.
[2] N.I. AKHIEZER & I.M. GLAZMAN Theory of linear operators in
Hilbert spaces, Pitman, 1981.
[3] C. T. H. BAKER, The Numerical Treatment of Integral Equations,
Clarendon Press, Oxford, UK, 1977.
[4] M. BERTERO & P. BOCCACCI, Introduction to Inverse Problems in
Imaging, IOP Publishing, Bristol, 1998.
[5] D. CALVETTI, G. LANDI, L. REICHEL & F. SGALLARI: Nonnegativity and iterative methods for ill-posed problems, Inverse Problems 20
(2004), 1747-1758.
[6] C. CLASON, B. JIN, K. KUNISCH, A semismooth Newton method for
L1 data fitting with automatic choice of regularization parameters and
noise calibration, SIAM J. Imaging Sci., 3 (2010), 199-231.
[7] F. COLONIUS & K. KUNISCH Output least squares stability in elliptic
systems, Appl. Math. Opt. 19 (1989), 33-63.
[8] J. W. DANIEL, The Conjugate Gradient Method for Linear and Nonlinear Operator Equations, SIAM J. Numer. Anal., 4 (1967), 10-26.
249
250
BIBLIOGRAPHY
[9] J. L. CASTELLANOS, S. GOMEZ & V. GUERRA The triangle method
for finding the corner of the L-curve Appl. Numer. Math. 43 (2002) 359373.
[10] T. F. CHAN & J. SHEN, Image Processing and Analysis: variational,
PDE, wavelet, and stochastic methods, SIAM, Philadelphia, 2005.
[11] P. E. DANIELSSON, P. EDHOLM & M. SEGER, Towards exact 3Dreconstruction for helical cone-beam scanning of long objects. A new detector arrangement and a new completeness condition, in Proceedings
of the 1997 International Meeting on Fully Three-Dimensional Image
Reconstruction in Radiology and Nuclear Medicine, edited by D. W.
Townsend & P. E. Kinahan, Pittsburgh, 1997.
[12] P. J. DAVIS, Circulant matrices, Wiley, New York, 1979.
[13] M. K. DAVISON The ill-conditioned nature of the limited angle tomography problem, SIAM J. Appl. Math. 43, 428-448.
[14] L. M. DELVES & J. L. MOHAMED, Computational Methods for Integral Equations, Cambridge University Press, Cambridge, UK, 1985.
[15] H. W. ENGL & H. GFRERER, A Posteriori Parameter Choice for General Regularization Methods for Solving Linear Ill-posed Problems, Appl.
Numer. Math. 7 (1988) 395-417.
[16] H. W. ENGL, K. KUNISCH & A. NEUBAUER, Convergence rates for
Tikhonov regularization of non-linear ill-posed problems, Inverse Problems, 5 (1989), 523-540.
[17] H. W. ENGL, M. HANKE & A. NEUBAUER, Regularization of Inverse
Problems, Kluwer Academic Publishers, 1996.
[18] G. H. GOLUB & C. F. VAN LOAN, Matrix Computations, The John
Hopkins University Press, Baltimore, London, 1989.
BIBLIOGRAPHY
251
[19] F. GONZALES, Notes on Integral Geometry and Harmonic Analysis,
COE Lecture Note Vol. 24, Kyushu University 2010.
[20] C. W. GROETSCH, Elements of Applicable and Functional Analysis,
Dekker, New York, 1980.
[21] C. W. GROETSCH, Generalized Inverses of Linear Operators: Representation and Approximation, Dekker, New York, 1977.
[22] C. W. GROETSCH, The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind, Pitman, Boston, 1984.
[23] U. HAEMARIK & U. TAUTENHAHN, On the Monotone Error Rule for
Parameter Choice in Iterative and Continuous Regularization Methods,
BIT Numer. Math. 41,5 (2001), 1029-1038.
[24] U. HAEMARIK & R. PALM, On Rules for Stopping the Conjugate
Gradient Type Methods in Ill-posed Problems, Math. Model. Anal. 12,1
(2007) 61-70.
[25] U. HAEMARIK & R. PALM, Comparison of Stopping Rules in Conjugate Gradient Type Methods for Solving Ill-posed problems, in Proceedings of the 10-th International Conference MMA 2005 & CMAM 2,
Trakai. Technika (2005) 285-291.
[26] U. HAEMARIK, R. PALM & T. RAUS Comparison of Parameter
Choices in Regularization Algorithms in Case of Different Information
about Noise Level, Calcolo 48,1 (2011) 47-59.
[27] M. HANKE, Conjugate Gradient Type Methods for Ill-Posed Problems,
Longman House, Harlow, 1995.
[28] M. HANKE, Limitations of the L-curve method in ill-posed problems,
BIT 36 (1996), 287-301.
252
BIBLIOGRAPHY
[29] M. HANKE & J. G. NAGY, Restoration of Atmospherically Blurred
Images by Symmetric Indefinite Conjugate Gradient Techniques, Inverse
Problems, 12 (1996), 157-173.
[30] M. HANKE, A. NEUBAUER, & O. SCHERZER, A convergence analysis of the Landweber iteration for nonlinear ill-posed problems, Numer.
Math., 72 (1995) 21-37.
[31] M. HANKE, A regularization Levenberg-Marquardt scheme, with applications to inverse groundwater filtration problems, Inverse Problems, 13
(1997), 79-95.
[32] P. C. HANSEN, The discrete Picard condition for discrete ill-posed problems, BIT 30 (1990), 658-672.
[33] P. C. HANSEN, Analysis of the discrete ill-posed problems by means of
the L-curve, SIAM Review, 34 (1992), 561-580.
[34] P. C. HANSEN & D. P. O’LEARY, The use of the L-curve in the regularization of discrete ill-posed problems, SIAM J. Sci. Comput., 14 (1993),
1487-1503.
[35] P. C. HANSEN, Regularization Tools: A Matlab Package for Analysis and Solution of Discrete Ill-posed Problems (version 4.1), Numerical
Algorithms, 6 (1994) 1-35.
[36] P.C. HANSEN, Rank Deficient and Discrete Ill-posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, 1998.
[37] P. C. HANSEN, T. K. JENSEN & G. RODRIGUEZ, An adaptive pruning algorithm for the discrete L-curve criterion, J. Comput. Appl. Math.
198 (2006), 483-492.
[38] P. C. HANSEN, J. G. NAGY, D. P. O’LEARY, Deblurring Images,
Matrices, Spectra and Filtering, SIAM, Philadelphia, 2006.
BIBLIOGRAPHY
253
[39] P. C. HANSEN, M. KILMER, R. H. KJELDSEN, Exploiting residual
information in the parameter choice for discrete ill-posed problems, BIT
Numerical Mathematics, 46,1 (2006), 41-59.
[40] P. C. HANSEN & M. SAXILD-HANSEN, AIR Tools - A MATLAB
Package of Algebraic Iterative Reconstruction Methods, Journal of Computational and Applied Mathematics, 236 (2012), 2167-2178.
[41] P. C. HANSEN, Regularization Tools Version 4.0 for Matlab 7.3, Numerical Algorithms, 46 (2007), 189-194.
[42] T. HEIN & B. HOFMANN, Approximate source conditions for nonlinear ill-posed problems – chances and limitations, Inverse Problems, 25
(2009), 035003 (16pp).
[43] S. HELGASON, The Radon Transform, 2nd. Ed., Birkhäuser Progress
in Math., 1999.
[44] G. HELMBERG, Introduction to Spectral Theory in Hilbert Spaces,
North Holland, Amsterdam, 1969.
[45] M. R. HESTENES & E. STIEFEL, Methods of Conjugate Gradients
for Solving Linear Systems, J. Research Nat. Bur. Standards 49 (1952),
409-436.
[46] B. HOFMANN, B. KALTENBACHER, C. PÖSCHL & O. SCHERZER,
A convergence rates result for Tikhonov regularization in Banach spaces
with non-smooth operators, Inverse Problems, 23,3 (2007), 987-1010.
[47] T. HOHAGE, Iterative methods in inverse obstacle scattering: regularization theory of linear and nonlinear exponentially ill-posed problems,
PhD Thesis University of Linz, Austria, 1999.
[48] P. A. JANSSON Deconvolution of Images and Spectra, San Diego, Academic 1997.
254
BIBLIOGRAPHY
[49] Q. JIN, Inexact Newton-Landweber iteration for solving nonlinear inverse problems in Banach spaces, Inverse Problems 28 (2012) 065002,
14pp.
[50] H. HU & J. ZHANG, Exact Weighted-FBP Algorithm for ThreeOrthogonal-Circular Scanning Reconstruction, Sensors, 9 (2009), 46064614.
[51] B. KALTENBACHER & B. HOFMANN, Convergence rates for the iteratively regularized gauss-newton method in Banach spaces, Inverse Problems, 26,3 (2010) 035007 21pp.
[52] B. KALTENBACHER & A. NEUBAUER, Convergence of projected iterative regularization methods for nonlinear problems with smooth solutions, Inverse Problems, 22 (2006), 1105-1119.
[53] B. KALTENBACHER, A. NEUBAUER, & O. SCHERZER, Iterative
Regularization Methods for Nonlinear Ill-posed Problems, de Gruyter,
2007.
[54] B. KALTENBACHER & I. TOMBA, Convergence rates for an iteratively regularized Newton-Landweber iteration in Banach space, Inverse
Problems 29 (2013) 025010.
[55] B. KALTENBACHER, Convergence rates for the iteratively regularized
Landweber iteration in Banach space, Proceedings of the 25th IFIP TC7
Conference on System Modeling and Optimization, Springer, 2013, to
appear.
[56] A. KATSEVICH, Analysis of an exact inversion formula for spiral conebeam CT, Physics in Medicine and Biology, 47 (2002) 2583-2598.
[57] A. KATSEVICH, Theoretically exact filtered backprojection-type inversion algorithm for spiral CT, SIAM Journal of Applied Mathemathics,
62 (2002), 2012-2026.
BIBLIOGRAPHY
255
[58] A. KATSEVICH, An improved exact filtered backprojection algorithm
for spiral computed tomography. Advances in Applied Mathematics, 32
(2004), 681-697.
[59] C. T. KELLEY, Iterative Methods for Linear and Nonlinear Equations,
Society for Industrial and Applied Mathematics, Philadelphia 1995.
[60] M. KILMER & G. W. STEWART, Iterative Regularization and MINRES, SIAM J. Matrix Anal. Appl., 21,2 (1999) 613-628.
[61] A. KIRSCH, An Introduction to the Mathematical Theory of Inverse
Problems, Springer-Verlag, New York, 1996.
[62] R. KRESS, Linear Integral Equations, Applied Mathematical Sciences
vol. 82, Second Edition, Springer-Verlag, New York, 1999.
[63] L. LANDWEBER, An Iteration Formula for Fredholm Integral Equation
of the First Kind, Amer. J. Math. 73 (1951), 615-624.
[64] A. K. LOUIS, Orthogonal Function Series Expansion and the Null Space
of the Radon Transform, SIAM J. Math. Anal. 15 (1984), 621-633.
[65] P. MAASS, The X-Ray Transform: Singular Value Decomposition and
Resolution, Inverse Problems 3 (1987), 729-741.
[66] T. M. MACROBERT, Spherical Harmonics: An Elementary Treatise
on Harmonic Functions with Applications, Pergamon Press, 1967.
[67] V. A. MOROZOV, On the Solution of Functional Equations by the
Method of Regularization, Soviet Math. Dokl., 7 (1966) 414-417.
[68] F. NATTERER, The Mathematics of Computerized Tomography, J. Wiley, B.G. Teubner, New York, Leipzig, 1986.
[69] F. NATTERER, and F. WÜBBELING Mathematical Methods in Image
Reconstruction, Cambridge University Press, SIAM 2001.
256
BIBLIOGRAPHY
[70] A. S. NEMIROVSKII, The Regularization Properties of the Adjoint Gradient Method in Ill-posed Problems, USSR Comput. Math. and Math.
Phys., 26,2 (1986) 7-16.
[71] A. NEUBAUER, Tikhonov-Regularization of Ill-Posed Linear Operator
Equations on Closed Convex Sets, PhD thesis, Johannes Kepler Universität Linz November 1985, appeared in Verlag der Wissenschaftlichen
Gesellschaften Österreich, Wien, 1986.
[72] A. V. OPPENHEIM & R. W. SCHAFER, Discrete-Time Signal Processing, Prentice Hall Inc., New Jersey, 1989.
[73] C. C. PAIGE & M. A. SAUNDERS, LSQR: an algorithm for sparse
linear equations and sparse least squares, ACM trans. Math. Software,
8 (1982), 43-71.
[74] R. PLATO, Optimal algorithms for linear ill-posed problems yield regularization methods, Numer. Funct. Anal. Optim., 11 (1990), 111-118.
[75] J. RADON, Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Mannigfaltigkeiten, Berichte Sächsische Akademie
der Wissenschaften, Math.-Phys., Kl, 69 (1917), 262-267.
[76] L. REICHEL & H. SADOK, A New L-Curve for Ill-Posed Problems, J.
Comput. Appl. Math., 219 (2008) 493-508.
[77] L. REICHEL & G. RODRIGUEZ, Old and new parameter choice rules
for discrete ill-posed problems, Num. Alg., 2012, DOI 10.1007/s11075012-9612-8.
[78] A. RIEDER, On convergence rates of inexact Newton regularizations,
Numer. Math. 88 (2001), 347-365.
[79] A. RIEDER, Inexact Newton regularization using conjugate gradients as
inner iteration, SIAM J. Numer. Anal. 43 (2005) 604-622.
BIBLIOGRAPHY
257
[80] W. RUDIN, Functional Analysis, Mc Graw-Hill Book Company, 1973.
[81] O. SCHERZER, A modified Landweber iteration for solving parameter
estimation problems, Appl. Math. Opt., 68 (1998), 38-45.
[82] T. SCHUSTER, B. KALTENBACHER, B. HOFMANN, & K. KAZIMIERSKI, Regularization Methods in Banach Spaces De Gruyter,
Berlin, New York, 2012.
[83] R. T. SEELEY, Spherical Harmonics, Amer. Math. Monthly 73 (1966),
115-121.
[84] L. A. SHEPP & B.F. LOGAN The Fourier Reconstruction of a Head
Section, IEEE Trans. Nuclear Sci. NS-21 (1974), 21-43.
[85] M. A. SHUBIN, Pseudodifferential Operators and Spectral Theory, Second Edition, Springer-Verlag 2001.
[86] L. L. SHUMAKER, Spline functions basic theory, John Wiley and Sons
1981.
[87] T. STEIHAUG, The conjugate gradient method and trust regions in large
scale optimization, SIAM J. Num. Anal., 20 (1983), 626-637.
[88] G. SZEGÖ, Orthogonal Polynomials, Amer. Math. Soc. Colloq. Publ.,
Vol. 23, Amer. Math. Soc., Providence, Rhode Island, 1975.
[89] K. C. TAM, S. SAMARASEKERA & F. SAUER, Exact cone-beam CT
with a spiral scan, Physics in Medicine and Biology, 43 (1998), 10151024.
[90] A. N. TIKHONOV, Regularization of Incorrectly Posed Problems, Soviet
Math. Dokl., 4 (1963) 1624-1627.
[91] A. N. TIKHONOV, Solution of Incorrectly Formulated Problems and the
Regularization Method, Soviet Math. Dokl., 4 (1963), 1035-1038.
258
BIBLIOGRAPHY
[92] F. TREVES, Basic Linear Partial Differential Equations, Dover, New
York, 2006.
[93] H. K. TUY, An inversion formula for cone-beam reconstruction, SIAM
J. Appl. Math., 43 (1983), 546-552.
[94] C. F. VAN LOAN, Computational frameworks for the fast Fourier
Transform, SIAM, Philadelphia, 1992.
[95] C. R. VOGEL, Non Convergence of the L-curve Regularization Parameter Selection Method, Inverse Problems, 12 (1996), 535-547.
[96] C. R. VOGEL, Computational Methods for Inverse Problems, SIAM
Frontiers in Applied Mathematics, 2002.
[97] A. J. WUNDERLICH, The Katsevich Inversion Formula for Cone-Beam
Computed Tomography, Master Thesis, 2006.
[98] J. YANG, Q. KONG, T. ZHOU & M. JIANG, Cone-beam cover method:
an approach to performing backprojection in Katsevich’s exact algorithm
for spiral cone-beam CT, Journal of X-Ray Science and Technology, 12
(2004), 199-214.
[99] Z. B. XU & G. F. ROACH, Characteristic inequalities of uniformly convex and uniformly smooth Banach spaces, Journal of Mathematical Analysis and Applications, 157,1 (1991), 189-210.
[100] S. S. YOUNG, R. G. DRIGGERS & E. L. JACOBS, Image Deblurring,
Boston, Artech House Publishers, 2008.
[101] F. ZAMA, Computation of Regularization Parameters using the
Fourier Coefficients, AMS acta, 2009, www.amsacta.unibo.it.
Acknowledgements
First of all, I would like to thank my supervisor Elena Loli Piccolomini, especially for the support and for the patience she showed in the difficult moments
and during the numerous discussions we have made about this thesis. Then
I would like to thank Professor Germana Landi, whose suggestions and remarks were crucial in the central chapters of the thesis and allowed Elena
and me to improve the level of this part significantly. A special thank to
Professor Barbara Kaltenbacher, for the very kind hospitality in Klagenfurt
and for her illuminating guide in the final part of the thesis. Without her
advices, that part would not have been possible.
I am very grateful also to Professor Alberto Parmeggiani, whose support was
very important for me during these three years.
I would also like to thank the Department of Mathematics of the University
of Bologna, that awarded me with 3 fellowships for the Ph.D and that gave
me the opportunity of studying and carrying out my research in such a beautiful environment. I also thank the Alpen Adria Universität of Klagenfurt
for the hospitality in April 2012 and October-November 2012 respectively.
In particular a special greeting is dedicated to the secretary Anita Wachter,
who provided me with a beautiful apartment during my second period in
Klagenfurt.
And now, let’s turn to italian!
Alla fine, sono arrivato in fondo anche a questa fatica ed ancora non ci credo.
Sebbene non sia un amante dei ringraziamenti, questa volta vorrei ringraziare
le persone a me più vicine, per la pazienza che mi hanno dedicato in questi
i
ii
Acknowledgements
anni di ricerca pieni di alti e bassi.
Innanzitutto, la mia famiglia, mia sorella, mio padre e mia madre, che hanno
sempre creduto in me, anche quando tornavo a casa arrabbiato, triste o sfiduciato. Inutile dire che il supporto di tutti, compresi i miei nonni e mia zia
Carla, è stato e sarà sempre fondamentale per me.
Grazie anche ad Anna e Marco, che posso definire membri onorari della
famiglia, ai miei cugini Daniele ed Irene e ai miei zii, Lidia e Davide.
Voglio poi ringraziare i miei tre compagni di viaggio in quest’avventura:
Gabriele Barbieri, Luca Ferrari e Giulio Tralli. In particolare Giulio, col
quale ho condiviso in pieno quest’esperienza, con discussioni infinite, dalla
matematica ai massimi sistemi alle stupidaggini, oltre ad un viaggio memorabile insieme, e tantissime altre esperienze.
Un immenso grazie anche a tutti i miei amici, in particolare quelli più stretti,
Luca Fabbri, Barbara Galletti, Laura Melega, Andrea Biondi, Roberto e
Laura Costantini, Emanuele Salomoni, Elena Campieri, Anna Cavrini, Luca
Cioni e tutti gli altri.
Infine, l’ultimo ringraziamento va a Silvia, che semplicemente mi ha dato
quella felicità che cercavo da tanto tempo e che spero di avere finalmente
trovato.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement