# User manual | Image Processing – Variational and PDE Methods Contents ```Dr C.-B. Schönlieb
Mathematical Tripos Part III: Lent Term 2013/14
Image Processing – Variational and PDE Methods
Contents
1
.
.
.
.
1
2
2
6
6
2
Mathematical Representation of Images
2.1 Images as elements in a function space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The Mumford-Shah image model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
9
13
3
Variational Approach to Image Processing
3.1 Mathematical preliminaries . . . . . . . . . . . . . . . . . . . . . .
3.2 Image denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Image reconstruction in the context of inverting a linear operator
3.4 Image inpainting . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Image segmentation with Mumford-Shah . . . . . . . . . . . . . .
.
.
.
.
.
13
15
17
22
25
32
PDEs in Imaging
4.1 Perona-Malik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Anisotropic diffusion filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
38
41
4
1
Introduction
1.1 What is a digital image? . . . . . . . . . . . . . . . . . . . .
1.2 Image processing tasks . . . . . . . . . . . . . . . . . . . . .
1.3 Image processing approaches . . . . . . . . . . . . . . . . .
1.4 Motivation for nonlinear PDE and variational approaches
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
These lecture notes are the results of a graduate course given at the University of Cambridge. They
are heavily based on various textbooks and review articles in the mathematical imaging literature. The
three main sources for the creation of this material are
• Aubert and Kornprobst’s book in Springer, Applied Mathematical Sciences, 06’ [AK06].
Sections 3.5 and 4.
• Bredies and Lorenz’s book (in german!) in Vieweg+Teubner, 11’ [BL11].
Section 3.
• Chambolle et al.’s lecture notes in De Gruyter, 10’ [CCCNP10].
Section 3.2.
• Chan and Shen’s book in SIAM 05’ [CS05a].
Section 2.
1
Image Processing – Variational and PDE Methods
1.1
C.-B. Schönlieb
What is a digital image?
In order to appreciate the following theory and the image processing applications, we first need to understand what a digital image really is. Roughly speaking a digital image is obtained from an analogue
image (representing the continuous world) by sampling and quantization. Basically this means that the
digital camera superimposes a regular grid on an analogue image and assigns a value, e.g., the mean
brightness in this field, to each grid element. In the terminology of digital images these grid elements
are called pixels. The image content is then described by grayvalues or colour values prescribed in each
pixel. The grayvalues are scalar values ranging between 0 (black) and 255 (white). The colour values are
vector values, e.g., (r, g, b), where each channel r, g and b represents the red, green, and blue component
of the colour and ranges, as the grayvalues, from 0 to 255.
The mathematical representation of a digital image is a so-called image function u defined on a two
dimensional (in general rectangular) image domain, the grid. This function is either scalar valued in the
case of a grayvalue image, or vector valued in the case of a colour image. Here the function value u(x, y)
denotes the grayvalue, i.e., colourvalue, of the image in the pixel (x, y) of the image domain. Figure 1
visualizes the connection between the digital image and its image function for the case of a grayvalue
image.
Figure 1: Digital image versus image function: On the very left a zoom into a digital photograph where
the image pixels (small squares) are clearly visible is shown; in the middle the grayvalues of the red
selection in the digital photograph are displayed in matrix form; on the very right the image function
of the digital photograph is shown where the grayvalue u(x, y) is plotted as the height over the x, y−
plane.
Typical sizes of digital images range from 2000 × 2000 pixels in images taken with a simple digital
camera, to 10000 × 10000 pixels in images taken with high-resolution cameras used by professional
photographers. The size of images in medical imaging applications depends on the task at hand. PET
for example produces three dimensional image data, where a full-length body scan has a typical size of
175 × 175 × 500 pixels.
1.2
Image de-noising In most acquisition processes for digital images data wrong information is added
to the image. Even modern cameras which are able to acquire high–resolution images produce noisy
outputs, cf. Figure 2. In fact, the appearance of noise is an intrinsic problem in image processing. The
Please email all corrections and suggestions to these notes to [email protected]
2
1.2
3
Figure 2: Bad lighting conditions may result into noisy image. First: A digital photo which has been acquired under too little light. Second: Plot of the grey values of the red channel along the one-dimensional
slice marked in red in the photograph.
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Figure 3: Blurred and de-blurred image using a total variation approach, see Section 3.3
Identify and remove the noise while preserving the most important information and structures.
Remark 1.1. For the human eye, noise is an easy problem to cope with. If the noise is not too strong we are still
able to analyse an image for its contents. However, for the computer this is not the case. This is important when
aiming for the automated analysis of an image.
Image de-blurring Image de-blurring denotes the task of removing blur in images that can be caused
by wrong focusing, shaking of the camera, atmospheric turbulences (for instance in earth-based astronomical imaging), or the movement due to the breathing of the patient in medical imaging. The task
here is
Identify the blur and enhance the image by removing (or reducing) the blur. The better the
blurring process is understood, the better de-blurring works.
Segmentation The goal is to segment an image into its different objects. The simplest situation is
binary, that is the segmentation into object and background. Image segmentation aims to
Segment one or more objects of interest in an image, also under the presence of noise and blur.
Image inpainting An important task in image processing is the process of filling in missing parts of
damaged images based on the information obtained from the intact part of the image. It is essentially a
type of interpolation and is called inpainting or disocclusion. This imaging task is drastically different
from de-noising and de-blurring: in image inpainting all the information in certain pixels got lost. The
Please email all corrections and suggestions to these notes to [email protected]
4
1.2
Figure 4: Image segmentation with level set method, compare Section 3.5
Figure 5: Damaged image (left) and its inpainted image (right), see Section 3.3
5
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Reconstruct a guess for the original image contents inside of holes in an image using the available
image information from intact parts of the image domain.
Image reconstructions from samples of linear transformations In medical-, seismic-, biological- imaging or for some visualisation tasks in chemistry, some physical imaging tools are employed to visualise
the inside of the body, the earth, a cell or chemical reactions. In such applications, it usually is not the
image directly that is measured but samples of its Fourier transform or its Radon transform for instance.
In that case, we want to
Reconstruct an approximation of the image density from (usually under-) sampled transform
data by “smoothly” inverting the transformation.
Remark 1.2. Other tasks, which we are not considering in this lecture are image registration, edge detection,
video processing and many more.
1.3
Image processing approaches
1.4
Motivation for nonlinear PDE and variational approaches
Let us motivate the consideration of such sophisticated approaches as nonlinear PDEs for the solution
of imaging tasks by a brief discussion of image denoising. Noise in an image usually constitutes a
highly-oscillatory (high frequency) component of the acquired image data. One way to think about denoising an image is to smooth that image, aiming to “smooth” away the noise. The simplest and best
investigated method for smoothing images is to apply a linear filter to them. One example of such a
filter is Gaussian smoothing.
Gaussian smoothing For the following considerations we represent a grey scale image g as a realvalued mapping g ∈ L1 (R2 ). Gaussian smoothing denotes the construction of a smoothed version u of
g by convolving g with a Gaussian kernel, that is
Z
u(x) = (Gσ ∗ g) (x) :=
Gσ (x − y) g(y) dy,
(1.1)
R2
where Gσ denotes the two-dimensional Gaussian of width σ > 0:
2
2
1
e−|x| /(2σ ) .
Gσ (x) :=
2πσ 2
This convolution if smoothing the image g because Gσ ∗ g ∈ C ∞ (R2 ), even if g is only absolutely integrable. To understand this smoothing process better we investigate the effect of Gaussian smoothing in
the frequency domain. To do so, we define the Fourier transform F by
Z
(Fg)(ω) :=
g(x) e−iωx dx,
R2
and get
(F (Gσ ∗ g)) (ω) = (FGσ )(ω) · (Fg)(ω)
2
= e−|ω|
/(2/σ 2 )
· (Fg)(ω).
This means that (1.1) is a low-pass filter that attenuates high frequencies in a monotone way.
Please email all corrections and suggestions to these notes to [email protected]
6
1.4
Motivation for nonlinear PDE and variational approaches
7
Figure 6: Reconstruction of the Shepp-Logan Phantom from 11% under sampling of the Fourier transform
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Relation to linear diffusion filtering It is a classical result that for g ∈ C(R2 ), bounded, the linear
diffusion equation
ut = ∆u
(1.2)
u(x, t = 0) = g(x)
possesses the solution
(
u(x, t) =
g(x)
t=0
√
G 2t ∗ g (x) t > 0.
(1.3)
This solution is unique within the class of functions that satisfy
2
|u(x, t)| ≤ M ea|x| ,
M, a > 0.
It depends continuously on the initial image g with respect to the L∞ norm, and it fulfils the max-min
principle
inf2 g ≤ u(x, t) ≤ sup g on R2 × [0, ∞),
R
R2
cf. [Wei98]. Investigating (1.3) we find
√ that the time t of the solution of the linear diffusion equation
is related to the spatial width σ = 2t, that is the later in time the solution u(x, t) is considered the
“flatter” it becomes (or the lower its frequencies). Hence, smoothing structures of order σ requires to
stop the diffusion process at time
1
T = σ2 .
2
The problem with using Gaussian filtering, i.e. linear diffusion filtering, for image denoising is that
the smoothing is isotropic: it does not depend on the image and it is the same in all directions. In
particular, image edges are not preserved.
Nonlinear diffusion We would like to find models for removing image noise while preserving one of
the most important parts of image information, that is image edges. To do so, it is essential to richen
the diffusion process by nonlinearity. Instead of (1.2) we consider a nonlinear diffusion equation of the
form
ut = div(c(|∇u|2 ) ∇u)
(1.4)
u(x, t = 0) = g(x),
with an appropriate diffusion constant c(·) that makes the strength of the smoothing dependent on the
size of the gradient of the solution. Equation (1.4) is called the Perona-Malik equation, and is just one
example for a nonlinear PDE approach in image processing, cf. Section 4.
Related, but not always equivalent, are variational models for image smoothing. The role of the nonlinearity in a PDE model, is here taken by the non-smoothness of the functional that is to be minimised,
cf. Section 3.
2
Mathematical Representation of Images
Now, since the image function is a mathematical object we can treat it as such and apply mathematical
operations to it. These mathematical operations are summarized by the term image processing techniques,
and range from statistical methods, morphological operations, to solving a partial differential equation
for the image function. We are especially interested in the last, i.e., PDE- and variational methods used
in imaging.
Please email all corrections and suggestions to these notes to [email protected]
8
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
We have introduced the object digital image already in Section 1.1. There, a digital image has been
introduced as a sampled and quantised version of an analog (also called physical or real) image. The
higher the resolution of a digital image, the closer it is to the analog image in the real-world.
While digital image processing is indeed concerned with digital images the methods used are often
motivated from considerations in the continuum, that is methods are formulated for the analog image.
In this course we take up this mathematically more challenging and analytically more beautiful position,
and let our image u be a continuous object defined on a rectangular domain Ω = (a, b) × (c, d). Within
this framework, there are many possibilities how images can be modelled, compare [CS05a, Chapter 3].
For our purposes we will focus on so-called deterministic image models only and on three deterministic
models in particular: Images as elements in a function space, level set representation of images, and the Mumford
Shah representation of images.
Throughout this section let Ω = (a, b) × (c, d) ⊂ R2 , where a < b, c < d ∈ R. Moreover, we consider
grey value images u : Ω → R only.
2.1
Images as elements in a function space
Images as distributions We define the set of test functions
D(Ω) = {ϕ ∈ C ∞ (Ω), suppϕ ⊂ Ω}
Example 2.1. Let Bα (m) = {|x − m| < α} the largest ball such that Bα (m) ∈ Ω. Then, we define
( 2
2
2
eα /(|x−m| −α ) x ∈ Bα (m)
ϕα,m (x) =
0
otherwise.
A test function ϕ ∈ D(Ω) can be interpreted as a linear sensor for capturing image signals. More precisely, we
model an image u on Ω as a distribution, that is a continuous linear functional on D(Ω). Let us phrase this in
accurate mathematics.
Definition 2.2 (Convergence in D(Ω)). A sequence {ϕn } in D(Ω) converges to zero, if ϕn vanish outside a
bounded set and all partial derivatives of ϕn converge to zero uniformly, that is
α
lim sup |D ϕn (x)| = 0
n→∞
x∈Ω
for all multi indices α = (α1 , α2 ) ∈ Z2+ .
Definition 2.3 (Distribution). A linear functional on D(Ω) (that is a linear map from D(Ω) to R) that is
continuous w.r.t. convergence in D(Ω), is a distribution on Ω. We write D0 (Ω) for the set of all distributions on
Ω and
u(ϕ) = hu, ϕi
for the image of a test function ϕ ∈ D(Ω) under the distribution u ∈ D0 (Ω).
An image u ∈ D0 (Ω), outputs a single response hu, ϕi for any sensor ϕ ∈ D(Ω), that attempts to sense
image features.
Example 2.4 (Examples of distributional images).
• u(x) = δ(x) the Dirac-delta function, compare Figure 7 on the left. Then
hu, ϕi = ϕ(O), for any sensor ϕ ∈ D(Ω).
Please email all corrections and suggestions to these notes to [email protected]
9
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Figure 7: Left: The image u is a bright spot concentrated at the origin, i.e., u(x) = δ(x), where δ stands
for the Dirac delta function. Right: The image u describes a step edge from 0 to 1, i.e., u(x) = u(x1 , x2 ) =
H(x1 ), where H(t) is the Heaviside 0 − 1 step function
• u(x) = u(x, y) = H(x) the Heaviside step function, that is
(
0 x<0
H(x) =
1 x ≥ 0,
compare Figure 7 on the right.
We summarise the following properties of distributional images.
• The sensing with test functions is a linear operation.
• With additional positivity constraint we have that the Riesz representation theorem is valid, i.e. u
is a positive distribution, then for any sensor ϕ ∈ D(Ω) there exists a Radon measure µ on Ω (i.e. a
Borel measure that is finite on any compact subset of Ω) s.t.
Z
hu, ϕi =
ϕ(x) dµ.
(2.1)
Ω
• notion of derivatives, i.e., distributional derivative v = Dα u is defined as a new distribution such
that
hv, ϕi = (−1)|α| hu, Dα ϕi, ∀ϕ ∈ D(Ω),
where α is a multi index.
• Sensing of distributions mimics the digital sensor devices in CCD cameras.
• Distributions constitute a very general class of functions. In particular, D0 (Ω) is a complete space
but not a normed vector space. However, both the variational and the PDE imaging models are
set within Banach spaces.
In the following we will narrow down the class of functions we consider. This leads us first to Sobolev
spaces.
Please email all corrections and suggestions to these notes to [email protected]
10
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Sobolev images We start with Lebesgue integrable functions. For p ∈ (0, ∞) we define
Z
p
p
L (Ω) = u :
|u(x)| dx < ∞ ,
Ω
and L∞ (Ω) being the space of essentially bounded functions. These are Banach space with norms
Z
kukp :=
|u(x)|p dx
1/p
Ω
and
kuk∞ := ess supΩ |u|
= inf{C > 0 : |u(x)| ≤ C for a.e. x ∈ Ω}.
In the case p = 2, L2 (Ω) is a Hilbert space with inner product
Z
(u, v)2 =
u · v dx ∀u, v, ∈ L2 (Ω).
Ω
Can we use Lp norms to quantify image contents? We will try as follows. For Ω0 ⊂ Ω with |Ω0 | > 0,
define
Z
1
u(x) dx
huiΩ0 := 0
|Ω | Ω0
the average of u over Ω0 . Then, the average information content of u in Ω0 is defined by the so-called
p-mean oscillation
1/p
Z
1
p
0
|u
−
hui
|
dx
.
upø =
Ω
|Ω0 | Ω0
For p = 2, this gives the canonical definition of the empirical standard deviation in statistics. However,
to describe “change” in images we need the notion of derivatives, which leads us to consider Sobolev
spaces.
Definition 2.5 (Sobolev space). For 1 ≤ p ≤ ∞, we define the Sobolev space W k,p (Ω) as the space of all locally
summable functions u : Ω → R such that for each multi index α with |α| ≤ k the derivative Dα u exists in the
weak sense and belongs to Lp (Ω). If p = 2, we usually write H k (Ω) = W k,2 (Ω).
The spaces W k,p (Ω) are Banach spaces equipped with the norm

1/p
R
 P
α p
|D
u|
dx
1≤p<∞
kukW k,p (Ω) := P |α|≤k Ω

α
ess sup |D u|
p = ∞.
Ω
|α|≤k
In the case p = 2, that is the spaces H k are in fact Hilbert spaces with inner product
X Z
(u, v)H k =
Dα u · Dα v dx.
|α|≤k
Ω
If we start, by modelling change in an image by its first derivatives we might consider H 1 (Ω) as
an appropriate space for an image, and in turn the corresponding H 1 norm as an appropriate measure
for image information. However, one disadvantage of taking u ∈ H 1 (Ω) is that u necessarily must be
continuous.
Please email all corrections and suggestions to these notes to [email protected]
11
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Problem 2.6 (Hölder continuity of H 1 -functions). We start with the one-dimensional case. Let u : (0, 1) → R,
u ∈ H 1 (0, 1). Then, for each 0 < s < t < 1, we have
s
Z t
Z t
√
√
|u0 (r)|2 dr ≤ t − s kuk2H 1
u0 (r) dr ≤ t − s
u(t) − u(s) |{z}
=
s
s
f ormally
so that u must be 1/2- Hölder continuous. (The formal argument above can be made rigorous: it should be made
on smooth functions first and then justified by a density argument for H 1 (0, 1).)
In two dimensions we have for u : (0, 1)2 → R and u ∈ H 1 ((0, 1)2 ) that its one-dimensional restrictions are
in H 1 (0, 1), that is: for a.e. y ∈ (0, 1) and x 7→ u(x, y) ∈ H 1 (0, 1), which essentially comes from the fact that
Z 1 Z 1
|Dx (x, y)|2 dx dy ≤ kuk2H 1 < ∞.
0
0
It means that for a.e. y, the map x 7→ u(x, y) is 1/2- Hölder continuous in x, so that the image u certainly cannot
jump across vertical boundaries.
A similar kind of regularity can be shown for any u ∈ W 1,p (Ω), 1 ≤ p ≤ ∞.
The problem with forcing an image to be continuous is that, as we have seen before with the solution
to the heat equation, edges in an image (that are jumps in the image function) cannot be represented.
However, edges constitute one of the most important features in images, and whenever we are processing an image – we denoise, deblur or interpolate it say – we seek for methods that can represent and
preserve edges in the process.
These considerations call for a less-smooth space, the space of functions of bounded variation.
BV images In the context of image reconstruction Rudin, Osher and Fatemi (ROF) introduced in 1992
the total variation as a measure for image reconstruction. The motivation was to propose a quantity that
differentiates between noise and structure in the image, in particular between noise and image edges.
For a function u ∈ W 1,1 (Ω) they considered
Z
|Du(x)| dx.
(2.2)
Ω
1,1
Well, thats confusing. In fact, W -functions are still not allowed to have discontinuities along horizontal and vertical direction in the image domain, see Exercises. However, the key is that the total
variation can be defined for much more general functions than W 1,1 . To see that, lets first consider a
simple example.
Example 2.7 (Dirac-delta). Let f (x, y) = δ(x, y) the Dirac delta at (0, 0) ∈ Ω. Then
Z
00
I(δ) = “
δ(x, y) dx dy = 1,
Ω
1
but δ is not an L -function in the traditional sense, it is a measure. So what is the correct definition of the integral
I for δ?
More generally, let µ be a non-negative measure on all Borel subsets of Ω, with µ(K) < ∞ for all
compact K ⊂ Ω (or more generally let µ be a signed Radon measure). Then, we can define
Z
I(µ) =
1 dµ = µ(Ω),
Ω
and with that signed Radon measures generalise the space L1 (Ω). The same thing applies to the total
variation (2.2) of an image u by taking f = Du.
Please email all corrections and suggestions to these notes to [email protected]
12
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Example 2.8 (Total variation of the Heaviside function). In R1 we consider f (x) = δ(x). Then for u0 = δ
and u(−∞) = 0 we get that
(
0 x<0
u(x) = H(x) =
1 x ≥ 0.
Then, u ∈
/ W 1,1 (R1 ) but still we can define
Z
T V (H) = “
00
|H 0 (x)| dx =
R1
Z
1 dδ = 1.
R1
More generally, for µ being a signed Radon measure in R1 and u its cumulative distribution function,
we can define
Z
T V (u) =
d|µ| = |µ|(R1 ),
R1
where |µ| is the total variation measure of µ, that is |µ| = µ+ + µ− where µ+ and µ− are the positive
(negative) variation of µ respectively.
With this discussion in mind we can now generalise the notion of the total variation (2.2) for functions
u ∈ L1loc .
Definition 2.9 (Total variation). The total variation of an image is defined by duality: for u ∈ L1loc (Ω) it is
given by
Z
∞
2
T V (u) = |Du|(Ω) = sup −
udivϕ dx : ϕ ∈ Cc (Ω; R ), |ϕ(x)| ≤ 1 ∀x ∈ Ω ,
(2.3)
Ω
is the total variation measure of the Radon measure Du given by Riesz representation theorem (2.1).
The space BV (Ω) of functions with bounded variation is the set of functions u ∈ L1 (Ω) such that T V (u) <
∞, endowed with the norm
kukBV = kuk1 + T V (u).
Example 2.10.
• For u ∈ W 1,1 (Ω) we have that |Du|(Ω) = kDuk1 .
• For u = χC for a subset C ⊂ Ω with smooth boundary, we have |Du|(Ω) = |Du|(C) = H1 (∂C) the
perimeter of C in Ω. Here H1 is the 1-dimensional Hausdorff measure.
2.2
The Mumford-Shah image model
See section 3.5.
3
Variational Approach to Image Processing
The variational approach to image processing constitutes the computation of a reconstructed image u
based on the observed image (or more generally data) g as a minimiser of a functional. The modelling
of this variational approach can be best motivated from Bayesian statistics.
Please email all corrections and suggestions to these notes to [email protected]
13
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Figure 8: In one space dimension the total variation of a signal u on the interval (0, 1) is just the maximal
sum over the absolute differences of functions values for partitions of (0, 1). Hence, in this example the
total variation of u on the left and of u on the right is the same!
The Bayesian point of view to image reconstruction For this we go back to the discrete setting for a
moment. Given an image g ∈ RN × RN we can formulate two components for solving a general inverse
problem:
• Data model: g = T u + n, where u ∈ RN × RN original image (to be reconstructed), T is a linear
transformation (for instance, T = Id in image denoising, T = S a sampling operator for image
inpainting, T = k∗ a convolution with a kernel k in image deblurring, or T = F for reconstructing
an image from its Fourier transform), n is the noise which for now is assumed to be Gaussian
distributed with mean 0 and standard deviation σ
• A-priori probability density: P (u) = e−p(u) du. A-priori information on the original image.
Then, the a posteriori probability for u knowing g given by Bayes is
P (u|g) =
with
1
P (g|u) = e− 2σ2
P
i,j
P (g|u)P (u)
,
P (g)
|(T u)i,j −gi,j |2
,
P (u) = e−p(u)
The idea of maximum a posteriori” (MAP) image reconstruction is to find the “best” image as the one
which maximises this probability or equivalently, which solves the minimisation problem




X
1
min p(u) + 2
|gi,j − (T u)i,j |2 .
u 

2σ
i,j
The variational model Going back to the continuous setting with g ∈ L2 (Ω), where Ω ⊂ R2 is a
rectangular image domain, and T ∈ L(L2 (Ω)) being a bounded linear operator from L2 into itself, a
generic minimisation problem to recover u from g reads
Z
1
min
αψ(u)
+
|T u(x) − g(x)|2 dx.
(3.1)
2 Ω
u∈L2 (Ω)
Please email all corrections and suggestions to these notes to [email protected]
14
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Here, the so-called regulariser ψ corresponds to the a-priori information that we have on the reconstructed image u and α > 0 is the so-called regularising parameter and acts like a balance between data
model and the regulariser. One of the most successful variational approaches for image reconstruction
is the total variation model, which takes ψ(u) = T V (u), cf. (2.3).
3.1
Mathematical preliminaries
Definition 3.1 (Strong, weak, and weak* convergence). A sequence (xn ) in a normed space (X, k · kX )
converges
• strongly to x ∈ X if limn→∞ kxn − x∗ k = 0. In this case we write xn → x.
• weakly to x ∈ X if
lim hxn , yiX×Y = hx, yiX×Y ,
n→∞
∀y ∈ Y = X ∗ is the dual space to X.
In this case we write xn * x.
A sequence (yn ) ∈ Y = X ∗ converges weakly* to y ∈ Y if
lim hyn , xiY ×X = hy, xiY ×X ,
n→∞
∀x ∈ X.
∗
In this case we write yn * y.
Definition 3.2 (Weak sequentially compact). Let (X, k · kX ) be a normed space. A subset U ⊂ X is weak
sequentially compact if every sequence in U has a weakly converging subsequence with limit in U .
Theorem 3.3. A normed vector space X is reflexive if and only if every bounded ball in X is weak sequentially
compact.
Example 3.4 (Reflexive Banach spaces).
• L2 is reflexive but L1 is not.
• H 1 is reflexive but W 1,1 is not.
Hausdorff measure
Let us start with the definition of the k-dimensional Hausdorff measure Hk .
Definition 3.5. Let 0 ≤ k ≤ +∞ and S a subset of Rd . The k-dimensional Hausdorff measure of S is given by
"
(
)#
X
[
k
k
|diam(Ai )| : diam(Ai ) ≤ ρ, A ⊂
Ai
,
H (S) = lim n(k) · inf
ρ→0
i∈I
i∈I
where n(k) is a normalisation factor and diam(Ai ) denotes the diameter of the set Ai . Then, the Hausdorff
dimension of a set S is defined by H − −dim(S) = inf{k ≥ 0 : Hk (S) = 0}.
Remark 3.6. Let k be a positive integer less or equal to the dimension d and S ⊂ Rd be a C 1 k–dimensional
manifold in Rd . Then the Hausdorff measure Hk (S) equals the classical k-dimensional area of S. Moreover,
Hd (S) equals the Lebesgue measure dx on S.
Please email all corrections and suggestions to these notes to [email protected]
15
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Level sets Let u be an integrable function on Ω ⊂ R2 open and bounded with Lipschitz boundary. We
define the level sets Γλ of u as
Γλ (u) = {x ∈ Ω : u(x) ≤ λ} .
(3.2)
Then, the level-set representation of u is Γ(u) = {Γλ (u) : λ ∈ R}. Note, that the definition of level sets
is not unique. In particular, the above definition differs from the classical level set formulation where
the level sets are defined as curves γλ (i.e. γλ = {x ∈ Ω : u(x) = λ}}) rather than sets. In fact, for
a continuous image function u the boundary ∂Γλ = γλ . The advantage of the set-notion (3.2) is that it
makes sense for non-smooth images as well.
Curvature In section 3.4 we will consider interpolation methods that find the interpolant with minimal
curvature. To formalise this we first have to say what we mean by curvature, both in the sense of curves
(associated to level sets (3.2)) as well as in the sense of curvature of a function (i.e. mean curvature of the
surface defined by the image function u). To do so we first recall some facts about planar curves, their
length and their curvature.
Let γ : [0, 1] → R2 be a simple (i.e. without self intersections) curve parameterised from the interval
[0, 1] to R2 . Then, the length of γ is defined as
( n
X
length(γ) := sup
|γ(ti ) − γ(ti−1 )| : n ∈ N
i=1
and 0 = t0 < t1 < . . . < tn = 1} .
A rectifiable curve is a curve with finite length. Moreover, if γ is Lipschitz continuous on [a, b], then the
metric derivative (the speed) of the curve γ is defined by
|γ 0 |(t) := lim
s→0
kγ(t + s) − γ(t)k
,
|s|
where k · k is the Euclidean norm on R2 . With that the length of γ is equivalently defined by
Z 1
length(γ) =
|γ 0 |(t) dt.
(3.3)
0
Note, that a generalised notion of length of a curve appears in the context of the coarea formula for
functions of bounded variation, cf. Theorem 3.18. The arc length s(t) of γ is given in the same flavour
as (3.3) with
Z
t
|γ 0 |(t̃) dt̃,
s(t) =
t ∈ [0, 1].
0
Re-parametrising γ in terms of its arc length is called the natural parametrisation and yields the unit
tangent vector
γ 0 (t)
.
γ 0 (s) = 0
|γ |(t)
If γ is twice continuously differentiable then the signed curvature of γ at t is given by
κ(t) =
and
det(γ 0 (t), γ 00 (t))
|γ 0 (t)|3
|κ(s)| = |γ 00 (s)|.
(3.4)
(3.5)
We will see later in section 3.4 that the curvature of a curve γ can be defined in weaker sense and the
regularity assumption on γ can be relaxed.
Please email all corrections and suggestions to these notes to [email protected]
16
Image Processing – Variational and PDE Methods
3.2
C.-B. Schönlieb
Image denoising
Total variation minimisation Given g ∈ L2 (Ω) we propose to compute a denoised version u of g as a
solution of
1
2
min
J
(u)
=
α|Du|(Ω)
+
ku
−
gk
(ROF)
2 .
2
u∈L2 (Ω)
This is the classical ROF-denoising approach, named after Rudin, Osher and Fatemi who have introduced this approach in 1992.
Theorem 3.7. [Existence and uniqueness for ROF] The minimisation problem (ROF) has a unique minimiser u.
For proving the existence of minimisers of this problem, we follow the direct method of calculus of
variations. For a generic minimisation problem of the form
min J (u),
X a Banach space,
u∈X
we can phrase this strategy within three steps
Method 3.8 (Direct method of calculus of variations).
1. Show that J is bounded from below, that is inf u∈X J (u) > −∞. Hence, there exists a minimising sequence
(un ) ∈ X such that J (un ) < ∞ for all n and limn→∞ J (un ) = inf u∈X J (u).
2. Check that this sequence is contained in a set which is sequentially compact w.r.t. the topology induced on
X. From that we get that there exists a subsequence (unk ) and a u∗ ∈ X such that limk→∞ unk = u∗ with
convergence w.r.t. the topology in X. This u∗ is a candidate for a minimiser.
3. Can we now deduce that
J (u∗ ) = lim J (unk )?
(3.6)
k→∞
Unfortunately no, since in essentially all cases the functional J is not continuous with respect to the convergence induced by the topology on X. Luckily, a key observation is that we do not really need the full
strength of (3.6). To prove that u∗ in fact is a minimiser of J we need to confirm one last property of J ,
that is that J is (sequentially) lower-semicontinuous (l.s.c.) with respect to the topology in X. Then
inf J (u) ≤ J (u∗ ) ≤ lim inf J (unk ) = inf J (u),
u∈X
k→∞
u∈X
and we have that u∗ ∈ X is a minimiser of J
Proof. We put the cart before the horse and start with proving that the total variation is l.s.c. with
respect to weak convergence in Lp for p ∈ [1, +∞). The idea is that the total variation is the supremum
of continuous functions and as such is l.s.c. Indeed, let
Z
Lϕ : u 7→ −
u(x)divϕ(x) dx.
Ω
p
If un * u in L (Ω) then Lϕ un → Lϕ u (this is due to the continuity of Lϕ even w.r.t. very weak topologies). But then
Lϕ u = lim Lϕ un ≤ lim inf T V (un ).
n→∞
Taking the supremum over all ϕ ∈
Cc∞ (Ω; R2 )
n→∞
with |ϕ(x)| ≤ 1 we deduce
T V (u) ≤ lim inf T V (un ),
n→∞
Please email all corrections and suggestions to these notes to [email protected]
17
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
that is T V is (sequentially) l.s.c. with respect to all the above mentioned topologies.
Now, let (un ) be a minimising sequence for J , that is limn→∞ J (un ) = inf u J (u). Since J (un ) ≤
J (0) < ∞ for n large enough (assuming that g ∈ L2 (Ω) and T V ≥ 0), then (un ) is bounded in L2 and
hence there exists a converging subsequence unk * u in L2 . But then, the total variation is l.s.c. with
respect to convergence in L2 and the norm in L2 is naturally l.s.c., that is
ku − gk2 ≤ lim inf kunk − gk2 .
k→∞
Hence
J (u) ≤ lim inf J (unk ) = inf J ,
k→∞
so that u is a minimiser of J .
To prove the uniqueness of the minimiser u we observe that k · k22 is strictly convex. Moreover the
total variation is convex since it is the supremum of linear (and hence convex) functions Lϕ . Indeed, one
clearly has for any u1 , u2 and t ∈ [0, 1] that
Lϕ (tu1 + (1 − t)u2 ) = tLϕ (u1 ) + (1 − t)Lϕ (u2 ) ≤ tT V (u1 ) + (1 − t)T V (u2 ).
Taking the sup in the lhs we have the result.
Hence, if u and u0 are two minimisers of J , then
2
Z u + u0
α
u + u0
0
≤ (T V (u) + T V (u )) +
− g dx
J
2
2
2
Z Ω
1
1
= (J (u) + J (u0 )) −
(u − u0 )2 dx,
2
4 Ω
which would be strictly less than inf J unless u = u0 . Hence the minimiser of the ROF problem (ROF)
exists and is unique.
In the proof of Theorem 3.7 we have proved the existence of a minimiser u for J in (ROF). The next
obvious question that arises is, what is the regularity of the minimiser u? Our immediate answer to this
will be that, since |Du|(Ω) + kuk2 < ∞ (and hence also kuk1 < ∞ that u ∈ BV (Ω)), cf. Exercises. Very
elegantly this property also results from the following compactness property of the space BV .
Theorem 3.9 (Rellich’s compactness theorem). Let Ω ⊂ R2 a rectangular image domain, and let (un ) be a
sequence of functions in BV (Ω) such that supn kun kBV < +∞. Then there exists u ∈ BV (Ω) and a subsequence
(unk ) such that unk → u strongly in L1 (Ω) as k → ∞.
Remark 3.10. Note, that the space BV (Ω) is not reflexive, but that Rellich’s theorem provides you with enough
compactness to prove existence of solutions for the ROF minimisation problem taking a minimising sequence in
BV . This will become important also when considering total variation minimisers with an L1 data fidelity term.
Two numerical algorithms for total variation denoising In what follows, we discuss two approaches
to solve the total variation denoising problem numerically. The first method relies on a smoothing of
the total variation regulariser that turns it into a differentiable quantity and is based on the solution
of the corresponding Euler-Lagrange equation via a fixed-point iteration. In the second method the
total variation denoising problem is reformulated into its dual form using useful facts for the Legendre
Fenchel dual of a convex and one-homogeneous function. The solution to the dual problem turns out to
be an orthogonal projection onto a convex set and can again be computed via a fixed-point iteration.
Please email all corrections and suggestions to these notes to [email protected]
18
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
• Method A: Lagged diffusivity: The idea of this method is to replace the minimisation problem
(ROF) by
Z p
1
2 + dx + ku − gk2 ,
min
J
(u)
=
α
|∇u|
(3.7)
2
2
u∈W 1,1 (Ω)
Ω
where g ∈ L( Ω) and the smoothing parameter 0 < 1. The well-posedness of this scheme cannot be derived via the direct method of calculus of variations because W 1,1 is not reflexive. Even if
we would set the problem as a minimisation problem in L2 (as before) the smoothed total variation
(with ∇u representing the absolutely continuous part of the distributional derivative only) is not
lower-semicontinuous. Well-posedness can still be proven via the method of relaxation. We will
not discuss this here but refer the reader to [AK06, Chapter 3.2.3] for the proof.
For computing a minimiser of (3.7) we first derive the corresponding first-order optimality conditions (called Euler-Lagrange equations) in weak form. That is, for a smooth and compactly
supported test function φ ∈ Cc∞ (Ω) we compute
d
(J (u − τ φ))τ =0 = 0,
dτ
∀φ ∈ Cc∞ (Ω).
This gives
d
(J (u − τ φ))τ =0 = α
dτ
Z
=
∇u
Z
Ω
Ω
p
Z
|∇u|2 + −αdiv
· ∇φ dx +
∇u
p
(u − g)φ dx
Ω
|∇u|2 + !
!
+ (u − g) φ dx,
and therefore the Euler-Lagrange equation (in the weak sense)
!
∇u
−αdiv p
+ (u − g) = 0.
|∇u|2 + The lagged-diffusivity method computes the minimiser of (3.7) as a fixed point iteration on the
Euler-Lagrange equation above. More precisely, for an initial u(0) (e.g. u(0) = g) one iteratively
computes u(k+1) for k = 0, 1, . . . as a solution of
!
∇u(k+1)
−αdiv p
+ (u(k+1) − g) = 0,
|∇u(k) |2 + which is a linear partial differential equation in u(k+1) . For more information on this scheme and
convergence analysis see [VO96, CM99].
Remark 3.11. If you are looking for a very easy method to implement total variation denoising, you can
also attempt to compute a minimiser of (3.7) via the method of steepest descent. As above, we start with an
initial u(0) and iterate for k = 0, 1, . . .
!
!
∇u(k)
(k+1)
(k)
(k)
u
= u + τ αdiv p
+g−u
,
|∇u(k) |2 + with time step size τ > 0. Stability of this scheme is only guaranteed with a condition on the step size
τ whose upper bound depends on the space discretisation and the size of . This condition can be rather
restrictive and, for small , allows only very small incremental changes in time. However, the advantage of
this scheme is that it is very easy to implement and each iteration is very cheap since it only requires the
evaluation of the right hand side in the equation.
Please email all corrections and suggestions to these notes to [email protected]
19
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
• Method B: Convex optimisation
For the derivation of this algorithm we change to the discrete setting. We consider images as N ×N
matrices. We denote by X the Euclidean space RN ×N . To define the discrete total variation we
introduce a discrete (linear) gradient operator. If u ∈ X, the gradient ∇u is a vector in Y = X × X
given by (∇u)i,j = ((∇x u)i,j , (∇y u)i,j ), with
(
ui+1,j − ui,j if i < N
(∇x u)i,j =
0
if i = N,
(
ui,j+1 − ui,j if j < M
(∇y u)i,j =
0
if j = M,
for i = 1, . . . , N , j = 1, . . . , M . Then the discrete total variation of u is defined by
X
J(u) =
|(∇u)i,j | ,
1≤i,j≤N
p
with |y| := y12 + y22 for every y = (y1 , y2 ) ∈ R2 .
This functional J is a discretization of the standard total variation, defined in the continuous setting for a function u ∈ L1 (Ω) (Ω is an open subset of R2 ) by (2.3). It can be shown that if some stepsize (or pixel size) h ≈ 1/N is introduced in the discrete definition of J (defining a new functional
Jh equal to h times the expression in the discrete definition of J) that as h → 0 (and the number
of pixels N goes to infinity), Jh Γ− converges to the continuous J (defined on Ω = (0, 1) × (0, 1)).
This means that the minimizers of the problems we are going to consider approximate correctly, if
the pixel size is very small, minimizers of similar problems defined in the continuous setting, see
[L11, Chapter 2.7].
With J being one-homogeneous (that is, J(λu) = λJ(u) for every u and λ > 0), it is a standard
∗
fact in convex
P analysis that the Legendre-Fenchel transform J (v) = supu hu, viX − J(u) (with
hu, viX = i,j ui,j vi,j ) is the characteristic function of a closed convex set K, see [HUL93]. That is,
(
0
if v ∈ K
J ∗ (v) = χK (v) =
+∞ otherwise.
Since J ∗∗ = J, we recover
J(u) = sup hu, viX .
v∈K
In the continuous setting, one readily sees from the definition of the functional J that K is the
closure of the set
div ξ : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1 ∀x ∈ Ω .
Let us now find a similar characterization in the discrete setting. In Y = X × X, we use the
Euclidean scalar product, defined in the standard way by
X
hp, qiY =
((px )i,j (qx )i,j + (py )i,j (qy )i,j ),
1≤i,j≤N
for every p = (px , py ), q = (qx , qy ) ∈ Y . Then, for every u,
J(u) = sup hp, ∇uiY
p
Please email all corrections and suggestions to these notes to [email protected]
20
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
where the sup is taken on all p ∈ Y such that |pi,j | ≤ 1 for every i, j. We introduce a discrete
divergence div : RN ×M × RN ×M → RN ×M defined, by analogy with the continuous setting, by
div = −∇∗ (∇∗ is the adjoint of the gradient ∇). That is, for every p ∈ Y and u ∈ X, h−div p, uiX =
hp, ∇uiY . One checks easily that div is given by

x
x

(p )i,j − (p )i−1,j if 1 < i < N
x
(divp)ij =
(p )i,j
if i = 1


x
−(p )i−1,j
if i = N

y
y

(p )i,j − (p )i,j−1 if 1 < j < M
+ (py )i,j
if j = 1


−(py )i,j−1
if j = M,
for every p = (px , py ) ∈ RN ×M × RN ×M . From this one immediately deduce that K is given by
{div p : p ∈ Y, |pi,j | ≤ 1 ∀i, j = 1, . . . , N } .
Based on these preliminary observation Chambolle [Ch04] proposes an algorithm to solve
2
min
u∈X
ku − gk
+ J(u),
2α
given g ∈ X and α > 0, which will be described in the sequel. Here, k.k is the Euclidean norm in
2
X, given by kuk = hu, uiX .
The Euler equation for the proposed optimization problem is
u − g + α∂J(u) 3 0.
Here, ∂J is the sub-differential of J, defined by
Definition 3.12. (Sub-differential). Let H be a Hilbert space. Let φ : H → (−∞, +∞] be convex and
proper (φ 6= +∞). Then for any x ∈ H, the sub-differential of φ at x is defined as
∂φ(x) = {y ∈ H; ∀ξ ∈ H, φ(ξ) ≥ φ(x) + hy, ξ − xiH } .
The Euler equation can be rewritten as (g−u)/α ∈ ∂J(u), which is equivalent to u ∈ ∂J ∗ ((g−u)/α)
(we are only quoting this convex analysis result here which can be found in [HUL93]). Writing this
as
g
g−u
1
g−u
∈
+ ∂J ∗ (
),
α
α
α
α
we get that w = (g − u)/α is the minimizer of
2
1
kw − (g/α)k
+ J ∗ (w).
2
α
Since J ∗ is the characteristic function χK , we deduce w = ΠK (g/α). Hence the solution u of the
proposed problem is simply given by
u = g − ΠαK (g).
A possible algorithm for computing u is therefore to try to compute the nonlinear projection ΠαK .
Please email all corrections and suggestions to these notes to [email protected]
21
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Computing the nonlinear projection ΠαK amounts to solving the following problem:
n
o
2
2
min kα div p − gk : p ∈ Y, |pi,j | ≤ 1 ∀i, j = 1, . . . , N .
The Karush-Kuhn-Tucker conditions yield the existence of a Lagrange multiplier αi,j ≥ 0, associated to each constraint in the above problem. With this we have for each i, j
−(∇(α div p − g))i,j + αi,j pi,j = 0,
with either αi,j > 0 and |pi,j | = 1 or |pi,j | < 1 and αi,j = 0. In the latter case, also (∇(α div p −
g))i,j = 0. We see that in any case αi,j = |(∇(α div p − g))i,j |.
We thus propose the following semi-implicit gradient descent (or fixed point) algorithm.
We choose τ > 0, let p(0) = 0 and for any n ≥ 0,
(n+1)
(n+1)
(n)
,
pi,j
= pi,j + τ (∇(divp(n) − f /α))i,j − (∇(divp(n) − f /α))i,j pi,j
so that
(n)
(n+1)
pi,j
pi,j + τ (∇(divp(n) − f /α))i,j
.
=
1 + τ (∇(divp(n) − f /α))i,j Chambolle shows the following result
Theorem 3.13. Let τ ≤ 1/8. Then, α div pn converges to ΠαK (g) as n → ∞.
Other noise models For the Gaussian noise model we have derived in the beginning of Section 3 the
variational approach (ROF). The following table should give an overview of possible other noise models
and their corresponding data fidelity terms in the generic total variation denoising approach
min {J (u) = α|Du|(Ω) + d(u, g)} ,
u∈A
where A is an admissible set of functions and d(u, g) is the data fidelity term, that is an adequate distance
function between u and g, cf. Table 1
Noise distribution
Poisson noise
Multiplicative noise
P (g|u)
P
1
e− σ2 i,j |gi,j −ui,j |
Q ugi,ji,j −ui,j
i,j gi,j ! · e
gi,j = ni,j · ui,j
d(u, g)
ku − gk1
R
(u − g log(u))
function that depends on ug .
Table 1: Noise models in the Bayesian framework and their corresponding data fidelity terms in the
continuous variational model.
3.3
Image reconstruction in the context of inverting a linear operator
In this section we start with the general introduction of variational regularisation for inverse problems.
A general (linear) inverse problem is, given measurements/data g with
g = T u + n,
Please email all corrections and suggestions to these notes to [email protected]
22
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
where T is a linear operator and n possible noise, reconstruct a preferably noise-free image u. In the
previous section we have derived the variational model (3.1) for reconstructing u in the discrete setting.
In the following, we will prove the well-posedness of the corresponding model in infinite dimensions.
To do so, let g ∈ L2 (Ω) and the linear operator T fulfils the following assumptions
(A1) T : L2 (Ω) → L2 (Ω) is a linear and continuous operator.
(A2) T χΩ 6= 0.
Examples for admissible T fulfilling (A1) and (A2) are convolution operators T = k∗, T = Id and
T = χΩ\D . Then, we consider the following minimisation problem for reconstructing u
1
min α|Du|(Ω) + kT u − gk22
(3.8)
u
2
Theorem 3.14 (Existence of TV regularised reconstructions). For g ∈ L2 (Ω) and an operator T that fulfils
assumptions (A1)-(A2), there exists a solution u ∈ BV (Ω) of the minimisation problem (3.8). If T is injective,
then the minimiser is unique.
Proof. Let (un ) be a minimising sequence for (3.8), then there exists a constant C > 0 such that
|Dun |(Ω) ≤ C, ∀n ≥ 1.
R
Next, we prove that Ω un ≤ C for all n ≥ 1. Let
R
un
wn = Ω χΩ and vn = un − wn .
|Ω|
R
Then Ω vn = 0 and Dvn = Dun . Hence, |Dvn |(Ω) ≤ C. Using the Poincaré-Wirtinger inequality, that is
Lemma 3.15 (Poincaré-Wirtinger inequality for BV functions). For u ∈ BV (Ω), let
Z
1
u(x) dx.
uø :=
|Ω| Ω
Then there exists a constant C > 0 such that
ku − uø k2 ≤ C|Du|(Ω).
we obtain that
kvn k2 ≤ C.
We also have
C ≥ kT un − gk22 = kT vn + T wn − gk22
≥ (kT vn − gk2 − kT wn k2 )2
≥ kT wn k2 (kT wn k2 − 2kT vn − gk2 )
≥ kT wn k2 [kT wn k2 − 2(kT k · kvn k2 + kgk2 )] .
Let xn = kT wn k2 and an = kT k · kvn k2 + kgk2 . Then
xn (xn − 2an ) ≤ C,
with 0 ≤ an ≤ kT k · C + kgk2 = C 0 , ∀n.
Please email all corrections and suggestions to these notes to [email protected]
23
Image Processing – Variational and PDE Methods
Hence, we obtain
0 ≤ xn ≤ an +
C.-B. Schönlieb
p
a2n + C ≤ C 00 ,
which implies
Z
kT χΩ k2
kT wn k2 = un ·
≤ C 00 , ∀n,
|Ω|
Ω
R
and thanks to (A2), we obtain that Ω un is uniformly bounded.
Again, by the Poincaré-Wirtinger inequality, we have
R
un − Ω un ≤ const |Dun |(Ω) ≤ const · C.
|Ω| 2
Finally, we obtain
R
R
R
Z
u
u u
n
n
000
Ω
Ω n
Ω
u
u
−
u
−
kun k2 = ≤
+
+
n ≤ C .
n
n
|Ω|
|Ω| 2
|Ω| 2
Ω
Therefore, un is bounded in L2 and in particular in L1 . Then un is bounded in BV (Ω) and there is a
subsequence (still denoted un ), and a u ∈ BV (Ω) such that un * u in L2 . Moreover, T un converges
weakly to T u in L2 by assumption (A1). With the l.s.c. properties of |Du|(Ω) and kT u − gk2 w.r.t. weak
convergence in L2 we are done.
If T is injective it follows from the strict convexity of k · k22 that the map u 7→ kT u − gk22 is also strictly
convex, hence the whole functional in (3.8) is strictly convex and the minimiser is unique.
Image deblurring In the case of image deblurring the data model is given by
g = k ∗ u + n,
(3.9)
where k(x) is a suitable blurring kernel (also called point spread function (PSF)) and n possible additive
noise. One example for blur is the out-of-focus blur, which can be modelled as a convolution with a
Gaussian kernel, that is
−|x|2
1
k(x) = Gσ (x) =
e 2σ2 .
(3.10)
2
2πσ
In the case when g is really a noisy & blurred version of the original image u then it is clear from our
discussion so far that the reconstruction of u (or better an image function that approximates the original
u) is done by solving the generic problem (3.8) with T being the convolution with the kernel k – the
regularisation with the total variation is a way to eliminate the noise in the blurred image g. However,
lets assume for a moment that the image g in (3.9) does not contain any noise, it is just given by g = k ∗ u.
Then T = k∗ is a linear operator and we could consider reconstructing u by just inverting this operator.
Unfortunately, we would be very ill-advised if attempting to do that because the inversion of a blurring
process is in general ill-posed. In particular, when considering the Gaussian blur model, deblurring the
image would correspond to solving the heat equation backward in time, cf. (1.3), which we know is an
unstable process. Hence, in either case regularisation is needed. We start by specifying the generic total
variation model (3.8) for deblurring for Ω = R2 , that is
1
2
uα = argminu α|Du|(Ω) + kk ∗ u − gk2 ,
2
where
Z
(k ∗ u)(x) =
k(x − y) u(y) dy.
R2
Please email all corrections and suggestions to these notes to [email protected]
24
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Figure 9: Domains for the setup of the variational deblurring problem (3.11).
To formulate this for an image on a bounded domain Ω there are several approaches considered in the
literature. One way, which considers bounded domains only, is as follows. Let Ω0 , Ω0 , Ω ⊂ R2 open and
bounded domains. The blurry image g ∈R L2 (Ω0 ) is defined on Ω0 and the blurring kernel k is an L1
function with compact support on Ω0 and Ω k dx = 1. Since g has emerged from u by convolution with
k we have that for g in Ω0 only information from u in Ω and from k in Ω0 can be used. Hence, we have to
assume that Ω is a domain such that
Ω0 − Ω0 = {x − y; x ∈ Ω0 , y ∈ Ω0 } ⊂ Ω.
Then the linear operator T for this problem is defined as
0
x∈Ω :
Z
(T u)(x) = (u ∗ k)(x) =
u(x − y)k(y) dy.
Ω
The operator T is linear and continuous as an operator from L2 (Ω) → L2 (Ω0 ) and maps constant functions on Ω to constant functions on Ω0 and in particular fulfils assumptions (A1)-(A2). Hence, from
Theorem 3.14 we have that there exists a minimiser for the problem
Z
1
min α|Du|(Ω) +
|u ∗ k − g|2 dx,
(3.11)
2 Ω0
u∈L2 (Ω)
for every α > 0. If T is additionally injective, then the minimiser of (3.11) is unique. Examples for
injective blurring kernels are the Gaussian kernel in (3.10) or simple averaging kernels.
Medical image reconstruction (MRI) The ROF-type (L2 fidelity) sparse reconstruction problem with
sampling operator S and Fourier transform operator F, may then be written
1
min α|Du|(Ω) + kg − SFuk22 .
(3.12)
u
2
Note that in this application the data g is complex valued and so is the reconstructed ‘image’ u.
3.4
Image inpainting
An important task in image processing is the process of filling in missing parts of damaged images based
on the information obtained from the surrounding areas. It is essentially a type of interpolation and is
called inpainting.
Please email all corrections and suggestions to these notes to [email protected]
25
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Figure 10: Sub-sampling patterns employed. Nσq% denotes two-dimensional Gaussian sampling with
variance 128σ and q% coverage of the 256 × 256 total space. Lq%
σ denotes one-dimensional Gaussian
sampling of lines with variance 128σ, while S q% denotes a spiral pattern with q% coverage.
Let f represent some given image defined on an image domain Ω. Loosely speaking, the problem
is to reconstruct the original image u in the (damaged) domain D ⊂ Ω, called inpainting domain or a
hole/gap (cf. Figure 11).
More precisely let Ω ⊂ R2 be an open and bounded domain with Lipschitz boundary, and let B1 , B2
be two Banach spaces with B2 ⊆ B1 , f ∈ B1 denoting the given image, and D ⊂ Ω the missing domain.
A general variational approach in image inpainting is formulated mathematically as a minimization
problem for a regularized cost functional J : B2 → R,
J (u) = R(u) +
where R : B2 → R and
1
2
kλ(f − u)kB1 → min ,
u∈B2
2
(
λ0
λ(x) =
0
Ω\D
D,
Please email all corrections and suggestions to these notes to [email protected]
(3.13)
(3.14)
26
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
is the indicator function of Ω \ D multiplied by a constant λ0 1 (whose size will depend on how noisy
the given image data is). This constant is the tuning parameter of the approach. As before R(u) denotes
the regularizing term and represents a certain a-priori information from the image u, i.e., it determines
in which space the restored image lies in. In the context of image inpainting, i.e., in the setting of (3.13)(3.14), it plays the main role of filling in the image content into the missing domain D, e.g., by diffusion
2
and/or transport. The fidelity term kλ(f − u)kB1 of the inpainting approach forces the minimizer u to
stay close to the given image f outside of the inpainting domain (how close is dependent on the size
of λ0 ). In this case the operator T from the general approach equals the indicator function of Ω \ D. In
general we have B2 ⊂ B1 , which signifies the smoothing effect of the regularizing term on the minimizer
u ∈ B2 (Ω).
Note that the variational approach (3.13)-(3.14) acts on the whole image domain Ω (global inpainting model), instead of posing the problem on the missing domain D only. This has the advantage of
simultaneous noise removal in the whole image and makes the approach independent of the number
and shape of the holes in the image. In this global model the boundary condition for D is superimposed
by the fidelity term.
Total variation inpainting Starting with our standard imaging model so far, we are tempted to take
the total variation as the regularising term R in (3.13). Total variation inpainting has been proposed by
Chan and Shen in [CS01a]. Within the same setting as for (3.13)-(3.14) the inpainted image u is recovered
as a minimiser of
1
(3.15)
|Du|(Ω) + kλ(u − f )k22 ,
2
where |Du|(Ω) is the total variation of u in Ω. In the noise free case, that is if we assume that g|Ω\D is
completely intact, we can also formulate the following variational approach: assume that g ∈ BV (Ω)
and seek the inpainted image u∗ that solves
min
{u∈L2 (Ω): u|Ω\D =g|Ω\D }
{|Du|(Ω)} .
(3.16)
Theorem 3.16. For an original image g ∈ BV (Ω) the minimisation problem (3.16) has a minimiser u∗ ∈ BV (Ω)
Proof. We can rewrite the constrained problem (3.16) as the following unconstrained problem
n
o
u∗ = argminu |Du|(Ω) + 1{v∈L2 (Ω): v|Ω\D =g|Ω\D } (u) ,
where
1S (u) =
(
0
+∞
u∈S
otherwise.
Then we can apply the direct method of calculus of variations, noting (or proving) that the indicator
function is l.s.c. and using compactness properties in L2 as before.
The disadvantage of the total variation approach in inpainting is that the level lines are interpolated
linearly. This means that the direction of the level lines is not preserved, since they are connected by a
straight line across the missing domain. This is due to the penalisation of the length of the level lines
within the minimising process with a total variation regulariser, thus connecting level lines from the
boundary of the inpainting domain via the shortest distance (linear interpolation).
To see this, we take the level line point of view. We want to derive another characterisation for the
total variation in terms of the level sets of the image function u.
Please email all corrections and suggestions to these notes to [email protected]
27
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Definition 3.17. Let E ⊂ Ω be a measurable set in R2 . This set is called a set of finite perimeter off its characteristic function χE ∈ BV (Ω). We write
Per(E; Ω) := |DχE |(Ω),
for the perimeter of E in Ω.
With the notion of sets of finite perimeter we have the following theorem.
Theorem 3.18. [Coarea formula] Let u ∈ BV (Ω) and for s ∈ R the set {u > s} the s-level set of u. Then, one
has
Z ∞
|Du|(Ω) =
Per({u > s}; Ω) ds.
−∞
Proof. We will only present a sketch of the proof here since the full version is very complicated and
technical. The sketch is discussed in three steps.
• We start with considering affine functions u(x) = p x, ∈ R, on a simplex Ω = T . Then, the total
variation of u in Ω can be easily computed to be
Z
|Du|(T ) = sup −
p xdivϕ dx : ϕ ∈ Cc∞ (Ω; R2 ), |ϕ(x)| ≤ 1 ∀x ∈ Ω
Ω
= |T ||p|.
On the the other hand the hypersurfaces ∂{u > s} are {px = s} and hence
Z ∞
Z ∞
Per({u > s}; Ω) ds =
dH1 ({px = s}) = |T ||p|.
−∞
−∞
• Now, having proved the result for affine functions on simplexes the idea is to triangulate Ω with
simplexes and approximate
a general function u ∈ BV by piecewise affine functions un on these
R
simplexes such that Ω |Dun | dx → |Du|(Ω). Indeed, we can start this process by first using the
following approximation theorem for functions of bounded variation.
Theorem 3.19. Let Ω ⊂ R2 a rectangular domain and u ∈ BV (Ω). Then, there exists a sequence (un ) ∈
C ∞ (Ω) ∩ W 1,1 (Ω) such that
(i) un → u in L1 (Ω),
R
(ii) Ω |Dun | dx → |Du|(Ω) as n → ∞.
Then, we can approximate u with smooth functions and those (similar to finite element theory)
with piecewise affine functions. Using the l.s.c. of the total variation and Fatou’s lemma we get
that
Z
Per({u > s}; Ω) ds ≤ |Du|(Ω).
R
Please email all corrections and suggestions to these notes to [email protected]
28
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
• The proof of the reverse inequality is straightforward by the the following calculation
Z
Z
Z
u(x)
Z
{u>0} 0
Z ∞Z
Ω
=
|{z}
Fubini
0
ds divϕ(x) dx
{u<0}
Z
u(x)
0 Z
(1 − χ{u>s} (x))divϕ(x) dxds
χ{u>s} (x)divϕ(x) dxds −
0
=
|{z}
divϕ=0
Z ∞
≤
−∞
Ω
Z
R
Z
ds divϕ(x) dx −
u(x)divϕ(x) dx =
∞
Ω
Z
divϕ dxds
−∞
{u>s}
Per({u > s}; Ω) ds.
−∞
Taking the sup over all admissible ϕ on the left finishes the proof.
Now, assume for a moment that the original image g is smooth and that the missing g|D is nonconstant, that is there exists a point x0 close to D where ∇g(x0 ) 6= 0. Then (by the inverse function
theorem) level lines of g in D are well-defined and distinguishable and in particular we have that the
s-level line
gs−1 = Γs := {x ∈ R2 : g ≡ s}
is uniquely labeled by the level s. Combining this with the coarea formula we can derive the total
variation inpainting problem for one level line Γs which should be interpolated in D across the two
points p1 ∈ ∂D and p2 ∈ ∂D as
Z
min
ds =
min
length(γs ).
{γs : γs (p1 )=Γs (p1 ), γs (p2 )=Γs (p2 )}
{γs : γs (p1 )=Γs (p1 ), γs (p2 )=Γs (p2 )}
γs
This means that level lines are interpolated with straight lines. While a straight line connection might
still be pleasant for small holes it is very unpleasant in the presence of larger gaps, even for simple
images. Another consequence of the linear interpolation is that level lines might not be connected across
large distances. A solution for this is the use of higher-order regularisation terms. One of the prototype
of higher-order variational inpainting methods is Euler’s elastica inpainting, which will be the topic of
the next section.
Euler’s elastica inpainting Based on the work of Mumford, Nitzberg & Shiota in 93’ [NMS93] Masnou
and Morel in 98’ [Ma98, MM98] proposed the Euler’s elastica energy (which goes back to Euler in 1744)
for the interpolation of level lines. In particular, the proposed to connect two points Γs (p1 ) with Γs (p2 )
for p1 , p2 ∈ ∂D by an elastica curve which is the one that solves
Z
min
α + βκ2 dt ,
γs ∈A
γs
where κ is the scalar curvature of γs , the parameters α and β are positive constants and A is the admissible set for the minimisation defined by
A = {γs : γs (p1 ) = Γs (p1 ), γs (p2 ) = Γs (p2 ), and normals n~1 and n~2 in p1 and p2 }.
Please email all corrections and suggestions to these notes to [email protected]
29
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
The idea of this approach is to weight the length penalty against a smoothness penalty modelled by the
curvature of the interpolating curve. Now, for all grey values 0 ≤ s ≤ 1 we have corresponding elastica
curves F ∗ = {Γs : 0 ≤ s ≤ 1} which minimises
Z
1
Z
α + βκ2 dt ds
E(F) =
0
γs
over F = {γs : 0 ≤ s ≤ 1, with appropriate boundary conditions }. Potential problems however are
that two level lines might intersect and that F ∗ does not have to weave the entire inpainting domain.
This motivated Chan, Kang and Shen in 2002 to formulate a corresponding functionalised model for
Euler’s elastica inpainting.
For an admissible (for the moment smooth) inpainting u on D we represent the curvature of the level
line γs : u ≡ s by
Du
.
κ = div ~n = div
|Du|
Therefore, we can write
J (u) = E(F)
Z 1Z
=
0
(α + βκ2 ) dt ds
γs : u=s
Z
=
|{z}
1
α + βdiv
ds/dl=|Du|
0
γs : u=s
Z
α + βdiv
=
Z
D
Du
|Du|
Du
|Du|
2 !
|Du|dl dt
2 !
dx,
where dl is the length element along the normal direction ~n, which is orthogonal to dt.
Remark 3.20. Note, that this derivation from the level line formulation to the functionalised model works in the
smooth (and non-constant) case only. A result as general as the coarea formula for the total variation does not
exist for the curvature (Conjecture of De Giorgi).
From our considerations so far, the smoothness assumption on u – that we need for the curvature to
be well-defined – is not a very realistic assumption for an image function. We would rather desire to
have an inpainting approach that is defined for image functions, even if they are discontinuous. This
can be done by introducing a “weak” form of the curvature and – using this weak curvature – formulate
Euler’s elastica inpainting
R for image functions in BV .
For g ∈ BV (Ω) with ∂D |Dg| = 0. The latter condition can be rewritten as
Z
Z
|Dg| =
|u+ − u− | dH1 = 0,
∂D
∂D
which means that g + = g − a.e. (w.r.t. to H1 ) along ∂D. In other words, we assume that g does not jump
across the boundary of D, that is there is no essential overlap between ∂D and image edges. Then we
consider the minimisation of
2 !
Z
Du
J (u) =
α + βdiv
dx,
(3.17)
|Du|
D
Please email all corrections and suggestions to these notes to [email protected]
30
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
under the conditions
Z
|Du| = 0, |κ| < ∞ a.e. (in the sense of the Hausdorff measure) along ∂D.
u|Ω\D = g|Ω\D ,
∂D
The second and third condition enforce a certain regularity on u on the boundary ∂u, namely that u does
not have essential discontinuities across the boundary and that its curvature κ is finite.
Now, for a function u ∈ BV (Ω) the Euler elastica energy in (3.17) is still defined only formally
because a general BV function lacks the necessary regularity for the curvature to be well defined. To be
able to rigorously define (3.17) for functions of bounded variation we shall – in the following – introduce
the weak form of curvature. For that, let us denote for a function u ∈ BV (D),
du ν(S) = |Du|(S),
S⊂D
the total variation of u in a subset S of D, which is a Radon measure on D. Let further, supp(du ν) be the
support of the total variation measure. Then, for any p ∈ supp(du ν) we have
du ν(Np ) = |Du|(Np ) > 0
on any small neighbourhood Np ⊂ D of p. Now, let
ρσ (x) =
1 x
ρ
,
σ2
σ
uσ = ρσ ∗ u,
where ρ is a fixed, radially symmetric non-negative mollifier (that is ρ is smooth and limσ→0 ρσ (x) = δx)
with compact support and unit integral. Then, for p ∈ supp(du ν) we define the weak absolute curvature
Duσ
(p) ,
κ̃(p) = lim sup div
|Du
|
σ→0
σ
where for those σ which give |Duσ (p)| = 0 we define div(Duσ /|Duσ |) = ∞. For any pixel p outside the
support supp(du ν) we define κ̃(p) = 0, since u is a.e. a constant function near a neighbourhood of p.
With this concept of weak curvature, the functionalised elastica energy in (3.17) can now be rigorously defined for BV functions u with κ̃ ∈ L2 (D, du ν). For more properties on the weak curvature and
equivalence between classical and weak curvature over certain classes of functions see [CKS02]. In what
follows, we will continue by deriving the first variation of the Euler elastica energy (3.17) and in turn an
interpretation of its inpainting dynamics in terms of transport and diffusion of grey value information
in g.
Theorem 3.21. Let φ ∈ Cc1 (R, [0, ∞)) and define
Z
φ(κ) · |∇u| dx,
R(u) =
Ω
for u ∈ W 2,1 (Ω). Then, the first variation of R over the set of Cc∞ (Ω) is given by
~,
∇u R = −divV
where
~ = φ(κ) · ~n −
V
~t ∂ (φ0 (κ)|∇u|)
.
|∇u|
∂~t
Here ~n = ∇u/|∇u| and ~t ⊥ ~n.
Please email all corrections and suggestions to these notes to [email protected]
31
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Proof. Let v ∈ Cc∞ (Ω), a compactly supported test function on Ω, then the first variation of R is defined
for τ ∈ R as
Z
d
R(u + τ v) |τ =0 .
∇u R · v dx =
dτ
Ω
We first compute
d
R(u + τ v)
dτ
∇(u + τ v)
∇v + φ0 (κu+τ v )
|∇(u + τ v)|
Z
φ(κu+τ v ) ·
=
Ω
d
κu+τ v |∇(u + τ v)| dx.
dτ
Then, we determine the derivative of κu+τ v = div (∇(u + τ v)/|∇(u + τ v)|) as


v))T
∇v|∇(u + τ v)| − ∇(u + τ v) · (∇(u+τ
d
∇(u + τ v)
|∇(u+τ v)| ∇v

div
= div 
dτ
|∇(u + τ v)|
|∇(u + τ v)|2
1
∇(u + τ v)
∇(u + τ v)
⊗
= div
Id −
∇v ,
|∇(u + τ v)|
|∇(u + τ v)| |∇(u + τ v)|
where ~a ⊗ ~a = ~a~aT is the orthogonal projection on the vector ~a ∈ R2 . Then,
d
R(u + τ v) |τ =0
dτ
Z
φ(κ)~n∇v
=
φ0 (κ)div
+
Ω
1
[Id − ~n ⊗ ~n] ∇v |∇u|
|∇u|
dx,
where we have used ~n = ∇u/|∇u|. The second term in the above addition can be simplified by integration by parts to
Z
−
1
[Id − ~n ⊗ ~n]
∇ (φ0 (κ)|∇u|) ∇v dx
|∇u|
Ω
Z
=
div [Id − ~n ⊗ ~n]
Ω
1
0
∇ (φ (κ)|∇u|) v dx.
|∇u|
we derive
Integrating the first term by parts as well and using ~n ⊗ ~n + ~t ⊗ ~t = Id and ~t ⊗ ~t∇f = ~t ∂f
∂~
t
1 ∂ (φ0 (κ)|∇u|) ~
∇u R · v dx = −
div φ(κ)~n −
t v dx
|∇u|
∂~t
Ω
ZΩ
~ · v dx,
=−
divV
Z
Z
Ω
which completes the proof.
Corollary 3.22. For the case of Euler elastica inpainting, i.e., φ(κ) = (a + bκ2 ) for nonnegative constants a, b,
the above theorem gives the following expression for the first variation of the respective regularising energy R
~,
∇u R = −divV
3.5
~ = (a + bκ2 )~n − 2b ∂κ|∇u| ~t.
V
|∇u| ∂~t
Image segmentation with Mumford-Shah
Mumford and Shah [MS89] introduced in 1989 a segmentation model that is based on the idea of decomposing an image into piecewise smooth parts that are separated by an edge set Γ. As before let
Ω ⊂ R2 be a rectangular domain and f a given (possibly noisy) image. Further, define an edge set Γ to
Please email all corrections and suggestions to these notes to [email protected]
32
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
be a relatively closed subset of Ω, with finite one-dimensional Hausdorff measure. We search for a pair
(u, Γ) minimising
Z
1
(u − f )2 dx + J (u, Γ),
(3.18)
E(u, Γ) =
2 Ω
with
J (u, Γ) =
α
2
Z
|∇u|2 dx + βH1 (Γ).
(3.19)
Ω\Γ
Here α and β are nonnegative constants and H1 (Γ) is the one-dimensional Hausdorff measure of Γ
(which is the length of Γ if Γ is regular). The Mumford-Shah model is inspired by former statistical
approaches, e.g. [GG84, BZ87]. It aims to decompose a given image into its piecewise smooth part u
and its edge set Γ, where the former is measured with the H 1 norm and the latter by its length or more
generally by the one dimensional Hausdorff dimension H1 (Γ).
Reduced case α → +∞:
of
To study the reduced case for (3.19) as α → +∞, we consider the minimisation
Z
Ẽ(u, Γ) =
(u − g)2 dx + βH1 (Γ),
Ω\Γ
A = {(u, Γ) : Du|Ω\Γ = 0, Γ is closed in Ω and H1 (Γ) < ∞}.
Then, for Γ 6= ∅ fixed a minimiser u of Ẽ over A is piecewise constant on the connected components of
SN
Ω \ Γ. In particular, if Ω \ Γ = i=1 Ωi is the unique decomposition into connected components, then for
u = ui constant on Ωi we have that
Z
Z
1
g dx =: gø|Ωi .
argminui
(g − ui )2 dx
=
|{z}
|Ωi | Ωi
Ωi
least squares approximation
Inserting this into Ẽ, the problem for the edge set Γ becomes
Z
Ẽ(Γ|g) =
(g − gø|Γ )2 dx + βH1 (Γ),
Ω\Γ
were
gø|Γ :=
N
X
gø|Ωi 1Ωi (x),
x ∈ Ω.
i=1
This is a model that seeks object boundaries Γ such that in each connected component of Ω \ Γ the given
image g is approximately constant. This model has been rediscovered by Chan & Vese in 2001 under the
name “Active Contours without (Gradients) Edges”.
Well-posedness of the scheme: Going back to the general case, we attempt to prove existence of minimisers via the direct method of calculus of variations. To do so we must find a topology that ensures
both compactness of the minimising sequence and lower-semicontinuity of E. However, we face the
following difficulty
Please email all corrections and suggestions to these notes to [email protected]
33
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Theorem 3.23 (Ambrosio 1989). Let E be a Borel set of RN with boundary ∂E, then the map
E 7→ HN −1 (∂E)
is not lower semicontinuous with respect to any compact topology.
The proof of this theorem is very involved and goes beyond the scope of this course. However, let
us understand this by means of an example.
Example 3.24 (Lack of l.s.c. of the Hausdorff measure). Let xi be the sequence of all rational points in RN
and let
Bi = {x ∈ RN : |x − xi | ≤ 2−i },
Ek =
k
[
∞
[
Bi , E =
i=0
Bi .
i=0
Moreover, let |E| be the Lebesgue measure of E and ωN = |B1 (0)| = Lebesque measure of the unit ball in RN .
Then,
∞
∞
[ X
|Bi |
|E| = Bi ≤
i=0
i=0
=
∞
X
2−iN · ωN
i=0
ωN
< +∞.
=
|{z}
1 − 2−N
N ≥1
But then, since rationals are dense in RN we have that cl(E) = RN and since RN is uncountable ∂E = cl(E) \ E
has infinite Lebesque measure. Therefore, also HN −1 (∂E) = +∞.
On the other hand
" k
#
[
HN −1 (∂Ek ) = HN −1 (∂
Bi )
i=0
≤H
N −1
k
[
!
∂Bi
i=0
=
k
X
ωN −1 N 2−i(N −1)
i=0
N ωN −1
≤
.
|{z} 1 − 2−(N −1)
N ≥2
Hence, for N ≥ 2 the sequence {HN −1 (∂Ek )} is bounded and Ek → E as k → +∞ in the sense of measures,
that is χEk → χE in L1 . But
HN −1 (∂E) > liminf k HN −1 (∂Ek ).
Hence, for getting a well-defined imaging model we shall introduce a relaxed version of (3.19) for
which existence of solutions in the space of so-called special functions of bounded variation SBV (Ω)
[AFP00], defined below, can be proven.
Please email all corrections and suggestions to these notes to [email protected]
34
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Let us start the well-posedness discussion of (3.18) with rephrasing the problem within a more regular setting (cf. [CS05a]) than the one of the relaxed functional we will end up with. This provides us with
a better ground to motivate the idea the relaxation is built on. Let u be defined as piecewise smooth, i.e.
SK
S
u is defined on a partition of Ω = i=1 Ωk Γ with u|Ωk ∈ H 1 (Ωk ) for k = 1, . . . , K and piecewise C 1
edge set Γ. Then one can define a piecewise continuous vector field ~n for each point x ∈ Γ that is part of
a C 1 component and with that the jump of u at x as
[u](x) = lim+ u(x + ρ · ~n(x)) − u(x − ρ · ~n(x)),
ρ→0
where the limit for ρ → 0+ is taken is the trace sense. With that we can define a vectorial measure on Γ
as
Z
Ju := [u]~n dH1 with Ju (γ) = [u](x)~n(x) dH1 ∀γ ∈ B(Γ).
γ
In this setting one can easily check that the restriction of the distributional derivative Du to the edge set
Γ equals Ju . Moreover, on the components Ωk where u is H 1 one has that Du = ∇u ∈ L2 (Ωk ) and hence
Du = ∇u|Ω\Γ + Ju |Γ .
This is our key observation: Instead of defining the edge set Γ separately from u we rather capture it
within the jump set of u, i.e. the set in which Du is singular and one-dimensional.
The problem with assuming Γ ∈ C 1 (or even piecewise C 1 ) is that this space does not provide us
with any compactness property and as a consequence any existence proof. Hence, we have to loosen our
assumption on Γ and prove existence in a less restrictive (i.e. larger) function space. This space will turn
out to involve the space of functions of bounded variation BV (Ω). This space provides us with sufficient
compactness and semicontinuity properties and gives sense to one dimensional discontinuities (edges)
of functions. The latter becomes clear when recalling some facts about functions of bounded variation
such as that the distributional derivative Du of a BV -function can be written as the sum of its absolute
continuous part ∇u dx, its jump part Ju and its Cantor part Cu , i.e.
Du = ∇u dx + (u+ − u− )~nu H1 |Su +Cu ,
{z
}
|
(3.20)
Ju
cf. the next paragraph for the details of this decomposition.
Lebesgue decomposition of Du Let u ∈ BV (Ω). Then, from the general theory of the Lebesgue
decomposition of measures, cf. e.g. [AFP00, p. 14, Theorem 1.28], we have that
Du = ∇u dx + Ds u,
∈ L1 (Ω) the absolute continuous part of Du and Ds u ⊥ dx is the singular part of
where ∇u(x) = d(Du)
dx
Du. The latter can be further decomposed into a jump part Ju and a Cantor part Cu , cf. [AFP00, Section
3.9]. Before we specify what these parts are exactly, we have to introduce some additional terminology
first.
For λ ∈ R, z ∈ Ω, and a small ρ > 0, we define the following subsets of the disc Bz,ρ = {x : |x − z| <
ρ}:
{u > λ}z,ρ := {x ∈ Ω ∩ Bz,ρ : u(x) > λ},
{u < λ}z,ρ := {x ∈ Ω ∩ Bz,ρ : u(x) < λ}.
Please email all corrections and suggestions to these notes to [email protected]
35
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Definition 3.25. We call a function u essentially not greater than λ in a point x ∈ Ω and write u(x) λ if
lim
ρ→0+
dx({u > λ}x,ρ )
=0
dx(Bx,ρ )
and analogously, u is essentially not smaller than λ in x and we write u(x) λ if
lim+
ρ→0
dx({u < λ}x,ρ )
= 0.
dx(Bx,ρ )
Then, we define the approximate upper and lower limit of a measurable function u in Ω as
u+ (x) := inf{λ ∈ R : u(x) λ},
u− (x) := sup{λ ∈ R : u(x) λ},
respectively.
For a function u ∈ L1 (Ω) we have
1
ρ→0 dx(Bx,ρ )
Z
|u(x) − u(y)| dy = 0 a.e. x ∈ Ω.
lim
Bx,ρ
Points x for which the above holds are called Lebesgue points of u, which have the properties
Z
1
u(x) = lim
u(y) dy,
ρ→0 dx(Bx,ρ ) B
x,ρ
u(x) = u+ (x) = u− (x).
The complement of the set of Lebesgue points (up to a set of H1 measure zero) is called the jump set Su ,
i.e.
Su = {x ∈ Ω; u− (x) < u+ (x)}.
The set Su is countable rectifiable, and for H1 a.e. x ∈ Ω, we can define a normal ~nu (x).
These considerations lead us to the decomposition (3.20) of the distributional derivative Du. The
idea now is, similar to the C 1 -case discussed before, to identify the edge set Γ with Su . Hence, instead
of (3.18) we minimise
Z
Z
1
α
2
E(u) =
(u − f ) dx +
|∇u|2 dx + βH1 (Su ).
2 Ω\D
2 Ω
Solving this would allow us to eliminate the unknown Γ in the minimisation problem. The issue however is that we cannot do this in BV . The objectionable element of BV is the Cantor part Cu in the
decomposition (3.20). For a function u ∈ BV this part may contain pathological functions such as the
Cantor-Vitali function that make the minimisation problem ill-posed, cf. [Am89]. The Cantor-Vitali
function is non-constant, continuous but with approximate differential equals zero almost everywhere.
For such a function v we would have
Z
1
E(v) =
(v − f )2 dx ≥ inf E(u) = 0,
2 Ω\D
u∈BV (Ω)
since BV functions are dense in L2 . But this means that the infimum cannot be achieved in general. To
exclude this case, we consider the space of special functions of bounded variation SBV (Ω), which is the
space of BV -functions such that Cu = 0. Then, our new problem that replaces (3.18) reads
min{E(u) : u ∈ SBV (Ω) ∩ L∞ (Ω)}.
(3.21)
For the relaxed problem (3.21) we have the following existence result.
Please email all corrections and suggestions to these notes to [email protected]
36
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Theorem 3.26. Let f ∈ L∞ (Ω). Then, the minimisation problem (3.21) obtains a solution.
To prove Theorem 3.26 we will use the following compactness and closure results for SBV functions,
cf. [AFP00, Section 4.].
Theorem 3.27 (Closure of SBV). Let Ω ⊂ Rd open and bounded and (un ) ⊂ SBV (Ω) with
(Z
)
Z
n
−
d−1
|u+
n − un | dH
|∇un |2 dx +
sup
Ω
< ∞.
(3.22)
Sun
If (un ) is weakly∗ converges in BV to u, then u ∈ SBV (Ω), ∇un weakly converges to ∇u in [L2 (Ω)]d , and
Dj un weakly∗ converges to Dj u in Ω. Moreover,
Z
Z
|∇u|2 dx ≤ liminf n→∞
|∇un |2 dx
Ω
Ω
Z
Z
(3.23)
+
−
d−1
−
d−1
|u − u | dH
≤ liminf n→∞
|u+
.
n − un | dH
Su
Sun
Theorem 3.28 (Compactness of SBV). Let Ω ⊂ Rd open and bounded, and (un ) ⊂ SBV (Ω). Assume that
un satisfies (3.22) and |un (x)| ≤ C for a.e. x ∈ Ω, for a constant C ≥ 0 and all n ≥ 1. Then, there exists a
subsequence (un(k) ) weakly∗ converging in BV (Ω) to u ∈ SBV (Ω) with |u(x)| ≤ C for a.e. x ∈ Ω.
Proof of Theorem 3.26. Let (un ) ∈ SBV (Ω) ∩ L∞ (Ω) be a minimising sequence of E. First, we convince
ourselves of the fact that we can restrict the minimisation problem to functions u that are essentially
bounded by C = kf kL∞ . This is because for ũ = max(min(u, C), −C) the truncated version of u we
have
Sũ ⊂ Su , and
Z
Z
Z
Z
α
|∇ũ|2 dx +
|f − ũ|2 dx ≤ α
|∇u|2 dx +
|f − u|2 dx.
Ω
Ω
Ω
Ω
Then, for such a minimising sequence we have the uniform bound
Z
Z
α
1
2
(un − f ) dx +
|∇un |2 dx + βH1 (Sun ) ≤ C,
E(un ) =
2 Ω\D
2 Ω
for a constant C ≥ 0 and for all n ≥ 1. By Theorem 3.28 we can find a subsequence un(k) that weakly∗
converges in BV to a u ∈ SBV (Ω) with |u(x)| ≤ C for a.e. x ∈ Ω. Moreover, by Theorem 3.27 ∇un
weakly converges to ∇u in (L2 (Ω))d and Dj un weakly∗ converges to Dj u in Ω. Applying the lower
semicontinuity properties (3.23) finishes the existence proof.
Having established the existence theory for the relaxed problem (3.21) the question arises what the
connection between the relaxed and the original formulation (3.18) exactly is? To answer this we make
use of the following theorem from [Am89a].
Theorem 3.29. Let Γ ⊂ Ω be a closed set such that Hd−1 (Γ) < ∞, and let u ∈ H 1 (Ω \ Γ) ∩ L∞ (Ω). Then,
u ∈ SBV (Ω) and Su ⊂ Γ ∪ P with Hd−1 (P ) = 0.
Hence,
min E(u) ≤ inf E(u, Γ).
u
(u,Γ)
Please email all corrections and suggestions to these notes to [email protected]
37
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
Moreover, for a minimiser of E it is proven in [DCL89, DMS92, MS95] that Hd−1 (Ω ∩ (S¯u − Su )) = 0.
Then, by choosing Γ = Ω ∩ S¯u we get a solution of the original problem and
min E(u) = min E(u, Ω ∩ S¯u ).
u
u
Following the existence theory there exists a series of works concerned with the regularity of the
edge set Γ. cf. e.g. [MS89, Bo96]. In practice mostly the edge set Γ is assumed to be at least Lipschitz
continuous in which case the Hausdorff measure as a regularity measure of Γ is replaced by the length
of Γ as defined in (3.3).
4
PDEs in Imaging
In section 1.4 we have already seen that for processing images with PDE methods one should turn to
a nonlinear (or anisotropic) approach. In what follows we discuss two nonlinear PDEs that have been
proposed for image enhancement: the Perona-Malik model and the anisotropic diffusion PDEs studied
by Joachim Weickert.
4.1
Perona-Malik
One of the most important and classical nonlinear PDEs for image enhancement is the Perona-Malik
model, proposed by Perona and Malik in 1990 [PM90]. The basic idea behind this model is to use a
nonlinear diffusion that reduces diffusivity at those locations in the image that are more likely to be
edges. This likelihood for edges is measured by |∇u|, the size of the gradient of an image function u.
For a given image g, the Perona-Malik filter is based on the equation
ut = div c(|∇u|2 )∇u
(4.1)
u(x, t = 0) = g(x),
and it uses diffusivities such as
c(s2 ) =
1
,
1 + s2 /λ2
(4.2)
λ > 0.
Here λ is a given threshold that encodes the size of edges that should be preserved or enhanced by the
diffusion process. This equation applies diffusion and edge detection in one single process.
Smoothing and edge enhancement in 1D Let us start studying the behaviour of (4.1) in one space
dimension. This simplifies notation and illustrates the main behaviour since near a straight edge a two
dimensional image approximates a function of one variable. In that case, (4.1) can be rewritten as
~ (u2 ) ,
ut = V
x
x
~ is given by
where – for our chosen diffusivity (4.2) – the flux function V
~ (s) = s · c(s2 ).
V
Then
~ 0 (s) = c(s2 ) + s c(s2 ) 0 ,
V
with
2
c(s )
0
=
1
1 + s2 /λ2
0
=−
2s/λ2
2
(1 + s2 /λ2 )
Please email all corrections and suggestions to these notes to [email protected]
38
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
gives

~ 0 (s) =
V
1
1 + s2 /λ2



s2
2


1 − 2
.
λ 1 + s2 /λ2 

|
{z
}
≤1 if |s|≤λ
Hence,
~ 0 (s) ≥ 0
V
~ 0 (s) < 0
V
Now, we can rewrite (4.1) as
if
|s| ≤ λ
if
|s| > λ.
~ 0 (ux ) · uxx
ut = V
and observe that the Perona-Malik model is of forward parabolic type if |ux | ≤ λ and of backward
parabolic type if |ux | > λ. This means that λ plays the role of a contrast parameter that separates
forward (low contrast) from backward (high contrast) diffusion areas, see Figure 12.
Figure 12: The Perona-Malik model performs forward- and backward diffusion in areas with low- and
high contrast respectively.
Smoothing and edge enhancement in 2D A similar argument can be made in two dimensions, where
the diffusion-dynamics (in points where |∇u| =
6 0) can be divided into diffusion in normal and tangential
direction to the level lines of u. In that case
ut =div c(|∇u|2 )∇u
c(|∇u|2 )(uxx + uyy ) + 2c0 (|∇u|2 ) (u2x uxx + 2ux uy uxy + u2y uyy ).
~ and T~ ,
Now, for each point x where |∇u(x)| =
6 0 we can define the normal and tangential vectors N
respectively, as
~ (x) = ∇u(x) ,
N
|∇u(x)|
~
~
~
T (x) ⊥ N (x) with |T (x)| = 1.
Please email all corrections and suggestions to these notes to [email protected]
39
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
~ and T~ direction as
Then, we can rewrite (4.1) in terms of diffusion in N
ut = c(|∇u|2 )uT T + c(|∇u|2 ) + 2|∇u|2 c0 (|∇u|2 ) uN N
(4.3)
where
1
u2 uyy + u2y uxx − 2ux uy uxy
|∇u|2 x
~ t ∇2 uN
~ = 1
=N
u2x uxx + u2y uyy + 2ux uy uxy .
2
|∇u|
uT T = T~ t ∇2 uT~ =
uN N
Directional smoothing in 2D From the representation of the diffusion-dynamics in terms of normal
and tangential direction in (4.3) it might come to mind to choose c is such a way that we smooth more
~ . This could be imposed by
in tangential direction T~ (that is along edges) and less in normal direction N
choosing c such that
c(s) + 2sc0 (s)
= 0,
lim
s→+∞
c(s)
or
sc0 (s)
1
lim
=− .
s→+∞ c(s)
2
Restricting ourselves to functions c(s) > 0 with power growth, then this limit implies
1
c(s) ≈ √ as s → +∞.
s
One example for such a c is
c(s) = √
1
.
1+s
A comment on the well-posedness of Perona-Malik Indeed for most interesting choices of c, such as
(4.2), the Perona-Malik equation is ill-posed (due to the backward diffusion). This intrinsic ill-posedness
of the equation is what makes it work for edge enhancement. A way to derive a well-posed model from
the Perona-Malik equation is to regularise the nonlinearity in (4.1). An example for that is the work of
[CLMC92], where the authors consider
ut = div c(|∇(Gσ ∗ u)|2 )∇u
u(x, t = 0) = g(x),
and prove the following theorem.
2
+
Theorem 4.1.
→ R+ be smooth, decreasing with φ(0) = 1, lims→+∞ φ(s) = 0
p Let φ(s) = c(s ), φ : R
2
and s 7→ φ( (s)) is smooth. If g ∈ L (Ω), then there exists a unique function u(t, x) ∈ C([0, T ]; L2 (Ω)) ∩
L2 (0, T ; H 1 (Ω)) satisfying in the distributional sense

ut = div (φ(|∇(Gσ ∗ u)|)∇u) in (0, T ) × Ω,


∇u · ν = 0
on (0, T ) × ∂Ω,


u(x, t = 0) = g(x)
in Ω.
Moreover,
|u|L∞ ((0,T );L2 (Ω)) ≤ |g|L2 ,
and u ∈ C ∞ ((0, T ) × cl(Ω)).
Please email all corrections and suggestions to these notes to [email protected]
40
Image Processing – Variational and PDE Methods
4.2
C.-B. Schönlieb
Anisotropic diffusion filters
In the Perona-Malik model (4.1) the diffusion is weighted by a scalar diffusivity c that only determines
the strength of the diffusion but cannot change its direction. With respect to this, the diffusion is nonlinear but isotropic. In this section we shall derive another image enhancement method that is based on
an anisotropic diffusion equation that takes into account local variations of the gradient orientation and
can potentially assign the diffusion with a certain orientation.
A natural idea is to choose as a preferred smoothing direction d~ the one that minimises grey value
~
fluctuations. We consider d(Θ)
= (cos(θ), sin(θ)) and
(
if d~ k ∇u
2 max
~
F (Θ) = |d(Θ) · ∇u(x)|
min if d~ ⊥ ∇u.
Now, minimising/maximising F (Θ) is equivalent to minimising/maximising the quadratic form
~
d~t ∇u · (∇u)t · d,
where
t
∇u · (∇u) =
u2x
ux uy
ux uy
u2y
is positive definite with eigenvalues
λ1 = |∇u|2
λ2 = 0
and an orthonormal basis of eigenvectors
v1 k ∇u
v2 ⊥ ∇u.
Having this, the idea is to define at each point x in the image an orientation descriptor as a function
of (∇u(∇u)t ) (x). The problem with this idea is though that this function constitutes a point wise estimate, which does not take into account possible information contained in a neighbourhood of x (and as
such makes this descriptor very sensitive to spurious changes in the image). To tackle this shortcoming
Weickert in [Wei98] proposed the introduction of smoothing kernels at different scales into the orientation descriptor - the result he called structure tensor of the image. In sketch form his approach consists
of the following steps.
The structure tensor:
• To avoid false detections due to noise the image u is first convolved with a Gaussian kernel,
that is
uσ (x) := (Gσ ∗ u)(x), σ ≥ 0.
• Further, local orientation information is averaged by building the so-called structure tensor
Jρ (∇uσ ) := Gρ ∗ ∇uσ · (∇uσ )t , ρ ≥ 0.
Please email all corrections and suggestions to these notes to [email protected]
41
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
• Interpreting the information in Jρ in terms of its eigenvalues and eigenvectors one derives a
way of manipulating the structure tensor. Namely, with Jρ (∇uσ ) = (jij ), the eigenvectors v1 ,
v2 are a pair of orthonormal eigenvectors
p 2j12
v1 k
, v2 ⊥ v1 ,
2
j22 − j11 + (j22 − j11 )2 + 4j12
and corresponding eigenvalues
q
1
2
2
µ1 =
j11 + j22 + (j11 − j22 ) + 4j12 ,
2
q
1
2
2
µ2 =
j11 + j22 − (j11 − j22 ) + 4j12 .
2
Than, the eigenvalues µ1 and µ2 describe the average contrast of the smoothed image function
uσ within a neighbourhood of size O(ρ) and the eigenvectors v1 and v2 the orientation that
maximise grey value fluctuations within this neighbourhood (≈k ∇uσ ) and its orthogonal (the
preferred direction of smoothing), respectively. Then, writing Jρ (∇uσ ) = 21 (µ1 v1 + µ2 v2 ), the
eigenvalues encode the weighting of each eigendirection in Jρ . In particular, the eigenvalues
µ1 and µ2 convey shape information in the form
µ1 (x) ≈ µ2 (x)
image has isotropic structure in x
µ1 (x) µ2 (x) ≈ 0
image has line like structure in x
µ1 (x) ≥ µ2 (x) 0
object edge forms a corner in x.
Now, using the structure descriptor the generic form of the anisotropic, nonlinear diffusion equation
ut = div (D(Jρ (∇uσ ))∇u) ,
where D is a diffusion tensor that depends on the structure tensor Jρ and can be chosen with respect
to the imaging task at hand. Roughly speaking, the eigenvectors of D should reflect the local image
structure. Hence, a good choice is to choose them to be the same orthonormal basis of eigenvectors as
one gets from Jρ . Moreover, the choice of the eigenvectors λ1 and λ2 of D depend on the desired goal.
We will discuss one example below. For more details on image enhancement with anisotropic diffusion
and other examples for D, see [Wei98].
Coherence-enhancing diffusion This type of anisotropic diffusion is designed to enhance flow (line)
like structures in an image function u with the potential to even repair broken line like structures by an
appropriate choice of ρ. To describe this approach, we first define the so-called coherence of an image u
as
coh = (µ1 − µ2 )2 ,
where µ1 and µ2 are the eigenvalues of Jρ (uσ ) and coh is taken as an indicator for line-like structures in
u, that is the larger coh(x) the more likely it is – we assume – that a line-like structure goes through x.
Please email all corrections and suggestions to these notes to [email protected]
42
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
With this assumption, one chooses a diffusion tensor D with the eigenvalues
λ1 = α
(
α
1
λ2 =
−
α + (1 − α)e (µ1 −µ2 )2
if µ1 = µ2
otherwise,
where α ∈ (0, 1).
References
[Am89] L. Ambrosio, Variational problems in SBV and image segmentation, Acta Applicandae Mathematicae, 17, pp. 1–40, 1989.
[Am89a] L. Ambrosio, A compactness theorem for a new class of functions of bounded variation, Bolletino
della Unione Matematica Italiana, VII (4), p. 857–881, 1989.
[AFP00] L. Ambrosio, N. Fusco, and D. Pallara, Functions of bounded variation and free discontinuity problems., Oxford Mathematical Monographs. Oxford: Clarendon Press. xviii, 2000.
[Al07] W. K. Allard, Total Variation Regularization for Image Denoising, I. Geometric Theory, SIAM J. Math.
Anal. 39 (2007), 1150–1190.
[ACC05] F. Alter, V. Caselles and A. Chambolle, A characterization of convex calibrable sets in RN , Math.
Ann. 332 (2005), 329–366.
[ACC05a] F. Alter, V. Caselles and A. Chambolle, Evolution of characteristic functions of convex sets in the
plane by the minimizing total variation flow, Interfaces Free Bound. 7 (2005), 29–53.
[AK06] G. Aubert, and P. Kornprobst, Mathematical Problems in Image Processing. Partial Differential Equations and the Calculus of Variations, Springer, Applied Mathematical Sciences, Vol 147, 2006.
[BSCB00] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, Image Inpainting, Siggraph 2000, Computer Graphics Proceedings, pp.417–424, 2000.
[BZ87] A. Blake, and A. Zisserman, Visual Reconstruction, MIT Press, Cambridge, MA, 1987.
[Bo96] A. Bonnet, On the regularity of the edge set of Mumford-Shah minimizers, Progress in Nonlinear
Differential Equations, 25, p. 93–103, 1996.
[BL11] K. Bredies, and D. Lorenz,
Vieweg+Teubner, 445 pages, 2011.
Mathematische Bildverarbeitung,
textbook (in german),
[CLMC92] F. Catté, P.-L. Lions, J.-M. Morel, and T. Coll, Image Selective Smoothing and Edge Detection by
Nonlinear Diffusion, SIAM J. Numer. Anal. 29 (1), pp. 182–193, 1992.
[Ch04] A. Chambolle, An algorithm for total variation minimization and applications, Journal of Mathematical Imaging and Vision, 20(1-2), pp. 89–97, 2004.
[CCCNP10] A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock, An Introduction to Total Variation for Image Analysis, Chapter in Theoretical Foundations and Numerical Methods for Sparse
Recovery, Ed. Massimo Fornasier, De Gruyter, 2010.
Please email all corrections and suggestions to these notes to [email protected]
43
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
[CKS02] T.F. Chan, S.H. Kang, and J. Shen, Euler’s elastica and curvature-based inpainting, SIAM J. Appl.
Math., Vol. 63, Nr.2, pp.564–592, 2002.
[CM99] T.F. Chan, and P. Mulet, On the convergence of the lagged diffusivity fixed point method in total
variation image restoration, SIAM Journal on Numerical Analysis, 36(2), pp. 354–367, 1999.
[CS01a] T. F. Chan and J. Shen, Mathematical models for local non-texture inpaintings, SIAM J. Appl. Math.,
62(3):1019–1043, 2001.
[CS01c] T. F. Chan and J. Shen, Non-texture inpainting by curvature driven diffusions (CDD), J. Visual
Comm. Image Rep., 12(4):436449, 2001.
[CS05a] T. F. Chan, and J. J. Shen, Image Processing and Analysis Variational, PDE, wavelet, and stochastic
methods, SIAM, (2005).
[DCL89] E. De Giorgi, M. Carriero, and A. Leaci. Existence theorem for a maximum problem with a free
discontinuity set, Archive for Rational Mechanics and Analysis, 108, p. 195–218, 1989.
[DMS92] G. Dal Maso, J.-M. Morel, and S. Solimini, A variational method in image segmentation: Existence
and approximation results, Acta Math., 168, p. 89–151, 1992.
[GG84] S. Geman, and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of
images, IEEE Trans. Pattern Anal. Machine Intell. 6 (1984), pp. 721-741.
[HUL93] J.B. Hiriart-Urruty, and C. Lemaréchal, Convex Analysis and Minimization Algorithms: Part 1:
Fundamentals, Vol. 305. Springer, 1993.
[L11] A. Langer, Subspace Correction and Domain Decomposition Methods for Total Variation Minimization,
Doctoral thesis, Johannes Kepler University of Linz, July 2011. http://people.ricam.oeaw.
ac.at/a.langer/publications/PhD_thesis.pdf.
[Ma98] S. Masnou, Filtrage et désocclusion d’images par méthodes d’ensembles de niveau, Thése de doctorat
á l’Université Paris-Dauphine, 1998.
[MM98] S. Masnou and J. Morel, Level Lines based Disocclusion, 5th IEEE Int’l Conf. on Image Processing,
Chicago, IL, Oct. 4-7, 1998, pp.259–263, 1998.
[MS95] J.-M. Morel, and S. Solimini, Variational methods in image segmentation. Progress in Nonlinear
Differential Equations and Their Applications, Vol. 14, Birkhäuser, Boston, 1995.
[MS89] D. Mumford, and J. Shah, Optimal approximations by piecewise smooth functions and associated variational problems, Comm. Pure Applied Math. 42, pp. 577-685, 1989.
[NMS93] M. Nitzberg, D. Mumford, and T. Shiota, Filtering, Segmentation, and Depth, Springer-Verlag,
Lecture Notes in Computer Science, 662, 1993.
[PM90] P. Perona, and J. Malik, Scale-space and Edge Detection Using Anisotropic Diffusion, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), pp.629-639, July 1990.
[RO94] L. Rudin and S. Osher, Total variation based image restoration with free local constraints, Proc. 1st
IEEE ICIP, 1, pp. 31–35, 1994.
[ROF92] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D 60, pp. 259-268, 1992.
Please email all corrections and suggestions to these notes to [email protected]
44
Image Processing – Variational and PDE Methods
C.-B. Schönlieb
[VO96] C.R. Vogel, and M.E. Oman, Iterative methods for total variation denoising, SIAM Journal on Scientific Computing, 17(1), pp. 227–238, 1996.
[Wei98] J. Weickert, Anisotropic diffusion in image processing, Teubner, Stuttgart, Germany, 1998.
Please email all corrections and suggestions to these notes to [email protected]
45
```