Limiting aspects of non-convex TV models Michael Hintermüller , Tuomo Valkonen

Limiting aspects of non-convex TV models Michael Hintermüller , Tuomo Valkonen
Limiting aspects of non-convex TVϕ models
Michael Hintermüller∗, Tuomo Valkonen†, and Tao Wu‡
Abstract. Recently, non-convex regularisation models have been introduced in order to provide a better prior
for gradient distributions in real images.
R They are based on using concave energies ϕ in the total
variation type functional TVϕ (u) := ϕ(|∇u(x)|) dx. In this paper, it is demonstrated that for
typical choices of ϕ, functionals of this type pose several difficulties when extended to the entire
space of functions of bounded variation, BV(Ω). In particular, if ϕ(t) = tq for q ∈ (0, 1) and TVϕ
is defined directly for piecewise constant functions and extended via weak* lower semicontinuous
envelopes to BV(Ω), then still TVϕ (u) = ∞ for u not piecewise constant. If, on the other hand,
TVϕ is defined analogously via continuously differentiable functions, then TVϕ ≡ 0, (!). We study a
way to remedy the models through additional multiscale regularisation and area strict convergence,
provided that the energy ϕ(t) = tq is linearised for high values. The fact, that this kind of energies
actually better matches reality and improves reconstructions, is demonstrated by statistics and
numerical experiments.
AMS subject classifications. 26B30, 49Q20, 65J20
Key words. total variation, non-convex, regularisation, area-strict convergence, multiscale analysis.
1. Introduction. Recently introduced non-convex total variation models are based on
employing concave energies ϕ, in discrete versions of functionals of the form
TVϕ
c (u) :=
Z
ϕ(|∇u(x)|) dx,
(u ∈ C 1 (Ω)),
(1.1)
Ω
which we call the continuous model, or
TVϕ
d (u) :=
Z
ϕ(|u+ (x) − u− (x)|) dHm−1 (x),
(u piecewise constant),
(1.2)
Ju
which we call the discrete model. Here Ω ⊂ Rm is our image domain, and Ju is the jump set of
u, where the one-sided traces u± from different sides of Ju differ. The typical energies include,
in particular, ϕ(t) = tq for q ∈ (0, 1). The models based on discretisations of (1.2) have
been proposed for the promotion of piecewise constant (cartoon-like) images [16, 23, 33, 34],
whereas models based on discretisations of (1.1) have been proposed for the better modelling
of gradient distributions in real-life images [27–29, 36]. To denoise an image z, one may then
solve the nonconvex Rudin-Osher-Fatemi type problem
1
min kz − uk2L2 (Ω) + αTVϕ (u)
u 2
∗
(1.3)
Institute for Mathematics, Humboldt University of Berlin, Germany. E-mail: [email protected]
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, United Kingdom (current), & Center for Mathematical Modeling, Escuela Politécnica Nacional, Quito, Ecuador. E-Mail:
[email protected], corresponding author.
‡
Institute for Mathematics and Scientific Computing, University of Graz, Austria. E-mail: [email protected]
†
1
2
M. Hintermüller, T. Valkonen, and T. Wu
ϕ
ϕ
for TVϕ = TVϕ
c or TV = TVd . Observe that (1.1) is only defined rigorously for differentiable
functions. In contrast to (1.2), it is in particular not defined for piecewise constant discretisations, or images with discontinuities. The functional has to be extended to the whole space of
functions of bounded variation denoted by BV(Ω), see [24] for its definition, in order to obtain
a sound model in the non-discretised setting. Alternatively, we may take (1.2), defined for
piecewise constant functions, as the basis and extend it to continuous functions. We will study
the extension of both models (1.1) and (1.2) to BV(Ω). We demonstrate that (1.1) in particular has severe theoretical difficulties for typical choices of ϕ. We also demonstrate that some
of these difficulties can be overcome by altering the model to better match reality, although
we also need additional multiscale regularisation in the model for theoretical purposes.
It is worth noting that in the finite-dimensional setting that has mainly been considered in
the aforementioned previous works, such problems with well-posedness do not surface. This
can also be seen by adaptation of our results, minding that in a finite-dimensional subspace
of BV(Ω), weak* and strong convergence are equivalent. Infinite-dimensional models, while
more challenging, can be more informative, and lead to better understanding of the model—
both finite- and infinite-dimensional. Concepts such as smoothness and jumps, are, after all,
not well-defined in the discrete setting. Study of these and other properties in the infinitedimensional setting can provide valuable information about the behaviour of the associated
image processing approaches and their solution approaches [1, 7, 10, 13, 14, 20, 46, 47].
Let us consider the specific difficulties with the discrete model TVϕ
d first. We assume that
m
we have a regularly spaced grid Ωh ⊂ Ω ∩ hZ , (h > 0), and a function uh : Ωh → R. By
m
{ei }m
i=1 we denote the canonical orthonormal basis of R . Then we identify uh with a function
u that is constant on each cell k + [0, h]m , (k ∈ Ωh ). Accordingly, we have
TVϕ
d (uh ) :=
m
X X
hm−1 ϕ(|uh (k + ei ) − uh (k)|).
(1.4)
k∈Ωh i=1
This discrete expression with h = 1 is essentially what is studied in [16, 33, 34], although [33]
studies also more general discrete models. In the function space setting, this model has to be
extended to all of BV(Ω), in particular to smooth functions. The extension naturally has to
be lower semicontinuous in a suitable topology, in order to guarantee the existence of solutions
to (1.3). Therefore, one is naturally confronted with the question whether such an extension
can be performed meaningfully?
Let us consider a simple motivating example on Ω = (0, 1) with ϕ(t) = tq for q ∈ (0, 1).
We aim to approximate the ramp function
u(t) = t
by piecewise constant functions. Given k > 0, we thus define
uk (t) = i/k,
for t ∈ [(i − 1)/k, i/k) and i ∈ {1, . . . , k}.
Clearly we have that uk converges strongly to u in L1 (Ω). Using the discrete model (1.4) with
h = 1/k, one has
TVqd (uk ) =
k
X
(1/k)0 · (1/k)q = k 1−q .
i=1
Limiting aspects of non-convex TVϕ models
3
We see that limk→∞ TVqd (uk ) = ∞! This suggests that the TVq model based on the “discrete”
functional might only allow piecewise constant functions as solutions. In other words, TVqd
would induce pronounced staircasing – a property desirable when restoring piecewise constant
images, but less suitable for other applications. In §3, we will indeed demonstrate that either
u is piecewise constant, or u 6∈ BV(Ω).
In order to highlight the inherent difficulties, let us then consider the continuous model
TVϕ
c , directly given by (1.1) for differentiable functions. In particular, (1.1) also serves as a
1
definition of TVϕ
c for continuous piecewise affine discretisations of u ∈ C (Ω). We observe
1
that if u ∈ C (Ω) on a bounded domain Ω, and we set uh (k) = u(k) for k ∈ Ωh , then
TVϕ
c,h (uh ) :=
X
hm ϕ(|∇h uh (k)|),
[∇h uh (k)]i := uh (k + ei ) − uh (k) /h
with
(1.5)
k∈Ωh
satisfies
ϕ
lim TVϕ
c,h (uh ) = TVc (u).
h&0
This approximate model TVϕ
c,h with h = 1 is essentially what is considered in [27, 28, 36]. On
an abstract level, it is also covered by [33]. The question now is whether the definition of TVϕ
c
can be extended to functions of bounded variation in a meaningful manner.
To start our investigation, let us try to approximate on Ω = (−1, 1) the step function
(
u(t) =
0,
1,
t < 0,
t ≥ 0.
Given k > 0, we define
k
u (t) =


0,

1,


 1 (kt + 1),
2
Then
gives
uk
→ u in
L1 (Ω).
t < −1/k,
t ≥ 1/k,
t ∈ [−1/k, 1/k).
However, the continuous model (1.1) with ϕ(t) = tq for q ∈ (0, 1)
TVqc (uk ) = (2/k)q−1 .
Thus TVqc (uk ) → 0 as k % ∞. This suggests that any extension of TVqc to u ∈ BV(Ω) through
weak* lower semicontinuous envelopes will have TVqc (u) = 0, and that jumps in u are mapped
to 0 in general. In §4 we will prove this rigorously and detect more striking properties. A
weak* lower semicontinuous extension will necessary satisfy TVqc ≡ 0.
Despite this discouraging property, after discussing in §5 the implications of the abovementioned results, we find appropriate remedies. Our associated principal approach is given
in §6. It utilizes the (stronger) notion of area-strict convergence [18, 30], which – as will be
shown – can be obtained using the multiscale analysis functional η from [44,45]. In §7 we also
discuss alternative remedies which are related to compact operators and the space SBV(Ω) of
special functions of bounded variation. In order to keep the flow of the paper, the pertinent
proofs are relegated to the Appendix.
To show existence of solutions to the fixed TVϕ
c model involving area-strict convergence,
we require that ϕ is level coercive, i.e. limt→∞ ϕ(t)/t > 0. This induces a linear penalty to
4
M. Hintermüller, T. Valkonen, and T. Wu
edges in the image. Based on these considerations, one arrives at the question whether gradient
statistics, such as the ones in [29], are reliable in dictating the prior term (regularizer). Our
experiments on natural images in §8 suggest that this is not the case. In fact, the jump part
of the image appears to have different statistics from the smooth part. It seems that the
conventional TV regularization [40] provides a model for the jump part, which is superior to
the nonconvex TV-model. This statistically validates our model, which is also suitable for a
function space setting. Our rather theoretical starting point of making the TVϕ
c model sound
in function space therefore leads to improved practical models. Finally, in §9 we study image
denoising with this model, and finish with conclusions in §10. We however begin with notation
and other preliminary matters in the following §2.
2. Notation and preliminaries. We denote the set of non-negative reals as R0,+ := [0, ∞).
If ϕ : R0,+ → R0,+ , then we write
ϕ0 := lim ϕ(t)/t,
t&0
and
ϕ∞ := lim ϕ(t)/t,
t%∞
implicitly assuming that the (possibly infinite) limits exist.
The notation kxk without explicit specification of the space or type of norm, stands for
the L2 –norm (in finite dimensions the 2–norm).
We write the boundary of a set A as ∂A, and the closure as A. The open ball of radius ρ
centred at x ∈ Rm is denoted by B(x, ρ).
We now introduce some measure theory. For details of the definitions, we refer to textbooks
including [4, 21, 41]. For Ω ⊂ Rm , we denote the space of (signed) Radon measures on Ω by
M(Ω), and the space of Rm -valued Radon measures by M(Ω; Rm ). We use the notation |µ|
for the total variation measure of µ = (µ1 , . . . , µm ) ∈ M(Ω; Rm ), and define the total variation
(Radon) norm of µ by
kµkM(Ω;Rm ) := |µ|(Ω):= sup
(m Z
X
i=1
)
ϕi (x) dµi (x) ϕ = (ϕ1 , . . . , ϕm ) ∈ Cc∞ (Ω; Rm ) .
Ω
(2.1)
Here Cc∞ (Ω; Rm ) denotes the set of Rm -valued smooth, infinitely differentiable, functions ϕ
with compact support supp ϕ b Ω. If µ = f Lm corresponds to a function f ∈ L1 (Ω), then
the definition simplifies into kµkM(Ω) = kµkL1 (Ω) .
For a measurable set A, we denote by µxA the restricted measure defined by (µxA)(B) :=
µ(A ∩ B). The restriction of a function u to A is denoted by u|A. On any given ambient space
Rm , (k ≤ m), we write Hk for the k-dimensional Hausdorff measure, and Lm for the Lebesgue
measure. We also define the Dirac δ-measure at x by
(
δx (A) :=
1,
0,
x ∈ A,
x 6∈ A.
Roughly, in the case m = 2 and k = 1, which is of most interest for us, L2 measures the
area of sets in R2 , while H1 measures the length of (collections of) curves embedded in R2 .
The Dirac δ measures membership of individual points, and is different from H0 , which would
count the number of points in a set.
Limiting aspects of non-convex TVϕ models
5
If J ⊂ Rm and there exist Lipschitz maps γi : Rm−1 → R with
Hm−1 J \
∞
[
!
γi (Rm−1 )
= 0,
i=1
then we say that J is countably Hm−1 -rectifiable. Again, with m = 2, this means roughly that
J is contained in a collection of Lipschitz curves, modulo a negligible set. It may, however,
happen, that J is merely a point cloud contained in the curves, and not a full curve itself.
We say that a function u : Ω → R on an open domain Ω ⊂ Rm is of bounded variation (see,
e.g., [4] for a thorough introduction), denoted u ∈ BV(Ω), if u ∈ L1 (Ω), and the distributional
gradient Du, given by
Z
Du(ϕ) :=
div ϕ(x)u(x) dx,
Ω
(ϕ ∈ Cc∞ (Ω)),
is a Radon measure. This is equivalent to asking that
TV(u) := |Du|(Ω) < ∞,
where the total variation of u is defined by setting µ = Du in (2.1). We can then decompose
Du into
Du = ∇uLn + Dj u + Dc u,
where ∇uLn is called the absolutely continuous part, Dj u the jump part, and Dc u the Cantor
part. We also denote the singular part by
Ds u := Dj u + Dc u
The density ∇u ∈ L1 (Ω; Rm ) corresponds to the classical gradient if u is differentiable. The
jump part may be written as
Dj u = (u+ − u− ) ⊗ νJu Hm−1 xJu ,
where the jump set Ju is countably Hm−1 -rectifiable, νJu (x) is its normal, and u+ and u− are
one-sided traces of u on Ju . These are defined by the condition
1
lim m
ρ&0 ρ
Z
B ± (x,ρ,ν)
|u± (x) − u(y)| dy = 0,
being satisfied for some normal vector ν ∈ Rm and the associated half-ball
B ± (x, ρ, ν) := {y ∈ B(x, ρ) | ±hy − x, νi ≥ 0}.
The remaining Cantor part Dc u of Du vanishes on any Borel set which is σ-finite with respect
to Hm−1 ; in particular |Dc u|(Ju ) = 0. We declare u an element of the space SBV(Ω) of special
functions of bounded variation, if u ∈ BV(Ω) and Dc u = 0.
We define the norm
kukBV(Ω) := kukL1 (Ω) + kDukM(Ω;Rm ) .
6
M. Hintermüller, T. Valkonen, and T. Wu
We say that a sequence {ui }∞
i=1 ⊂ BV(Ω), converges weakly* to u in BV(Ω), denoted by
∗
∗
ui *
u, if ui → u strongly in L1 (Ω) and Dui *
Du weakly* in M(Ω; Rm ). If in addition
|Dui |(Ω) → |Du|(Ω), we say that the convergence is strict. The weak* convergence of Dui to
Du may in this case be expressed as
Z
i
div ϕ(x) du (x) →
Z
Ω
div ϕ(x)u(x) dx,
Ω
for all ϕ ∈ Cc∞ (Ω; Rm ).
It roughly means that we have convergence when we “test” the sequence by a sensor–the test
function ϕ. We may however not have convergence on those aspects of the image u that
cannot be sensed by such sensors. In particular, weak* convergence does not imply strong
convergence, or even strict converence. Strict convergence helps avoid annihilation effects.
For example, the step functions ui = χ[−1/i,1/i] , with Dui = δ−1/i − δ1/i and |Dui |(Ω) = 2,
converge weakly to zero, but not strictly.
∗
Finally, we say that a funtional F : BV(Ω) → R is weak* lower semicontinuous, if ui *
u implies F (u) ≤ lim inf i→∞ F (ui ). This is an essential property for showing existence of
minimisers of F : if ui is a infimising sequence, F (ui ) & inf F > −∞, then u is a minimiser
provided F is weak* lower semicontinuous.
3. Limiting aspects of the discrete TVϕ model. We begin by rigorously defining and
analysing the discrete TVϕ model (1.2) in BV(Ω). This model is used in the literature to
promote piecewise constant solutions to image reconstruction problems. For our analysis we
consider the following class of energies ϕ.
Definition 3.1.Define Wd as the set of increasing, lower semicontinuous, subadditive functions ϕ : R0,+ → R0,+ that satisfy ϕ(0) = 0 and ϕ0 = ∞.
Example 3.1. Examples of ϕ ∈ Wd include ϕ(t) = tq for q ∈ [0, 1). Note that this choice
is the only one in Wd of the classes considered, for example, in [34]. The logistic penalty
ϕ(t) = log(αt + 1) in particular, while subadditive, has ϕ0 = α < ∞. The fractional penalty
ϕ(t) = αt/(1 + αt) likewise has ϕ0 = α < ∞. As we will later see, these classes are, however,
admissible for the continuous model TVϕ
c.
Definition 3.2.Denote by pwc(Ω) the set of functions u ∈ BV(Ω) that are piecewise constant
in the sense Du = Dj u. We then write |Dj u| = θu Hm−1 xJu , where
θu (x) := |u+ (x) − u− (x)|.
Definition 3.3.Given an energy ϕ ∈ Wd , the “discrete” non-convex total variation model is
defined by
Z
gϕ (u) :=
TV
d
ϕ(θu (x)) dHm−1 (x),
(u ∈ pwc(Ω)),
Ju
and extend this to u ∈ BV(Ω) by defining
gϕ (ui ),
TVϕ
inf TV
d (u) := lim
d
∗
ui *u,
ui ∈pwc(Ω)
with the convergence weakly* in BV(Ω), in order to obtain a weak* lower semicontinuous
functional.
Limiting aspects of non-convex TVϕ models
7
]ϕ in particular agrees with (1.4). Our main result regarding this model
The functional TV
d
is the following.
Theorem 3.4. Let ϕ ∈ Wd . Then
TVϕ
d (u) = ∞
u ∈ BV(Ω) \ pwc(Ω).
for
The proof is based on the SBV compactness theorem [2]; alternatively it can be proved
via rectifiability results in the theory of currents [49], as used in the study of transportation
networks, e.g., in [37, 43].
Theorem 3.5 (SBV compactness [2]). Let Ω ⊂ Rm be open and bounded. Suppose ϕ, ψ :
0,+
R
→ R0,+ are lower semicontinuous and increasing with ϕ∞ = ∞ and ψ0 = ∞. Suppose
i ∗
{ui }∞
i=1 ⊂ SBV(Ω) and u * u ∈ SBV(Ω) weakly* in BV(Ω). If
Z
sup
ϕ(|∇ui (x)|) dx +
Jui
Ω
i=1,2,3,...
Z
!
ψ(θui (x)) dHm−1 (x)
< ∞,
then there exists a subsequence of {ui }∞
i=1 , unrelabelled, such that
ui → u strongly in L1 (Ω),
(3.1)
∇ui * ∇u weakly in L1 (Ω; Rm ),
(3.2)
j i
(3.3)
∗
j
m
D u * D u weakly* in M(Ω; R ).
If, moreover, ψ is subadditive, then
Z
ψ(θu (x)) dH
m−1
(x) ≤ lim inf
Z
i→∞
Ju
Jui
ψ(θui (x)) dHm−1 (x).
(3.4)
Proof. For the proof of (3.1)–(3.3), we refer to [2, 4]. As the SBV compactness theorem is
typically stated, concavity of ψ is required for the lower semicontinuity result (3.4). The fact
that subadditivity and ψ(0) = 0 suffices, follows from [4, Theorem 5.4], see also [2]. There we
use the fact
β := ψ0 = ϕ∞
in the application of the theorem to the functional
Z
F (u) :=
Z
ϕ(|∇u(x)|) dx +
Ω
ψ(θu (x)) dHm−1 (x) + β|Dc u|(Ω).
Ju
An alternative approach for the whole proof of the SBV compactness theorem is to map BV(Ω)
into a space of Cartesian currents and use [49].
∗
Proof of Theorem 3.4. Given u ∈ BV(Ω), let ui ∈ pwc(Ω) satisfy ui *
u weakly* in BV(Ω).
i
Then the SBV compactness theorem shows that ∇u = ∇u = 0 and Dc u = 0. Thus u ∈
pwc(Ω).
8
M. Hintermüller, T. Valkonen, and T. Wu
Remark 3.1. The functions ϕ(t) = αt/(1 + αt) and ϕ(t) = log(αt + 1) for α > 0, considered
in [34] for reconstruction of piecewise constant images, do not have the property ϕ(t)/t → ∞
as t & 0. The above result therefore does not apply, and indeed TVϕ
d defined using these
functions will not force u with TVϕ
(u)
<
∞
to
be
piecewise
constant,
as
the following result
d
states.
Proposition 3.6. Let ϕ : R0,+ → R0,+ be continuously differentiable and satisfy ϕ(0) = 0.
Then the following hold.
(i) If ϕ0 < ∞ and ϕ is subadditive, then there exist a constant C > 0 such that
TVϕ
d (u) ≤ C TV(u),
(u ∈ BV(Ω)).
(ii) If ϕ0 > 0 and ϕ is increasing, then for every M > 0 there exists also a constant
c = c(M ) > 0 such that
c TV(u) ≤ TVϕ
d (u),
(u ∈ BV(Ω), kukL∞ (Ω) ≤ M ).
Proof. We first prove the upper bound. To begin with, we observe that
ϕ(t) ≤ ϕ0 t.
(3.5)
Indeed, since ϕ is sub-additive we have
lim
δ&0
ϕ(t + δ) − ϕ(t)
ϕ(δ)
≤ lim
= ϕ0 < ∞
δ&0 δ
δ
Thus ϕ0 (t) ≤ ϕ0 . As ϕ(0) = 0, it follows that ϕ(t) ≤ ϕ0 t.
Now, with u ∈ BV(Ω), we pick a sequence {uk }∞
k=1 in pwc(Ω) converging to u strictly in
BV(Ω); for details see [12]. Then by (3.5) we have
]ϕ (uk ) ≤ ϕ0 TV(uk ),
TV
d
(k = 1, . . . , ∞).
Then, by the definition of TVϕ
d (u) and the strict convergence
k
]ϕ k
TVϕ
d (u) ≤ lim inf TVd (u ) ≤ lim inf ϕ0 TV(u ) = ϕ0 TV(u).
k→∞
k→∞
The claim in (i) follows.
Let us now prove the lower bound in (ii). First of all, we observe the existence of c > 0
with
ϕ(t) ≥ ct, (0 ≤ t ≤ 2M ).
(3.6)
Indeed, by the definition of ϕ0 , there exists t0 > 0 such that ϕ(t) > (ϕ0 /2)t for t ∈ (0, t0 ).
Since ϕ is increasing, we have ϕ(t) ≥ ϕ(t0 ) ≥ (ϕ0 /2)t0 for t ≥ t0 . This yields c = ϕ0 t0 /(4M ).
Assuming that kukL∞ (Ω) ≤ M < ∞, we now let {uk }∞
k=1 ⊂ pwc(Ω) approximate u weakly*
in BV(Ω). We may assume that
kuk kL∞ (Ω) ≤ 2M ,
(3.7)
Limiting aspects of non-convex TVϕ models
9
because if this would not hold, then we could truncate uk , and the modified sequence {uk2M }∞
k=1
]ϕ (uk ) ≤ TV
]ϕ (uk ). Thanks to (3.6) and
would still converge to u weakly* in BV(Ω) with TV
d
2M
d
(3.7), we have
]ϕ (uk ),
cTV(uk ) ≤ TV
d
(k = 1, . . . , ∞).
By the lower semicontinuity of TV(·), we obtain
]ϕ (uk ).
cTV(u) ≤ lim inf TV(uk ) ≤ lim inf TV
d
k→∞
k→∞
Since the approximating sequence {uk }∞
k=1 was arbitrary, the claim follows.
4. Limiting aspects of the continuous TVϕ model. We now consider the continuous
model (1.1) or (1.5). Both are common in works aiming to model real image statistics. We
initially restrict our attention to the following energies ϕ.
Definition 4.1.We denote by Wc the class of increasing, subadditive, continuous functions
ϕ : R0,+ → R0,+ with ϕ∞ = 0.
Example 4.1. Examples of ϕ ∈ Wc include in particular ϕ(t) = tq for q ∈ (0, 1), as well as
the fractional penalty ϕ(t) = αt/(1 + αt) and the logistic penalty ϕ(t) = log(αt + 1) for α > 0.
Definition 4.2.Given an energy ϕ, we start with the C 1 model (1.1), which we now denote
by
gϕ (u) :=
TV
c
Z
ϕ(|∇u(x)|) dx,
(u ∈ C 1 (Ω)).
Ω
In order to extend this to u ∈ BV(Ω), we take the weak* lower semicontinuous envelope
gϕ (ui ).
TVϕ
inf TV
c (u) := lim
d
∗
ui *u,
ui ∈C 1 (Ω)
In the definition, the convergence is weakly* in BV(Ω).
We emphasise that it is crucial to define TVϕ
c through this limiting process in order
to obtain weak* lower semicontinuity. This is useful to show the existence of solutions to
variational problems with the regulariser TVϕ
c in BV(Ω) – or a larger space, as there is no
guarantee that TVϕ
(u)
<
∞
would
imply
u
∈
BV(Ω).
c
We may say the following about the TVϕ
model.
c
Theorem 4.3. Let ϕ ∈ Wc , and suppose that Ω ⊂ Rm has a Lipschitz boundary. Then
TVϕ
c (u) = 0
for
u ∈ BV(Ω).
The result is intuitively obvious when one considers the convex envelope of the energy
ϕ, which has to be zero. The rigorous verification of the result will, however, demand some
amount of work. We note that this could be done using classical quasi-convexification arguments, see [6, Theorem 11.3.1], using the integral sense of quasi-convexity common in
variational analysis, not the simple maximum sense common in optimisation. The rigorous computation of the quasi-convex envelope of x 7→ ϕ(kxk), for all the energies that we
10
M. Hintermüller, T. Valkonen, and T. Wu
consider, would appear to be nearly as much work as a more informative direct argument. We
therefore provide a self-contained proof, which will also provide new quantitive estimates for
other enegies. These will be provided in Proposition 4.6 later in the section.
The main ingredient of the proof of Theorem 4.3 is Lemma 4.5, which will utilise the
following simple result.
Lemma 4.4. Let ϕ ∈ Wc . Then there exist a, b > 0 such that
ϕ(t) ≤ a + bt,
(t ∈ R0,+ ).
Proof. Since ϕ∞ = 0, we can find t0 > 0 such that ϕ(t)/t ≤ 1 for t ≥ t0 . Thus, because ϕ is
increasing, we have ϕ(t) ≤ ϕ(t0 ) + t for every t ∈ R0,+ .
Lemma 4.5. Let ϕ ∈ Wc , and suppose that Ω ⊂ Rm is bounded with Lipschitz boundary.
Then
Z
ϕ
TVc (u) ≤
ϕ(|∇u(x)|) dx, (u ∈ BV(Ω)).
(4.1)
Ω
Observe the difference between Lemma 4.5 and Theorem 3.4. The former shows that in
]ϕ
the limit of TV
c , the singular part is completely free, whereas the latter shows that in the
ϕ
]
limit of TV , only the jump part is allowed at all!
d
Proof. We may assume that
Z
ϕ(|∇u(x)|) dx < ∞,
Ω
because otherwise there is nothing to prove. We let u0 ∈ BV(Rm ) denote the zero-extension
of u from Ω to Rm . Then
Du0 = Du − ν∂Ω u− Hn−1 x∂Ω
for u− the interior trace of u on ∂Ω, and ν∂Ω the exterior normal of Ω. In fact [4, Section 3.7]
there exists a constant C = C(Ω) such that
kν∂Ω u− Hn−1 x∂ΩkM(Rm ;Rm ) ≤ CkukBV(Ω) .
We pick some ρ ∈ Cc∞ (Rm ) with 0 ≤ ρ ≤ 1, ρ dx = 1, and supp ρ ⊂ B(0, 1). We then define
the family of mollifiers ρ (x) := −n ρ(x/) for > 0, and define by convolution and restriction
of domain
u := (ρ ∗ u0 )|Ω.
R
Then u ∈ C ∞ (Ω), and u → u strongly in L1 (Ω) as & 0. As |Du |(ω) ≤ |Du0 |(Ω), it follows
∗
that u *
u weakly* in BV(Ω); see, e.g., [4, Proposition 3.13]. Thus
]ϕ
TVϕ
c (u) ≤ lim inf TVc (u ).
&0
In order to obtain the conclusion of the theorem, we just have to calculate the right hand side.
Limiting aspects of non-convex TVϕ models
11
We have
Z
Z
ρ (x − y) d|Du0 |(y)
ρ (x − y) dDu0 (y) ≤
|∇u (x)| = m
m
ZR
ZR
≤
Rm
ρ (x − y)|∇u0 (y)| dy +
(4.2)
s
Rm
ρ (x − y) d|D u0 |(y).
We approximate the terms for the absolutely continuous and singular parts differently. Starting with the absolutely continuous part, we let K be a compact set such that Ω + B(0, 1) ⊂ K,
and define
Z
g0 (x) := |∇u0 (x)| and g (x) :=
ρ (x − y)|∇u0 (y)| dy.
Rm
Then g → g0 in L1 (K), and g |(Rm \ K) = 0 for ∈ (0, 1). By the L1 convergence, we can
find a sequence i & 0 such that gi → g0 almost uniformly. Consequently, given δ > 0, we
may find a set E ⊂ K with Lm (K \ E) < δ and gi → g0 uniformly on E. We may assume
that each i is small enough such that
kgi − g0 kL1 (K) ≤ δ.
(4.3)
Lemma 4.4 provides for some a, b > 0 the estimate
ϕ(t) ≤ a + bt.
(4.4)
From the uniform convergence on E, it follows that for large enough i, we have
ϕ(gi (x)) ≤ ϕ(1 + g0 (x)) ≤ v(x) := a + b(1 + g0 (x)),
(x ∈ E).
Since E ⊂ K is bounded, v ∈ L1 (E). The reverse Fatou inequality on E gives the estimate
Z
ϕ(gi (x)) dx ≤
lim sup
E
i→∞
Z
E
lim sup ϕ(gi (x)) dx ≤
i→∞
Z
ϕ(g0 (x)) dx.
(4.5)
E
On K \ E, we obtain the estimate
Z
K\E
ϕ(gi (x)) dx ≤
≤
Z
K\E
Z
K\E
≤
ϕ(g0 (x)) + ϕ(|gi (x) − g0 (x)|) dx
(by subadditivity)
ϕ(g0 (x)) dx + aLm (K \ E) + bkgi − g0 kL1 (K) (by (4.4))
Z
ϕ(g0 (x)) dx + (a + b)δ.
(by (4.3))
K\E
(4.6)
Combining the estimates (4.5) and (4.6), we have
Z
lim sup
i→∞
Ω
ϕ(gi (x)) dx ≤
Z
ϕ(g0 (x)) dx + (a + b)δ.
K
Since δ > 0 was arbitrary, and we may always find an almost uniformly convergent subsequence
of any subsequence of {g }>0 , we conclude that
Z
lim sup
&0
Ω
ϕ(g (x)) dx ≤
Z
Z
ϕ(|∇u0 (x)|) dx =
K
ϕ(|∇u(x)|) dx.
Ω
(4.7)
12
M. Hintermüller, T. Valkonen, and T. Wu
Let us then consider the singular part in (4.2). We observe that
0, for x ∈ Rm \ K. If we define
f (x) := −m |Ds u0 |(B(x, )),
R
Rm
ρ (x−y) d|Ds u0 (y)| =
(x ∈ K),
then by Fubini’s theorem
Z
f (x) dx = −m
Z
K
Z
K
K
χB(x,) (y) d|Ds u0 |(y) dx = −m
Z
Z
K
s
K
χB(y,) (x) dx d|Ds u0 |(y)
≤ ωm |D u0 |(K).
(4.8)
Here ωm is the volume of the unit ball in Rm . Moreover, by the Besicovitch derivation theorem
(discussed, for example, in [4, 32]), we have
(Lm -a.e. x ∈ K).
lim f (x) = 0,
&0
Because Lm (K) < ∞, Egorov’s theorem shows that f → 0 almost uniformly. Thus, for any
δ > 0, there exists a set Kδ ⊂ K with Lm (K \ Kδ ) ≤ δ and f → 0 uniformly on Kδ .
Next we study K \ Kδ . We pick an arbitrary σ > 0. Because ϕ(t)/t → 0 as t → ∞, there
exists t0 > 0 such that ϕ(t) ≤ σt for t ≥ t0 . In fact, because ϕ is lower semicontinuous and
ϕ(0) = 0, if we choose
t0 := inf{t ≥ 0 | ϕ(t) < σt},
then ϕ(t0 ) = σt0 . Thus, because ϕ is increasing
(t ∈ R0,+ ).
e
ϕ(t) ≤ ϕ(t)
:= σ(t0 + t),
(4.9)
Choosing ∈ (0, 1) such that f ≤ δ on Kδ , and using ρ ≤ −m χB(0,) , we may approximate
Z
Z
ρ (x − y) d|D u0 |(y)
ϕ
Rm
s
Rm
dx ≤
Z
ϕ (f (x)) dx
m
≤
ZR
Z
ϕ (f (x)) dx +
K\Kδ
Kδ
≤
Z
ϕe (f (x)) dx
Z
ϕ(δ) dx +
Kδ
m
σ (t0 + f (x)) dx
K\Kδ
≤ L (K)ϕ(δ) + δσt0 + σωm |Ds u0 |(K)
(by (4.8)).
(4.10)
Thus
Z
Z
ϕ
lim inf
&0
Rm
Rm
s
ρ (x − y) d|D u0 |(y)
dx ≤ Lm (K)ϕ(δ) + δσt0 + σωm |Ds u0 |(K).
Observe that the choices of σ and t0 are independent of δ. Therefore, because δ > 0 was
arbitrary, using the continuity of ϕ we deduce that we may set δ = 0 above. But then,
because σ > 0 was also arbitrary, we deduce
Z
Z
lim
&0 Rm
ϕ
Rm
s
ρ (x − y) d|D u0 |(y)
dx = 0.
(4.11)
Limiting aspects of non-convex TVϕ models
13
Finally, combining the estimate (4.7) for the absolutely continuous part and the estimate
(4.11) for the singular part with (4.2), we deduce that
]ϕ
lim sup TV
c (u ) = lim sup
&0
&0
Z
Z
ϕ
Rm
Rm
ρ (x − y) d|Du0 |(y)
dx ≤
Z
ϕ(|∇u(x)|) dx.
Ω
This concludes the proof of (4.1).
Proof of Theorem 4.3. We employ the bound (4.1) of Lemma 4.5, but still have to extend it
to a possibly unbounded domain Ω. For this purpose, we let R > 0 be arbitrary, and apply
the lemma to uR := u|B(0, R). Then
TVϕ
c (uR )
≤
Z
ϕ(|∇uR (x)|) dx ≤
Ω
Z
ϕ(|∇u(x)|) dx.
Ω
∗
But uR *
u weakly* in BV(Ω) as R % ∞; indeed L1 convergence is obvious, and for any
∞
ϕ ∈ Cc (Ω; Rm ), we have supp ϕ ∈ B(0, R) for large enough R, so that DuR (ϕ) = Du(ϕ).
Therefore, because TVϕ
c is weakly* lower semicontinuous by construction, we conclude that
TVϕ
c (u) ≤
Z
ϕ(|∇u(x)|) dx.
(4.12)
Ω
Given any u ∈ C 1 (Ω), we may find uh ∈ pwc(Ω), (h > 0), strictly convergent to u in
BV(Ω) [12]. But (4.12) shows that
TVϕ
c (uh ) = 0.
By the weak* lower semicontinuity of TVϕ
c we conclude
ϕ
TVϕ
c (u) ≤ lim inf TVc (uh ) = 0,
h&0
(u ∈ C 1 (Ω)).
Another referral to lower-semicontinuity now shows that TVϕ
c (u) = 0 for any u ∈ BV(Ω).
Similarly to Proposition 3.6 for TVϕ
d , we have the following more positive result.
0,+
0,+
Proposition 4.6. Let ϕ : R
→R
be lower semicontinuous and satisfy ϕ(0) = 0. Then
the following hold.
(i) If ϕ0 < ∞ and ϕ is subadditive, then there exists a constant C > 0 such that
TVϕ
c (u) ≤ C TV(u),
(u ∈ BV(Ω)).
(ii) If ϕ0 , ϕ∞ > 0 and ϕ is increasing, then there exists also a constant c > 0 such that
c TV(u) ≤ TVϕ
c (u),
(u ∈ BV(Ω)).
Remark 4.1. If we assume that ϕ is concave, the condition ϕ0 > 0 in (ii) follows from the
other assumptions.
14
M. Hintermüller, T. Valkonen, and T. Wu
Proof. The proof of the upper bound follows exactly as the upper bound in Proposition 3.6,
just replacing approximation in pwc(Ω) by C 1 (Ω).
For the lower bound, first of all, we observe that there exists t∞ > 0 such that ϕ(t) ≥
∞
(ϕ /2)t, (t ≥ t∞ ). Secondly, there exists t0 > 0 such that ϕ(t) ≥ (ϕ0 /2)t, (0 ≤ t ≤ t0 ). Since
ϕ is increasing, ϕ(t) ≥ ϕ(t0 ) ≥ tϕ(t0 )/t0 , (t0 ≤ t ≤ t∞ ). Consequently
ϕ(t) ≥ ct,
for c := min{ϕ∞ /2, ϕ(t0 )/t∞ , ϕ0 /2}.
(t ≥ 0),
Therefore
0
]ϕ
cTV(u0 ) ≤ TV
c (u ),
(u0 ∈ C 1 (Ω)).
The claim now follows from the weak* lower semicontinuity of TV as in the proof of Proposition
3.6.
In fact, in most of the interesting cases we may prove a slightly stronger result.
Theorem 4.7. Let ϕ : R0,+ → R0,+ be concave with ϕ(0) = 0 and 0 < ϕ∞ < ∞. Suppose
that Ω ⊂ Rm has a Lipschitz boundary. Then
∞
TVϕ
c (u) = ϕ TV(u).
(u ∈ BV(Ω)).
(4.13)
Proof. We first suppose that Ω is bounded. The proof of the upper bound
TVϕ
c (u)
≤
Z
ϕ(|∇u(x)|) dx + ϕ∞ |Ds u|(Ω),
(4.14)
Ω
is then a modification of Lemma 4.5. The estimate (4.7) for the absolutely continuous part
follows as before. For the singular part, we observe that (4.9) holds for any σ > ϕ∞ . Therefore,
proceeding as before, we obtain in place of (4.11) the estimate
Z
Z
lim
&0 Rm
ϕ
Rm
s
ρ (x − y) d|D u0 |(y)
dx ≤ σ|Ds u|(Ω).
(4.15)
Letting σ & ϕ∞ and combining (4.7) with (4.15) we get (4.14). As in Theorem 4.6, we may
extend this bound to a possibly unbounded Ω.
If u ∈ C 1 (Ω), we may again approximate u strictly in BV(Ω) by piecewise constant
ϕ
functions {ui }∞
i=1 . By the lower semicontinuity of TVc and (4.14), we then have
∞
s
∞
TVϕ
c (u) ≤ lim inf ϕ |D u|(Ω) = ϕ |Du|(Ω).
i→∞
(4.16)
Finally, we observe that by concavity
ϕ(t) ≥ ϕ∞ t.
∞
1
]ϕ
Thus TV
c (u) ≥ ϕ |Du|(Ω). We immediately obtain (4.13) for u ∈ C (Ω). By strictly
convergent approximation, we then extend the result to u ∈ BV(Ω).
Limiting aspects of non-convex TVϕ models
15
5. Discussion. Theorem 4.3 and Theorem 4.7 show that we cannot hope to have a simple
weakly* lower semicontinuous non-convex total variation model as a prior for image gradient
distributions. In fact, it follows from [9], see also [4, Section 5.1] and [22, Theorem 5.14], that
lower semicontinuity of the continuous TVϕ
c model is only possible for convex ϕ. The problem
∞
0
is: if ϕ is less than ϕ (t), then image edges are always cheaper than smooth transitions. If
ϕ∞ = 0, they are so cheap that we get a zero functional at the limit for a general class of
functions. If ϕ∞ > 0 and ϕ is concave, then we get a factor of TV as result. If ϕ is not
concave, we still have the upper bound (4.16); it may however be possible that some gradients
are cheaper than jumps. This would in particular be the case with Huber regularisation of
ϕ(t) = t. More about the jump set of solutions to Huber-regularised as well as non-convex
total variation models may be read in [47].
In fact, in [27] Huber regularisation was used with ϕ(t) = tq for q ∈ (0, 1) for algorithmic
reasons. For small γ > 0, this is defined as
q
tq − 2−q
2 γ , t > γ,
q q−2 2
t ,
t ∈ [0, γ].
2γ
(
e
ϕ(t)
:=
(5.1)
e
Then ϕ(t)
≤ ϕ(t), so that
e
ϕ
TVϕ
c ≤ TVc = 0.
The asymptotic behaviour of the regulariser at infinity is the crucial feature here, and it also
cannot be altered without changing the edge behaviour of the regulariser. Huber-regularisation
does not alter the asymptotic behaviour, and as long as alternative smoothing strategies, such
as those considered in [15], do not, they also provide no change in the results.
In contrast to the continuous TVϕ
c model, according to Theorem 3.4, the discrete model
works correctly for ϕ(t) = tq and generally ϕ ∈ Wd , if the desire is to force piecewise constant
solutions to (1.3). As we saw in the comments preceding Proposition 3.6, it however does
not force piecewise constant solutions for some of the energies ϕ typically employed in this
context. Generally, what causes piecewise constant solutions is the property ϕ0 = ∞. If one
does not desire piecewise constant solutions, one can therefore use Huber regularisation or
linearise ϕ for t < δ. The latter employs
(
e
ϕ(t)
=
ϕ(t) − ϕ(δ) + ϕ0 (δ)δ, t > δ,
ϕ0 (δ)t,
t ≤ δ.
Then ϕ(t) ≤ Ct for some C > 0, so that TVϕ
d (u) < ∞ for every u ∈ BV(Ω). We also note that
although this approach defines a regularisation functional on all of BV(Ω), it cannot be used
for modelling the distribution of gradients in real images, the purpose of the TVϕ
c model. In
ϕ
fact, as in the TVd model we cannot control the penalisation of ∇u beyond a constant factor.
In summary, the TVϕ
d model works as intended for ϕ ∈ Wd – it enforces piecewise constant
model
however is not theoretically sound in function spaces. We will
solutions. The TVϕ
c
therefore next seek ways to fix it.
6. Multiscale regularisation and area-strict convergence. The problem with the TVϕ
c
model is that weak* lower semicontinuity is too strong a requirement. We need a weaker type
16
M. Hintermüller, T. Valkonen, and T. Wu
of lower semicontinuity, or, in other words, a stronger type of convergence. Norm convergence
in BV is too strong; it would not be possible at all to approximate edges. Strict convergence
is also still too weak, as can be seen from the proof of Lemma 4.5. Strong convergence in L2 ,
which we could in fact obtain from strict convergence for Ω ⊂ R2 (see [31, 39]), is also not
enough, as a stronger form of gradient convergence is the important part. A suitable mode of
convergence is the so-called area-strict convergence [18, 30]. For our purposes, the following
definition is the most appropriate one.
Definition 6.1.Suppose Ω ⊂ Rn with n ≥ 2. The sequence {ui }∞
i=1 ⊂ BV(Ω) converges to
i (x) := (x/|x|, ui (x)) converges strictly
u ∈ BV(Ω) area-strictly if the sequence {U i }∞
with
U
i=1
in BV(Ω; Rn+1 ) to U (x) := (x/|x|, u(x)).
i
1
i ∗
In other words, {ui }∞
i=1 converges to u area-strictly if u → u strongly in L (Ω), Du * Du
n
i
weakly* in M(Ω; R ), and A(u ) → A(u) for the area functional
A(u) :=
Z q
1 + |∇u(x)|2 dx + |Ds u|(Ω).
Ω
It can be shown that area-strict convergence is stronger than strict convergence, but weaker
than norm convergence. Here we recall from §2 the definition of strict convergence and of the
singular part Ds u of Du.
In order to state a continuity result with respect to area-strict convergence, we need a few
definitions. Specifically, we denote the Sobolev conjugate
(
1∗ :=
n/(n − 1),
∞,
n > 1,
n = 1,
and define
(
uθ (x) :=
θu+ (x) + (1 − θ)u− (x), x ∈ Ju ,
e(x),
u
x 6∈ Su .
In [39], see also [30], the following result is proved.
Theorem 6.2. Let Ω be a bounded domain with Lipschitz boundary, p ∈ [1, 1∗ ] if n ≥ 2 and
p ∈ [1, 1∗ ) if n = 1. Let f ∈ C(Ω × R × Rn ) satisfy
|f (x, y, A)| ≤ C(1 + |y|p + |A|),
((x, y, A) ∈ Ω × R × Rn ),
and assume the existence of f ∞ ∈ C(Ω × R × Rn ), defined by
f ∞ (x, y, A) := lim
0
x →x
y 0 →y
A0 →A
t→∞
f (x0 , y 0 , tA0 )
.
t
Then the functional
F(u) :=
Z
f (x, u(x), ∇u(x)) dx +
Ω
is area-strictly continuous on BV(Ω).
Z Z 1
Ω 0
f ∞ (x, uθ (x),
dDs u
(x)) d|Ds u|(x)
d|Ds u|
(6.1)
Limiting aspects of non-convex TVϕ models
17
We now apply this mode of convergence to non-convex total variation, restricting our
attention to the following class of functions.
Definition 6.3.We denote by Was the set of functions ϕ ∈ C(R0,+ ) such that ϕ∞ exists, and
for some c, C > 0 and b ≥ 0 the following estimates hold true:
ct − b ≤ ϕ(t) ≤ C(1 + t),
(t ∈ R0,+ ).
(6.2)
Example 6.1. Let ϕ be any of the functions in Example 4.1. They do not satisfy the lower
bound in (6.2). If, however, we pick some cut-off M > 0, then ϕM ∈ Was for the high-value
linearisation
(
ϕ(t)
t ≤ M,
ϕM (t) :=
0
ϕ(M ) + ϕ (M )(t − M ), t > M.
Corollary 6.4. Suppose ϕ ∈ Was . Then the functional
TVϕ
as (u) :=
Z
ϕ(|∇u(x)|) dx + ϕ∞ |Ds u|(Ω),
(u ∈ BV(Ω)),
Ω
is area-strictly continuous on BV(Ω).
Proof. Letting f (x, y, A) := ϕ(A), we verify (6.1). The claim is therefore immediate from
Theorem 6.2 with p = 1. Note that we do not need the lower bound in the definition of Was
just yet.
But how could we obtain area-strict convergence of an infimising sequence of a variational
problem? In [44, 45] the following multiscale analysis functional η was introduced for scalarvalued measures µ ∈ M(Ω). Given η0 > 0 and {ρ }>0 , a family of mollifiers satisfying the
semigroup property ρ+δ = ρ ∗ ρδ , η can be defined as
η(µ) := η0
∞ Z
X
n
`=1 R
(|µ| ∗ ρ2−i )(x) − |µ ∗ ρ2−i |(x) dx,
(µ ∈ M(Ω)).
i ∗
If the sequence of measures {µi }∞
i=1 ⊂ M(Ω) satisfies supi η(µ) < ∞ and µ * µ weakly*
in M(Ω), then we have |µi |(Ω) → |µ|(Ω). In essence, the functional η penalises the type of
complexity of measures such as two approaching δ-spikes of different sign, which prohibits
strict convergence. In Appendix A, we extend the strict convergence results of [44, 45] to
vector-valued µ ∈ M(Ω; RN ), in particular the case µ = DU for U the lifting of u as discussed
above.
In order to bound in BV(Ω) an infimising sequence of problems using TVϕ
as as a regulariser,
we require slightly stricter assumptions on ϕ. These can usually, and particularly in the
interesting case ϕ(t) = tq , be easily satisfied by linearising ϕ above a cut-off point M with
respect to the function value. This will force ϕ∞ > 0, which is not required for continuity
with respect to area-strict convergence in its own right. We will later see that such a cut-off
can be justified by real gradient distributions and also argued in numerical experiments.
Now we may prove the following result, which shows that area-strict convergence and the
multiscale analysis functional η provide a remedy for the theoretical difficulties associated
with the TVϕ
c model.
18
M. Hintermüller, T. Valkonen, and T. Wu
Theorem 6.5. Suppose Ω ⊂ Rn is bounded with Lipschitz boundary, and ϕ ∈ Was . Define
U (x) := (1, u(x)). Then the functional
F (u) := TVϕ
as (u) + η(DU )
1
is weak* lower semicontinuous on BV(Ω), and any sequence {ui }∞
i=1 ⊂ L (Ω) with
sup F (ui ) < ∞
i
admits an area-strictly convergent subsequence.
i ∞
Proof. Suppose {ui }∞
i=1 converges weakly* to u ∈ BV(Ω). Then it follows that {U }i=1 conm+1
i
verge weakly* to U ∈ BV(Ω; R
). If lim inf i→∞ η(DU ) = ∞, we clearly have lower semicontinuity of F . By switching to an unrelabelled subsequence, we may therefore assume that
supi η(DU i ) < ∞. It follows from Theorem A.4 in the Appendix that |DU i |(Ω) → |DU |(Ω).
In other words, {ui }∞
i=1 converges area-strictly to u. Applying Corollary 6.4 and the weak*
lower semicontinuity of η, we now see that
F (u) ≤ lim inf F (ui ).
i→∞
Thus weak* lower semicontinuity holds true.
i
1
Next suppose that {ui }∞
i=1 ⊂ L (Ω) with supi F (u ) < ∞. Since ct − b ≤ ϕ(t) and Ω is
bounded, it follows that supi TV(ui ) < ∞. The sequence therefore admits a subsequence,
unrelabelled without loss of generality, which converges weakly* to some u ∈ BV(Ω). Hence,
the fact that {ui }∞
i=1 admits an area-strictly convergent subsequence now follows as in the
previous paragraph.
We immediately deduce the following corollary.
Corollary 6.6. Suppose Ω ⊂ Rn is bounded with Lipschitz boundary, ϕ ∈ Was , J : BV(Ω) →
R is convex, proper, and weakly* lower semicontinuous, and J satisfies for some C > 0 the
coercivity condition
J(u) ≥ C(kukL1 (Ω) − 1).
Then the functional
G(u) := J(u) + αTVϕ
as (u) + η(DU ),
(u ∈ BV(Ω)),
admits a minimiser u ∈ BV(Ω).
Remark 6.1. We can, for example, take J(u) = 21 kz − uk2L2 (Ω) .
Observe that
∞
η(DU ) = η0
X
η` (DU ),
`=1
where, for ` > 0,
Z
η` (DU ) :=
Rn
(ρ` ∗ |DU |)(x) − |ρ` ∗ DU |(x) dx
= |Ds u|(Ω) +
Z
Rn
q
1 + |∇u(x)|2 −
q
1 + |(ρ` ∗ Du)(x)|2 dx.
(6.3)
Limiting aspects of non-convex TVϕ models
19
In particular, if u ∈ W 1,1 (Ω), then we obtain with ∇ u := ρ ∗ ∇u the expression
Z
η` (DU ) =
q
q
1 + |∇u(x)|2 −
Rn
and the estimate
η` (DU ) ≤
Z
Rn
1 + |∇` u(x)|2 dx
q
|∇u(x)|2 − |∇ u(x)|2 dx.
`
The following proposition shows that, in infimising sequences, we may ignore terms from
η. This justifies the associated numerical approximation.
Proposition 6.7. Suppose Ω ⊂ Rn is bounded with Lipschitz boundary, ϕ ∈ Was , and that
J : BV(Ω) → R is as in Corollary 6.6. Let Ki ∈ N+ and i > 0, i = 1, 2, 3, . . ., satisfy
lim Ki = ∞,
i→∞
lim i = 0.
and
i→∞
Suppose further that {ui }∞
i=1 ⊂ BV(Ω) satisfies
i
J(ui ) + αTVϕ
as (u ) +
Ki
X
η` (DU i ) ≤
`=1
inf
u∈BV(Ω)
G(u) + i ,
(i = 1, 2, 3, . . .).
i
Then we can find û ∈ BV(Ω) and a subsequence of {ui }∞
i=1 , unrelabelled, such that u → û
area-strictly, and û minimises G.
Proof. Let L := inf u∈BV(Ω) G(u). Since there is nothing to prove if L = ∞, we may assume
L < ∞. Then we have
i
cTV(ui ) − bLn (Ω) ≤ TVϕ
as (u ).
This yields
J(ui ) + αcTV(ui ) ≤ αbLn (Ω) + L + i .
∗
It follows for a subsequence, unrelabelled, that ui *
û weakly* for some û ∈ BV(Ω). By the
weak* lower semicontinuity of η` , see Theorem A.4, we then have
Kj
X
`=1
η` (DÛ ) ≤ lim inf
i→∞
Kj
X
η(DU i ) ≤ L,
(j = 1, 2, 3, . . .).
`=1
It follows that
η(DÛ ) =
∞
X
η` (DÛ ) ≤ L.
`=1
Using Lemma A.3 with µi = DU i and µ = DU , we see that ui → û area-strictly, and that
u 7→ J(u) +
αTVϕ
as (u)
+
Kj
X
η` (DU )
`=1
is area-strictly lower semicontinuous for for each fixed j = 1, 2, 3, . . .. This shows that G(û) ≤
L. As a consequence, û minimises G.
20
M. Hintermüller, T. Valkonen, and T. Wu
7. Remarks on alternative remedies. We now discuss two alternative approaches to make
the TVϕ
c model work in the limit. These are based on compactifying the differential operator and on working in SBV(Ω), respectively. As we only intend to demonstrate alternative
possibilities, we stay brief here. Hence the proofs have been placed in the appendix.
Remark 7.1. (Compact operators) Area-strict convergence is not the only possibility to
ϕ
make the TVϕ
c model function; another way to understand the problems with the basic TVc
model is that the operator ∇ is not compact. One way to obtain a compact operator is by
convolution. This is the contents of the following result.
Theorem 7.1. Let {ρ }>0 be a family of mollifiers, Ω ⊂ Rn open, and ϕ : R0,+ → R0,+
increasing, subadditive and continuous with ϕ(0) = 0. Fix > 0, and define D : L1 (Ω) →
L1 (Rn ; Rn ) by
D u := ρ ∗ Du.
Then
TVϕ,
c (u) :=
Z
Rn
ϕ(|D u(x)|) dx,
(u ∈ BV(Ω)),
is lower semicontinuous with respect to weak* convergence in BV(Ω). Moreover
gϕ
lim TVϕ,
c (u) = TVc (u),
&0
(u ∈ C 1 (Ω)).
(7.1)
We relegate the proof of this theorem to Appendix B. It should be noted that any u ∈
L1 (Ω) satisfies TVϕ,
c (u) < ∞. In particular
1
i
sup kz − ui k2L2 (Ω) + αTVϕ,
c (u ) < ∞
2
i
does not guarantee weak* convergence of a subsequence. For that, an additional TV(ui ) term
(with small factor) is required in a TVϕ,
c based variational model in image processing.
Remark 7.2. (The space SBV(Ω) and η) If we apply the η functional of [44,45] to a bounded
sequence of functions g i ∈ L1 (Ω; Rm ), then we get strict convergence in this space. It remains
to find whether we get convergence. Then we could regularise ∇u this way, and, working
in the space SBV(Ω), penalise the jump part separately. It turns out that this is possible if
we state the modification η̄ of η in Lp (Rn ; Rm ) for p ∈ (1, ∞). Then strict convergence is
equivalent to strong convergence.
With ` & 0, η0 > 0, and p ∈ (1, ∞), we define
η̄(g) := η0
∞
X
η̄` (g),
η̄` (g) := kgkLp (Rn ;Rm ) − kg ∗ ρ` kLp (Rn ;Rm ) ,
(g ∈ Lp (Ω; Rm )).
(7.2)
`=1
Then we have the following result, whose proof is relegated to Appendix C.
Theorem 7.2. Let Ω ⊂ Rm be open and bounded. Suppose ψ ∈ Wd , and ϕ : R0,+ → R0,+ is
lower semicontinuous and increasing with ϕ∞ = ∞ and
kgkLp (Ω;Rm ) ≤ C 1 +
Z
ϕ(|g(x)|) dx ,
(g ∈ Lp (Ω; Rm )),
(7.3)
Limiting aspects of non-convex TVϕ models
21
for some C > 0, where p ∈ (1, ∞) is as in (7.2). Let
Z
Z
F (u) :=
ϕ(|∇u(x)|) dx + η̄(∇u) +
Rn
ψ(θu (x)) dHm−1 (x).
Ju
Then F is lower semicontinuous with respect to weak* convergence in BV(Ω). In fact, any
i
sequence {ui }∞
i=1 with supi F (u ) < ∞ admits a subsequence convergent weakly* and in the
sense (3.1)–(3.3).
8. Image statistics and the jump part. Our studies in the proceding sections have pointed
us to the following question: Are the statistics of [29] valid when we split the image into smooth
and jump parts? What are the statistics for jump heights, and does splitting the gradient into
these two parts alter the distribution for the absolutely continuous part? When calculating
statistics from discrete images, we do not have the excuse that the jumps would be negligible,
i.e. Lm (Ju ) = 0!
In order to gain some insight, here we did a few experiments with real photographs,
displayed in Figures 8.1–8.3. These three photographs represent images with different types
of statistics. The pier photo of Figure 8.1 is very simple, with large smooth areas and some
fine structures. The parrot test image in Figure 8.2 has a good balance of features. The
summer lake scene in Figure 8.3 is somewhat more complex, with plenty of fine features.
We split the pixels of each image into edge and smooth parts by a simple threshold on the
norm k∇u(k)k of the discrete gradient at each pixel k. Then we find optimal α and q ∈ (0, 2)
for the distribution
Pt (t) := Ct exp(−αϕ(t)),
to match the experimental distribution. This in turn gives rise to the prior
Pu (u) = Cu exp −α
Z
ϕ(|∇u(x)|) dx .
Ω
Both Ct , Cu > 0 are normalising factors. In practise we do the fitting of Pt to the experimental
distribution by a simple least squares fit on the logarithms of the distributions. We will
comment on the suitability of this approach later on in this section. In the least squares fit we
keep C as a free (unnormalised) parameter, and recalculate it after the fit. Observe that the
normalisation constant does not affect the denoising problem (here in the finite-dimensional
setting)
!
σ2
2
ϕ
]c (u) .
max Pu|f (u|f )Pu (u) ∝ max exp − kz − uk2 − αTV
u
f
2
Here we have the Gaussian noise distribution
!
σ2
Pf |u (f |u) = C exp − kz − uk22 ,
2
0
for σ the noise level. This gives the statistical interpretation of the denoising model, that of
a maximum a posteriori (MAP) estimate.
Finally, in the matter of statistics, we note that the TVϕ prior only attempts to correctly
model gradient statistics; the modelling of histogram statistics with Wasserstein distances
22
M. Hintermüller, T. Valkonen, and T. Wu
was recently studied in [38, 42] together with the conventional TV gradient prior. It is also
worth remarking that our approach of improving the prior based on the statistics of the
ground-truth is different from recent approaches that optimise the prior based on the denoising
result [8, 11, 17, 25, 26]. These approaches can provide improved results in practise, but no
longer have the simple MAP interpretation. It is definitely possible to optimise the parameters
of the TVϕ model in this manner, but outside the scope of the present already long manuscript.
(a) Pier photo
0
(b) Detected edge pixels (red)
0
log−probability
Fit; q=0.50
−2
0
Smooth part
Edge part
Smooth fit; q=0.36
Edge fit; q=1.44
−2
log−probability
Fit (man); q=0.32, M=30, α =0.059
Fit (emp); q=0.40, M=15, α∞=0.021
−4
−4
−4
−6
−6
−6
−8
−8
−8
−10
−10
−10
−12
−12
−12
−14
0
50
−14
150
0
100
q
50
100
q
150
(c) Log-histogram and t model (d) Separate t model for edge
part
∞
−2
−14
0
Fit (opt); q=0.42, M=69, α∞=0.050
50
100
150
(e) Linearised models
Figure 8.1: Pier photo: discrete gradient histogram and least squares models. The image
intensity in (a) is in the range [0, 255], and we have chosen pixels k with |∇u(k)| ≥ 30 as edges
(b). The log-histogram of |∇u(k)| with optimal fit of t 7→ tq is displayed in (c). This is done
separately for the edge pixels in (d). The linearised model is fitted in (e) for the cut-off point
M = 30 (manual edge detection), M = 69 (optimal least squares fit). We moreover show the
empirically best linearised model.
Our experiments confirm the findings of [29] that some q ∈ (0, 1) is generally a good fit
for the entire distribution, as well as for the smooth part. However, optimal q for the edge
part varies. In Figure 8.1 actually q = 1.44 – larger than one! We have to admit that the
number of edge pixels in this image is quite small, so statistically the result may be considered
unreliable. In Figure 8.3 with a significant proportion of edge pixels, we still have q = 1.05.
Limiting aspects of non-convex TVϕ models
23
(a) Parrot photo
0
(b) Detected edge pixels (red)
0
log−probability
Fit; q=0.46
−2
0
Smooth part
Edge part
Smooth fit; q=0.68
Edge fit; q=1.18
−2
log−probability
Fit (man); q=0.22, M=20, α =0.046
Fit (emp); q=0.50, M=40, α∞=0.025
−4
−4
−4
−6
−6
−6
−8
−8
−8
−10
−10
−10
−12
−12
−12
−14
0
50
100
150
200
q
−14
250
0
50
100
150
q
200
250
(c) Log-histogram and t model (d) Separate t model for edge
part
∞
−2
−14
0
Fit (opt); q=0.40, M=73, α∞=0.041
50
100
150
200
250
(e) Linearised models
Figure 8.2: Parrot photo: discrete gradient histogram and least squares models. The image
intensity in (a) is in the range [0, 255], and we have chosen pixels k with |∇u(k)| ≥ 20 as edges
(b). The log-histogram of |∇u(k)| with optimal fit of t 7→ tq is displayed in (c). This is done
separately for the edge pixels in (d). The linearised model is fitted in (e) for the cut-off point
M = 20 (manual edge detection), M = 73 (optimal least squares fit). We moreover show the
empirically best linearised model.
These findings also suggest that on average fitting a single q ∈ (0, 1) to the entire statistic
(not split into edge and smooth parts) may be right, but there is significant variation between
images in the shape of the distribution for the edge part. The smooth part generally looks
roughly similar among our test images.
In order to suggest an improved model for image gradient statistics, in each of Figures
8.1(e)–8.3(e), we also fit to the statistics of the linearised distribution Pt (t) = C exp(−αϕ(t))
for
(
tq ,
0 ≤ t ≤ M,
ϕ(t) :=
(8.1)
q
q−1
(1 − q)M + qM
t, t > M.
This is again done by a least squares fit on the logarithm of the distribution. For the ‘Fit
24
M. Hintermüller, T. Valkonen, and T. Wu
(man)’ curve, we fix the cut-off point M to a hand-picked (manual) edge threshold and optimise
(q, α). We also optimise over all of the parameters (q, α, M ). This is the ‘Fit (opt)’ curve. We
note that the asymptotic α, which we define as
α∞ := lim αϕ(t)/t = qM q−1 ,
t→∞
is roughly the same for both of the choices, and generally the curves are close to each other. As
α∞ describes the behaviour of the model on edges, and for total variation denoising α∞ = α,
we find α∞ to be a parameter that should indeed stay roughly constant between models
with different q and M . It, however, turns out that α∞ as obtained by the simple least
squares histogram fit is in practise bad; it gives far too high regularisation, i.e., a too narrow
distribution. The problem is that the simple least squares fit on the logarithm over-emphasises
the tail of the distribution, on which we moreover have very little statistics due to the discrete
nature of the data. This yields far too high α∞ , i.e., the slope of the linear part is too steep in
the figures. Developing a reliable way to obtain the model from the data is outside the scope
of the present paper, although it is definitely an interesting subject for future studies. This
is why we have also included the curve ‘Fit (emp)’, which is based on an empirical choice of
(α∞ , M, q) from our numerical experiments in the following §9. There we keep α∞ fixed as
we vary M and q. We will also incorporate the noise level σ 2 into α. It turns out that for the
empirically good distribution, q is close to the values found by histogram fitting above, but
α∞ is very different.
9. Numerical reconstructions. Next we provide a numerical solver for the following TVϕ
c
model, possibly including the η-terms. We note that our solver is only one option, and not the
focus of the present paper, which is in modelling and analysis. We therefore do not provide an
extensive analysis and comparison to alternative approaches of the solver itself. Compared to
first-order splitting approaches, recently analysed extensively using the Kurdyka-Łojasiewicz
property [5, 35], it can however be said that our solver can be proved to have theoretically
faster local superlinear convergence [27, 28].
In the finite-dimensional setting, we aim to solve
min f (u) :=
u
X
k∈Ωh
1
αϕ(|∇u(k)|) + |z(k) − u(k)|2
2
+ η0
N q
X
1+
|∇u(k)|2
−
q
1 + |∇l
u(k)|2
!
. (9.1)
l=1
Here ϕ is given by (8.1), and α, η0 are manually chosen to balance
the weights of the respective
p
terms. For an image u of resolution n1 -by-n2 , we set h := 1/n1 n2 , Ωd := [0, 1]2 ∩ (hZ2 ), and
discretize the gradient by forward differences, i.e.
∇u(k) := ((u(k + e1 ) − u(k))/h, (u(k + e2 ) − u(k))/h) ,
for all
k ∈ Ωd ,
with homogeneous Dirichlet boundary condition. Each ∇l u := ρl ∗ ∇u is defined through
the convolution with a prescribed smoothing kernel ρl (l > 0). Here ρl is specified as a
two-dimensional Gaussian distribution of standard deviation l centered at the origin.
Limiting aspects of non-convex TVϕ models
25
To cope with the kink of the non-smooth ϕ term at zero, we introduce a Huber-type local
smoothing [27, 28] by replacing ϕ in (9.1) with a continuously differentiable function ϕγ with
locally Lipschitz derivative. More specifically, let 0 < γ M be the smoothing parameter
and ϕγ : [0, ∞) → [0, ∞) be defined by

ϕ(t) − (ϕ(γ) − 1 ϕ0 (γ)γ)
2
ϕγ (t) = ϕ0 (γ) 2
 2 t
γ
if t ≥ γ,
if 0 ≤ t < γ.
Thus, the resulting Huberized TVϕγ model appears as
min fγ (u) :=
u
X
k∈Ωd
1
αϕγ (|∇u(k)|) + |z(k) − u(k)|2 +
2
η0
N q
X
1 + |∇u(k)|2 −
q
1 + |∇l u(k)|2
!
. (9.2)
l=1
For this problem, the first-order optimality condition reads:





α∇> p + u − z + η0



ψ(max(|∇u|, γ))p = ∇u,
PN
l=1

∇> p
∇u
∇l u
 = 0,
− ∇>
l q
2
1 + |∇u|
1 + |∇l u|2
(9.3)
2
where p ∈ R|Ωd | is an auxiliary variable and ψ : (0, ∞) → (0, ∞) is defined by ψ(t) :=
t/ϕ0 (t). Note that ψ is locally Lipschitz and monotonically increasing, and in the following we
shall denote by ∂ψ a subdifferential of ψ. We remark that the consistency of the Huberized
stationary points induced by (9.3) towards the stationary points of the original model (9.1)
was investigated in the previous work [27, 28]. Moreover, the system (9.3) is not differentiable
in the classical sense. Therefore, in the following we present a generalized Newton-type solver
for computing a stationary point satisfying (9.3).
Given the current iterate (ui , pi ), our solver relies on the following regularized linear system
arising from differentiating (9.3) (and further straightforward manipulations; see [27, 28]):
(H i + β i Ri )δui = −g i ,
26
M. Hintermüller, T. Valkonen, and T. Wu
with
mi := max(|∇u|, γ),
(
i
χ (k) :=
1 if |∇u(k)| ≥ γ,
0 if |∇u(k)| < γ,
!
1
χi ∂ψ(mi ) i
i >
i
i >
H := I + α∇
(p
)(∇u
)
+
(∇u
)(p
)
∇
I
−
ψ(mi )
2mi
>
i
+ η0
N
X

∇
>q
l=1
−∇>
l q
1 + |∇ui |2
1
I−
1 + |∇ui |2
Ri := εI + α∇>
+ η0
1
N
X
l=1
(∇ui )(∇ui )>
I−
1 + |∇ui |2
(∇ui )(∇ui )>
!
!
∇

∇l  ,
1 + |∇ui |2
χi ∂ψ(mi ) i
i >
i
i >
∇
(p
)(∇u
)
+
(∇u
)(p
)
2ψ(mi )mi
1
∇>
l q
1 + |∇ui |2
g i := ∇fγ (ui ) = α∇>
∇ui + η0
ψ(mi )
N
X
l=1
(∇ui )(∇ui )>
I−
1 + |∇ui |2
!
∇ l ,
+ ui − z
(9.4)

∇ui
∇> q
1 + |∇ui |2

∇l ui
.
− ∇>
l q
1 + |∇l ui |2
Here H i represents a (modified) generalized Hessian matrix of fγ at ui , while Ri , with an
arbitrarily fixed 0 < ε α, serves as a structural Hessian regularization. Note that H i
may not be positive definite at the iterate ui . For this reason, the regularization weight β i is
automatically tuned by a trust-region based mechanism; see steps 8–20. Further, whenever
β i = 1, the regularized Hessian, i.e. H i +β i Ri , is positive definite. Consequently, our β i -update
scheme guarantees δui to be a descent direction for fγ at ui , and thus the overall iterative
scheme can be globalised by, e.g., the Wolfe-Powell line search [19]; see step 21 in Algorithm 1.
Moreover, following the algorithm development in [27,28], one can show that β i asymptotically
vanishes as ui approaches a stationary point where a certain type of second-order sufficient
optimality condition is satisfied. Thus, local superlinear convergence can be attained. The
overall algorithm is detailed in Algorithm 1 below. The following parameters associated with
Algorithm 1 are specified throughout our experiments: c = 1, ρ = 0.25, ρ̄ = 0.75, κ = 0.5,
κ̄ = 2, d = 10−10 , τ1 = 0.1, τ2 = 0.9, γ = 0.001, r = 0.01. Algorithm 1 is terminated once
k∇fγ (ui )k/k∇fγ (u0 )k drops below 10−7 .
We report in Figures 9.1–9.3 and Table 9.1 the results of denoising our three test images
using this algorithm with rather high artificial noise levels. We have added Gaussian noise of
standard deviation σ = 30 to all test images. We report both the conventional peak-signalto-noise ratio (PSNR) as well as the structural similarity measure (SSIM) of [48]. The latter
better quantifies the visual quality of images by essentially computing the PSNR in local
Limiting aspects of non-convex TVϕ models
27
Algorithm 1 Superlinearly convergent Newton-type method for TVϕγ denoising
Require: c > 0, 0 < ρ ≤ ρ̄ < 1, 0 < κ < 1 < κ̄, d > 0, 0 < τ1 < 1/2, τ1 < τ2 < 1, 0 < γ 1,
0 < r < 1.
1: Initialize the iterate (u0 , p0 ), the regularization weight β 0 ≥ 0, and the trust-region radius
r0 > 0. Set i := 0.
2: repeat
3:
Generate H i , Ri , and g i at the current iterate (ui , pi ).
4:
Solve the linear system (H i + β i Ri )δui = −g i (inexactly) for δui by the conjugate gradient (CG) method up to the residual tolerance r , or detect the non-positive definiteness
of H i + β i Ri during the CG iterations.
5:
if H i + β i Ri is not positive definite or −((g i )> δui )/(kg i kkδui k) < d then
6:
Set β i := 1, and return to step 4.
7:
end if
i > i
i
i 2
8:
if β i = 1 and
q (δu ) R δu > (r ) then
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
Set ri := (δui )> Ri δui , β i+1 := 1, and go to step 13.
else
Set β i+1 := max(min(β i + c−1 ((δui )> Ri δui ) − (ri )2 ), 1), 0).
end if
Evaluate ρi := fγ (ui + δui ) − fγ (ui ) / (g i )> δui + (δui )> H i δui /2 .
if ρi < ρ then
Set ri+1 := κri .
else if ρi > ρ̄ then
Set ri+1 := κ̄ri .
else
Set ri+1 := ri .
end if
Determine the step size ai along the search direction δui such that ui+1 = ui + ai δui
satisfies the following Wolfe-Powell conditions:
(
22:
fγ (ui+1 ) ≤ fγ (ui ) + τ1 ai (g i )> δui ,
∇fγ (ui+1 )> δui ≥ τ2 (g i )> δui .
Generate the next iterate:
ui+1 := ui + ai δui ,
!
1
χi ∂ψ(mi ) i
i+1
i
i
i >
i
i >
i
p
:=
∇u + ∇δu −
(p )(∇u ) + (∇u )(p ) ∇δu .
ψ(mi )
2mi
Set i := i + 1.
24: until the stopping criterion is fulfilled.
23:
28
M. Hintermüller, T. Valkonen, and T. Wu
0
log−probability
Fit; q=0.87
−5
−10
−15
0
50
100
150
200
250
(c) Log-histogram and tq model
0
Smooth part
Edge part
Smooth fit; q=0.24
Edge fit; q=1.05
−5
(a) Summer photo
−10
−15
0
50
100
150
200
250
q
(d) Separate t model for edge part
0
log−probability
Fit (man); q=0.35, M=30, α∞=0.050
Fit (opt); q=0.40, M=40, α∞=0.049
Fit (emp); q=0.30, M=40, α∞=0.017
−5
−10
(b) Detected edge pixels (red)
−15
0
50
100
150
200
250
(e) Linearised models
Figure 8.3: Summer photo: discrete gradient histogram and least squares models. The
image intensity in (a) is in the range [0, 255], and we have chosen pixels k with |∇u(k)| ≥ 30
as edges (b). The log-histogram of |∇u(k)| with optimal fit of t 7→ tq is displayed in (c). This
is done separately for the edge pixels in (d). The linearised model is fitted in (e) for the cut-off
point M = 30 (manual edge detection), M = 40 (optimal least squares fit). We moreover
show the empirically best linearised model.
Limiting aspects of non-convex TVϕ models
29
(a) Original
(b) Noisy image
(c) M = 0
(d) M = 10 (PSNR-optimal)
(e) M = 40 (SSIM-optimal)
(f) M = ∞
Figure 9.1: Pier photo: denoising results with noise level σ = 30 (Gaussian), for varying
cut-off M , fixed q = 0.4 and fixed α∞ = 0.0207.
30
M. Hintermüller, T. Valkonen, and T. Wu
(a) Original
(b) Noisy image
(c) M = 0 (PSNR-optimal)
(d) M = 15 (SSIM-optimal)
(e) M = 40
(f) M = ∞
Figure 9.2: Parrot photo: denoising results with noise level σ = 30 (Gaussian), for varying
cut-off M , fixed q = 0.5 and fixed α∞ = 0.0253.
Limiting aspects of non-convex TVϕ models
31
(a) Original
(b) Noisy image
(c) M = 0
(d) M = 20 (PSNR-optimal)
(e) M = 40 (SSIM-optimal)
(f) M = ∞
Figure 9.3: Summer photo: denoising results with noise level σ = 60 (Gaussian), for varying
cut-off M , fixed q = 0.3 and fixed α∞ = 0.00430.
32
M. Hintermüller, T. Valkonen, and T. Wu
Table 9.1: Denoising results of all the three test photos. The noise level σ, exponent q
and asymptotic regularisation α∞ are fixed. The cut-off point M varies. We report both the
PSNR and the SSIM, with the optimal values underlined.
Parrot photo / σ = 30, q = 0.5, α∞ = 0.0253
M
0
10
15
30
40
PSNR 30.3432 29.9156 29.5876 28.8098 28.3655
SSIM 0.7914
0.7919
0.7922
0.7906
0.7887
Pier photo / σ = 30, q = 0.4, α∞ = 0.0207
M
0
10
20
30
40
PSNR 29.0019 29.3959 29.3472 29.0918 28.8988
SSIM 0.6737
0.7191
0.7477
0.7556
0.7619
∞
Summer photo / σ = 30, q = 0.4, α = 0.00430
M
0
10
20
30
40
PSNR 26.0919 26.2750 26.2851 26.0643 25.7443
SSIM 0.589
0.6175
0.6489
0.6641
0.6644
50
28.0398
0.7854
60
27.7424
0.7823
∞
28.6829
0.7552
50
28.679
0.7608
60
28.5327
0.7593
∞
28.5836
0.7297
50
25.3627
0.6615
60
25.0449
0.6543
∞
25.2755
0.6175
Limiting aspects of non-convex TVϕ models
33
windows and combining the results in a non-linear fashion. The range of the SSIM is [0, 1],
the higher the better.
In our computations, we keep α∞ and q fixed, and vary M (by altering α as necessary).
For M = ∞, i.e., the original TVq model, we simply take α as our chosen fixed α∞ . This is
because ϕ∞ = 0, so the real asymptotic α for the model is always zero. It is quite remarkable
that in our results fine features of the images are always retained very well although higher
M tends to increase the stair-casing effect (not applicable to M = ∞). At the optimal choice
of M by PSNR or SSIM, more noise can be seen to be removed than by TV (M = 0).
Generally, we can say that adding the cut-off M improves the results compared to the earlier
TVq model without cut-off (M = ∞). Whether the results are better than conventional TV
denoising is open to debate. By PSNR and SSIM the results tend to favor the TVϕ -model.
Visually oscillatory effects of noise are better removed, but at the same time the stair-casing
is accentuated. The best result is in the eye of the beholder.
We also tested on the parrot photo the effect of the multiscale regularisation term η by
including the first term η1 of the sum for varying weights of η0 and convolution kernel widths
1 . The results are in Figure 9.4 and Table 9.2. Clearly large η0 has a deteriorating effect on
both PSNR and SSIM, whereas the effect of the choice of 1 is less severe. Visually, large η0
creates an almost artistic quantisation and feature-filtering effect. The latter is also controlled
by 1 : large 1 tends to remove large features. A particular feature to notice is the eye of the
parrot on the right in Figure 9.4(a) versus (b). It has disappeared altogether in the latter.
Table 9.2: Effect of the η term on the parrot photo. Only the first term η1 of the sum is
included, with varying convolution width 1 and weight η0 . The noise level σ = 30 (Gaussian),
cut-off M = 10, exponent q = 0.5 and asymptotic regularisation α∞ = 0.0253 are fixed.
Optimal SSIM and PSNR are underlined.
η0 = 0.1α
η0 = α
η0 = 10α
1 = 0.5
PSNR
SSIM
29.9433 0.8036
29.2700 0.8072
26.8069 0.7939
1 = 1
PSNR
SSIM
29.8023 0.8091
28.2237 0.8014
25.7874 0.7921
1 = 2
PSNR
SSIM
29.6197 0.8085
27.5659 0.7970
24.9003 0.7886
10. Conclusion. We have studied difficulties with non-convex total variation models in
the function space setting. We have demonstrated that the model (1.2) continues to do what it
is proposed to do in the discrete setting – to promote piecewise constant solutions – for most,
but not all, energies ϕ, employed in the literature. Naïve forms of the model (1.1), proposed to
model real gradient distributions in images, however have much more severe difficulties. We
have shown that the model can be remedied if we replace the topology of weak* convergence
by that of area strict convergence. In order to do this, we have to add additional multiscale
regularisation in terms of the functional η into the model, and to linearise the energy ϕ for
large gradients. The latter is needed to make the model BV-coercive, and to have any kind
of penalisation for jumps. We have demonstrated through numerical experiments and simple
statistics that this model, in fact, better matches reality than the simple energies ϕ(t) = tq .
34
M. Hintermüller, T. Valkonen, and T. Wu
(a) η0 = 7.071e−03 , 1 = 1.0
(b) η0 = 7.071e−03 , 1 = 2.0
(c) η0 = 7.071e−04 , 1 = 1.0
(d) η0 = 7.071e−04 , 1 = 2.0
(e) η0 = 7.071e−05 , 1 = 1.0
(f) η0 = 7.071e−05 , 1 = 2.0
Figure 9.4: Effect of the η term on the parrot photo. Only the first term η1 of the sum is
included, with varying convolution width 1 and weight η0 . The noise level σ = 30 (Gaussian),
cut-off M = 10, exponent q = 0.5 and asymptotic regularisation α∞ = 0.0253 are fixed.
Limiting aspects of non-convex TVϕ models
35
Our purely theoretical starting point has therefore led to improved practical models. The η
functional, however, remains a “theoretical artefact”. It has its own regularisation effect that,
naturally, does not distort the results too much for small parameters (though it does so for
large parameters). As we shown in Proposition 6.7, it can be ignored in discretisations when
not passing to the function space limit.
Acknowledgements. A large part of this work was done while T. Valkonen was at the
Center for Mathematical Modeling, Escuela Politécnica Nacional, Quito, Ecuador. There he
was been supported by a Prometeo scholarship of the Senescyt. In Cambridge, T. Valkonen
has been financially supported by the King Abdullah University of Science and Technology
(KAUST) Award No. KUK-I1-007-43, and the EPSRC first grant Nr. EP/J009539/1 “Sparse
& Higher-order Image Restoration”. M. Hintermüller and T. Wu have been supported by
the Austrian FWF SFB F32 “Mathematical Optimization in Biomedical Sciences”, and the
START-Award Y305.
A data statement of the EPSRC. This is primarily a theoretical mathematics paper, and
any data used mainly serves as a demonstration of mathematically proven results. Moreover,
photographs that are for all intents and purposes statistically comparable to the ones used
for the final experiments, can easily be produced with a digital camera, or downloaded from
the internet; in particular, the parrot test photo is available in the Kodak Lossless True Color
Image Suite.1 For the computations, we directly applied software developed in an earlier
research program. This was funded by various non-UK agencies, whose rules govern the code.
REFERENCES
[1] W.K. Allard, Total variation regularization for image denoising, I. Geometric theory, SIAM J. Math.
Anal., 39 (2008), pp. 1150–1190.
[2] L. Ambrosio, A compactness theorem for a new class of functions of bounded variation, Boll. Un. Mat.
Ital. B, 7 (1989), pp. 857–881.
[3] L. Ambrosio, A. Coscia, and G. Dal Maso, Fine properties of functions with bounded deformation,
Arch. Ration. Mech. Anal., 139 (1997), pp. 201–238.
[4] L. Ambrosio, N. Fusco, and D. Pallara, Functions of Bounded Variation and Free Discontinuity
Problems, Oxford University Press, 2000.
[5] H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame
problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods,
Math. Program., 137 (2013), pp. 91–129.
[6] H. Attouch, G. Buttazzo, and G. Michaille, Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization, MOS-SIAM Series on Optimization, Society for Industrial and
Applied Mathematics, 2014.
[7] M. Benning and M. Burger, Ground states and singular vectors of convex variational regularization
methods, Methods and Applications of Analysis, 20 (2013), pp. 295–334.
[8] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio,
B. van Bloemen Waanders, K. Willcox, and Y. Marzouk, Large-scale inverse problems and
quantification of uncertainty, vol. 712, John Wiley & Sons, 2011.
[9] G. Bouchitté and G. Buttazzo, New lower semicontinuity results for nonconvex functionals defined
on measures, Nonlinear Anal., 15 (1990), pp. 679 – 692.
[10] K. Bredies, K. Kunisch, and T. Pock, Total generalized variation, SIAM J. Imaging Sci., 3 (2011),
pp. 492–526.
1
At the time of writing this, available at http://r0k.us/graphics/kodak/.
36
M. Hintermüller, T. Valkonen, and T. Wu
[11] T. Bui-Thanh, K. Willcox, and O. Ghattas, Model reduction for large-scale systems with highdimensional parametric input space, SIAM J. Sci. Comput., 30 (2008), pp. 3270–3288.
[12] E. Casas, K. Kunisch, and C. Pola, Regularization by functions of bounded variation and applications
to image enhancement, Applied Mathematics & Optimization, 40 (1999), pp. 229–257.
[13] V. Caselles, A. Chambolle, and M. Novaga, The discontinuity set of solutions of the TV denoising
problem and some extensions, Multiscale Model. Simul., 6 (2008), pp. 879–894.
[14] T. F. Chan and S. Esedoglu, Aspects of total variation regularized L1 function approximation, SIAM
J. Appl. Math., 65 (2005), pp. 1817–1837.
[15] X. Chen, Smoothing methods for nonsmooth, nonconvex minimization, Math. Program., 134 (2012),
pp. 71–99.
[16] X. Chen and W. Zhou, Smoothing nonlinear conjugate gradient method for image restoration using
nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 3 (2010), pp. 765–790.
[17] J. C. de Los Reyes and C.-B. Schönlieb, Image denoising: Learning noise distribution via PDEconstrained optimization, Inverse Probl. Imaging, (2014). to appear.
[18] S. Delladio, Lower semicontinuity and continuity of functions of measures with respect to the strict
convergence, Proceedings of the Royal Society of Edinburgh: Section A Mathematics, 119 (1991),
pp. 265–278.
[19] J. E. Dennis, Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996.
[20] V. Duval, J. F. Aujol, and Y. Gousseau, The TVL1 model: A geometric point of view, Multiscale
Model. Simul., 8 (2009), pp. 154–189.
[21] L. C. Evans and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press, 1992.
[22] I. Fonseca and G. Leoni, Modern methods in the calculus of variations: Lp spaces, Springer Verlag,
2007.
[23] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of
images, IEEE TPAMI, (1984), pp. 721–741.
[24] E. Giusti, Minimal Surfaces and Functions of Bounded Variation, vol. 80 of Monographs in Mathematics,
Birkhäuser, 1984.
[25] E. Haber, L. Horesh, and L. Tenorio, Numerical methods for experimental design of large-scale linear
ill-posed inverse problems, Inverse Problems, 24 (2008), p. 055012.
[26] E. Haber and L. Tenorio, Learning regularization functionals – supervised training approach, Inverse
Problems, 19 (2003), p. 611.
[27] M. Hintermüller and T. Wu, Nonconvex TVq -models in image restoration: Analysis and a trust-region
regularization–based superlinearly convergent solver, SIAM J. Imaging Sci., 6 (2013), pp. 1385–1415.
[28]
, A superlinearly convergent R-regularized Newton scheme for variational models with concave
sparsity-promoting priors, Comput. Optim. Appl., 57 (2014), pp. 1–25.
[29] J. Huang and D. Mumford, Statistics of natural images and models, in IEEE CVPR, vol. 1, 1999.
[30] J. Kristensen and F. Rindler, Relaxation of signed integral functionals in BV, Calc. Var. Partial
Differential Equations, 37 (2010), pp. 29–62.
[31] P.-L. Lions, The concentration-compactness principle in the calculus of variations. the limit case, i.,
Revista matemática iberoamericana, 1 (1985), pp. 145–201.
[32] P Mattila, Geometry of sets and measures in Euclidean spaces: Fractals and rectifiability, Cambridge
University Press, 1999.
[33] M. Nikolova, Minimizers of cost-functions involving nonsmooth data-fidelity terms. application to the
processing of outliers, SIAM J. Numer. Anal., 40 (2002), pp. 965–994.
[34] M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching, Efficient reconstruction of piecewise constant
images using nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 1 (2008), pp. 2–25.
[35] P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: Inertial proximal algorithm for non-convex optimization, arXiv preprint arXiv:1404.4805, (2014).
[36] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, An iterated l1 algorithm for non-smooth non-convex
optimization in computer vision, in IEEE CVPR, 2013.
[37] E. Paolini and E. Stepanov, Optimal transportation networks as flat chains, Interfaces Free Bound., 8
(2006), pp. 393–436.
[38] J. Rabin and G. Peyré, Wasserstein regularization of imaging problem, in Image Processing (ICIP),
Limiting aspects of non-convex TVϕ models
37
2011 18th IEEE International Conference on, 2011, pp. 1541–1544.
[39] F. Rindler and G. Shaw, Strictly continuous extensions of functionals with linear growth to the space
BV. preprint, 2013.
[40] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica
D, 60 (1992), pp. 259–268.
[41] W. Rudin, Real and Complex Analysis, McGraw-Hill Book Company, 1966.
[42] P. Swoboda and C. Schnörr, Convex variational image restoration with histogram priors, SIAM J.
Imaging Sci., 6 (2013), pp. 1719–1735.
[43] T. Valkonen, Optimal transportation networks and stations, Interfaces Free Bound., 11 (2009), pp. 569–
597.
[44]
, Transport equation and image interpolation with SBD velocity fields, J. Math. Pures Appl., 95
(2011), pp. 459–494.
[45]
, Strong polyhedral approximation of simple jump sets, Nonlinear Anal., 75 (2012), pp. 3641–3671.
[46]
, The jump set under geometric regularisation. Part 2: Higher-order approaches. Submitted, July
2014.
[47]
, The jump set under geometric regularisation. Part 1: Basic technique and first-order denoising,
SIAM J. Math. Anal., 47 (2015), pp. 2587–2629.
[48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image quality assessment: From error
visibility to structural similarity, IEEE Trans. Image Processing, 13 (2004), pp. 600–612.
[49] B. White, Rectifiability of flat chains, Ann. of Math., 150 (1999), pp. 165–184.
Appendix A. Vectorial η functional.
We now study a condition ensuring the convergence of the total variation |µi |(Ω) subject
to the weak* convergence of the measures µi ∈ M(Ω; Rm ), (i = 0, 1, 2, . . .). Improving a result
first presented in [44,45], we show in Theorem A.4 below that if {f` }∞
`=0 is a normalised nested
sequence of functions as in Definition A.1 below, then it suffices to bound
η(µi ) :=
∞
X
η` (µi ),
where η` (µi ) :=
Z
|µi |(τx f` ) − |µi (τx f` )| dx. (µ ∈ M(Ω; RN )). (A.1)
`=0
Here we employ the notation τx f (y) := f (y − x). Also, we write |µi (τx f` )| := kµi (τx f` )k2 .
Definition A.1. Let f` : Rm → R, (` = 0, 1, 2, . . .), be bounded Borel functions with compact
support that are continuous in Rm \ Sf` , i.e. the approximate discontinuity set equals the
m
m
discontinuity set. Also let {ν` }∞
`=0 be a sequence in M(R ) with |ν` |(R ) R= 1. The sequence
∞
{(f` , ν` )}`=0 is then said to form a nested sequence of functions
if f` (x) = f`+1 (x − y) dν` (y)
R
(a.e.). The sequence is said to be normalised if f` ≥ 0 and f` dx = 1.
Example A.1. Let ρ be the standard convolution mollifier such that
(
ρ(x) :=
exp(−1/(1 − kxk2 )),
0,
kxk < 1,
kxk ≥ 1,
and define ρ (x) := −m ρ(x/). Since ρ+δ = ρ ∗ρδ where ∗ denotes the convolution operation,
we deduce that f` := ρ2−` and ν` = ρ2−`−1 form a normalised nested sequence.
We require the following basic lemma for our vectorial case.
Lemma A.2. Let ν ∈ M(Ω) be a positive Radon measure, and g ∈ L1 (Ω; RN ). Then
Z
Z
g(x) dν(x) ≤
kg(x)k2 dν(x).
Ω
2
Ω
38
M. Hintermüller, T. Valkonen, and T. Wu
Proof. For any x ∈ Ω, we write g(x) = θ(x)v(x) with v(x) = (v1 (x), . . . , vN (x)), 0 ≤ θ(x),
and kv(x)k2 = 1. Then we define λ := θν. Now
2
Z
2
2
Z
N Z
N
X
X
1
2
g(x) dν(x) =
v
(x)
dλ(x)
=
λ(Ω)
v
(x)
dλ(x)
j
j
λ(Ω)
Ω
2
Ω
j=1
≤
N
X
j=1
Z
λ(Ω)
Ω
|vj (x)|2 dλ(x) = λ(Ω)2 .
Ω
j=1
Here we have used Jensen’s inequality. From this we conclude
Z
Z
λ(Ω) =
θ(x) dν(x) =
Ω
kg(x)k2 dν(x),
Ω
proving the claim.
With the help of the above lemma, in [45] Theorem A.4 below was proved exactly as the
case of scalar-valued measures (N = 1). Our proof here is however slightly different. We base
it on the following more general lemma on partial sums, which we also need for the proof of
Proposition 6.7.
Lemma A.3. Let Ω ⊂ Rm be an open and bounded set, and {(f` , ν` )}∞
`=0 a normalised
∞
+
nested sequence of functions. Let {Ki }i=1 ⊂ N satisfy limi→∞ Ki = ∞. Suppose {µi }∞
i=0 ⊂
M(Ω; RN ) weakly* converges to µ ∈ M(Ω; RN ) with
sup |µi |(Ω) +
i
Ki
X
η` (µi ) < ∞,
(A.2)
`=1
and
η(µ) < ∞.
(A.3)
Then
η` (µ) ≤ lim inf η` (µi ),
i→∞
(` = 0, 1, 2, . . .).
(A.4)
∗
If also |µi | *
λ in M(Ω), then λ = |µ|. Moreover, provided the weak* convergences hold in
m
N
M(R ; R ), resp., M(Rm ), then
η` (µ) = lim η` (µi ),
i→∞
(` = 0, 1, 2, . . .).
(A.5)
∗
∗
Proof. Let us suppose first that µi *
µ and |µi | *
λ weakly* in M(Rm ; RN ), resp., M(Rm )
rather than just within Ω. We denote by Ef the discontinuity set of f , while Sf stands for the
approximate discontinuity
set. Fubini’s theorem and the fact that Sf is an Lm -negligible Borel
R
set, imply that λ(Sτx f` ) dx = 0. This and the non-negativity of λ show that λ(Sτx f` ) = 0
for a.e. x ∈ Rm . Since by assumption Ef ⊂ Sf , it follows that λ(Eτx f` ) = 0, so that (see,
e.g., [3, Proposition 1.62]) µi (τx f` ) → µ(τx f` ) for a.e. x ∈ Rm . Likewise |µi |(τx f` ) → λ(τx f` )
Limiting aspects of non-convex TVϕ models
39
for a.e. x ∈ Rm . Since supi |µi |(Ω) < ∞, and Ω is bounded, an application of the dominated
convergence theorem now yields
Z
lim
i→∞
i
|µ (τx f` )| dx =
Z
|µ(τx f` )| dx.
(A.6)
By the lower semicontinuity of the total variation, recalling that
Z
|µi |(τx f` ) dx = |µi |(Rm ),
this shows (A.4) under the assumption that the weak* convergences are in M(Rm ).
Observe then that since {(f` , ν` )}∞
{η` (µ)}∞
`=0 is a nested sequence of functions,
`=0 forms
R
a decreasing sequence (for any µ ∈ M(Ω)). Indeed, as f` (x) = f`+1 (x − y) dν` (y) and
ν` (Rm ) = 1 with ν` ≥ 0, using Lemma A.2 we have
Z
Z Z
kµ(τx f` )k2 dx = µ(τx+y f`+1 ) dν` (y)
dx
2
Z Z
≤
kµ(τx+y f`+1 )k2 dν` (y) dx =
Z
kµ(τx f`+1 )k2 dx.
Referring to (A.1), it follows that
η` (µ) ≥ η`+1 (µ).
(A.7)
∗
To show λ = |µ|, that is |µi | *
|µ|, we only have to show |µi |(Ω) → |µ|(Ω). To see the latter,
we choose an arbitrary > 0, and write
|µ|(Ω) − |µi |(Ω) = η` (µ) − η` (µi ) +
Z
|µ(τx f` )| − |µi (τx f` )| dx.
(A.8)
Since η` ≥ 0, using (A.7) in (A.2) and (A.3), we now observe that taking ` large enough and
i` such that Ki` ≥ `, we have
sup{η` (µ), η` (µi` ), η` (µi` +1 ), η` (µi` +2 ), . . .} ≤ .
Employing this in (A.8), we deduce for any large enough ` and all i ≥ i` that
Z
i
i
|µ|(Ω) − |µ |(Ω) ≤ 2 + |µ(τx f` )| − |µ (τx f` )| dx .
The integral term tends to zero as i → ∞ by (A.6). Therefore
lim |µi |(Ω) − |µ|(Ω) ≤ 2.
i→∞
Since > 0 was arbitrary, we conclude that λ = |µ|. Moreover, (A.5) follows from (A.6) now.
This concludes the proof of the lemma under the assumption that the weak* convergences are
in M(Rm ).
40
M. Hintermüller, T. Valkonen, and T. Wu
∗
If this assumption does not hold, we may still switch to a subsequence for which µik *
µ̄
m
N
m
N
weakly* in M(R ; R ) for some µ̄ ∈ M(R ; R ). But, since Ω is open, necessarily µ̄xΩ = µ.
Moreover, an application of the triangle inequality gives
η` (µ) = η` (µ̄xΩ) ≤ η` (µ̄) ≤ lim inf η` (µik ).
k→∞
∗
As this bound holds for every subsequence, we deduce (A.4). Likewise, we have |µik | *
λ̄
m
weakly* in M(R ) for a common subsequence. Again λ̄xΩ = λ. Since by the previous
paragraphs |µ̄| = λ̄, this implies λ = |µ|.
Theorem A.4. Let Ω ⊂ Rm be an open and bounded set, and {(f` , ν` )}∞
`=0 a normalised
N ) converges weakly*
nested sequence of functions. Suppose the sequence {µi }∞
in
M(Ω;
R
i=0
∗
to a measure µ ∈ M(Ω; RN ), and satisfies supi |µi |(Ω) + η(µi ) < ∞. If also |µi | *
λ, then
λ = |µ|. Moreover, each of the functionals η and η` , (` = 0, 1, 2, . . .), is lower-semicontinuous
with respect to the weak* convergence of {µi }∞
i=0 .
Proof. Only lower semicontinuity of η demands a proof; the rest is immediate from Lemma
A.3 with Ki = i, for instance. Also lower semicontinuity of η follows simply from Fatou’s
lemma and (A.4).
Appendix B. Proof of Theorem 7.1.
We prove here our result on the remedy by resorting to compact operators.
Lemma B.1. Let Ω, Ω0 be open domains, and suppose K : BV(Ω) → L1 (Ω0 ; Rm ) is a
compact linear operator. Let ϕ : R0,+ → R0,+ be lower semicontinuous. Then
Z
F (u) :=
Ω0
ϕ(kKu(x)k) dx
is lower semicontinuous with respect to weak* convergence in BV(Ω).
Proof. If {ui }∞
i=1 ⊂ BV(Ω) converges weakly* to u ∈ BV(Ω), then it is bounded in BV(Ω).
Therefore, by the compactness of K, the sequence {Kui }∞
i=1 has a subsequence, unrelabelled,
1
0
m
which converges strongly to some v ∈ L (Ω ; R ). By the continuity of K, which follows from
compactness, necessarily v = Ku. Now [22, Theorem 5.9] shows that
F (u) ≤ lim F (ui ).
i→∞
Lemma B.2. Let ρ ∈ Cc∞ (Rn ) and Ω ⊂ Rn be bounded and open. Suppose µi → µ weakly*
in M(Ω; Rm ). Then ρ ∗ µi → ρ ∗ µ strongly in L∞ (Rn ).
Proof. Let K be a compact set such that Ω + supp ρ ⊂ K. We have
kρ ∗ µi kL∞ (K;Rm ) ≤ kρkL∞ (Rn ) |µi |(Ω)
and
k∇(ρ ∗ µi )kL∞ (K;Rn ×Rm ) = k(∇ρ) ∗ µi kL∞ (K;Rn ×Rm ) ≤ k∇ρkL∞ (Rn ;Rn ) |µi |(Ω).
Limiting aspects of non-convex TVϕ models
41
Thus {ρ ∗ µi }∞
i=1 is uniformly bounded and equicontinuous. It follows from the Arzelá-Ascoli
theorem that ρ ∗ µi converges uniformly (i.e., in L∞ (K; Rm )) to some v ∈ Cc (K; Rm ).
Let ϕ ∈ L1 (K; Rm ). Then by the weak* convergence we have
Z
hϕ(x), (ρ ∗ µi )(x)i dx =
Rn
→
Z
h(ϕ ∗ ρ)(x), dµi (x)i
n
ZR
h(ϕ ∗ ρ)(x), dµ(x)i =
Rn
Z
hϕ(x), (ρ ∗ µ)(x)i dx,
Rn
as i → ∞, so that ρ ∗ µi → ρ ∗ µ weakly in L∞ (K; Rm ). Then it holds that ρ ∗ µ = v, and the
convergence is strong. Because supp w ⊂ K for w = ρ ∗ µ or w = ρ ∗ µi , the claim follows.
Proof of Theorem 7.1. Suppose {ui }∞
i=1 is a bounded sequence in BV(Ω). We may extract a
subsequence, unrelabelled, such that {ui }∞
i=1 is weakly* convergent in BV(Ω) to some u ∈
BV(Ω). By Lemma B.2, then
D ui → D u strongly in L∞ (Ω; Rm ).
Weak* lower semicontinuity now follows from Lemma B.1.
Let then u ∈ C 1 (Ω). The estimate
]ϕ
lim sup TVϕ,
c (u) ≤ TVc (u),
(u ∈ C 1 (Ω)),
(B.1)
&0
follows from the proof of Lemma 4.5. By subadditivity we also have
ϕ,
]ϕ
TV
c (u) − TVc (u) ≤
Z
ϕ(k(ρ ∗ ∇u)(x) − ∇u(x)k) dx.
Ω
Writing g (x) := k(ρ ∗∇u)(x)−∇u(x)k, we have g (x) → 0 in L1 (Rn ). We may again proceed
as in the proof of Lemma 4.5 to show
ϕ,
]ϕ
lim sup TV
c (u) − TVc (u) ≤ 0.
&0
This together with (B.1) proves (7.1).
Appendix C. Proof of Theorem 7.2.
We now prove our result on the remedy based on the SBV space.
Proposition C.1. Let η̄ and p ∈ (1, ∞) be as in (7.2). Suppose g i * g weakly in Lp (Ω; Rm ),
and that supi η̄(g i ) < ∞. Then g i → g strongly in Lp (Ω; Rm ).
Proof. Let K be a compact set such that Ω + supp ρ1 ⊂ K, and set M := supi η̄(g i ). We
observe that η̄ is lower semicontinuous with respect to weak convergence in Lp . Therefore
η(g) ≤ M . As in the proof of Lemma A.3, we observe that η̄` ≥ η̄`+1 , from which it again
follows that
η̄` (h) → 0
as ` → ∞
uniformly for h ∈ {g, g 1 , g 2 , g 3 , . . .}.
(C.1)
42
M. Hintermüller, T. Valkonen, and T. Wu
We then observe that as in Lemma B.2, we have
ρ` ∗ g i → ρ` ∗ g
strongly in L∞ (K; Rm )
(C.2)
for each ` ∈ {1, 2, 3, . . .}. Thus, it holds that
kg i kLp (Ω;Rm ) − kgkLp (Ω;Rm ) ≤ η` (g i ) − η` (g) + kρ` ∗ g i − ρ` ∗ gkLp (K;Rm ) .
Given δ > 0, thanks to (C.1), we may find ` such that
kg i kLp (Ω;Rm ) − kgkLp (Ω;Rm ) ≤ 2δ + kρ` ∗ g i − ρ` ∗ gkLp (K;Rm ) .
With ` fixed, we thus get by (C.2) that
lim sup kg i kLp (Ω;Rm ) − kgkLp (Ω;Rm ) ≤ 2δ.
i→∞
Since δ > 0 was arbitrary, and using weak lower semicontinuity of k · kLp (Ω;Rm ) , we deduce
lim kg i kLp (Ω;Rm ) = kgkLp (Ω;Rm ) .
i→∞
But for p ∈ (1, ∞), strict convergence implies strong convergence [22]. This concludes the
proof.
Proof of Theorem 7.2. If {ui }∞
i=1 is weakly* convergent in BV(Ω), we may – without loss of
generality – assume that supi F (ui ) < ∞, for otherwise lower semicontinuity is obvious. Then
by the SBV Compactness Theorem 3.5, the convergences (3.1)–(3.4) hold for a subsequence.
This also shows
weak* convergence, if it did not hold originally. Moreover, by the same
R
theorem, u 7→ Ju ψ(θu (x)) dHm−1 (x) is lower semicontinuous with respect to this convergence.
p
m
By (7.3) we may further assume {∇ui }∞
i=1 weakly convergent in L (Ω; R ). Proposition
p
m
C.1 now shows strong convergence of {∇ui }∞
i=1 in L (Ω; R ). The functional F is lower
semicontinuous with respect to all of these convergences, which yields lower semicontinuity
with respect to weak* convergence in BV(Ω).
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement