# Limiting aspects of non-convex TV models Michael Hintermüller , Tuomo Valkonen

Limiting aspects of non-convex TVϕ models Michael Hintermüller∗, Tuomo Valkonen†, and Tao Wu‡ Abstract. Recently, non-convex regularisation models have been introduced in order to provide a better prior for gradient distributions in real images. R They are based on using concave energies ϕ in the total variation type functional TVϕ (u) := ϕ(|∇u(x)|) dx. In this paper, it is demonstrated that for typical choices of ϕ, functionals of this type pose several difficulties when extended to the entire space of functions of bounded variation, BV(Ω). In particular, if ϕ(t) = tq for q ∈ (0, 1) and TVϕ is defined directly for piecewise constant functions and extended via weak* lower semicontinuous envelopes to BV(Ω), then still TVϕ (u) = ∞ for u not piecewise constant. If, on the other hand, TVϕ is defined analogously via continuously differentiable functions, then TVϕ ≡ 0, (!). We study a way to remedy the models through additional multiscale regularisation and area strict convergence, provided that the energy ϕ(t) = tq is linearised for high values. The fact, that this kind of energies actually better matches reality and improves reconstructions, is demonstrated by statistics and numerical experiments. AMS subject classifications. 26B30, 49Q20, 65J20 Key words. total variation, non-convex, regularisation, area-strict convergence, multiscale analysis. 1. Introduction. Recently introduced non-convex total variation models are based on employing concave energies ϕ, in discrete versions of functionals of the form TVϕ c (u) := Z ϕ(|∇u(x)|) dx, (u ∈ C 1 (Ω)), (1.1) Ω which we call the continuous model, or TVϕ d (u) := Z ϕ(|u+ (x) − u− (x)|) dHm−1 (x), (u piecewise constant), (1.2) Ju which we call the discrete model. Here Ω ⊂ Rm is our image domain, and Ju is the jump set of u, where the one-sided traces u± from different sides of Ju differ. The typical energies include, in particular, ϕ(t) = tq for q ∈ (0, 1). The models based on discretisations of (1.2) have been proposed for the promotion of piecewise constant (cartoon-like) images [16, 23, 33, 34], whereas models based on discretisations of (1.1) have been proposed for the better modelling of gradient distributions in real-life images [27–29, 36]. To denoise an image z, one may then solve the nonconvex Rudin-Osher-Fatemi type problem 1 min kz − uk2L2 (Ω) + αTVϕ (u) u 2 ∗ (1.3) Institute for Mathematics, Humboldt University of Berlin, Germany. E-mail: [email protected] Department of Applied Mathematics and Theoretical Physics, University of Cambridge, United Kingdom (current), & Center for Mathematical Modeling, Escuela Politécnica Nacional, Quito, Ecuador. E-Mail: [email protected], corresponding author. ‡ Institute for Mathematics and Scientific Computing, University of Graz, Austria. E-mail: [email protected] † 1 2 M. Hintermüller, T. Valkonen, and T. Wu ϕ ϕ for TVϕ = TVϕ c or TV = TVd . Observe that (1.1) is only defined rigorously for differentiable functions. In contrast to (1.2), it is in particular not defined for piecewise constant discretisations, or images with discontinuities. The functional has to be extended to the whole space of functions of bounded variation denoted by BV(Ω), see [24] for its definition, in order to obtain a sound model in the non-discretised setting. Alternatively, we may take (1.2), defined for piecewise constant functions, as the basis and extend it to continuous functions. We will study the extension of both models (1.1) and (1.2) to BV(Ω). We demonstrate that (1.1) in particular has severe theoretical difficulties for typical choices of ϕ. We also demonstrate that some of these difficulties can be overcome by altering the model to better match reality, although we also need additional multiscale regularisation in the model for theoretical purposes. It is worth noting that in the finite-dimensional setting that has mainly been considered in the aforementioned previous works, such problems with well-posedness do not surface. This can also be seen by adaptation of our results, minding that in a finite-dimensional subspace of BV(Ω), weak* and strong convergence are equivalent. Infinite-dimensional models, while more challenging, can be more informative, and lead to better understanding of the model— both finite- and infinite-dimensional. Concepts such as smoothness and jumps, are, after all, not well-defined in the discrete setting. Study of these and other properties in the infinitedimensional setting can provide valuable information about the behaviour of the associated image processing approaches and their solution approaches [1, 7, 10, 13, 14, 20, 46, 47]. Let us consider the specific difficulties with the discrete model TVϕ d first. We assume that m we have a regularly spaced grid Ωh ⊂ Ω ∩ hZ , (h > 0), and a function uh : Ωh → R. By m {ei }m i=1 we denote the canonical orthonormal basis of R . Then we identify uh with a function u that is constant on each cell k + [0, h]m , (k ∈ Ωh ). Accordingly, we have TVϕ d (uh ) := m X X hm−1 ϕ(|uh (k + ei ) − uh (k)|). (1.4) k∈Ωh i=1 This discrete expression with h = 1 is essentially what is studied in [16, 33, 34], although [33] studies also more general discrete models. In the function space setting, this model has to be extended to all of BV(Ω), in particular to smooth functions. The extension naturally has to be lower semicontinuous in a suitable topology, in order to guarantee the existence of solutions to (1.3). Therefore, one is naturally confronted with the question whether such an extension can be performed meaningfully? Let us consider a simple motivating example on Ω = (0, 1) with ϕ(t) = tq for q ∈ (0, 1). We aim to approximate the ramp function u(t) = t by piecewise constant functions. Given k > 0, we thus define uk (t) = i/k, for t ∈ [(i − 1)/k, i/k) and i ∈ {1, . . . , k}. Clearly we have that uk converges strongly to u in L1 (Ω). Using the discrete model (1.4) with h = 1/k, one has TVqd (uk ) = k X (1/k)0 · (1/k)q = k 1−q . i=1 Limiting aspects of non-convex TVϕ models 3 We see that limk→∞ TVqd (uk ) = ∞! This suggests that the TVq model based on the “discrete” functional might only allow piecewise constant functions as solutions. In other words, TVqd would induce pronounced staircasing – a property desirable when restoring piecewise constant images, but less suitable for other applications. In §3, we will indeed demonstrate that either u is piecewise constant, or u 6∈ BV(Ω). In order to highlight the inherent difficulties, let us then consider the continuous model TVϕ c , directly given by (1.1) for differentiable functions. In particular, (1.1) also serves as a 1 definition of TVϕ c for continuous piecewise affine discretisations of u ∈ C (Ω). We observe 1 that if u ∈ C (Ω) on a bounded domain Ω, and we set uh (k) = u(k) for k ∈ Ωh , then TVϕ c,h (uh ) := X hm ϕ(|∇h uh (k)|), [∇h uh (k)]i := uh (k + ei ) − uh (k) /h with (1.5) k∈Ωh satisfies ϕ lim TVϕ c,h (uh ) = TVc (u). h&0 This approximate model TVϕ c,h with h = 1 is essentially what is considered in [27, 28, 36]. On an abstract level, it is also covered by [33]. The question now is whether the definition of TVϕ c can be extended to functions of bounded variation in a meaningful manner. To start our investigation, let us try to approximate on Ω = (−1, 1) the step function ( u(t) = 0, 1, t < 0, t ≥ 0. Given k > 0, we define k u (t) = 0, 1, 1 (kt + 1), 2 Then gives uk → u in L1 (Ω). t < −1/k, t ≥ 1/k, t ∈ [−1/k, 1/k). However, the continuous model (1.1) with ϕ(t) = tq for q ∈ (0, 1) TVqc (uk ) = (2/k)q−1 . Thus TVqc (uk ) → 0 as k % ∞. This suggests that any extension of TVqc to u ∈ BV(Ω) through weak* lower semicontinuous envelopes will have TVqc (u) = 0, and that jumps in u are mapped to 0 in general. In §4 we will prove this rigorously and detect more striking properties. A weak* lower semicontinuous extension will necessary satisfy TVqc ≡ 0. Despite this discouraging property, after discussing in §5 the implications of the abovementioned results, we find appropriate remedies. Our associated principal approach is given in §6. It utilizes the (stronger) notion of area-strict convergence [18, 30], which – as will be shown – can be obtained using the multiscale analysis functional η from [44,45]. In §7 we also discuss alternative remedies which are related to compact operators and the space SBV(Ω) of special functions of bounded variation. In order to keep the flow of the paper, the pertinent proofs are relegated to the Appendix. To show existence of solutions to the fixed TVϕ c model involving area-strict convergence, we require that ϕ is level coercive, i.e. limt→∞ ϕ(t)/t > 0. This induces a linear penalty to 4 M. Hintermüller, T. Valkonen, and T. Wu edges in the image. Based on these considerations, one arrives at the question whether gradient statistics, such as the ones in [29], are reliable in dictating the prior term (regularizer). Our experiments on natural images in §8 suggest that this is not the case. In fact, the jump part of the image appears to have different statistics from the smooth part. It seems that the conventional TV regularization [40] provides a model for the jump part, which is superior to the nonconvex TV-model. This statistically validates our model, which is also suitable for a function space setting. Our rather theoretical starting point of making the TVϕ c model sound in function space therefore leads to improved practical models. Finally, in §9 we study image denoising with this model, and finish with conclusions in §10. We however begin with notation and other preliminary matters in the following §2. 2. Notation and preliminaries. We denote the set of non-negative reals as R0,+ := [0, ∞). If ϕ : R0,+ → R0,+ , then we write ϕ0 := lim ϕ(t)/t, t&0 and ϕ∞ := lim ϕ(t)/t, t%∞ implicitly assuming that the (possibly infinite) limits exist. The notation kxk without explicit specification of the space or type of norm, stands for the L2 –norm (in finite dimensions the 2–norm). We write the boundary of a set A as ∂A, and the closure as A. The open ball of radius ρ centred at x ∈ Rm is denoted by B(x, ρ). We now introduce some measure theory. For details of the definitions, we refer to textbooks including [4, 21, 41]. For Ω ⊂ Rm , we denote the space of (signed) Radon measures on Ω by M(Ω), and the space of Rm -valued Radon measures by M(Ω; Rm ). We use the notation |µ| for the total variation measure of µ = (µ1 , . . . , µm ) ∈ M(Ω; Rm ), and define the total variation (Radon) norm of µ by kµkM(Ω;Rm ) := |µ|(Ω):= sup (m Z X i=1 ) ϕi (x) dµi (x) ϕ = (ϕ1 , . . . , ϕm ) ∈ Cc∞ (Ω; Rm ) . Ω (2.1) Here Cc∞ (Ω; Rm ) denotes the set of Rm -valued smooth, infinitely differentiable, functions ϕ with compact support supp ϕ b Ω. If µ = f Lm corresponds to a function f ∈ L1 (Ω), then the definition simplifies into kµkM(Ω) = kµkL1 (Ω) . For a measurable set A, we denote by µxA the restricted measure defined by (µxA)(B) := µ(A ∩ B). The restriction of a function u to A is denoted by u|A. On any given ambient space Rm , (k ≤ m), we write Hk for the k-dimensional Hausdorff measure, and Lm for the Lebesgue measure. We also define the Dirac δ-measure at x by ( δx (A) := 1, 0, x ∈ A, x 6∈ A. Roughly, in the case m = 2 and k = 1, which is of most interest for us, L2 measures the area of sets in R2 , while H1 measures the length of (collections of) curves embedded in R2 . The Dirac δ measures membership of individual points, and is different from H0 , which would count the number of points in a set. Limiting aspects of non-convex TVϕ models 5 If J ⊂ Rm and there exist Lipschitz maps γi : Rm−1 → R with Hm−1 J \ ∞ [ ! γi (Rm−1 ) = 0, i=1 then we say that J is countably Hm−1 -rectifiable. Again, with m = 2, this means roughly that J is contained in a collection of Lipschitz curves, modulo a negligible set. It may, however, happen, that J is merely a point cloud contained in the curves, and not a full curve itself. We say that a function u : Ω → R on an open domain Ω ⊂ Rm is of bounded variation (see, e.g., [4] for a thorough introduction), denoted u ∈ BV(Ω), if u ∈ L1 (Ω), and the distributional gradient Du, given by Z Du(ϕ) := div ϕ(x)u(x) dx, Ω (ϕ ∈ Cc∞ (Ω)), is a Radon measure. This is equivalent to asking that TV(u) := |Du|(Ω) < ∞, where the total variation of u is defined by setting µ = Du in (2.1). We can then decompose Du into Du = ∇uLn + Dj u + Dc u, where ∇uLn is called the absolutely continuous part, Dj u the jump part, and Dc u the Cantor part. We also denote the singular part by Ds u := Dj u + Dc u The density ∇u ∈ L1 (Ω; Rm ) corresponds to the classical gradient if u is differentiable. The jump part may be written as Dj u = (u+ − u− ) ⊗ νJu Hm−1 xJu , where the jump set Ju is countably Hm−1 -rectifiable, νJu (x) is its normal, and u+ and u− are one-sided traces of u on Ju . These are defined by the condition 1 lim m ρ&0 ρ Z B ± (x,ρ,ν) |u± (x) − u(y)| dy = 0, being satisfied for some normal vector ν ∈ Rm and the associated half-ball B ± (x, ρ, ν) := {y ∈ B(x, ρ) | ±hy − x, νi ≥ 0}. The remaining Cantor part Dc u of Du vanishes on any Borel set which is σ-finite with respect to Hm−1 ; in particular |Dc u|(Ju ) = 0. We declare u an element of the space SBV(Ω) of special functions of bounded variation, if u ∈ BV(Ω) and Dc u = 0. We define the norm kukBV(Ω) := kukL1 (Ω) + kDukM(Ω;Rm ) . 6 M. Hintermüller, T. Valkonen, and T. Wu We say that a sequence {ui }∞ i=1 ⊂ BV(Ω), converges weakly* to u in BV(Ω), denoted by ∗ ∗ ui * u, if ui → u strongly in L1 (Ω) and Dui * Du weakly* in M(Ω; Rm ). If in addition |Dui |(Ω) → |Du|(Ω), we say that the convergence is strict. The weak* convergence of Dui to Du may in this case be expressed as Z i div ϕ(x) du (x) → Z Ω div ϕ(x)u(x) dx, Ω for all ϕ ∈ Cc∞ (Ω; Rm ). It roughly means that we have convergence when we “test” the sequence by a sensor–the test function ϕ. We may however not have convergence on those aspects of the image u that cannot be sensed by such sensors. In particular, weak* convergence does not imply strong convergence, or even strict converence. Strict convergence helps avoid annihilation effects. For example, the step functions ui = χ[−1/i,1/i] , with Dui = δ−1/i − δ1/i and |Dui |(Ω) = 2, converge weakly to zero, but not strictly. ∗ Finally, we say that a funtional F : BV(Ω) → R is weak* lower semicontinuous, if ui * u implies F (u) ≤ lim inf i→∞ F (ui ). This is an essential property for showing existence of minimisers of F : if ui is a infimising sequence, F (ui ) & inf F > −∞, then u is a minimiser provided F is weak* lower semicontinuous. 3. Limiting aspects of the discrete TVϕ model. We begin by rigorously defining and analysing the discrete TVϕ model (1.2) in BV(Ω). This model is used in the literature to promote piecewise constant solutions to image reconstruction problems. For our analysis we consider the following class of energies ϕ. Definition 3.1.Define Wd as the set of increasing, lower semicontinuous, subadditive functions ϕ : R0,+ → R0,+ that satisfy ϕ(0) = 0 and ϕ0 = ∞. Example 3.1. Examples of ϕ ∈ Wd include ϕ(t) = tq for q ∈ [0, 1). Note that this choice is the only one in Wd of the classes considered, for example, in [34]. The logistic penalty ϕ(t) = log(αt + 1) in particular, while subadditive, has ϕ0 = α < ∞. The fractional penalty ϕ(t) = αt/(1 + αt) likewise has ϕ0 = α < ∞. As we will later see, these classes are, however, admissible for the continuous model TVϕ c. Definition 3.2.Denote by pwc(Ω) the set of functions u ∈ BV(Ω) that are piecewise constant in the sense Du = Dj u. We then write |Dj u| = θu Hm−1 xJu , where θu (x) := |u+ (x) − u− (x)|. Definition 3.3.Given an energy ϕ ∈ Wd , the “discrete” non-convex total variation model is defined by Z gϕ (u) := TV d ϕ(θu (x)) dHm−1 (x), (u ∈ pwc(Ω)), Ju and extend this to u ∈ BV(Ω) by defining gϕ (ui ), TVϕ inf TV d (u) := lim d ∗ ui *u, ui ∈pwc(Ω) with the convergence weakly* in BV(Ω), in order to obtain a weak* lower semicontinuous functional. Limiting aspects of non-convex TVϕ models 7 ]ϕ in particular agrees with (1.4). Our main result regarding this model The functional TV d is the following. Theorem 3.4. Let ϕ ∈ Wd . Then TVϕ d (u) = ∞ u ∈ BV(Ω) \ pwc(Ω). for The proof is based on the SBV compactness theorem [2]; alternatively it can be proved via rectifiability results in the theory of currents [49], as used in the study of transportation networks, e.g., in [37, 43]. Theorem 3.5 (SBV compactness [2]). Let Ω ⊂ Rm be open and bounded. Suppose ϕ, ψ : 0,+ R → R0,+ are lower semicontinuous and increasing with ϕ∞ = ∞ and ψ0 = ∞. Suppose i ∗ {ui }∞ i=1 ⊂ SBV(Ω) and u * u ∈ SBV(Ω) weakly* in BV(Ω). If Z sup ϕ(|∇ui (x)|) dx + Jui Ω i=1,2,3,... Z ! ψ(θui (x)) dHm−1 (x) < ∞, then there exists a subsequence of {ui }∞ i=1 , unrelabelled, such that ui → u strongly in L1 (Ω), (3.1) ∇ui * ∇u weakly in L1 (Ω; Rm ), (3.2) j i (3.3) ∗ j m D u * D u weakly* in M(Ω; R ). If, moreover, ψ is subadditive, then Z ψ(θu (x)) dH m−1 (x) ≤ lim inf Z i→∞ Ju Jui ψ(θui (x)) dHm−1 (x). (3.4) Proof. For the proof of (3.1)–(3.3), we refer to [2, 4]. As the SBV compactness theorem is typically stated, concavity of ψ is required for the lower semicontinuity result (3.4). The fact that subadditivity and ψ(0) = 0 suffices, follows from [4, Theorem 5.4], see also [2]. There we use the fact β := ψ0 = ϕ∞ in the application of the theorem to the functional Z F (u) := Z ϕ(|∇u(x)|) dx + Ω ψ(θu (x)) dHm−1 (x) + β|Dc u|(Ω). Ju An alternative approach for the whole proof of the SBV compactness theorem is to map BV(Ω) into a space of Cartesian currents and use [49]. ∗ Proof of Theorem 3.4. Given u ∈ BV(Ω), let ui ∈ pwc(Ω) satisfy ui * u weakly* in BV(Ω). i Then the SBV compactness theorem shows that ∇u = ∇u = 0 and Dc u = 0. Thus u ∈ pwc(Ω). 8 M. Hintermüller, T. Valkonen, and T. Wu Remark 3.1. The functions ϕ(t) = αt/(1 + αt) and ϕ(t) = log(αt + 1) for α > 0, considered in [34] for reconstruction of piecewise constant images, do not have the property ϕ(t)/t → ∞ as t & 0. The above result therefore does not apply, and indeed TVϕ d defined using these functions will not force u with TVϕ (u) < ∞ to be piecewise constant, as the following result d states. Proposition 3.6. Let ϕ : R0,+ → R0,+ be continuously differentiable and satisfy ϕ(0) = 0. Then the following hold. (i) If ϕ0 < ∞ and ϕ is subadditive, then there exist a constant C > 0 such that TVϕ d (u) ≤ C TV(u), (u ∈ BV(Ω)). (ii) If ϕ0 > 0 and ϕ is increasing, then for every M > 0 there exists also a constant c = c(M ) > 0 such that c TV(u) ≤ TVϕ d (u), (u ∈ BV(Ω), kukL∞ (Ω) ≤ M ). Proof. We first prove the upper bound. To begin with, we observe that ϕ(t) ≤ ϕ0 t. (3.5) Indeed, since ϕ is sub-additive we have lim δ&0 ϕ(t + δ) − ϕ(t) ϕ(δ) ≤ lim = ϕ0 < ∞ δ&0 δ δ Thus ϕ0 (t) ≤ ϕ0 . As ϕ(0) = 0, it follows that ϕ(t) ≤ ϕ0 t. Now, with u ∈ BV(Ω), we pick a sequence {uk }∞ k=1 in pwc(Ω) converging to u strictly in BV(Ω); for details see [12]. Then by (3.5) we have ]ϕ (uk ) ≤ ϕ0 TV(uk ), TV d (k = 1, . . . , ∞). Then, by the definition of TVϕ d (u) and the strict convergence k ]ϕ k TVϕ d (u) ≤ lim inf TVd (u ) ≤ lim inf ϕ0 TV(u ) = ϕ0 TV(u). k→∞ k→∞ The claim in (i) follows. Let us now prove the lower bound in (ii). First of all, we observe the existence of c > 0 with ϕ(t) ≥ ct, (0 ≤ t ≤ 2M ). (3.6) Indeed, by the definition of ϕ0 , there exists t0 > 0 such that ϕ(t) > (ϕ0 /2)t for t ∈ (0, t0 ). Since ϕ is increasing, we have ϕ(t) ≥ ϕ(t0 ) ≥ (ϕ0 /2)t0 for t ≥ t0 . This yields c = ϕ0 t0 /(4M ). Assuming that kukL∞ (Ω) ≤ M < ∞, we now let {uk }∞ k=1 ⊂ pwc(Ω) approximate u weakly* in BV(Ω). We may assume that kuk kL∞ (Ω) ≤ 2M , (3.7) Limiting aspects of non-convex TVϕ models 9 because if this would not hold, then we could truncate uk , and the modified sequence {uk2M }∞ k=1 ]ϕ (uk ) ≤ TV ]ϕ (uk ). Thanks to (3.6) and would still converge to u weakly* in BV(Ω) with TV d 2M d (3.7), we have ]ϕ (uk ), cTV(uk ) ≤ TV d (k = 1, . . . , ∞). By the lower semicontinuity of TV(·), we obtain ]ϕ (uk ). cTV(u) ≤ lim inf TV(uk ) ≤ lim inf TV d k→∞ k→∞ Since the approximating sequence {uk }∞ k=1 was arbitrary, the claim follows. 4. Limiting aspects of the continuous TVϕ model. We now consider the continuous model (1.1) or (1.5). Both are common in works aiming to model real image statistics. We initially restrict our attention to the following energies ϕ. Definition 4.1.We denote by Wc the class of increasing, subadditive, continuous functions ϕ : R0,+ → R0,+ with ϕ∞ = 0. Example 4.1. Examples of ϕ ∈ Wc include in particular ϕ(t) = tq for q ∈ (0, 1), as well as the fractional penalty ϕ(t) = αt/(1 + αt) and the logistic penalty ϕ(t) = log(αt + 1) for α > 0. Definition 4.2.Given an energy ϕ, we start with the C 1 model (1.1), which we now denote by gϕ (u) := TV c Z ϕ(|∇u(x)|) dx, (u ∈ C 1 (Ω)). Ω In order to extend this to u ∈ BV(Ω), we take the weak* lower semicontinuous envelope gϕ (ui ). TVϕ inf TV c (u) := lim d ∗ ui *u, ui ∈C 1 (Ω) In the definition, the convergence is weakly* in BV(Ω). We emphasise that it is crucial to define TVϕ c through this limiting process in order to obtain weak* lower semicontinuity. This is useful to show the existence of solutions to variational problems with the regulariser TVϕ c in BV(Ω) – or a larger space, as there is no guarantee that TVϕ (u) < ∞ would imply u ∈ BV(Ω). c We may say the following about the TVϕ model. c Theorem 4.3. Let ϕ ∈ Wc , and suppose that Ω ⊂ Rm has a Lipschitz boundary. Then TVϕ c (u) = 0 for u ∈ BV(Ω). The result is intuitively obvious when one considers the convex envelope of the energy ϕ, which has to be zero. The rigorous verification of the result will, however, demand some amount of work. We note that this could be done using classical quasi-convexification arguments, see [6, Theorem 11.3.1], using the integral sense of quasi-convexity common in variational analysis, not the simple maximum sense common in optimisation. The rigorous computation of the quasi-convex envelope of x 7→ ϕ(kxk), for all the energies that we 10 M. Hintermüller, T. Valkonen, and T. Wu consider, would appear to be nearly as much work as a more informative direct argument. We therefore provide a self-contained proof, which will also provide new quantitive estimates for other enegies. These will be provided in Proposition 4.6 later in the section. The main ingredient of the proof of Theorem 4.3 is Lemma 4.5, which will utilise the following simple result. Lemma 4.4. Let ϕ ∈ Wc . Then there exist a, b > 0 such that ϕ(t) ≤ a + bt, (t ∈ R0,+ ). Proof. Since ϕ∞ = 0, we can find t0 > 0 such that ϕ(t)/t ≤ 1 for t ≥ t0 . Thus, because ϕ is increasing, we have ϕ(t) ≤ ϕ(t0 ) + t for every t ∈ R0,+ . Lemma 4.5. Let ϕ ∈ Wc , and suppose that Ω ⊂ Rm is bounded with Lipschitz boundary. Then Z ϕ TVc (u) ≤ ϕ(|∇u(x)|) dx, (u ∈ BV(Ω)). (4.1) Ω Observe the difference between Lemma 4.5 and Theorem 3.4. The former shows that in ]ϕ the limit of TV c , the singular part is completely free, whereas the latter shows that in the ϕ ] limit of TV , only the jump part is allowed at all! d Proof. We may assume that Z ϕ(|∇u(x)|) dx < ∞, Ω because otherwise there is nothing to prove. We let u0 ∈ BV(Rm ) denote the zero-extension of u from Ω to Rm . Then Du0 = Du − ν∂Ω u− Hn−1 x∂Ω for u− the interior trace of u on ∂Ω, and ν∂Ω the exterior normal of Ω. In fact [4, Section 3.7] there exists a constant C = C(Ω) such that kν∂Ω u− Hn−1 x∂ΩkM(Rm ;Rm ) ≤ CkukBV(Ω) . We pick some ρ ∈ Cc∞ (Rm ) with 0 ≤ ρ ≤ 1, ρ dx = 1, and supp ρ ⊂ B(0, 1). We then define the family of mollifiers ρ (x) := −n ρ(x/) for > 0, and define by convolution and restriction of domain u := (ρ ∗ u0 )|Ω. R Then u ∈ C ∞ (Ω), and u → u strongly in L1 (Ω) as & 0. As |Du |(ω) ≤ |Du0 |(Ω), it follows ∗ that u * u weakly* in BV(Ω); see, e.g., [4, Proposition 3.13]. Thus ]ϕ TVϕ c (u) ≤ lim inf TVc (u ). &0 In order to obtain the conclusion of the theorem, we just have to calculate the right hand side. Limiting aspects of non-convex TVϕ models 11 We have Z Z ρ (x − y) d|Du0 |(y) ρ (x − y) dDu0 (y) ≤ |∇u (x)| = m m ZR ZR ≤ Rm ρ (x − y)|∇u0 (y)| dy + (4.2) s Rm ρ (x − y) d|D u0 |(y). We approximate the terms for the absolutely continuous and singular parts differently. Starting with the absolutely continuous part, we let K be a compact set such that Ω + B(0, 1) ⊂ K, and define Z g0 (x) := |∇u0 (x)| and g (x) := ρ (x − y)|∇u0 (y)| dy. Rm Then g → g0 in L1 (K), and g |(Rm \ K) = 0 for ∈ (0, 1). By the L1 convergence, we can find a sequence i & 0 such that gi → g0 almost uniformly. Consequently, given δ > 0, we may find a set E ⊂ K with Lm (K \ E) < δ and gi → g0 uniformly on E. We may assume that each i is small enough such that kgi − g0 kL1 (K) ≤ δ. (4.3) Lemma 4.4 provides for some a, b > 0 the estimate ϕ(t) ≤ a + bt. (4.4) From the uniform convergence on E, it follows that for large enough i, we have ϕ(gi (x)) ≤ ϕ(1 + g0 (x)) ≤ v(x) := a + b(1 + g0 (x)), (x ∈ E). Since E ⊂ K is bounded, v ∈ L1 (E). The reverse Fatou inequality on E gives the estimate Z ϕ(gi (x)) dx ≤ lim sup E i→∞ Z E lim sup ϕ(gi (x)) dx ≤ i→∞ Z ϕ(g0 (x)) dx. (4.5) E On K \ E, we obtain the estimate Z K\E ϕ(gi (x)) dx ≤ ≤ Z K\E Z K\E ≤ ϕ(g0 (x)) + ϕ(|gi (x) − g0 (x)|) dx (by subadditivity) ϕ(g0 (x)) dx + aLm (K \ E) + bkgi − g0 kL1 (K) (by (4.4)) Z ϕ(g0 (x)) dx + (a + b)δ. (by (4.3)) K\E (4.6) Combining the estimates (4.5) and (4.6), we have Z lim sup i→∞ Ω ϕ(gi (x)) dx ≤ Z ϕ(g0 (x)) dx + (a + b)δ. K Since δ > 0 was arbitrary, and we may always find an almost uniformly convergent subsequence of any subsequence of {g }>0 , we conclude that Z lim sup &0 Ω ϕ(g (x)) dx ≤ Z Z ϕ(|∇u0 (x)|) dx = K ϕ(|∇u(x)|) dx. Ω (4.7) 12 M. Hintermüller, T. Valkonen, and T. Wu Let us then consider the singular part in (4.2). We observe that 0, for x ∈ Rm \ K. If we define f (x) := −m |Ds u0 |(B(x, )), R Rm ρ (x−y) d|Ds u0 (y)| = (x ∈ K), then by Fubini’s theorem Z f (x) dx = −m Z K Z K K χB(x,) (y) d|Ds u0 |(y) dx = −m Z Z K s K χB(y,) (x) dx d|Ds u0 |(y) ≤ ωm |D u0 |(K). (4.8) Here ωm is the volume of the unit ball in Rm . Moreover, by the Besicovitch derivation theorem (discussed, for example, in [4, 32]), we have (Lm -a.e. x ∈ K). lim f (x) = 0, &0 Because Lm (K) < ∞, Egorov’s theorem shows that f → 0 almost uniformly. Thus, for any δ > 0, there exists a set Kδ ⊂ K with Lm (K \ Kδ ) ≤ δ and f → 0 uniformly on Kδ . Next we study K \ Kδ . We pick an arbitrary σ > 0. Because ϕ(t)/t → 0 as t → ∞, there exists t0 > 0 such that ϕ(t) ≤ σt for t ≥ t0 . In fact, because ϕ is lower semicontinuous and ϕ(0) = 0, if we choose t0 := inf{t ≥ 0 | ϕ(t) < σt}, then ϕ(t0 ) = σt0 . Thus, because ϕ is increasing (t ∈ R0,+ ). e ϕ(t) ≤ ϕ(t) := σ(t0 + t), (4.9) Choosing ∈ (0, 1) such that f ≤ δ on Kδ , and using ρ ≤ −m χB(0,) , we may approximate Z Z ρ (x − y) d|D u0 |(y) ϕ Rm s Rm dx ≤ Z ϕ (f (x)) dx m ≤ ZR Z ϕ (f (x)) dx + K\Kδ Kδ ≤ Z ϕe (f (x)) dx Z ϕ(δ) dx + Kδ m σ (t0 + f (x)) dx K\Kδ ≤ L (K)ϕ(δ) + δσt0 + σωm |Ds u0 |(K) (by (4.8)). (4.10) Thus Z Z ϕ lim inf &0 Rm Rm s ρ (x − y) d|D u0 |(y) dx ≤ Lm (K)ϕ(δ) + δσt0 + σωm |Ds u0 |(K). Observe that the choices of σ and t0 are independent of δ. Therefore, because δ > 0 was arbitrary, using the continuity of ϕ we deduce that we may set δ = 0 above. But then, because σ > 0 was also arbitrary, we deduce Z Z lim &0 Rm ϕ Rm s ρ (x − y) d|D u0 |(y) dx = 0. (4.11) Limiting aspects of non-convex TVϕ models 13 Finally, combining the estimate (4.7) for the absolutely continuous part and the estimate (4.11) for the singular part with (4.2), we deduce that ]ϕ lim sup TV c (u ) = lim sup &0 &0 Z Z ϕ Rm Rm ρ (x − y) d|Du0 |(y) dx ≤ Z ϕ(|∇u(x)|) dx. Ω This concludes the proof of (4.1). Proof of Theorem 4.3. We employ the bound (4.1) of Lemma 4.5, but still have to extend it to a possibly unbounded domain Ω. For this purpose, we let R > 0 be arbitrary, and apply the lemma to uR := u|B(0, R). Then TVϕ c (uR ) ≤ Z ϕ(|∇uR (x)|) dx ≤ Ω Z ϕ(|∇u(x)|) dx. Ω ∗ But uR * u weakly* in BV(Ω) as R % ∞; indeed L1 convergence is obvious, and for any ∞ ϕ ∈ Cc (Ω; Rm ), we have supp ϕ ∈ B(0, R) for large enough R, so that DuR (ϕ) = Du(ϕ). Therefore, because TVϕ c is weakly* lower semicontinuous by construction, we conclude that TVϕ c (u) ≤ Z ϕ(|∇u(x)|) dx. (4.12) Ω Given any u ∈ C 1 (Ω), we may find uh ∈ pwc(Ω), (h > 0), strictly convergent to u in BV(Ω) [12]. But (4.12) shows that TVϕ c (uh ) = 0. By the weak* lower semicontinuity of TVϕ c we conclude ϕ TVϕ c (u) ≤ lim inf TVc (uh ) = 0, h&0 (u ∈ C 1 (Ω)). Another referral to lower-semicontinuity now shows that TVϕ c (u) = 0 for any u ∈ BV(Ω). Similarly to Proposition 3.6 for TVϕ d , we have the following more positive result. 0,+ 0,+ Proposition 4.6. Let ϕ : R →R be lower semicontinuous and satisfy ϕ(0) = 0. Then the following hold. (i) If ϕ0 < ∞ and ϕ is subadditive, then there exists a constant C > 0 such that TVϕ c (u) ≤ C TV(u), (u ∈ BV(Ω)). (ii) If ϕ0 , ϕ∞ > 0 and ϕ is increasing, then there exists also a constant c > 0 such that c TV(u) ≤ TVϕ c (u), (u ∈ BV(Ω)). Remark 4.1. If we assume that ϕ is concave, the condition ϕ0 > 0 in (ii) follows from the other assumptions. 14 M. Hintermüller, T. Valkonen, and T. Wu Proof. The proof of the upper bound follows exactly as the upper bound in Proposition 3.6, just replacing approximation in pwc(Ω) by C 1 (Ω). For the lower bound, first of all, we observe that there exists t∞ > 0 such that ϕ(t) ≥ ∞ (ϕ /2)t, (t ≥ t∞ ). Secondly, there exists t0 > 0 such that ϕ(t) ≥ (ϕ0 /2)t, (0 ≤ t ≤ t0 ). Since ϕ is increasing, ϕ(t) ≥ ϕ(t0 ) ≥ tϕ(t0 )/t0 , (t0 ≤ t ≤ t∞ ). Consequently ϕ(t) ≥ ct, for c := min{ϕ∞ /2, ϕ(t0 )/t∞ , ϕ0 /2}. (t ≥ 0), Therefore 0 ]ϕ cTV(u0 ) ≤ TV c (u ), (u0 ∈ C 1 (Ω)). The claim now follows from the weak* lower semicontinuity of TV as in the proof of Proposition 3.6. In fact, in most of the interesting cases we may prove a slightly stronger result. Theorem 4.7. Let ϕ : R0,+ → R0,+ be concave with ϕ(0) = 0 and 0 < ϕ∞ < ∞. Suppose that Ω ⊂ Rm has a Lipschitz boundary. Then ∞ TVϕ c (u) = ϕ TV(u). (u ∈ BV(Ω)). (4.13) Proof. We first suppose that Ω is bounded. The proof of the upper bound TVϕ c (u) ≤ Z ϕ(|∇u(x)|) dx + ϕ∞ |Ds u|(Ω), (4.14) Ω is then a modification of Lemma 4.5. The estimate (4.7) for the absolutely continuous part follows as before. For the singular part, we observe that (4.9) holds for any σ > ϕ∞ . Therefore, proceeding as before, we obtain in place of (4.11) the estimate Z Z lim &0 Rm ϕ Rm s ρ (x − y) d|D u0 |(y) dx ≤ σ|Ds u|(Ω). (4.15) Letting σ & ϕ∞ and combining (4.7) with (4.15) we get (4.14). As in Theorem 4.6, we may extend this bound to a possibly unbounded Ω. If u ∈ C 1 (Ω), we may again approximate u strictly in BV(Ω) by piecewise constant ϕ functions {ui }∞ i=1 . By the lower semicontinuity of TVc and (4.14), we then have ∞ s ∞ TVϕ c (u) ≤ lim inf ϕ |D u|(Ω) = ϕ |Du|(Ω). i→∞ (4.16) Finally, we observe that by concavity ϕ(t) ≥ ϕ∞ t. ∞ 1 ]ϕ Thus TV c (u) ≥ ϕ |Du|(Ω). We immediately obtain (4.13) for u ∈ C (Ω). By strictly convergent approximation, we then extend the result to u ∈ BV(Ω). Limiting aspects of non-convex TVϕ models 15 5. Discussion. Theorem 4.3 and Theorem 4.7 show that we cannot hope to have a simple weakly* lower semicontinuous non-convex total variation model as a prior for image gradient distributions. In fact, it follows from [9], see also [4, Section 5.1] and [22, Theorem 5.14], that lower semicontinuity of the continuous TVϕ c model is only possible for convex ϕ. The problem ∞ 0 is: if ϕ is less than ϕ (t), then image edges are always cheaper than smooth transitions. If ϕ∞ = 0, they are so cheap that we get a zero functional at the limit for a general class of functions. If ϕ∞ > 0 and ϕ is concave, then we get a factor of TV as result. If ϕ is not concave, we still have the upper bound (4.16); it may however be possible that some gradients are cheaper than jumps. This would in particular be the case with Huber regularisation of ϕ(t) = t. More about the jump set of solutions to Huber-regularised as well as non-convex total variation models may be read in [47]. In fact, in [27] Huber regularisation was used with ϕ(t) = tq for q ∈ (0, 1) for algorithmic reasons. For small γ > 0, this is defined as q tq − 2−q 2 γ , t > γ, q q−2 2 t , t ∈ [0, γ]. 2γ ( e ϕ(t) := (5.1) e Then ϕ(t) ≤ ϕ(t), so that e ϕ TVϕ c ≤ TVc = 0. The asymptotic behaviour of the regulariser at infinity is the crucial feature here, and it also cannot be altered without changing the edge behaviour of the regulariser. Huber-regularisation does not alter the asymptotic behaviour, and as long as alternative smoothing strategies, such as those considered in [15], do not, they also provide no change in the results. In contrast to the continuous TVϕ c model, according to Theorem 3.4, the discrete model works correctly for ϕ(t) = tq and generally ϕ ∈ Wd , if the desire is to force piecewise constant solutions to (1.3). As we saw in the comments preceding Proposition 3.6, it however does not force piecewise constant solutions for some of the energies ϕ typically employed in this context. Generally, what causes piecewise constant solutions is the property ϕ0 = ∞. If one does not desire piecewise constant solutions, one can therefore use Huber regularisation or linearise ϕ for t < δ. The latter employs ( e ϕ(t) = ϕ(t) − ϕ(δ) + ϕ0 (δ)δ, t > δ, ϕ0 (δ)t, t ≤ δ. Then ϕ(t) ≤ Ct for some C > 0, so that TVϕ d (u) < ∞ for every u ∈ BV(Ω). We also note that although this approach defines a regularisation functional on all of BV(Ω), it cannot be used for modelling the distribution of gradients in real images, the purpose of the TVϕ c model. In ϕ fact, as in the TVd model we cannot control the penalisation of ∇u beyond a constant factor. In summary, the TVϕ d model works as intended for ϕ ∈ Wd – it enforces piecewise constant model however is not theoretically sound in function spaces. We will solutions. The TVϕ c therefore next seek ways to fix it. 6. Multiscale regularisation and area-strict convergence. The problem with the TVϕ c model is that weak* lower semicontinuity is too strong a requirement. We need a weaker type 16 M. Hintermüller, T. Valkonen, and T. Wu of lower semicontinuity, or, in other words, a stronger type of convergence. Norm convergence in BV is too strong; it would not be possible at all to approximate edges. Strict convergence is also still too weak, as can be seen from the proof of Lemma 4.5. Strong convergence in L2 , which we could in fact obtain from strict convergence for Ω ⊂ R2 (see [31, 39]), is also not enough, as a stronger form of gradient convergence is the important part. A suitable mode of convergence is the so-called area-strict convergence [18, 30]. For our purposes, the following definition is the most appropriate one. Definition 6.1.Suppose Ω ⊂ Rn with n ≥ 2. The sequence {ui }∞ i=1 ⊂ BV(Ω) converges to i (x) := (x/|x|, ui (x)) converges strictly u ∈ BV(Ω) area-strictly if the sequence {U i }∞ with U i=1 in BV(Ω; Rn+1 ) to U (x) := (x/|x|, u(x)). i 1 i ∗ In other words, {ui }∞ i=1 converges to u area-strictly if u → u strongly in L (Ω), Du * Du n i weakly* in M(Ω; R ), and A(u ) → A(u) for the area functional A(u) := Z q 1 + |∇u(x)|2 dx + |Ds u|(Ω). Ω It can be shown that area-strict convergence is stronger than strict convergence, but weaker than norm convergence. Here we recall from §2 the definition of strict convergence and of the singular part Ds u of Du. In order to state a continuity result with respect to area-strict convergence, we need a few definitions. Specifically, we denote the Sobolev conjugate ( 1∗ := n/(n − 1), ∞, n > 1, n = 1, and define ( uθ (x) := θu+ (x) + (1 − θ)u− (x), x ∈ Ju , e(x), u x 6∈ Su . In [39], see also [30], the following result is proved. Theorem 6.2. Let Ω be a bounded domain with Lipschitz boundary, p ∈ [1, 1∗ ] if n ≥ 2 and p ∈ [1, 1∗ ) if n = 1. Let f ∈ C(Ω × R × Rn ) satisfy |f (x, y, A)| ≤ C(1 + |y|p + |A|), ((x, y, A) ∈ Ω × R × Rn ), and assume the existence of f ∞ ∈ C(Ω × R × Rn ), defined by f ∞ (x, y, A) := lim 0 x →x y 0 →y A0 →A t→∞ f (x0 , y 0 , tA0 ) . t Then the functional F(u) := Z f (x, u(x), ∇u(x)) dx + Ω is area-strictly continuous on BV(Ω). Z Z 1 Ω 0 f ∞ (x, uθ (x), dDs u (x)) d|Ds u|(x) d|Ds u| (6.1) Limiting aspects of non-convex TVϕ models 17 We now apply this mode of convergence to non-convex total variation, restricting our attention to the following class of functions. Definition 6.3.We denote by Was the set of functions ϕ ∈ C(R0,+ ) such that ϕ∞ exists, and for some c, C > 0 and b ≥ 0 the following estimates hold true: ct − b ≤ ϕ(t) ≤ C(1 + t), (t ∈ R0,+ ). (6.2) Example 6.1. Let ϕ be any of the functions in Example 4.1. They do not satisfy the lower bound in (6.2). If, however, we pick some cut-off M > 0, then ϕM ∈ Was for the high-value linearisation ( ϕ(t) t ≤ M, ϕM (t) := 0 ϕ(M ) + ϕ (M )(t − M ), t > M. Corollary 6.4. Suppose ϕ ∈ Was . Then the functional TVϕ as (u) := Z ϕ(|∇u(x)|) dx + ϕ∞ |Ds u|(Ω), (u ∈ BV(Ω)), Ω is area-strictly continuous on BV(Ω). Proof. Letting f (x, y, A) := ϕ(A), we verify (6.1). The claim is therefore immediate from Theorem 6.2 with p = 1. Note that we do not need the lower bound in the definition of Was just yet. But how could we obtain area-strict convergence of an infimising sequence of a variational problem? In [44, 45] the following multiscale analysis functional η was introduced for scalarvalued measures µ ∈ M(Ω). Given η0 > 0 and {ρ }>0 , a family of mollifiers satisfying the semigroup property ρ+δ = ρ ∗ ρδ , η can be defined as η(µ) := η0 ∞ Z X n `=1 R (|µ| ∗ ρ2−i )(x) − |µ ∗ ρ2−i |(x) dx, (µ ∈ M(Ω)). i ∗ If the sequence of measures {µi }∞ i=1 ⊂ M(Ω) satisfies supi η(µ) < ∞ and µ * µ weakly* in M(Ω), then we have |µi |(Ω) → |µ|(Ω). In essence, the functional η penalises the type of complexity of measures such as two approaching δ-spikes of different sign, which prohibits strict convergence. In Appendix A, we extend the strict convergence results of [44, 45] to vector-valued µ ∈ M(Ω; RN ), in particular the case µ = DU for U the lifting of u as discussed above. In order to bound in BV(Ω) an infimising sequence of problems using TVϕ as as a regulariser, we require slightly stricter assumptions on ϕ. These can usually, and particularly in the interesting case ϕ(t) = tq , be easily satisfied by linearising ϕ above a cut-off point M with respect to the function value. This will force ϕ∞ > 0, which is not required for continuity with respect to area-strict convergence in its own right. We will later see that such a cut-off can be justified by real gradient distributions and also argued in numerical experiments. Now we may prove the following result, which shows that area-strict convergence and the multiscale analysis functional η provide a remedy for the theoretical difficulties associated with the TVϕ c model. 18 M. Hintermüller, T. Valkonen, and T. Wu Theorem 6.5. Suppose Ω ⊂ Rn is bounded with Lipschitz boundary, and ϕ ∈ Was . Define U (x) := (1, u(x)). Then the functional F (u) := TVϕ as (u) + η(DU ) 1 is weak* lower semicontinuous on BV(Ω), and any sequence {ui }∞ i=1 ⊂ L (Ω) with sup F (ui ) < ∞ i admits an area-strictly convergent subsequence. i ∞ Proof. Suppose {ui }∞ i=1 converges weakly* to u ∈ BV(Ω). Then it follows that {U }i=1 conm+1 i verge weakly* to U ∈ BV(Ω; R ). If lim inf i→∞ η(DU ) = ∞, we clearly have lower semicontinuity of F . By switching to an unrelabelled subsequence, we may therefore assume that supi η(DU i ) < ∞. It follows from Theorem A.4 in the Appendix that |DU i |(Ω) → |DU |(Ω). In other words, {ui }∞ i=1 converges area-strictly to u. Applying Corollary 6.4 and the weak* lower semicontinuity of η, we now see that F (u) ≤ lim inf F (ui ). i→∞ Thus weak* lower semicontinuity holds true. i 1 Next suppose that {ui }∞ i=1 ⊂ L (Ω) with supi F (u ) < ∞. Since ct − b ≤ ϕ(t) and Ω is bounded, it follows that supi TV(ui ) < ∞. The sequence therefore admits a subsequence, unrelabelled without loss of generality, which converges weakly* to some u ∈ BV(Ω). Hence, the fact that {ui }∞ i=1 admits an area-strictly convergent subsequence now follows as in the previous paragraph. We immediately deduce the following corollary. Corollary 6.6. Suppose Ω ⊂ Rn is bounded with Lipschitz boundary, ϕ ∈ Was , J : BV(Ω) → R is convex, proper, and weakly* lower semicontinuous, and J satisfies for some C > 0 the coercivity condition J(u) ≥ C(kukL1 (Ω) − 1). Then the functional G(u) := J(u) + αTVϕ as (u) + η(DU ), (u ∈ BV(Ω)), admits a minimiser u ∈ BV(Ω). Remark 6.1. We can, for example, take J(u) = 21 kz − uk2L2 (Ω) . Observe that ∞ η(DU ) = η0 X η` (DU ), `=1 where, for ` > 0, Z η` (DU ) := Rn (ρ` ∗ |DU |)(x) − |ρ` ∗ DU |(x) dx = |Ds u|(Ω) + Z Rn q 1 + |∇u(x)|2 − q 1 + |(ρ` ∗ Du)(x)|2 dx. (6.3) Limiting aspects of non-convex TVϕ models 19 In particular, if u ∈ W 1,1 (Ω), then we obtain with ∇ u := ρ ∗ ∇u the expression Z η` (DU ) = q q 1 + |∇u(x)|2 − Rn and the estimate η` (DU ) ≤ Z Rn 1 + |∇` u(x)|2 dx q |∇u(x)|2 − |∇ u(x)|2 dx. ` The following proposition shows that, in infimising sequences, we may ignore terms from η. This justifies the associated numerical approximation. Proposition 6.7. Suppose Ω ⊂ Rn is bounded with Lipschitz boundary, ϕ ∈ Was , and that J : BV(Ω) → R is as in Corollary 6.6. Let Ki ∈ N+ and i > 0, i = 1, 2, 3, . . ., satisfy lim Ki = ∞, i→∞ lim i = 0. and i→∞ Suppose further that {ui }∞ i=1 ⊂ BV(Ω) satisfies i J(ui ) + αTVϕ as (u ) + Ki X η` (DU i ) ≤ `=1 inf u∈BV(Ω) G(u) + i , (i = 1, 2, 3, . . .). i Then we can find û ∈ BV(Ω) and a subsequence of {ui }∞ i=1 , unrelabelled, such that u → û area-strictly, and û minimises G. Proof. Let L := inf u∈BV(Ω) G(u). Since there is nothing to prove if L = ∞, we may assume L < ∞. Then we have i cTV(ui ) − bLn (Ω) ≤ TVϕ as (u ). This yields J(ui ) + αcTV(ui ) ≤ αbLn (Ω) + L + i . ∗ It follows for a subsequence, unrelabelled, that ui * û weakly* for some û ∈ BV(Ω). By the weak* lower semicontinuity of η` , see Theorem A.4, we then have Kj X `=1 η` (DÛ ) ≤ lim inf i→∞ Kj X η(DU i ) ≤ L, (j = 1, 2, 3, . . .). `=1 It follows that η(DÛ ) = ∞ X η` (DÛ ) ≤ L. `=1 Using Lemma A.3 with µi = DU i and µ = DU , we see that ui → û area-strictly, and that u 7→ J(u) + αTVϕ as (u) + Kj X η` (DU ) `=1 is area-strictly lower semicontinuous for for each fixed j = 1, 2, 3, . . .. This shows that G(û) ≤ L. As a consequence, û minimises G. 20 M. Hintermüller, T. Valkonen, and T. Wu 7. Remarks on alternative remedies. We now discuss two alternative approaches to make the TVϕ c model work in the limit. These are based on compactifying the differential operator and on working in SBV(Ω), respectively. As we only intend to demonstrate alternative possibilities, we stay brief here. Hence the proofs have been placed in the appendix. Remark 7.1. (Compact operators) Area-strict convergence is not the only possibility to ϕ make the TVϕ c model function; another way to understand the problems with the basic TVc model is that the operator ∇ is not compact. One way to obtain a compact operator is by convolution. This is the contents of the following result. Theorem 7.1. Let {ρ }>0 be a family of mollifiers, Ω ⊂ Rn open, and ϕ : R0,+ → R0,+ increasing, subadditive and continuous with ϕ(0) = 0. Fix > 0, and define D : L1 (Ω) → L1 (Rn ; Rn ) by D u := ρ ∗ Du. Then TVϕ, c (u) := Z Rn ϕ(|D u(x)|) dx, (u ∈ BV(Ω)), is lower semicontinuous with respect to weak* convergence in BV(Ω). Moreover gϕ lim TVϕ, c (u) = TVc (u), &0 (u ∈ C 1 (Ω)). (7.1) We relegate the proof of this theorem to Appendix B. It should be noted that any u ∈ L1 (Ω) satisfies TVϕ, c (u) < ∞. In particular 1 i sup kz − ui k2L2 (Ω) + αTVϕ, c (u ) < ∞ 2 i does not guarantee weak* convergence of a subsequence. For that, an additional TV(ui ) term (with small factor) is required in a TVϕ, c based variational model in image processing. Remark 7.2. (The space SBV(Ω) and η) If we apply the η functional of [44,45] to a bounded sequence of functions g i ∈ L1 (Ω; Rm ), then we get strict convergence in this space. It remains to find whether we get convergence. Then we could regularise ∇u this way, and, working in the space SBV(Ω), penalise the jump part separately. It turns out that this is possible if we state the modification η̄ of η in Lp (Rn ; Rm ) for p ∈ (1, ∞). Then strict convergence is equivalent to strong convergence. With ` & 0, η0 > 0, and p ∈ (1, ∞), we define η̄(g) := η0 ∞ X η̄` (g), η̄` (g) := kgkLp (Rn ;Rm ) − kg ∗ ρ` kLp (Rn ;Rm ) , (g ∈ Lp (Ω; Rm )). (7.2) `=1 Then we have the following result, whose proof is relegated to Appendix C. Theorem 7.2. Let Ω ⊂ Rm be open and bounded. Suppose ψ ∈ Wd , and ϕ : R0,+ → R0,+ is lower semicontinuous and increasing with ϕ∞ = ∞ and kgkLp (Ω;Rm ) ≤ C 1 + Z ϕ(|g(x)|) dx , (g ∈ Lp (Ω; Rm )), (7.3) Limiting aspects of non-convex TVϕ models 21 for some C > 0, where p ∈ (1, ∞) is as in (7.2). Let Z Z F (u) := ϕ(|∇u(x)|) dx + η̄(∇u) + Rn ψ(θu (x)) dHm−1 (x). Ju Then F is lower semicontinuous with respect to weak* convergence in BV(Ω). In fact, any i sequence {ui }∞ i=1 with supi F (u ) < ∞ admits a subsequence convergent weakly* and in the sense (3.1)–(3.3). 8. Image statistics and the jump part. Our studies in the proceding sections have pointed us to the following question: Are the statistics of [29] valid when we split the image into smooth and jump parts? What are the statistics for jump heights, and does splitting the gradient into these two parts alter the distribution for the absolutely continuous part? When calculating statistics from discrete images, we do not have the excuse that the jumps would be negligible, i.e. Lm (Ju ) = 0! In order to gain some insight, here we did a few experiments with real photographs, displayed in Figures 8.1–8.3. These three photographs represent images with different types of statistics. The pier photo of Figure 8.1 is very simple, with large smooth areas and some fine structures. The parrot test image in Figure 8.2 has a good balance of features. The summer lake scene in Figure 8.3 is somewhat more complex, with plenty of fine features. We split the pixels of each image into edge and smooth parts by a simple threshold on the norm k∇u(k)k of the discrete gradient at each pixel k. Then we find optimal α and q ∈ (0, 2) for the distribution Pt (t) := Ct exp(−αϕ(t)), to match the experimental distribution. This in turn gives rise to the prior Pu (u) = Cu exp −α Z ϕ(|∇u(x)|) dx . Ω Both Ct , Cu > 0 are normalising factors. In practise we do the fitting of Pt to the experimental distribution by a simple least squares fit on the logarithms of the distributions. We will comment on the suitability of this approach later on in this section. In the least squares fit we keep C as a free (unnormalised) parameter, and recalculate it after the fit. Observe that the normalisation constant does not affect the denoising problem (here in the finite-dimensional setting) ! σ2 2 ϕ ]c (u) . max Pu|f (u|f )Pu (u) ∝ max exp − kz − uk2 − αTV u f 2 Here we have the Gaussian noise distribution ! σ2 Pf |u (f |u) = C exp − kz − uk22 , 2 0 for σ the noise level. This gives the statistical interpretation of the denoising model, that of a maximum a posteriori (MAP) estimate. Finally, in the matter of statistics, we note that the TVϕ prior only attempts to correctly model gradient statistics; the modelling of histogram statistics with Wasserstein distances 22 M. Hintermüller, T. Valkonen, and T. Wu was recently studied in [38, 42] together with the conventional TV gradient prior. It is also worth remarking that our approach of improving the prior based on the statistics of the ground-truth is different from recent approaches that optimise the prior based on the denoising result [8, 11, 17, 25, 26]. These approaches can provide improved results in practise, but no longer have the simple MAP interpretation. It is definitely possible to optimise the parameters of the TVϕ model in this manner, but outside the scope of the present already long manuscript. (a) Pier photo 0 (b) Detected edge pixels (red) 0 log−probability Fit; q=0.50 −2 0 Smooth part Edge part Smooth fit; q=0.36 Edge fit; q=1.44 −2 log−probability Fit (man); q=0.32, M=30, α =0.059 Fit (emp); q=0.40, M=15, α∞=0.021 −4 −4 −4 −6 −6 −6 −8 −8 −8 −10 −10 −10 −12 −12 −12 −14 0 50 −14 150 0 100 q 50 100 q 150 (c) Log-histogram and t model (d) Separate t model for edge part ∞ −2 −14 0 Fit (opt); q=0.42, M=69, α∞=0.050 50 100 150 (e) Linearised models Figure 8.1: Pier photo: discrete gradient histogram and least squares models. The image intensity in (a) is in the range [0, 255], and we have chosen pixels k with |∇u(k)| ≥ 30 as edges (b). The log-histogram of |∇u(k)| with optimal fit of t 7→ tq is displayed in (c). This is done separately for the edge pixels in (d). The linearised model is fitted in (e) for the cut-off point M = 30 (manual edge detection), M = 69 (optimal least squares fit). We moreover show the empirically best linearised model. Our experiments confirm the findings of [29] that some q ∈ (0, 1) is generally a good fit for the entire distribution, as well as for the smooth part. However, optimal q for the edge part varies. In Figure 8.1 actually q = 1.44 – larger than one! We have to admit that the number of edge pixels in this image is quite small, so statistically the result may be considered unreliable. In Figure 8.3 with a significant proportion of edge pixels, we still have q = 1.05. Limiting aspects of non-convex TVϕ models 23 (a) Parrot photo 0 (b) Detected edge pixels (red) 0 log−probability Fit; q=0.46 −2 0 Smooth part Edge part Smooth fit; q=0.68 Edge fit; q=1.18 −2 log−probability Fit (man); q=0.22, M=20, α =0.046 Fit (emp); q=0.50, M=40, α∞=0.025 −4 −4 −4 −6 −6 −6 −8 −8 −8 −10 −10 −10 −12 −12 −12 −14 0 50 100 150 200 q −14 250 0 50 100 150 q 200 250 (c) Log-histogram and t model (d) Separate t model for edge part ∞ −2 −14 0 Fit (opt); q=0.40, M=73, α∞=0.041 50 100 150 200 250 (e) Linearised models Figure 8.2: Parrot photo: discrete gradient histogram and least squares models. The image intensity in (a) is in the range [0, 255], and we have chosen pixels k with |∇u(k)| ≥ 20 as edges (b). The log-histogram of |∇u(k)| with optimal fit of t 7→ tq is displayed in (c). This is done separately for the edge pixels in (d). The linearised model is fitted in (e) for the cut-off point M = 20 (manual edge detection), M = 73 (optimal least squares fit). We moreover show the empirically best linearised model. These findings also suggest that on average fitting a single q ∈ (0, 1) to the entire statistic (not split into edge and smooth parts) may be right, but there is significant variation between images in the shape of the distribution for the edge part. The smooth part generally looks roughly similar among our test images. In order to suggest an improved model for image gradient statistics, in each of Figures 8.1(e)–8.3(e), we also fit to the statistics of the linearised distribution Pt (t) = C exp(−αϕ(t)) for ( tq , 0 ≤ t ≤ M, ϕ(t) := (8.1) q q−1 (1 − q)M + qM t, t > M. This is again done by a least squares fit on the logarithm of the distribution. For the ‘Fit 24 M. Hintermüller, T. Valkonen, and T. Wu (man)’ curve, we fix the cut-off point M to a hand-picked (manual) edge threshold and optimise (q, α). We also optimise over all of the parameters (q, α, M ). This is the ‘Fit (opt)’ curve. We note that the asymptotic α, which we define as α∞ := lim αϕ(t)/t = qM q−1 , t→∞ is roughly the same for both of the choices, and generally the curves are close to each other. As α∞ describes the behaviour of the model on edges, and for total variation denoising α∞ = α, we find α∞ to be a parameter that should indeed stay roughly constant between models with different q and M . It, however, turns out that α∞ as obtained by the simple least squares histogram fit is in practise bad; it gives far too high regularisation, i.e., a too narrow distribution. The problem is that the simple least squares fit on the logarithm over-emphasises the tail of the distribution, on which we moreover have very little statistics due to the discrete nature of the data. This yields far too high α∞ , i.e., the slope of the linear part is too steep in the figures. Developing a reliable way to obtain the model from the data is outside the scope of the present paper, although it is definitely an interesting subject for future studies. This is why we have also included the curve ‘Fit (emp)’, which is based on an empirical choice of (α∞ , M, q) from our numerical experiments in the following §9. There we keep α∞ fixed as we vary M and q. We will also incorporate the noise level σ 2 into α. It turns out that for the empirically good distribution, q is close to the values found by histogram fitting above, but α∞ is very different. 9. Numerical reconstructions. Next we provide a numerical solver for the following TVϕ c model, possibly including the η-terms. We note that our solver is only one option, and not the focus of the present paper, which is in modelling and analysis. We therefore do not provide an extensive analysis and comparison to alternative approaches of the solver itself. Compared to first-order splitting approaches, recently analysed extensively using the Kurdyka-Łojasiewicz property [5, 35], it can however be said that our solver can be proved to have theoretically faster local superlinear convergence [27, 28]. In the finite-dimensional setting, we aim to solve min f (u) := u X k∈Ωh 1 αϕ(|∇u(k)|) + |z(k) − u(k)|2 2 + η0 N q X 1+ |∇u(k)|2 − q 1 + |∇l u(k)|2 ! . (9.1) l=1 Here ϕ is given by (8.1), and α, η0 are manually chosen to balance the weights of the respective p terms. For an image u of resolution n1 -by-n2 , we set h := 1/n1 n2 , Ωd := [0, 1]2 ∩ (hZ2 ), and discretize the gradient by forward differences, i.e. ∇u(k) := ((u(k + e1 ) − u(k))/h, (u(k + e2 ) − u(k))/h) , for all k ∈ Ωd , with homogeneous Dirichlet boundary condition. Each ∇l u := ρl ∗ ∇u is defined through the convolution with a prescribed smoothing kernel ρl (l > 0). Here ρl is specified as a two-dimensional Gaussian distribution of standard deviation l centered at the origin. Limiting aspects of non-convex TVϕ models 25 To cope with the kink of the non-smooth ϕ term at zero, we introduce a Huber-type local smoothing [27, 28] by replacing ϕ in (9.1) with a continuously differentiable function ϕγ with locally Lipschitz derivative. More specifically, let 0 < γ M be the smoothing parameter and ϕγ : [0, ∞) → [0, ∞) be defined by ϕ(t) − (ϕ(γ) − 1 ϕ0 (γ)γ) 2 ϕγ (t) = ϕ0 (γ) 2 2 t γ if t ≥ γ, if 0 ≤ t < γ. Thus, the resulting Huberized TVϕγ model appears as min fγ (u) := u X k∈Ωd 1 αϕγ (|∇u(k)|) + |z(k) − u(k)|2 + 2 η0 N q X 1 + |∇u(k)|2 − q 1 + |∇l u(k)|2 ! . (9.2) l=1 For this problem, the first-order optimality condition reads: α∇> p + u − z + η0 ψ(max(|∇u|, γ))p = ∇u, PN l=1 ∇> p ∇u ∇l u = 0, − ∇> l q 2 1 + |∇u| 1 + |∇l u|2 (9.3) 2 where p ∈ R|Ωd | is an auxiliary variable and ψ : (0, ∞) → (0, ∞) is defined by ψ(t) := t/ϕ0 (t). Note that ψ is locally Lipschitz and monotonically increasing, and in the following we shall denote by ∂ψ a subdifferential of ψ. We remark that the consistency of the Huberized stationary points induced by (9.3) towards the stationary points of the original model (9.1) was investigated in the previous work [27, 28]. Moreover, the system (9.3) is not differentiable in the classical sense. Therefore, in the following we present a generalized Newton-type solver for computing a stationary point satisfying (9.3). Given the current iterate (ui , pi ), our solver relies on the following regularized linear system arising from differentiating (9.3) (and further straightforward manipulations; see [27, 28]): (H i + β i Ri )δui = −g i , 26 M. Hintermüller, T. Valkonen, and T. Wu with mi := max(|∇u|, γ), ( i χ (k) := 1 if |∇u(k)| ≥ γ, 0 if |∇u(k)| < γ, ! 1 χi ∂ψ(mi ) i i > i i > H := I + α∇ (p )(∇u ) + (∇u )(p ) ∇ I − ψ(mi ) 2mi > i + η0 N X ∇ >q l=1 −∇> l q 1 + |∇ui |2 1 I− 1 + |∇ui |2 Ri := εI + α∇> + η0 1 N X l=1 (∇ui )(∇ui )> I− 1 + |∇ui |2 (∇ui )(∇ui )> ! ! ∇ ∇l , 1 + |∇ui |2 χi ∂ψ(mi ) i i > i i > ∇ (p )(∇u ) + (∇u )(p ) 2ψ(mi )mi 1 ∇> l q 1 + |∇ui |2 g i := ∇fγ (ui ) = α∇> ∇ui + η0 ψ(mi ) N X l=1 (∇ui )(∇ui )> I− 1 + |∇ui |2 ! ∇ l , + ui − z (9.4) ∇ui ∇> q 1 + |∇ui |2 ∇l ui . − ∇> l q 1 + |∇l ui |2 Here H i represents a (modified) generalized Hessian matrix of fγ at ui , while Ri , with an arbitrarily fixed 0 < ε α, serves as a structural Hessian regularization. Note that H i may not be positive definite at the iterate ui . For this reason, the regularization weight β i is automatically tuned by a trust-region based mechanism; see steps 8–20. Further, whenever β i = 1, the regularized Hessian, i.e. H i +β i Ri , is positive definite. Consequently, our β i -update scheme guarantees δui to be a descent direction for fγ at ui , and thus the overall iterative scheme can be globalised by, e.g., the Wolfe-Powell line search [19]; see step 21 in Algorithm 1. Moreover, following the algorithm development in [27,28], one can show that β i asymptotically vanishes as ui approaches a stationary point where a certain type of second-order sufficient optimality condition is satisfied. Thus, local superlinear convergence can be attained. The overall algorithm is detailed in Algorithm 1 below. The following parameters associated with Algorithm 1 are specified throughout our experiments: c = 1, ρ = 0.25, ρ̄ = 0.75, κ = 0.5, κ̄ = 2, d = 10−10 , τ1 = 0.1, τ2 = 0.9, γ = 0.001, r = 0.01. Algorithm 1 is terminated once k∇fγ (ui )k/k∇fγ (u0 )k drops below 10−7 . We report in Figures 9.1–9.3 and Table 9.1 the results of denoising our three test images using this algorithm with rather high artificial noise levels. We have added Gaussian noise of standard deviation σ = 30 to all test images. We report both the conventional peak-signalto-noise ratio (PSNR) as well as the structural similarity measure (SSIM) of [48]. The latter better quantifies the visual quality of images by essentially computing the PSNR in local Limiting aspects of non-convex TVϕ models 27 Algorithm 1 Superlinearly convergent Newton-type method for TVϕγ denoising Require: c > 0, 0 < ρ ≤ ρ̄ < 1, 0 < κ < 1 < κ̄, d > 0, 0 < τ1 < 1/2, τ1 < τ2 < 1, 0 < γ 1, 0 < r < 1. 1: Initialize the iterate (u0 , p0 ), the regularization weight β 0 ≥ 0, and the trust-region radius r0 > 0. Set i := 0. 2: repeat 3: Generate H i , Ri , and g i at the current iterate (ui , pi ). 4: Solve the linear system (H i + β i Ri )δui = −g i (inexactly) for δui by the conjugate gradient (CG) method up to the residual tolerance r , or detect the non-positive definiteness of H i + β i Ri during the CG iterations. 5: if H i + β i Ri is not positive definite or −((g i )> δui )/(kg i kkδui k) < d then 6: Set β i := 1, and return to step 4. 7: end if i > i i i 2 8: if β i = 1 and q (δu ) R δu > (r ) then 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: Set ri := (δui )> Ri δui , β i+1 := 1, and go to step 13. else Set β i+1 := max(min(β i + c−1 ((δui )> Ri δui ) − (ri )2 ), 1), 0). end if Evaluate ρi := fγ (ui + δui ) − fγ (ui ) / (g i )> δui + (δui )> H i δui /2 . if ρi < ρ then Set ri+1 := κri . else if ρi > ρ̄ then Set ri+1 := κ̄ri . else Set ri+1 := ri . end if Determine the step size ai along the search direction δui such that ui+1 = ui + ai δui satisfies the following Wolfe-Powell conditions: ( 22: fγ (ui+1 ) ≤ fγ (ui ) + τ1 ai (g i )> δui , ∇fγ (ui+1 )> δui ≥ τ2 (g i )> δui . Generate the next iterate: ui+1 := ui + ai δui , ! 1 χi ∂ψ(mi ) i i+1 i i i > i i > i p := ∇u + ∇δu − (p )(∇u ) + (∇u )(p ) ∇δu . ψ(mi ) 2mi Set i := i + 1. 24: until the stopping criterion is fulfilled. 23: 28 M. Hintermüller, T. Valkonen, and T. Wu 0 log−probability Fit; q=0.87 −5 −10 −15 0 50 100 150 200 250 (c) Log-histogram and tq model 0 Smooth part Edge part Smooth fit; q=0.24 Edge fit; q=1.05 −5 (a) Summer photo −10 −15 0 50 100 150 200 250 q (d) Separate t model for edge part 0 log−probability Fit (man); q=0.35, M=30, α∞=0.050 Fit (opt); q=0.40, M=40, α∞=0.049 Fit (emp); q=0.30, M=40, α∞=0.017 −5 −10 (b) Detected edge pixels (red) −15 0 50 100 150 200 250 (e) Linearised models Figure 8.3: Summer photo: discrete gradient histogram and least squares models. The image intensity in (a) is in the range [0, 255], and we have chosen pixels k with |∇u(k)| ≥ 30 as edges (b). The log-histogram of |∇u(k)| with optimal fit of t 7→ tq is displayed in (c). This is done separately for the edge pixels in (d). The linearised model is fitted in (e) for the cut-off point M = 30 (manual edge detection), M = 40 (optimal least squares fit). We moreover show the empirically best linearised model. Limiting aspects of non-convex TVϕ models 29 (a) Original (b) Noisy image (c) M = 0 (d) M = 10 (PSNR-optimal) (e) M = 40 (SSIM-optimal) (f) M = ∞ Figure 9.1: Pier photo: denoising results with noise level σ = 30 (Gaussian), for varying cut-off M , fixed q = 0.4 and fixed α∞ = 0.0207. 30 M. Hintermüller, T. Valkonen, and T. Wu (a) Original (b) Noisy image (c) M = 0 (PSNR-optimal) (d) M = 15 (SSIM-optimal) (e) M = 40 (f) M = ∞ Figure 9.2: Parrot photo: denoising results with noise level σ = 30 (Gaussian), for varying cut-off M , fixed q = 0.5 and fixed α∞ = 0.0253. Limiting aspects of non-convex TVϕ models 31 (a) Original (b) Noisy image (c) M = 0 (d) M = 20 (PSNR-optimal) (e) M = 40 (SSIM-optimal) (f) M = ∞ Figure 9.3: Summer photo: denoising results with noise level σ = 60 (Gaussian), for varying cut-off M , fixed q = 0.3 and fixed α∞ = 0.00430. 32 M. Hintermüller, T. Valkonen, and T. Wu Table 9.1: Denoising results of all the three test photos. The noise level σ, exponent q and asymptotic regularisation α∞ are fixed. The cut-off point M varies. We report both the PSNR and the SSIM, with the optimal values underlined. Parrot photo / σ = 30, q = 0.5, α∞ = 0.0253 M 0 10 15 30 40 PSNR 30.3432 29.9156 29.5876 28.8098 28.3655 SSIM 0.7914 0.7919 0.7922 0.7906 0.7887 Pier photo / σ = 30, q = 0.4, α∞ = 0.0207 M 0 10 20 30 40 PSNR 29.0019 29.3959 29.3472 29.0918 28.8988 SSIM 0.6737 0.7191 0.7477 0.7556 0.7619 ∞ Summer photo / σ = 30, q = 0.4, α = 0.00430 M 0 10 20 30 40 PSNR 26.0919 26.2750 26.2851 26.0643 25.7443 SSIM 0.589 0.6175 0.6489 0.6641 0.6644 50 28.0398 0.7854 60 27.7424 0.7823 ∞ 28.6829 0.7552 50 28.679 0.7608 60 28.5327 0.7593 ∞ 28.5836 0.7297 50 25.3627 0.6615 60 25.0449 0.6543 ∞ 25.2755 0.6175 Limiting aspects of non-convex TVϕ models 33 windows and combining the results in a non-linear fashion. The range of the SSIM is [0, 1], the higher the better. In our computations, we keep α∞ and q fixed, and vary M (by altering α as necessary). For M = ∞, i.e., the original TVq model, we simply take α as our chosen fixed α∞ . This is because ϕ∞ = 0, so the real asymptotic α for the model is always zero. It is quite remarkable that in our results fine features of the images are always retained very well although higher M tends to increase the stair-casing effect (not applicable to M = ∞). At the optimal choice of M by PSNR or SSIM, more noise can be seen to be removed than by TV (M = 0). Generally, we can say that adding the cut-off M improves the results compared to the earlier TVq model without cut-off (M = ∞). Whether the results are better than conventional TV denoising is open to debate. By PSNR and SSIM the results tend to favor the TVϕ -model. Visually oscillatory effects of noise are better removed, but at the same time the stair-casing is accentuated. The best result is in the eye of the beholder. We also tested on the parrot photo the effect of the multiscale regularisation term η by including the first term η1 of the sum for varying weights of η0 and convolution kernel widths 1 . The results are in Figure 9.4 and Table 9.2. Clearly large η0 has a deteriorating effect on both PSNR and SSIM, whereas the effect of the choice of 1 is less severe. Visually, large η0 creates an almost artistic quantisation and feature-filtering effect. The latter is also controlled by 1 : large 1 tends to remove large features. A particular feature to notice is the eye of the parrot on the right in Figure 9.4(a) versus (b). It has disappeared altogether in the latter. Table 9.2: Effect of the η term on the parrot photo. Only the first term η1 of the sum is included, with varying convolution width 1 and weight η0 . The noise level σ = 30 (Gaussian), cut-off M = 10, exponent q = 0.5 and asymptotic regularisation α∞ = 0.0253 are fixed. Optimal SSIM and PSNR are underlined. η0 = 0.1α η0 = α η0 = 10α 1 = 0.5 PSNR SSIM 29.9433 0.8036 29.2700 0.8072 26.8069 0.7939 1 = 1 PSNR SSIM 29.8023 0.8091 28.2237 0.8014 25.7874 0.7921 1 = 2 PSNR SSIM 29.6197 0.8085 27.5659 0.7970 24.9003 0.7886 10. Conclusion. We have studied difficulties with non-convex total variation models in the function space setting. We have demonstrated that the model (1.2) continues to do what it is proposed to do in the discrete setting – to promote piecewise constant solutions – for most, but not all, energies ϕ, employed in the literature. Naïve forms of the model (1.1), proposed to model real gradient distributions in images, however have much more severe difficulties. We have shown that the model can be remedied if we replace the topology of weak* convergence by that of area strict convergence. In order to do this, we have to add additional multiscale regularisation in terms of the functional η into the model, and to linearise the energy ϕ for large gradients. The latter is needed to make the model BV-coercive, and to have any kind of penalisation for jumps. We have demonstrated through numerical experiments and simple statistics that this model, in fact, better matches reality than the simple energies ϕ(t) = tq . 34 M. Hintermüller, T. Valkonen, and T. Wu (a) η0 = 7.071e−03 , 1 = 1.0 (b) η0 = 7.071e−03 , 1 = 2.0 (c) η0 = 7.071e−04 , 1 = 1.0 (d) η0 = 7.071e−04 , 1 = 2.0 (e) η0 = 7.071e−05 , 1 = 1.0 (f) η0 = 7.071e−05 , 1 = 2.0 Figure 9.4: Effect of the η term on the parrot photo. Only the first term η1 of the sum is included, with varying convolution width 1 and weight η0 . The noise level σ = 30 (Gaussian), cut-off M = 10, exponent q = 0.5 and asymptotic regularisation α∞ = 0.0253 are fixed. Limiting aspects of non-convex TVϕ models 35 Our purely theoretical starting point has therefore led to improved practical models. The η functional, however, remains a “theoretical artefact”. It has its own regularisation effect that, naturally, does not distort the results too much for small parameters (though it does so for large parameters). As we shown in Proposition 6.7, it can be ignored in discretisations when not passing to the function space limit. Acknowledgements. A large part of this work was done while T. Valkonen was at the Center for Mathematical Modeling, Escuela Politécnica Nacional, Quito, Ecuador. There he was been supported by a Prometeo scholarship of the Senescyt. In Cambridge, T. Valkonen has been financially supported by the King Abdullah University of Science and Technology (KAUST) Award No. KUK-I1-007-43, and the EPSRC first grant Nr. EP/J009539/1 “Sparse & Higher-order Image Restoration”. M. Hintermüller and T. Wu have been supported by the Austrian FWF SFB F32 “Mathematical Optimization in Biomedical Sciences”, and the START-Award Y305. A data statement of the EPSRC. This is primarily a theoretical mathematics paper, and any data used mainly serves as a demonstration of mathematically proven results. Moreover, photographs that are for all intents and purposes statistically comparable to the ones used for the final experiments, can easily be produced with a digital camera, or downloaded from the internet; in particular, the parrot test photo is available in the Kodak Lossless True Color Image Suite.1 For the computations, we directly applied software developed in an earlier research program. This was funded by various non-UK agencies, whose rules govern the code. REFERENCES [1] W.K. Allard, Total variation regularization for image denoising, I. Geometric theory, SIAM J. Math. Anal., 39 (2008), pp. 1150–1190. [2] L. Ambrosio, A compactness theorem for a new class of functions of bounded variation, Boll. Un. Mat. Ital. B, 7 (1989), pp. 857–881. [3] L. Ambrosio, A. Coscia, and G. Dal Maso, Fine properties of functions with bounded deformation, Arch. Ration. Mech. Anal., 139 (1997), pp. 201–238. [4] L. Ambrosio, N. Fusco, and D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems, Oxford University Press, 2000. [5] H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods, Math. Program., 137 (2013), pp. 91–129. [6] H. Attouch, G. Buttazzo, and G. Michaille, Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization, MOS-SIAM Series on Optimization, Society for Industrial and Applied Mathematics, 2014. [7] M. Benning and M. Burger, Ground states and singular vectors of convex variational regularization methods, Methods and Applications of Analysis, 20 (2013), pp. 295–334. [8] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio, B. van Bloemen Waanders, K. Willcox, and Y. Marzouk, Large-scale inverse problems and quantification of uncertainty, vol. 712, John Wiley & Sons, 2011. [9] G. Bouchitté and G. Buttazzo, New lower semicontinuity results for nonconvex functionals defined on measures, Nonlinear Anal., 15 (1990), pp. 679 – 692. [10] K. Bredies, K. Kunisch, and T. Pock, Total generalized variation, SIAM J. Imaging Sci., 3 (2011), pp. 492–526. 1 At the time of writing this, available at http://r0k.us/graphics/kodak/. 36 M. Hintermüller, T. Valkonen, and T. Wu [11] T. Bui-Thanh, K. Willcox, and O. Ghattas, Model reduction for large-scale systems with highdimensional parametric input space, SIAM J. Sci. Comput., 30 (2008), pp. 3270–3288. [12] E. Casas, K. Kunisch, and C. Pola, Regularization by functions of bounded variation and applications to image enhancement, Applied Mathematics & Optimization, 40 (1999), pp. 229–257. [13] V. Caselles, A. Chambolle, and M. Novaga, The discontinuity set of solutions of the TV denoising problem and some extensions, Multiscale Model. Simul., 6 (2008), pp. 879–894. [14] T. F. Chan and S. Esedoglu, Aspects of total variation regularized L1 function approximation, SIAM J. Appl. Math., 65 (2005), pp. 1817–1837. [15] X. Chen, Smoothing methods for nonsmooth, nonconvex minimization, Math. Program., 134 (2012), pp. 71–99. [16] X. Chen and W. Zhou, Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 3 (2010), pp. 765–790. [17] J. C. de Los Reyes and C.-B. Schönlieb, Image denoising: Learning noise distribution via PDEconstrained optimization, Inverse Probl. Imaging, (2014). to appear. [18] S. Delladio, Lower semicontinuity and continuity of functions of measures with respect to the strict convergence, Proceedings of the Royal Society of Edinburgh: Section A Mathematics, 119 (1991), pp. 265–278. [19] J. E. Dennis, Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996. [20] V. Duval, J. F. Aujol, and Y. Gousseau, The TVL1 model: A geometric point of view, Multiscale Model. Simul., 8 (2009), pp. 154–189. [21] L. C. Evans and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press, 1992. [22] I. Fonseca and G. Leoni, Modern methods in the calculus of variations: Lp spaces, Springer Verlag, 2007. [23] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE TPAMI, (1984), pp. 721–741. [24] E. Giusti, Minimal Surfaces and Functions of Bounded Variation, vol. 80 of Monographs in Mathematics, Birkhäuser, 1984. [25] E. Haber, L. Horesh, and L. Tenorio, Numerical methods for experimental design of large-scale linear ill-posed inverse problems, Inverse Problems, 24 (2008), p. 055012. [26] E. Haber and L. Tenorio, Learning regularization functionals – supervised training approach, Inverse Problems, 19 (2003), p. 611. [27] M. Hintermüller and T. Wu, Nonconvex TVq -models in image restoration: Analysis and a trust-region regularization–based superlinearly convergent solver, SIAM J. Imaging Sci., 6 (2013), pp. 1385–1415. [28] , A superlinearly convergent R-regularized Newton scheme for variational models with concave sparsity-promoting priors, Comput. Optim. Appl., 57 (2014), pp. 1–25. [29] J. Huang and D. Mumford, Statistics of natural images and models, in IEEE CVPR, vol. 1, 1999. [30] J. Kristensen and F. Rindler, Relaxation of signed integral functionals in BV, Calc. Var. Partial Differential Equations, 37 (2010), pp. 29–62. [31] P.-L. Lions, The concentration-compactness principle in the calculus of variations. the limit case, i., Revista matemática iberoamericana, 1 (1985), pp. 145–201. [32] P Mattila, Geometry of sets and measures in Euclidean spaces: Fractals and rectifiability, Cambridge University Press, 1999. [33] M. Nikolova, Minimizers of cost-functions involving nonsmooth data-fidelity terms. application to the processing of outliers, SIAM J. Numer. Anal., 40 (2002), pp. 965–994. [34] M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching, Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 1 (2008), pp. 2–25. [35] P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: Inertial proximal algorithm for non-convex optimization, arXiv preprint arXiv:1404.4805, (2014). [36] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, An iterated l1 algorithm for non-smooth non-convex optimization in computer vision, in IEEE CVPR, 2013. [37] E. Paolini and E. Stepanov, Optimal transportation networks as flat chains, Interfaces Free Bound., 8 (2006), pp. 393–436. [38] J. Rabin and G. Peyré, Wasserstein regularization of imaging problem, in Image Processing (ICIP), Limiting aspects of non-convex TVϕ models 37 2011 18th IEEE International Conference on, 2011, pp. 1541–1544. [39] F. Rindler and G. Shaw, Strictly continuous extensions of functionals with linear growth to the space BV. preprint, 2013. [40] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), pp. 259–268. [41] W. Rudin, Real and Complex Analysis, McGraw-Hill Book Company, 1966. [42] P. Swoboda and C. Schnörr, Convex variational image restoration with histogram priors, SIAM J. Imaging Sci., 6 (2013), pp. 1719–1735. [43] T. Valkonen, Optimal transportation networks and stations, Interfaces Free Bound., 11 (2009), pp. 569– 597. [44] , Transport equation and image interpolation with SBD velocity fields, J. Math. Pures Appl., 95 (2011), pp. 459–494. [45] , Strong polyhedral approximation of simple jump sets, Nonlinear Anal., 75 (2012), pp. 3641–3671. [46] , The jump set under geometric regularisation. Part 2: Higher-order approaches. Submitted, July 2014. [47] , The jump set under geometric regularisation. Part 1: Basic technique and first-order denoising, SIAM J. Math. Anal., 47 (2015), pp. 2587–2629. [48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Processing, 13 (2004), pp. 600–612. [49] B. White, Rectifiability of flat chains, Ann. of Math., 150 (1999), pp. 165–184. Appendix A. Vectorial η functional. We now study a condition ensuring the convergence of the total variation |µi |(Ω) subject to the weak* convergence of the measures µi ∈ M(Ω; Rm ), (i = 0, 1, 2, . . .). Improving a result first presented in [44,45], we show in Theorem A.4 below that if {f` }∞ `=0 is a normalised nested sequence of functions as in Definition A.1 below, then it suffices to bound η(µi ) := ∞ X η` (µi ), where η` (µi ) := Z |µi |(τx f` ) − |µi (τx f` )| dx. (µ ∈ M(Ω; RN )). (A.1) `=0 Here we employ the notation τx f (y) := f (y − x). Also, we write |µi (τx f` )| := kµi (τx f` )k2 . Definition A.1. Let f` : Rm → R, (` = 0, 1, 2, . . .), be bounded Borel functions with compact support that are continuous in Rm \ Sf` , i.e. the approximate discontinuity set equals the m m discontinuity set. Also let {ν` }∞ `=0 be a sequence in M(R ) with |ν` |(R ) R= 1. The sequence ∞ {(f` , ν` )}`=0 is then said to form a nested sequence of functions if f` (x) = f`+1 (x − y) dν` (y) R (a.e.). The sequence is said to be normalised if f` ≥ 0 and f` dx = 1. Example A.1. Let ρ be the standard convolution mollifier such that ( ρ(x) := exp(−1/(1 − kxk2 )), 0, kxk < 1, kxk ≥ 1, and define ρ (x) := −m ρ(x/). Since ρ+δ = ρ ∗ρδ where ∗ denotes the convolution operation, we deduce that f` := ρ2−` and ν` = ρ2−`−1 form a normalised nested sequence. We require the following basic lemma for our vectorial case. Lemma A.2. Let ν ∈ M(Ω) be a positive Radon measure, and g ∈ L1 (Ω; RN ). Then Z Z g(x) dν(x) ≤ kg(x)k2 dν(x). Ω 2 Ω 38 M. Hintermüller, T. Valkonen, and T. Wu Proof. For any x ∈ Ω, we write g(x) = θ(x)v(x) with v(x) = (v1 (x), . . . , vN (x)), 0 ≤ θ(x), and kv(x)k2 = 1. Then we define λ := θν. Now 2 Z 2 2 Z N Z N X X 1 2 g(x) dν(x) = v (x) dλ(x) = λ(Ω) v (x) dλ(x) j j λ(Ω) Ω 2 Ω j=1 ≤ N X j=1 Z λ(Ω) Ω |vj (x)|2 dλ(x) = λ(Ω)2 . Ω j=1 Here we have used Jensen’s inequality. From this we conclude Z Z λ(Ω) = θ(x) dν(x) = Ω kg(x)k2 dν(x), Ω proving the claim. With the help of the above lemma, in [45] Theorem A.4 below was proved exactly as the case of scalar-valued measures (N = 1). Our proof here is however slightly different. We base it on the following more general lemma on partial sums, which we also need for the proof of Proposition 6.7. Lemma A.3. Let Ω ⊂ Rm be an open and bounded set, and {(f` , ν` )}∞ `=0 a normalised ∞ + nested sequence of functions. Let {Ki }i=1 ⊂ N satisfy limi→∞ Ki = ∞. Suppose {µi }∞ i=0 ⊂ M(Ω; RN ) weakly* converges to µ ∈ M(Ω; RN ) with sup |µi |(Ω) + i Ki X η` (µi ) < ∞, (A.2) `=1 and η(µ) < ∞. (A.3) Then η` (µ) ≤ lim inf η` (µi ), i→∞ (` = 0, 1, 2, . . .). (A.4) ∗ If also |µi | * λ in M(Ω), then λ = |µ|. Moreover, provided the weak* convergences hold in m N M(R ; R ), resp., M(Rm ), then η` (µ) = lim η` (µi ), i→∞ (` = 0, 1, 2, . . .). (A.5) ∗ ∗ Proof. Let us suppose first that µi * µ and |µi | * λ weakly* in M(Rm ; RN ), resp., M(Rm ) rather than just within Ω. We denote by Ef the discontinuity set of f , while Sf stands for the approximate discontinuity set. Fubini’s theorem and the fact that Sf is an Lm -negligible Borel R set, imply that λ(Sτx f` ) dx = 0. This and the non-negativity of λ show that λ(Sτx f` ) = 0 for a.e. x ∈ Rm . Since by assumption Ef ⊂ Sf , it follows that λ(Eτx f` ) = 0, so that (see, e.g., [3, Proposition 1.62]) µi (τx f` ) → µ(τx f` ) for a.e. x ∈ Rm . Likewise |µi |(τx f` ) → λ(τx f` ) Limiting aspects of non-convex TVϕ models 39 for a.e. x ∈ Rm . Since supi |µi |(Ω) < ∞, and Ω is bounded, an application of the dominated convergence theorem now yields Z lim i→∞ i |µ (τx f` )| dx = Z |µ(τx f` )| dx. (A.6) By the lower semicontinuity of the total variation, recalling that Z |µi |(τx f` ) dx = |µi |(Rm ), this shows (A.4) under the assumption that the weak* convergences are in M(Rm ). Observe then that since {(f` , ν` )}∞ {η` (µ)}∞ `=0 is a nested sequence of functions, `=0 forms R a decreasing sequence (for any µ ∈ M(Ω)). Indeed, as f` (x) = f`+1 (x − y) dν` (y) and ν` (Rm ) = 1 with ν` ≥ 0, using Lemma A.2 we have Z Z Z kµ(τx f` )k2 dx = µ(τx+y f`+1 ) dν` (y) dx 2 Z Z ≤ kµ(τx+y f`+1 )k2 dν` (y) dx = Z kµ(τx f`+1 )k2 dx. Referring to (A.1), it follows that η` (µ) ≥ η`+1 (µ). (A.7) ∗ To show λ = |µ|, that is |µi | * |µ|, we only have to show |µi |(Ω) → |µ|(Ω). To see the latter, we choose an arbitrary > 0, and write |µ|(Ω) − |µi |(Ω) = η` (µ) − η` (µi ) + Z |µ(τx f` )| − |µi (τx f` )| dx. (A.8) Since η` ≥ 0, using (A.7) in (A.2) and (A.3), we now observe that taking ` large enough and i` such that Ki` ≥ `, we have sup{η` (µ), η` (µi` ), η` (µi` +1 ), η` (µi` +2 ), . . .} ≤ . Employing this in (A.8), we deduce for any large enough ` and all i ≥ i` that Z i i |µ|(Ω) − |µ |(Ω) ≤ 2 + |µ(τx f` )| − |µ (τx f` )| dx . The integral term tends to zero as i → ∞ by (A.6). Therefore lim |µi |(Ω) − |µ|(Ω) ≤ 2. i→∞ Since > 0 was arbitrary, we conclude that λ = |µ|. Moreover, (A.5) follows from (A.6) now. This concludes the proof of the lemma under the assumption that the weak* convergences are in M(Rm ). 40 M. Hintermüller, T. Valkonen, and T. Wu ∗ If this assumption does not hold, we may still switch to a subsequence for which µik * µ̄ m N m N weakly* in M(R ; R ) for some µ̄ ∈ M(R ; R ). But, since Ω is open, necessarily µ̄xΩ = µ. Moreover, an application of the triangle inequality gives η` (µ) = η` (µ̄xΩ) ≤ η` (µ̄) ≤ lim inf η` (µik ). k→∞ ∗ As this bound holds for every subsequence, we deduce (A.4). Likewise, we have |µik | * λ̄ m weakly* in M(R ) for a common subsequence. Again λ̄xΩ = λ. Since by the previous paragraphs |µ̄| = λ̄, this implies λ = |µ|. Theorem A.4. Let Ω ⊂ Rm be an open and bounded set, and {(f` , ν` )}∞ `=0 a normalised N ) converges weakly* nested sequence of functions. Suppose the sequence {µi }∞ in M(Ω; R i=0 ∗ to a measure µ ∈ M(Ω; RN ), and satisfies supi |µi |(Ω) + η(µi ) < ∞. If also |µi | * λ, then λ = |µ|. Moreover, each of the functionals η and η` , (` = 0, 1, 2, . . .), is lower-semicontinuous with respect to the weak* convergence of {µi }∞ i=0 . Proof. Only lower semicontinuity of η demands a proof; the rest is immediate from Lemma A.3 with Ki = i, for instance. Also lower semicontinuity of η follows simply from Fatou’s lemma and (A.4). Appendix B. Proof of Theorem 7.1. We prove here our result on the remedy by resorting to compact operators. Lemma B.1. Let Ω, Ω0 be open domains, and suppose K : BV(Ω) → L1 (Ω0 ; Rm ) is a compact linear operator. Let ϕ : R0,+ → R0,+ be lower semicontinuous. Then Z F (u) := Ω0 ϕ(kKu(x)k) dx is lower semicontinuous with respect to weak* convergence in BV(Ω). Proof. If {ui }∞ i=1 ⊂ BV(Ω) converges weakly* to u ∈ BV(Ω), then it is bounded in BV(Ω). Therefore, by the compactness of K, the sequence {Kui }∞ i=1 has a subsequence, unrelabelled, 1 0 m which converges strongly to some v ∈ L (Ω ; R ). By the continuity of K, which follows from compactness, necessarily v = Ku. Now [22, Theorem 5.9] shows that F (u) ≤ lim F (ui ). i→∞ Lemma B.2. Let ρ ∈ Cc∞ (Rn ) and Ω ⊂ Rn be bounded and open. Suppose µi → µ weakly* in M(Ω; Rm ). Then ρ ∗ µi → ρ ∗ µ strongly in L∞ (Rn ). Proof. Let K be a compact set such that Ω + supp ρ ⊂ K. We have kρ ∗ µi kL∞ (K;Rm ) ≤ kρkL∞ (Rn ) |µi |(Ω) and k∇(ρ ∗ µi )kL∞ (K;Rn ×Rm ) = k(∇ρ) ∗ µi kL∞ (K;Rn ×Rm ) ≤ k∇ρkL∞ (Rn ;Rn ) |µi |(Ω). Limiting aspects of non-convex TVϕ models 41 Thus {ρ ∗ µi }∞ i=1 is uniformly bounded and equicontinuous. It follows from the Arzelá-Ascoli theorem that ρ ∗ µi converges uniformly (i.e., in L∞ (K; Rm )) to some v ∈ Cc (K; Rm ). Let ϕ ∈ L1 (K; Rm ). Then by the weak* convergence we have Z hϕ(x), (ρ ∗ µi )(x)i dx = Rn → Z h(ϕ ∗ ρ)(x), dµi (x)i n ZR h(ϕ ∗ ρ)(x), dµ(x)i = Rn Z hϕ(x), (ρ ∗ µ)(x)i dx, Rn as i → ∞, so that ρ ∗ µi → ρ ∗ µ weakly in L∞ (K; Rm ). Then it holds that ρ ∗ µ = v, and the convergence is strong. Because supp w ⊂ K for w = ρ ∗ µ or w = ρ ∗ µi , the claim follows. Proof of Theorem 7.1. Suppose {ui }∞ i=1 is a bounded sequence in BV(Ω). We may extract a subsequence, unrelabelled, such that {ui }∞ i=1 is weakly* convergent in BV(Ω) to some u ∈ BV(Ω). By Lemma B.2, then D ui → D u strongly in L∞ (Ω; Rm ). Weak* lower semicontinuity now follows from Lemma B.1. Let then u ∈ C 1 (Ω). The estimate ]ϕ lim sup TVϕ, c (u) ≤ TVc (u), (u ∈ C 1 (Ω)), (B.1) &0 follows from the proof of Lemma 4.5. By subadditivity we also have ϕ, ]ϕ TV c (u) − TVc (u) ≤ Z ϕ(k(ρ ∗ ∇u)(x) − ∇u(x)k) dx. Ω Writing g (x) := k(ρ ∗∇u)(x)−∇u(x)k, we have g (x) → 0 in L1 (Rn ). We may again proceed as in the proof of Lemma 4.5 to show ϕ, ]ϕ lim sup TV c (u) − TVc (u) ≤ 0. &0 This together with (B.1) proves (7.1). Appendix C. Proof of Theorem 7.2. We now prove our result on the remedy based on the SBV space. Proposition C.1. Let η̄ and p ∈ (1, ∞) be as in (7.2). Suppose g i * g weakly in Lp (Ω; Rm ), and that supi η̄(g i ) < ∞. Then g i → g strongly in Lp (Ω; Rm ). Proof. Let K be a compact set such that Ω + supp ρ1 ⊂ K, and set M := supi η̄(g i ). We observe that η̄ is lower semicontinuous with respect to weak convergence in Lp . Therefore η(g) ≤ M . As in the proof of Lemma A.3, we observe that η̄` ≥ η̄`+1 , from which it again follows that η̄` (h) → 0 as ` → ∞ uniformly for h ∈ {g, g 1 , g 2 , g 3 , . . .}. (C.1) 42 M. Hintermüller, T. Valkonen, and T. Wu We then observe that as in Lemma B.2, we have ρ` ∗ g i → ρ` ∗ g strongly in L∞ (K; Rm ) (C.2) for each ` ∈ {1, 2, 3, . . .}. Thus, it holds that kg i kLp (Ω;Rm ) − kgkLp (Ω;Rm ) ≤ η` (g i ) − η` (g) + kρ` ∗ g i − ρ` ∗ gkLp (K;Rm ) . Given δ > 0, thanks to (C.1), we may find ` such that kg i kLp (Ω;Rm ) − kgkLp (Ω;Rm ) ≤ 2δ + kρ` ∗ g i − ρ` ∗ gkLp (K;Rm ) . With ` fixed, we thus get by (C.2) that lim sup kg i kLp (Ω;Rm ) − kgkLp (Ω;Rm ) ≤ 2δ. i→∞ Since δ > 0 was arbitrary, and using weak lower semicontinuity of k · kLp (Ω;Rm ) , we deduce lim kg i kLp (Ω;Rm ) = kgkLp (Ω;Rm ) . i→∞ But for p ∈ (1, ∞), strict convergence implies strong convergence [22]. This concludes the proof. Proof of Theorem 7.2. If {ui }∞ i=1 is weakly* convergent in BV(Ω), we may – without loss of generality – assume that supi F (ui ) < ∞, for otherwise lower semicontinuity is obvious. Then by the SBV Compactness Theorem 3.5, the convergences (3.1)–(3.4) hold for a subsequence. This also shows weak* convergence, if it did not hold originally. Moreover, by the same R theorem, u 7→ Ju ψ(θu (x)) dHm−1 (x) is lower semicontinuous with respect to this convergence. p m By (7.3) we may further assume {∇ui }∞ i=1 weakly convergent in L (Ω; R ). Proposition p m C.1 now shows strong convergence of {∇ui }∞ i=1 in L (Ω; R ). The functional F is lower semicontinuous with respect to all of these convergences, which yields lower semicontinuity with respect to weak* convergence in BV(Ω).

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement