Generalized sampling and infinite-dimensional compressed sensing

Generalized sampling and infinite-dimensional compressed sensing
Generalized sampling and infinite-dimensional
compressed sensing
Ben Adcock
Department of Mathematics
Simon Fraser University
Burnaby, BC V5A 1S6
Canada
Anders C. Hansen
DAMTP, Centre for Mathematical Sciences
University of Cambridge
Wilberforce Rd, Cambridge CB3 0WA
United Kingdom
Abstract
We introduce and analyze an abstract framework, and corresponding method, for compressed sensing
in infinite dimensions. This extends the existing theory from signals in finite-dimensional vectors spaces
to the case of separable Hilbert spaces. We explain why such a new theory is necessary, and demonstrate
that existing finite-dimensional techniques are ill-suited for solving a number of important problems.
This work stems from recent developments in generalized sampling theorems for classical (Nyquist
rate) sampling that allows for reconstructions in arbitrary bases. The main conclusion of this paper is that
one can extend these ideas to allow for both reconstructions in arbitrary bases and significant subsampling
of sparse or compressible signals.
1
Introduction
Compressed sensing (CS) has, with little doubt, been one of the great successes of applied mathematics in
the last decade [12, 15, 17, 19, 21, 22]. It allows one to sample at rates dramatically lower than conventional
wisdom suggests – namely, the Nyquist rate – provided the signal to be recovered is sparse in a particular
basis, and the sampling vectors are relatively incoherent.
However, compressed sensing is currently only a finite-dimensional theory. It concerns the recovery
of vectors in some finite-dimensional space (typically RN or CN ) whose nonzero entries with respect to a
particular basis are small in number in comparison to N . Herein lies a problem. Real-world signals are
typically analog, or continuous-time, and thus are modeled more faithfully in infinite-dimensional function
spaces [8]. Any finite-dimensional model may therefore not be well suited in practice (we give examples
of this in §2). Although this issue has been widely recognized, there have been almost no attempts made
thus far to extend the existing theory to a more realistic, infinite-dimensional model. With this in mind, the
purpose of this paper is to provide such a generalization.
The first step in this regard is a move away from the current matrix-vector model. In particular, we
consider the following more realistic scenario: a signal f is viewed as an element of a separable Hilbert
space H, and its measurements are modeled as a sequence of linear functionals ζj : H → C. This gives rise
to the countable collection
ζ1 (f ), ζ2 (f ), ζ3 (f ), . . . ,
(1.1)
of samples of f . Now suppose that the signal f is sparse in some orthonormal basis {ϕj }j∈N of H. The main
questions we answer in this paper are the following: can we recover f by subsampling from the collection
(1.1), and how can this realized by a numerical algorithm? In doing so, we obtain a full theory for so-called
infinite-dimensional compressed sensing, valid for (almost) arbitrary pairs of sampling schemes {ζj }j∈N and
reconstruction bases {φj }j∈N .
The theory we introduce in this paper stems from recent developments in classical sampling of signals. In
[1, 3, 4] a new sampling theory, known as generalized sampling, was introduced for reconstructions of signals
in arbitrary bases {ϕj }j∈N from their samples (1.1) (see §1.4 and §4 for further details). The framework we
propose in this paper is a continuation of this work that allows one to take advantage of sparsity to allow for
subsampling. To illustrate, one of the important results we prove in this paper is as follows:
1
P∞
Theorem 1.1. For f = j=1 αj ϕj write ∆ = {j : αj 6= 0} and suppose that ∆ ⊂ {1, . . . , M } for some
M ∈ N. Let > 0 be arbitrary. Then, there exists an integer N ∈ N (specific estimates will be given in
§7) depending on M and |∆| only such that the following holds: if Ω ⊂ {1, . . . , N }, |Ω| = m, is chosen
uniformly at random, then, with probability greater than 1 − , f can be recovered
p exactly from the samples
{ζj (f ) : j ∈ Ω} given that m is proportional to |∆| · log(−1 + 1) · log(N M |∆|).
As we explain in §6 and §7, the values m, N are specified explicitly by a system of inequalities involving
|∆|, M and . The somewhat surprising result is that taking N = M will typically not be sufficient. As we
demonstrate in §2, there are straightforward examples where |∆| may be very small, but choosing N = M
will give disastrous results. However, choosing a particular value N > M (specific bounds will be given
later) will allow for dramatic subsampling (see §2.3).
1.1
An example
The model detailed above is particularly important in Magnetic Resonance Imaging (MRI), and this application shall serve as our primary example in this paper. In this setting f , the image, is sampled via inner
products with complex exponentials, and the infinite collection {ζj (f )}j∈N of samples corresponds to the
Fourier coefficients of f (typically H = L2 (R2 )). Suppose now that we know that f is sparse in, say, a
wavelet basis. The question is, how can we recover f exactly by subsampling from the given Fourier coefficients {ζj (f )}j∈N ? In particular, to what extent can we subsample (i.e. how does the number of samples m
we require depend on the sparsity, where m is as in Theorem 1.1), and what is the range N from which we
need to draw our samples (clearly, it makes no sense to draw samples uniformly with indices ranging over
N).
The MRI problem served as a strong motivation for much of the original development of CS [15]. Work
by Lustig et al has led to the successful application of finite-dimensional techniques in this area [33]. However, as we shall explain in §2, there are a number of significant issues in tackling the infinite-dimensional
MRI problem with the current techniques. Indeed, the current approach may well fail quite dramatically,
even in extremely simple examples (see §2.1), and any attempt to remedy it whilst remaining in the finitedimensional setting may also not succeed (see §2.2). Fortunately, the approach we propose in this paper,
which is based on the aforementioned infinite-dimensional model, successfully overcomes these shortcomings (§2.3).
1.2
The need for a new general theory
It is natural to ask whether such a new theory is necessary or not. Given the problem described above, one
must at some stage discretize. It therefore seems plausible that finite-dimensional CS could be readily applied
once one had restricted the problem from the underlying Hilbert space H to a suitable finite-dimensional
space (e.g. CN ). In particular, if f is sparse in an, albeit countably-infinite, basis {ϕj }j∈N (i.e. it only has
finitely many nonzero coefficients in this basis), it seems plausible that the corresponding sparse recovery
problem is inherently finite-dimensional. In some limited cases this is indeed the case: one may treat the
problem solely in finite dimensions with existing CS tools (see Remark 2.3). However, as we discuss in §2,
in general this problem cannot be tackled in such a way.
Indeed, ‘discretizing’ the problem – that is, reducing the infinite amounts of information contained in the
samples {ζj (f )}j∈N – so as to make it amenable to computations, is fraught with difficulties. In particular,
the most obvious, and ultimately most naı̈ve, discretizations may well destroy the original structure of the
problem. This means that exact recovery may well not be possible with finite-dimensional techniques, since
the key structure that allows for subsampling has been destroyed by the discretization. This issue is discussed
further in §2.1 and §2.2.
Aside from how the discretization is carried out, there is another problem with the finite-dimensional
viewpoint. To illustrate this issue, consider now a more realistic scenario, where the signal to be recovered
f = g + h is not sparse but compressible. In other words, g is sparse in {ϕj }j∈N , whereas h is not sparse
but is small in norm. This is now a fully infinite-dimensional problem: both the set of samples {ζj (f )}j∈N
and the support of f in {ϕj }j∈N have infinite cardinality. Of course, it is now no longer possible to recover
f exactly. However, it is natural to ask whether we can recover f up to some error determined only by h. In
particular, it is critically important that this error behave well as both the total number of samples used and
the range from which such samples are drawn are varied. That is, rather than exact recovery, one must also
2
consider the important, and fundamentally infinite-dimensional, notion of approximation. Arguably, there is
little hope of being able to do this using solely existing finite-dimensional CS tools.
To impress this point further on the reader, consider for the moment an analogy. A huge component
of the field of numerical analysis is devoted to both the design and study of numerical discretizations of
partial differential equations (PDEs). Typically the discretization takes the form of a solution to a linear
systems of equations [11, 36]. However, to study the key approximation properties of the discretization
(i.e. how rapidly the method converges as the discretization parameter – step/mesh size etc – is varied), one
must also have knowledge of the underlying PDE. In particular, in the analysis of finite element methods,
for example, one establishes existence and uniqueness of finite-dimensional discretizations, as well as error
estimates, by first proving regularity results for the given infinite-dimensional PDE (coercivity conditions,
for example) [11, 18]. Moreover, reversing the argument, regularity properties of the PDE (in other words,
the infinite-dimensional structure of the problem) also inform the design of new numerical methods.
In summary, the message is as follows: the discretization cannot be studied in isolation from the underlying PDE. As it transpires, much the same is true for the sparse recovery problem: formulating the problem in
infinite-dimensions is necessary both in designing a discretization, and in analyzing the resulting numerical
method.
1.3
Discretizing infinite-dimensional problems
How such a discretization is performed is a crucial issue in this paper. In general, discretising infinitedimensional problems is a difficult and subtle issue which cannot be carried out successfully without a
thorough understanding of the particular problem at hand. Unless done carefully, it is quite possible to end up
with a discretization whose properties contrast starkly with those of the original problem, and consequently
a numerical method that is neither stable nor convergent. With this in mind, our approach is based on the
following fundamental philosophy:
Keep the infinite-dimensional structure and crucial properties of the original problem when discretising.
(Ph)
By correctly following this principle we obtain a method for infinite-dimensional CS that allows significant subsampling of sparse and compressible signals, and thereby extends existing techniques to infinitedimensional problems.
Notice that the approach we propose in this paper is a significant departure from the current conceived
wisdom. Indeed, rather than discretising first, our initial step is to devise an appropriate infinite-dimensional
formulation of the sparse recovery problem. Truncation is then carried out in the second step, leading to a
finite-dimensional problem which can be solved numerically.
It is also worth mentioning that (Ph) is by no means unique to this particular problem. Whilst it is often
implicitly followed in numerical PDEs, most relevantly for this paper it was recently employed in [29] to
solve the long-standing computational spectral problem. A number of ideas in this article stem from [29],
and the contributions of this paper may be viewed as a continuation of this work.
1.4
Generalized sampling (GS)
The framework we propose in this paper has its direct origins in recent developments in classical, i.e. Nyquist
rate, sampling. Specifically, the question of how to recover signals (not necessarily sparse or compressible)
in arbitrary bases from their samples (1.1). Although this question has long been studied in sampling theory,
there had been little effort given over to the issue of discretizations, and the key notion of approximation [41]
(see also [1, 3]).
By employing (Ph), a new approach, known as generalized sampling (GS), was developed in [1, 3, 4]. GS
is a new type of sampling theory which incorporates the critical issues of approximation and stability, leading
to the so-called stable sampling rate [3]. The resulting numerical method allows for guaranteed recovery of
any signal in an arbitrary basis from a collection of its samples in a manner which is both numerically stable
and, in a certain sense, optimal. In this paper we build on this work in the following way: we show that, in the
case that the signal to be reconstructed is sparse or compressible, reconstruction can also be performed with
significant subsampling. We refer to the corresponding method as generalized sampling with compressed
sensing (GS–CS).
One important instance of both GS and GS–CS is the recovery of a function from its Fourier samples
(the MRI problem, in particular). Although the classical Shannon Sampling Theorem [31, 41] allows for
3
reconstruction in terms of an infinite series of sinc functions or complex exponentials, the slow convergence
of these series, as well as the appearance of the Gibbs phenomenon [32], means that such a reconstruction
is often not practical. Consequently, Shannon’s theorem may not be used in practice [41], even for Nyquistrate sampling. Nonetheless, in many circumstances it is well known that the given signal can be wellrepresented (i.e. it is sparse or has rapidly decaying coefficients) in a new basis of functions; be they splines,
wavelets, curvelets, etc [26]. GS allows one to reconstruct in such a basis in manner that is both accurate and
numerically stable. The method we develop in this paper, GS–CS, permits one to dramatically undersample
whenever the signal is sparse or compressible.
1.5
Summary and outline
As has now be made clear to the reader, the viewpoint of this work is very much that of an infinitedimensional (or analog) world. In other words, we consider images and signals as infinite-resolution objects,
and not as objects of fixed resolution defined uniquely by a finite number of their samples. Thus, at best, an
arbitrary image can only be approximated by a finite collection of its samples (e.g. by its Fourier series in
the case of MRI). However, a key conclusion of this paper is that one may obtain far better approximations
than, for example, the Fourier series, by reconstructing in another basis, especially if the image is sparse or
compressible in such a basis.
This stance is by no means unique to this paper, and it has long been recognized that such a viewpoint is
better suited to many problems. For example, the idea of finite rates of innovation, developed by Vetterli et
al [8, 23, 43], is very similar to this principle, although the resulting technique is completely different to that
which we propose.
The outline of the remainder of this paper is as follows. In §2 we explain why the current CS techniques
are insufficient for infinite-dimensional problems, and why a new theory is therefore necessary. In §3 we
introduce the specific types of models we consider in this paper. §4 recaps GS, and in §5 we introduce
our method for infinite-dimensional compressed sensing: namely, GS–CS. The main results of this paper
– recovery of sparse and compressible signals with GS–CS – are presented in §7, and in §8 we provide
numerical examples. In §9–§11 we give proofs of the main results.
2
Why do we need a new approach?
Consider the following very simple model problem (which will form the primary example throughout this
paper):
Problem 2.1. Suppose that f ∈ L2 (R) has support contained in [−1, 1], and let {ϕj }j∈N be the orthonormal
basis of Haar wavelets on L2 (−1, 1). Define
ζj (f ) = Ff (j),
j ∈ Z,
(2.1)
to be the Fourier coefficients of f (this is precisely the type of sampling one encounters in MRI). Here Ff
denotes the Fourier transform of f , and ≤ 21 is arbitrary. Throughout, we shall take = 12 . We will also
assume that f is sparse, or compressible, in the basis {ϕj }j∈N of Haar wavelets. Thus, the problem is to
recover f from the measurements (2.1).
We shall return to this problem throughout this paper. Recall that the classical Shannon Sampling Theorem (see §4) gives that f can be recovered exactly (in the L2 sense) from the infinite collection {ζj (f )}j∈Z .
However, since f is known to be sparse in the Haar wavelet basis {ϕj }j∈N , the question is as follows: is
there a way this additional information could be used to allow f to be recovered from only a finite number of
its samples? Moreover, if this is true, how many such samples are required, and is it possible to subsample?
These are questions we now wish to answer.
2.1
First example - the standard approach fails
Let us consider the simplest possible example: namely, let f = χ[0,1/2) − χ[1/2,1) be the classical Haar
wavelet. Note that f is extremely sparse in the Haar basis. To recover f exactly from (2.1), at some stage
one needs to discretize, so as to reduce the infinite amounts of information to something finite. The standard
4
1.5
0.1
1
0.05
0.5
0
0
−0.5
−0.05
−1
−1.5
−1
−0.5
0
0.5
−0.1
−1
1
−0.5
0
0.5
1
Figure 1: The figure shows the rather
and f (t) − fN,m (t) (right) against t for 2N = 256
P disappointing fN,m (t) (left)
2N
ξ
ϕ
(t)
and
ξ
=
{ξ
}
and m = 130, where fN,m (t) = 2N
is
a minimizer of (2.3).
j
j
j
j=1
j=1
technique, especially in sparse MRI [33], involves two steps. First, one replaces the infinite collection of
samples with the finite vector
y = ζ(f ) = {ζj (f ) : j = −N + 1, . . . , N },
N ∈ N.
(2.2)
Second, one uses a combination of the discrete Fourier and discrete wavelet transforms (DFT and DWT
respectively) to formulate the corresponding measurement matrix. To this end, let Udf , Vdw ∈ C2N ×2N be
the matrices of these transforms. The classical discrete approximation to the problem of inverting the Fourier
transform is then given by
y = Udf x,
where x is a vector approximating pointwise values of f on an equispaced grid in [−1, 1]. Since f is sparse
in the Haar basis it is very tempting to think that
x0 = Vdw x
is also sparse, if Vdw is the discrete Haar transform. One should therefore be able to recover f perfectly with
finite-dimensional compressed sensing using only relatively few of its samples y = ζ(f ). More precisely,
if Ω ⊂ {1, . . . , 2N }, |Ω| = m < 2N is chosen uniformly at random, then the usual approach would be to
solve the convex optimization problem
−1
min kηkl1 subject to PΩ Udf Vdw
η = PΩ y.
η∈C2N
(2.3)
Here PΩ : C2N → C2N is orthogonal the projection onto span{ej : j ∈ Ω} and {ej }2N
j=1 is the canonical
2N
basis for C . If ξ is a minimizer of this problem, then one could hope that ξ agrees with the vector x0 with
−1
high probability, and hence we could recover x = Vdw
x0 .
To test (2.3), let us consider the case where 2N = 256
m = 130 – i.e. we samples nearly 50% of
Pand
2N
the samples in the range −N + 1, . . . , N . Write fN,m = j=1 ξj ϕj , where ξ is a minimizer of (2.3). Note
−1
that fN,m takes the values of the vector Vdw
ξ at the grid points.
As Figure 1 demonstrates the result is extremely disappointing. The function f has not been recovered
anywhere near exactly, and the reconstruction fN,m computed from (2.3) commits rather large errors (especially near the jumps in f , i.e. x = −1, 0, 21 , 1). To be more precise, even though f only has one nonzero
coefficient in the Haar wavelet basis, and despite the fact that we use m = 130 Fourier samples of f , we
do not get anywhere near to perfect recovery. The question is now: what went wrong, and why can we
seemingly not recover f perfectly with this approach? The answer given in the next section.
2.1.1
The DFT destroys sparsity
The source of the catastrophic failure of (2.3) is the discretization employed: namely, the DFT. The problem
−1
is that the DFT is not exact – it commits an error. To illustrate this, consider matrix Udf
. This maps the
5
vector of Fourier coefficients ζ(f ) of a function f to a vector consisting of pointwise values on an equispaced
2N -grid in [−1, 1]. However, this mapping is not exact: for an arbitrary function f , the result is only an
approximation to the grid values of f . The question is, how large is the approximation error, and how does
it affect (2.3) and its solutions?
To quantify this effect, consider the vector x ∈ C2N defined by
Udf x = ζ(f ).
This vector represents grid values of an approximation to f . In fact, it is simple to see that x consists precisely
of the values of the function
fN (t) = N
X
Ff (j)e2πijt ,
= 1/2,
(2.4)
j=−N +1
on this grid. This function is nothing more than the truncated version of the reconstruction given by the
Shannon Sampling Theorem (see §4). In other words, the approximation involved in the transform Udf is
equivalent to replacing a function f by its partial Fourier series fN .
Let us consider the discrete wavelet transform x0 ∈ C2N of x:
x0 = Vdw x.
The right-hand side of the equality constraint in (2.3) now reads
−1
PΩ Udf Vdw
x0 .
Thus, for the method (2.3) to be successful we require x0 = Vdw x to be a sparse vector. However, this can
never happen. Sparsity of x0 is equivalent to stipulating that the partial Fourier series fN be sparse in the
Haar wavelet basis. The function fN consists of smooth complex exponentials, and thus cannot have a sparse
representation in a basis of piecewise smooth functions. Therefore, although it has a unitary and incoherent
measurement matrix, (2.3) is not a sparse recovery problem. Consequently there is little or no hope of
recovering the sparse vector α of Haar wavelet coefficients of f from (2.3). This explains the complete
failure witnessed in Figure 1.
From this argument, we now conclude the following: by forming the approximation (2.4), we have
destroyed the structure (i.e. the sparsity) of the original problem. In particular, we have violated the guiding
principle (Ph).
Remark 2.1 Note that this loss of sparsity is not exclusive to the Haar wavelet basis. In fact, if f is sparse
in any basis of compactly supported wavelets then, by insisting on using the Shannon Sampling Theorem,
we also witness the same problem: namely, fN can never be sparse in the same basis.
2.1.2
The DFT leads to the Gibbs phenomenon
Whilst the loss of sparsity is significant, there is another important issue with the use of the DFT. Given
that η is not sparse in Haar wavelets, suppose now, as an exercise, we forgo any subsampling. That is, we
let m = 2N . The problem (2.3) now has a unique solution η. However, by the arguments given above,
the entries of η are not the Haar wavelet coefficients of f , but rather coefficients of the approximation fN .
Thus, by solving (2.3) (both with and without subsampling) we are not actually computing Haar wavelet
coefficients of f , but those of the partial Fourier series fN instead. Thus, we cannot hope to obtain a better
(i.e. more accurate) reconstruction of f than fN .
The question is, how good an approximation is fN ? Given that f is piecewise smooth, it turns out to
be very poor. In fact, as N → ∞, fN does not converge uniformly to f , and only converges very slowly
in the weaker L2 norm. One also witnesses the unpleasant Gibbs phenomenon, with its associated O (1)
oscillations, near each jump in f . These effects are visualized in Figure 2. The fact that (2.3) leads to a Haar
wavelet approximation to fN , as opposed to f , can be observed by comparing the left panels in Figures 2
and 1 respectively.
Of course, one may attempt various remedies to this problem, such as increasing N or using the total
variation norm instead. However, the key point is, regardless of how clever we are, if we insist on performing
reconstruction via Udf , then we cannot hope to obtain any better than the (extremely poor) approximation
fN , the partial Fourier series.
6
1.5
0.1
1
0.05
0.5
0
0
−0.5
−0.05
−1
−1.5
−1
−0.5
0
0.5
−0.1
−1
1
−0.5
0
0.5
1
Figure 2: The figure shows fN (t) (left) and f (t) − fN (t) (right) against t for 2N = 256.
Remark 2.2 This particular issue, poor convergence of fN , is also not unique to Haar wavelets. Indeed,
any piecewise smooth (in particular, nonperiodic) function will have the same problem. Only functions f
which are smooth and periodic on [−1, 1] will be well approximated by their Fourier series fN . However, it
is typically very rare for real-world signals and images to have such properties [41].
Remark 2.3 The well-informed reader may object and suggest that if we have a signal f that is sparse in the
Haar basis, why do we not measure f with noiselets [20] (e.g. ζj (f ) = hf, ψj i, where the ψj ’s are noiselets).
This approach would give a finite-dimensional recovery problem suffering from none of the aforementioned
issues. However, the luxury to choose the measurement system is very rare in applications. In particular, in
the example that we just considered (which serves as a model for MRI) we are given {Ff (j)}j∈Z , and this
cannot be readily altered. Thus, to properly solve these types of problems, we must have a model for which
the sampling scheme {ζj }j∈N is fixed.
2.2
Second example - a common misunderstanding
The failure of (2.3) can be interpreted as a violation of the fundamental principle (Ph). In particular, the
crucial property that f is sparse in the Haar wavelet basis is destroyed when the DFT is applied. With this
in mind, it may seem to the reader that, since f is a finite sum of Haar wavelets, there is a simple remedy to
this problem: namely, replace the DFT and DWT by the measurement matrix


ζ1 (ϕ1 ) · · ·
ζ1 (ϕ2N )


..
..
..
(2.5)
UN = 
,
.
.
.
ζ2N (ϕ1 ) · · ·
ζ2N (ϕ2N )
randomly sample Ω ⊂ {1, . . . , 2N } with |Ω| = m, obtain a minimizer ξ to the problem
min kηkl1 subject to PΩ UN η = PΩ ζ(f ),
η∈C2N
(2.6)
P2N
and form the reconstruction fN,m = j=1 ξj ϕj , Clearly this approach, unlike (2.3), preserves the sparsity
of the original problem. In this case we have, for convenience, reindexed in the natural way the Fourier
samples {ζj }j∈N over N rather than Z.
P2N
Let us consider an example of (2.6). Suppose that f (t) = j=1 αj ϕj (t), where 2N = 768. We have
chosen a function f such that |supp(f )| = |{αj : αj 6= 0}| = 5. In particular, the function f is very sparse
in the Haar wavelet basis. In Figure 3 we display the reconstruction given by (2.6) using m = 760. As is
evident, f is recovered extremely poorly by (2.6): although we have used nearly 98% of its Fourier samples
in the range 1, . . . , 2N , the reconstruction error kf − fN,m k is very large (roughly 2.43 and kf k = 3.21).
We repeated the experiment fifty times with the same outcome.
This example is disastrous. Despite altering the standard CS approach (2.3) to ensure that sparsity is
preserved, we still obtain an extremely poor reconstruction. Indeed, since f is sparse in the Haar basis and
7
−10
x 10
30
1
20
0.5
10
0
0
−10
−0.5
−20
−1
−30
−1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1
Figure 3: The figure shows the disastrous error f (t) − fN,m (t) against t (left) as well as the much more pleasant error
f (t) − gNe ,m̃ (t) (right). Note that fN,m requires m = 760 samples whereas gNe ,m̃ requires only m̃ = 50 samples.
we sample using nearly all its Fourier samples in the range 1, . . . , 2N , one may have reasonably hoped to
recover f perfectly in this example. However, as seen in Figure 3, this is certainly not the case.
From a conventional CS viewpoint, this failure appears quite surprising. We have formed a measurement matrix in a standard way by taking inner products of one orthonormal basis (the complex exponentials
{e2πik· }N
k=−N +1 ) with another (Haar wavelets). Surely the standard finite-dimensional results should apply
[12]? However, they do not, as evidenced by Figure 3. The reason is that UN is not unitary (or even close to
unitary). The unexpected nature of this failure is most likely due to the following common misconception: a
change-of-basis matrix of the form (2.5) is only unitary when the bases span the same space. This is clearly
not the case in (2.5), where the sampling and reconstruction bases consist of the first 2N complex exponentials and Haar wavelets respectively. The key here is the underlying infinite dimensionality. One needs
infinitely many complex exponentials to span a space large enough so that it will contain finitely many Haar
wavelets (or vice versa).
From this discussion we draw the following conclusion. Simply thinking (since f has only finitely
many non-zero Haar wavelet coefficients) that the problem can be embedded in CN and solved using finitedimensional CS is incorrect. In fact the failure of UN can also be interpreted as a violation of (Ph). As the
above argument indicates, UN is not unitary, whereas the ‘infinite measurement matrix’


ζ1 (ϕ1 ) ζ1 (ϕ2 ) · · ·


(2.7)
U =  ζ2 (ϕ1 ) ζ2 (ϕ2 ) · · ·  ,
..
..
..
.
.
.
formed by combining the full countably-infinite bases does possess this property (both Haar wavelets and
complex exponentials form orthonormal bases for the infinite-dimensional Hilbert space L2 (−1, 1)). It is
precisely the loss of this structure when ‘discretising’ U via UN that is the source of the failure observed
above.
2.3
Third example - a new approach
With these examples in mind, the purpose of the remainder of this paper is to describe a new approach for
infinite-dimensional CS, known as generalized sampling with compressed sensing (GS–CS), which overcomes the aforementioned failings. This brings us to the purpose of this section, and really the essence
of this paper: namely, why infinite dimensions? Put simply, this is because the search for the coefficients
α = {α1 , α2 , . . .} of f results in an infinite system of equations. By formulating reconstruction directly in
an infinite-dimensional way, and then discretising (as opposed to discretising first), we are able to completely
avoid the pitfalls described above.
P2N
We will now present a straightforward example of this approach. Let f (t) = j=1 αj ϕj (t) be as in
8
§2.2. Note that

 
u11
ζ1 (f )
ζ2 (f ) u21
=

 
..
..
.
.
u12
u22
..
.
 
α1
...
α2 
. . .
 ,
..
..
.
.
uij = ζi (ϕj ).
It is by attacking this infinite set of equations directly that we find a successful approach. The key is to
realize that, although f has only five non-zero wavelet coefficients, it has infinitely many Fourier coefficients {ζj (f )}j∈N . Thus, recovering f from these sampled Fourier coefficients is fundamentally not a
finite-dimensional problem.
Consider now the following. Let U = {uij }i,j∈N . We will use an uneven section of this infinite
matrix (this is a trick that stems from [29]). Specifically, for K ∈ N let PK denote the projection onto
e ∈ N, we choose Ω ⊂
span{e1 , . . . , eN }, where {ek }k∈N is the canonical basis for l2 (N). Now, for N
e
{1, . . . , N } uniformly at random, with |Ω| = m̃, and numerically compute a minimizer ξ to
inf
η∈l1 (N)
kηkl1 subject to PΩ PNe U PM η = PΩ y,
y = {ζ1 (f ), ζ2 (f ), . . .},
(2.8)
e = 1351 and m̃ = 50, and M = 768. One can now easily reconstruct the function
where M ∈ N. Let N
PM
f of §2.2. In particular, by letting gNe ,m̃ = j=1 ξj ϕj , where ξ is a minimizer of (2.8), we obtain kf −
gNe ,m̃ kL2 = 1.15 × 10−11 (this is the average of 50 trials). This error is displayed in Figure 3. Note
the dramaticimprovement over the approach of §2.2. Specifically, the error has been reduced from O (1)
to O 10−10 . Moreover, the GS–CS reconstruction uses less than 7% of the number of sampled Fourier
coefficients that were used to form the extremely poor reconstruction in (2.6).
Remark 2.4 The true error committed by GS–CS is most likely zero. However, the numerical error is
polluted by the limited accuracy of the solver used to compute the relevant l1 optimization problem.
So, why do we get such a great result with this new approach? After all, we used much fewer samples
e ? These questions will be answered in this paper, and the key is what
than in §2.2. Also, how did we choose N
we refer to as the Balancing Property (see §6.2). We shall present further numerical examples illustrating
the effectiveness of this new approach in §8.
3
Infinite-dimensional compressed sensing
To develop a theory for infinite-dimensional CS it is useful to now introduce the types of signal models
we consider in this paper, and to formally define the problemPwe consider. Suppose that H is a separable
∞
Hilbert space with an orthonormal basis {ϕj }j∈N . Let f = j=1 αj ϕj be the signal we wish to recover,
and suppose that we have access to the countable collection of samples
ζ1 (f ), ζ2 (f ), ζ3 (f ), . . . ,
(3.1)
where ζj : H → C are continuous linear functionals on H. The problem throughout this paper will be to
recover f in terms of {ϕj }j∈N from the samples (3.1). Since we consider sparse signals f it is useful to
introduce the notation supp(f ):
supp(f ) = {j ∈ N : αj 6= 0}.
Here αj are the coefficients of f in the basis {ϕj }j∈N . When its meaning is clear, we shall also write ∆ for
supp(f ).
3.1
The semi-infinite dimensional model
We first assume that f is exactly sparse in the basis {ϕj }j∈N . In other words, there exists an M ∈ N such
that
∆ = supp(f ) ⊂ {1, . . . , M }.
(3.2)
Naturally, we do not know ∆, however, we may have information about M . We refer to this model as semiinfinite dimensional: although f has only finite support in {ϕj }j∈N , we can access samples ζj (f ) from the
countable collection (3.1).
9
In practice the assumption that f is perfectly sparse is often unrealistic. Thus, a more reasonable scenario
is the following:
f = g + h,
∆ = supp(g) ⊂ {1, . . . , M },
supp(h) = {1, . . . , M }.
(3.3)
In this case, one can no longer expect perfect recovery of f with subsampling. However, it is highly desirable
to have a bound on the error in reconstructing f . Specifically, we wish to show perfect recovery of f with
high probability, up to an error determined only by some appropriate norm of h.
3.2
The fully infinite-dimensional model
Whilst (3.3) is more realistic than the exact sparsity model (3.2), it is rare in practice that supp(h) is finite.
In particular, in the fully-infinite dimensional model we consider the significantly more general setting:
f = g + h,
∆ = supp(g) ⊂ {1, . . . , M },
|supp(h)| = ∞.
(3.4)
This model is termed fully infinite-dimensional since the support of f is infinite, as is its set of samples.
Again we may pose the same question: how well can we reconstruct f , and how does the error behave in
terms of h?
Let us at this stage notice that (3.3) and (3.4) are, in many senses, very different problems. Correspondingly, the theorems we present about each problem are quite different in character. Whilst the skeptical reader
may think still it plausible that (3.3) could be tackled by existing finite-dimensional CS techniques (despite
the discussion in §2) there is little hope of doing the same for (3.4).
4
Generalized sampling: guaranteed recovery in arbitrary bases
Before discussing how to subsample infinite-dimensional signals, it is first necessary to consider the more
basic case where no sparsity is present. The question is, how can P
one actually reconstruct arbitrary signals f
∞
from their measurements {ζj (f )}j∈N ? Or in other words, if f = j=1 αj ϕj , how do we recover the infinite
vector α = {α1 , α2 , . . .} from the samples ζ1 (f ), ζ2 (f ), . . .. Only once this problem has been solved can
one properly consider the issue of subsampling. Fortunately, the technique of generalized sampling (GS)
was developed precisely to solve this problem [1, 3, 4]. We now recap this approach.
Under some assumptions on {ζj }j∈N (e.g. each ζj is continuous and {ζj (f )}j∈N ∈ l2 (N), ∀f ∈ H), we
can view the full recovery problem as the infinite-dimensional system of linear equations
U α = ζ(f ),
where α = {α1 , α2 , . . .}, ζ(f ) = {ζ1 (f ), ζ2 (f ), . . .} and U is the infinite measurement matrix


ζ1 (ϕ1 ) ζ1 (ϕ2 ) · · ·


U =  ζ2 (ϕ1 ) ζ2 (ϕ2 ) · · ·  .
..
..
..
.
.
.
(4.1)
(4.2)
Clearly, if we were able to invert U , and provided we had access to all samples of f , then we could recover
α (and hence f ) exactly. However, this is never the case in practice. Instead, we must consider truncations
of (4.1), and look to compute approximations α̃1 , . . . , α̃N to the first N coefficients of α.
At this point we make the following critical remark. Whatever strategy we use for computing such
approximate coefficients, the result α̃[N ] = {α̃1 , . . . , α̃N } ∈ CN must be a good approximation to the first
N exact coefficients α1 , . . . , αN . Recall that the whole premise for recovering f in the basis {ϕj }j∈N is that
we know that f is well represented in this basis. In other words, the coefficients {αj }j∈N decay rapidly as
j → ∞, or, in the case where f is sparse, only a finite number are nonzero. Therefore, whichever method
we use for solving (4.1), it is vital that the error
kα − α̃[N ] kl2 ,
(4.3)
is small (here and later, for convenience, we shall not make a distinction between the finite vector α[N ] =
{α̃1 , . . . , α̃N } ∈ CN and its imbedding {α̃1 , . . . , α̃N , 0, 0, . . .} in l2 (N)). Clearly, the examples of §2 violate
this condition: specifically, the failure witnessed is a result of the error (4.3) being large.
10
4.1
Finite sections: a warning from spectral theory
The most obvious approach for discretising (4.1) follows from taking finite sections of U . In other words,
if PN : l2 (N) → span{ej : j = 1, . . . , N } is the orthogonal projection, we consider solutions α̃[N ] to the
N × N system of equations
PN U PN α̃[N ] = PN ζ(f ).
(4.4)
Note that

PN U P N
ζ1 (ϕ1 ) · · ·

..
..
=
.
.
ζN (ϕ1 ) · · ·

ζ1 (ϕN )

..
,
.
ζN (ϕN )
is nothing more than the leading N × N submatrix (i.e. the finite section) of U .
Finite sections [9, 10, 28] are extremely widely used in practice. However, for general operators U there
is no guarantee that α̃[N ] need either exist, or that α̃[N ] (if it exists) actually converges to α as N → ∞.
In fact, it is easy to devise pairs of bases {ϕj }j∈N and sampling schemes {ζj }j∈N for which the error
kα − α̃[N ] kl2 blows up as N → ∞, whenever α̃[N ] is the result of the finite section method [1, 2].
Another significant issue is that the finite section matrix PN U PN ∈ CN ×N may be extremely poorly
conditioned, even though U and its inverse U −1 are bounded. Examples of operators U whose finite sections
exhibit exponentially poor conditioning were given in [1]. In particular, the measurement matrix formed by
sampling in the Fourier basis and reconstructing in Haar wavelets (the principal example of this paper) suffers
from this phenomenon. As a result, the numerical method based on finite sections is not just nonconvergent,
it is also extremely unstable and highly sensitive to noise (see also [2, 3]).
The failure of the finite section method for solving (4.1) can be viewed as a violation of the principle (Ph).
Finite sections have been studied extensively from the viewpoint of computational spectral theory. Therein
one typically wishes to gain information about the spectrum of U by considering discretizations of the form
PN U PN [7, 28]. The main conclusion is that, unless U satisfies some very stringent restrictions (such as
positive self-adjointness), its finite sections PN U PN may have wildly different (spectral) properties. This
violates (Ph), and thus makes finite sections typically unsuitable for solving (4.1).
In particular, for operators U of the form (4.2) an important property is unitarity. Whenever the measurements ζj (f ) = hf, ψj i for some orthonormal system {ψj }j∈N (as is the case with Fourier sampling) the
matrix U is a unitary operator on l2 (N). In general, however, there is no guarantee that the finite sections
PN U PN are unitary, in violation of (Ph).
Note that the finite section PN U PN is precisely the measurement matrix (2.5) encountered in the finitedimensional CS approach (2.6). As commented in §2.2, the loss of the unitary structure of U accounts for
the failure seen therein.
4.2
Uneven sections and generalized sampling (GS)
Fortunately, there is a simple, albeit far less common, way to overcome the failure of finite section method,
based on taking rectangular, as opposed to square, sections of U . In [1, 4] it was proposed to replace (4.4)
with
Aα̃[M ] = PM U ∗ PN ζ(f ),
A = PM U ∗ PN U P M ,
(4.5)
where M ∈ N (the number of coefficients α̃1 , . . . , α̃M computed) is appropriately chosen (typically M ≤
N ). The result is known as generalized sampling (GS). Note that A ≡ (PN U PM )∗ PN U PM , where
PN U PM is the N × M uneven section of U .
The main idea is that, by allowing M to vary independently of N , one can obtain both a numerically
stable and accurate reconstruction of the first M values α1 , . . . , αM . Note that this means that we typically
recover fewer of the coefficients α1 , α2 , . . . than in the finite section method. However, unlike the latter, it is
possible to guarantee both the stability and accuracy of this approach. In other words, by being less greedy
in the number of coefficients we seek to recover, we actually obtain a far better result.
The main theorem proved in [1, 3] is as follows:
Theorem 4.1. Let U ∈ B(l2 (N)) be an isometry and suppose that M ∈ N is given. Then there exists an
N0 ∈ N such that, for every N ≥ N0 , there is a unique solution α̃[M ] to (4.5). Furthermore, we have the
sharp bound
1
kα − α̃[M ] k ≤
kP ⊥ αk,
(4.6)
CN,M M
11
where
CN,M = kPM − PM U ∗ PN U PM k.
(4.7)
Specifically, N0 is the least N such that CN,M > 0.
It can be shown that the quantity CN,M → 1 as N → ∞, for any fixed M . Thus, one deduces from
(4.6) that α̃[M ] can be made arbitrarily close to PM α – the best M -term approximation to α – by varying
N suitably. Hence, a good reconstruction can always be guaranteed with this approach. Furthermore, the
−1
resulting method is also stable. The condition number of the matrix A scales like CN,M
[3]. That is to say,
precisely the same quantity that ensures accuracy of the reconstruction also guarantees numerical stability.
Note that CN,M can be easily computed numerically by finding the norm of an N × N matrix. Hence,
the conditions of Theorem 4.1 can be verified numerically. Having said this, in numerous circumstances of
interest one can also obtain analytical bounds [1, 4].
To connect generalized sampling with the principle (Ph), note that the uneven section PN U PM inherits
the structure of U , whenever M and N are chosen suitably. In fact, for fixed M ,
P M U ∗ P N U P M → PM U ∗ U P M = PM ,
N → ∞,
where I : l2 (N) → l2 (N) is the identity. Thus, PN U PM is also an isometry on the range of PM in the limit
N → ∞.
4.3
A generalized Shannon Sampling Theorem
One of the main instances of GS, and a central reason for its development, is the case where {ζj (f )}j∈Z
corresponds to the Fourier samples (2.1) (we replace the index set N with Z in this instance). Although
the famous Shannon Sampling Theorem ensures that both f and its Fourier transform Ff can be recovered
exactly via the infinite sums
X
X
t + j
,
f (·) = Ff (j)e2πij· ,
Ff (t) =
Ff (j)sinc
j∈Z
j∈Z
(note that the first converges in L2 , whereas the second converges both uniformly and in L2 ), in practice one
has to truncate these series, leading to the approximations
bN/2c
bN/2c
fN (t) = X
Ff (j)e
2πijt
,
X
FfN (t) =
Ff (j)sinc
j=−bN/2c
j=−bN/2c
t + j
.
(4.8)
As discussed, these are typically very poor reconstructions of f and Ff respectively.
However, suppose now we know another basis {ϕj }j∈N in which f is well represented. Then we can
apply GS to obtain an improved reconstruction in this basis. This leads to the following generalization of
Shannon’s theorem:
Theorem 4.2 (Generalized Sampling Theorem [1, 4]). Let F denote the Fourier transform on R, and suppose
that {ϕj }j∈N is an orthonormal set in L2 (R) satisfying supp(ϕj ) ⊂ [−T, T ] for all j ∈ N and some T > 0.
1
For 0 < ≤ 2T
let
√
ζj (f ) = Ff (ρ(j)), j ∈ Z, f ∈ L2 (R),
where ρ : N → Z is some bijection, and suppose that U is given by (4.2). Then, for each M ∈ N there is an
N ∈ N such that there exists a unique solution α̃[M ] ∈ CM to (4.5), for any f ∈ span{ϕj }j∈N . Moreover, if
fN,M =
M
X
α̃j ϕj ,
gN,M =
j=1
M
X
α̃j Fϕj ,
j=1
then
kf − fN,M kL2 (R) ≤
and
1
CN,M
⊥
kPM
f kL2 (R) ,
√
kg − gN,M kL2 (R) ≤
2T
kP ⊥ f kL2 (R) ,
CN,M M
where g = Ff , CN,M is given by (4.7) and PM denotes the projection onto span{ϕ1 , . . . , ϕM }.
12
(4.9)
−14
x 10
1.5
1
−1
0.5
−1.2
−1.4
0
−1.6
−0.5
−1.8
−1
−1.5
−1
−2
−0.5
0
0.5
−2.2
−1
1
−0.5
0
0.5
1
Figure 4: The figure shows the disappointing error f (t) − fN (t) against t (left) and the more pleasant error f (t) −
fN,M (t) (right) for N = 51 and M = 12. Note that fN and fN,M use exactly the same samples.
Note that this theorem is just the special case of Theorem 4.1 corresponding to Fourier samples. It is
also a straightforward exercise to extend it to the multivariate setting, where F corresponds to the Fourier
transform on L2 (Rd ) [1]. This theory extends the classical Shannon Sampling Theorem as well as its many
fundamental generalizations [6, 25, 42, 41]. The key point is that, if we know that f is well-represented
in {ϕj }j∈N , then we can recover f optimally (up to a multiplicative constant) in terms of the first M basis
function ϕ1 , . . . , ϕM using only its first N Fourier coefficients.
An important issue that we shall not address in full detail in this paper is how the constant CN,M behaves.
In particular, for > 0, how large must N be, for a given M , to ensure that CN,M > 1 − ? Whilst this
condition can always be verified numerically (see above), in the important case we consider in this paper
(namely, reconstructions in Haar wavelets from Fourier samples) one can show that the scaling N = c()M
(for some c() > 0) is sufficient [1]. In particular, whilst setting N = M (this corresponds to the finite
section) leads to a divergent and unstable numerical method, in many cases the scaling N = 2M results in a
stable and convergent method.
4.4
An example - the effectiveness of generalized sampling
To demonstrate the use of generalized sampling let us consider the following function
f (t) = t5 e−t ,
t ∈ [−1, 1].
Suppose we can sample the Fourier coefficients of f : in particular, we have access to {Ff (j)}j∈Z for
= 1/2. To reconstruct f from these samples we will try two different techniques. First, we test the truncated
Fourier series fN defined in (4.8). Due to the fact that f is not periodic we cannot expect rapid convergence
of fN to f . However, the Generalized Sampling Theorem 4.2 allows us to reconstruct in any basis. Thus,
(due to analyticity of f ) we will choose the reconstruction basis {ϕj }j∈N consisting of orthonormal Legendre
polynomials on [−1, 1]. In particular, we define fN,M as in (4.9), where ρ : N → N is given by
ρ(1) = 0, ρ(2) = 1, ρ(3) = −1, ρ(4) = 2, ρ(5) = −2 . . . .
(4.10)
In Figure 4 we have displayed the errors f −fN and f −fN,M . Note that both reconstructions, fN and fN,M ,
use the same samples, yet the improvement of fN,M compared to fN is dramatic. In particular, we go from
an O (1) error to roughly machine precision. One question we shall not address here is how to determine the
relationship between M and N . Note that this is a case where setting M = N gives complete nonsense. We
refer to [1, 3, 4] for a complete analysis.
We will repeat the experiment above with another function:
f (t) = sin(5t) +
L
X
αj ψj (t),
j=1
13
t ∈ [−1, 1],
−3
x 10
5
60
40
20
0
0
−20
−40
−1
−0.5
0
0.5
−5
−1
1
−0.5
0
0.5
1
Figure 5: The figure shows the large error f (t) − fN (t) against t (left) as well as the substantially smaller error
f (t) − fN,M (t) (right) for N = 2401 and M = 1750. Note that fN and fN,M use exactly the same samples.
where {ψj }j∈N are the Haar wavelets on [−1, 1], L = 1700 and the αj ’s are some arbitrarily chosen coefficients. We will assume, as above, that we can sample the Fourier coefficients {Ff (j)}j∈Z of f (with = 12
once more). Due to the vast number of discontinuities of f we cannot expect the truncated Fourier series
fN to be a good approximation to f . However, by the Generalized Sampling Theorem 4.2 we can choose
the reconstruction basis {ϕj }j∈N to be the Haar wavelets, and construct fN,M as in (4.9). In Figure 5 we
have displayed the errors f − fN and f − fN,M . Note that both reconstructions, fN and fN,M , use the same
samples, yet the improvement of fN,M compared to fN is substantial.
5
Generalized sampling with compressed sensing (GS–CS)
An immediate consequence of Theorems 4.1 and 4.2 is that, if we know that f is sparse in {ϕj }j∈N – i.e.
supp(f ) ⊂ {1, . . . , M } – then we can recover f perfectly from its first N samples, whenever N is suitably
large. However, given that N ≥ M typically for GS to succeed (in particular, when using Haar wavelets,
one typically requires N = 2M [1]), the question is, is it possible to use the same ideas in combination
with CS techniques to attain subsampling? The answer turns out to be yes (given some minor assumptions),
and the key is to follow a similar approach, again based on uneven sections, to formulate the reconstruction
appropriately. The result is known
P∞ as generalized sampling with compressed sensing (GS–CS).
Let us suppose that f = j=1 αj ϕj is sparse in {ϕj }j∈N and is sampled via {ζj }j∈N . As opposed to
the failed approaches of §2, which were loosely based on discretising first, the technique we now propose
involves first formulating the sparse recovery problem in infinite dimensions. To this end, let Ω ⊂ N be of
size |Ω| = m ∈ N and consider the (infinite-dimensional) optimization problem
min kηkl1 subject to PΩ U η = PΩ ζ(f ),
η∈l1 (N)
(5.1)
where U is the infinite measurement matrix (4.2) and ζ(f ) = {ζ1 (f ), ζ2 (f ), . . .} is the infinite vector of
samples.
Recall that GS relies on a well-posed infinite-dimensional recovery problem (4.1) before discretization
can proceed. Seeking similar notions for (5.1), we are led to the following questions:
(i) How do we choose Ω? Obviously there is no unique choice, but it makes sense to choose Ω uniformly
at random from {1, . . . , N }, where N ∈ N. This raises the question following question: how large
must N be?
(ii) Suppose that η is a minimizer of (5.1) (note that η need not be unique). How large is kη − αk, where α
is the infinite vector of coefficients of f in the basis {ϕj }j∈N . In particular, how does kη − αk depend
on both m (the total number of samples) and N (the range from which the samples are drawn).
(iii) If f is exactly sparse in {ϕj }j∈N , do we recover its coefficient vector α exactly (with high probability)
from (5.1), and what conditions on m and N ensure this recovery?
14
Let us suppose for the moment that we have answers to these questions. Naturally we cannot solve (5.1)
numerically, hence we must discretize. For this, we follow the same ideas that lead to generalized sampling.
Thus, we introduce a parameter k ∈ N and consider the finite-dimensional optimization problem
min kηkl1 subject to PΩ U Pk η = PΩ ζ(f ).
η∈CM
(5.2)
We refer to this approach as generalized sampling with compressed sensing. This of course leads to another
set of questions:
(iv) Will (5.2) have a solution? Note that (5.2) need not have a solution for all k, since PΩ ζ(f ) need not
be in the range of PΩ U Pk (although, as we shall show, this is always the case for sufficiently large k).
Moreover, will solutions of (5.2) converge to solutions of (5.1)? We will answer this affirmatively.
(v) If f is not sparse but compressible, how large is the error kη − αk when η is a solution to (5.2) and α
is the vector of coefficients of f ? In particular, if f belongs to either of the models (3.3) or (3.4), can
kη − αk be bounded above in terms of some appropriate norm of h?
Answers to these questions will be provided in section §7 where we state the main results of this paper.
6
6.1
Notation and definitions
Notation
We now introduce some additional notation that will be used in the remainder of this paper. Write H = l2 (N),
and let k·k be the standard norm on H. All other norms will be specified. Let {ej }j∈N be the natural
basis on l2 (N), and, if Γ ⊂ N, define PΓ to be the orthogonal projection onto cl(span{ej : j ∈ Γ}). If
Γ = {1, . . . , N }, then we simply write PN .
If ξ ∈ H and j ∈ N, then ξ(j) = hξ, ej i (we will also sometimes use the notation ξj ). For Γ ⊂ N, we
denote the natural embedding operator by ιΓ : l2 (Γ) → H. Note that ι∗Γ η = η|Γ for η ∈ H. For any vector
ξ ∈ H we write supp(ξ) = {j ∈ N : ξ(j) 6= 0}. We also define the sign sgn(ξ) ∈ l∞ (N) of ξ ∈ l∞ (N) as
follows:
(
ξ(j)/|ξ(j)| if ξ(j) 6= 0
sgn(ξ)(j) =
0
otherwise.
For an operator U ∈ B(H) we define incoherence parameter
υ(U ) = sup |uij |,
uij = hU ej , ei i,
(6.1)
i,j∈N
i.e. the max norm of the operator U with respect to {ej }j∈N . Also, if U = {uij }ij∈N is an infinite matrix,
we define the maximum row norm of U by
sX
|uij |2 .
kU kmr = sup
i∈N
j∈N
This quantity forms a vector space norm on the vector space of all infinite matrices (although not an algebra
norm). Finally, for convenience, we will define the following crucial function that will be used frequently in
the exposition. For M ∈ N and U ∈ B(H) let ω̃M,U : {1, . . . , M } × R+ × N → N be given by






ω̃M,U (r, s, N ) = i ∈ N :
max
kPΓ1 U ∗ PΓ2 U ei k > s .
(6.2)


Γ
⊂{1,...,M
},|Γ
|=r
1
1


Γ ⊂{1,...,N }
2
Observe also that the mapping s 7→ ω̃M,U (r, s, N ) is a decreasing function.
15
6.2
Key definition
Definition 6.1. Let U ∈ B(H) be an isometry. Then N and m satisfy the weak Balancing Property with
respect to U, M and |∆| if
r
kPM U ∗ PN U PM − PM k ≤
max
|Γ|=|∆|,Γ⊂{1,...,M }
4
log2 4N
p
|∆|/m
!−1
,
1
.
kPM PΓ⊥ U ∗ PN U PΓ kmr ≤ p
8 |∆|
(6.3)
(6.4)
are satisfied. We say that N and m satisfy the strong Balancing Property with respect to U, M and |∆| if
(6.3) holds, as well as
max
|Γ|=|∆|,Γ⊂{1,...,M }
1
kPΓ⊥ U ∗ PN U PΓ kmr ≤ p
.
8 |∆|
(6.5)
Remark 6.1 The inequality in (6.4) is somewhat inconvenient. However, it can be replaced by the far
simpler, although weaker, condition
1
kPM U ∗ PN U PM − diag(PM U ∗ PN U PM )kmr ≤ p
.
8 |∆|
(6.6)
Here diag(B) denotes the diagonal matrix whose entries correspond to the diagonal entries of the matrix
B. In particular, condition (6.6) is the requirement on the magnitude of the off-diagonal entries of the
matrix PM U ∗ PN U PM . In much the same manner, (6.5) can also be replaced by the much more convenient
(however much stronger) condition
1
kU ∗ PN U PM − diag(U ∗ PN U PM )kmr ≤ p
.
8 |∆|
The following proposition establishes that the balancing property is well defined:
Proposition 6.2. If U , M and |∆| are as in Definition 6.1, then there always exists integers N and m that
satisfy the weak and strong Balancing Properties with respect to U, M and |∆|.
Proof. Note that since PN → I strongly as N → ∞ we have that PN U → U strongly. However, for any
Γ ⊂ N with |Γ| < ∞ we have by compactness that PN U PΓ → U PΓ in norm as N → ∞. The fact that U is
an isometry yields the assertion.
Remark 6.2 Note that it is through the Balancing Property property we determine the number N , the crucial
number that determine the set {1, . . . , N } from where we will draw the samples (see §2.3).
7
Main results
We now present the main results on GS–CS. Proofs of these results form the content of the remainder of this
paper.
7.1
The semi-infinite dimensional model
The first results concern the semi-infinite dimensional model (see §3.1):
Theorem 7.1. Let U ∈ B(H) be an isometry, M ∈ N, > 0 and suppose that x0 ∈ l1 (N) with supp(x0 ) =
∆, where ∆ ⊂ {1, . . . , M }. Suppose that N and m satisfy the weak Balancing Property with respect to U,
M and |∆|, and let Ω ⊂ {1, . . . , N } be chosen uniformly at random with |Ω| = m. If ζ = U x0 then, with
probability exceeding 1 − , the problem
inf
η∈l1 (N)
kηkl1
subject to PΩ U PM η = PΩ ζ,
16
(7.1)
has a unique solution ξ and this solution coincides with x0 , given that m satisfies
!
p
M
N
|∆|
m ≥ C · N · υ 2 (U ) · |∆| · log −1 + 1 · log
,
m
(7.2)
for some universal constant C. Furthermore, if m = N then ξ is unique and ξ = x0 with probability 1.
The main conclusion of this theorem is as follows: a sparse signal x0 can be recovered perfectly (with
high probability) by subsampling from the coefficients ζ, provided (6.3), (6.4) and (7.2) hold. Note that this
result gives answers to the questions (i) and (iii) posed in §5. Note that Theorem 7.1 confirms Theorem 1.1
of §1.
Recall that the second scenario in the semi-infinite dimensional model corresponds to signals y0 = x0 +h,
where x0 is sparse and supp(h) ⊂ {1, . . . , M }. The following theorem concerns this case:
Theorem 7.2. Let U ∈ B(H) be an isometry, M ∈ N, > 0 and suppose that x0 , h ∈ l1 (N) with
supp(x0 ) = ∆, where ∆ ⊂ {1, . . . , M }, and supp(h) ⊂ {1, . . . , M }. Define y0 = x0 + h. Suppose that N
and m satisfy the weak Balancing Property with respect to U, M and |∆|, and let Ω ⊂ {1, . . . , N } be chosen
uniformly at random with |Ω| = m. If ζ = U y0 and ξ ∈ H is a minimizer of (7.1) then, with probability
exceeding 1 − , we have that
m
20N
+ 11 +
khkl1 ,
(7.3)
kξ − y0 k ≤
m
2N
given that m satisfies (7.2) If m = N then (7.3) holds with probability 1.
This theorem demonstrates recovery for compressible signals y0 = x0 + h of the form (3.3): we witness
perfect recovery, up to an error determined solely by the magnitude of the nonsparse signal component h. In
particular, this result answers part of question (v) posed previously.
Remark 7.1 It is important to notice that there need not be a unique solution to (7.1). However, this is not
an issue. Theorem 7.2, and, in particular, equation (7.3), states that all solutions to (7.1) will be close to y0
in norm.
7.2
The fully infinite-dimensional model
Recall that the semi-infinite dimensional model (3.3) places the restriction that the support of the nonsparse
term h is contained in {1, . . . , M }. As discussed in §3, this assumption is quite rare in practice, and a more
realistic setting is provided by the fully infinite-dimensional model. Here we assume that y0 = x0 +h, where
x0 is sparse and |supp(h)| is infinite.
To address this setting, it is first necessary to scrutinize an infinite-dimensional optimization problem of
the form (5.1):
Theorem 7.3. Let U ∈ B(H) be an isometry, M ∈ N, > 0 and suppose that x0 , h ∈ l1 (N) with
supp(x0 ) = ∆, where ∆ ⊂ {1, . . . , M }. Define y0 = x0 + h. Suppose that N and m satisfy the strong
Balancing Property with respect to U, M and |∆|, and let Ω ⊂ {1, . . . , N } be chosen uniformly at random
with |Ω| = m. If ζ = U y0 and ξ ∈ H is a minimizer of
inf
η∈l1 (N)
kηkl1
subject to PΩ U η = PΩ ζ,
(7.4)
then, with probability exceeding 1 − , we have that
20N
m
kξ − y0 k ≤
+ 11 +
khkl1 ,
m
2N
(7.5)
given that m satisfies
m ≥ C · N · υ 2 (U ) · |∆| · log ω = ω̃M,U (|∆|, s, N ),
−1
s=
+ 1 · log
32N
ωN
p
|∆|
m
!
,
m
p
|∆| log(e4 −1 )
for some universal constant C (recall ω̃M,U from (6.2)). If m = N then (7.5) holds with probability 1.
17
(7.6)
Remark 7.2 The quantity ω in (7.6) can also be replaced by a much more convenient (and of course much
f, where
less sharp) estimate. In particular we have that ω ≤ M
(
)
m
f = min r ∈ N : kPM U ∗ PN kkPN U Pr⊥ k ≤
p
M
.
32N |∆| log(e4 −1 )
f is finite, since kPN U Pr⊥ k → 0 as r → ∞. for fixed N .
Note that M
This theorem, much like Theorem 7.2, confirms recovery of y0 up to an error determined solely by h.
Note that it resolves questions (i)–(iii) posed in §5. However, note that the optimization problem (7.4) is
infinite-dimensional. In practice, one always replaces (7.4) with the finite-dimensional problem
inf
η∈l1 (N)
kηkl1
subject to PΩ U Pk η = PΩ ζ,
(7.7)
where k ∈ N is suitably chosen. The obvious question now arises: how do solutions of (7.7) compare to
those of (7.4) as k → ∞? For this we have the following:
Proposition 7.4. Let U ∈ B(H), x0 ∈ l1 (N) and PΩ be a finite rank projection. Then, for all sufficiently
large k ∈ N, there exists an ξk ∈ H satisfying
kξk kl1 = inf {kηkl1 : PΩ U Pk η = PΩ U x0 } .
η∈H
Moreover, for every > 0 there is a K ∈ N such that, for all k ≥ K, we have kξk − ξ˜k kl1 ≤ , where ξ˜k
satisfies
kξ˜k kl1 = inf {kηkl1 : PΩ U η = PΩ U x0 } .
(7.8)
η∈H
In particular, if there exists a unique minimiser x0 of (7.8), then ξk → x0 in the l1 norm.
This proposition states that the computed solutions of (7.7) will be approximate minimisers of (7.4) for all
sufficiently large k. In particular, computed solutions will approximately satisfy (7.5). Note that it resolves
question (iv) posed in §5.
Remark 7.3 The amount of subsampling depends on the incoherence parameter υ(U ). For a specific operator U this is fixed, although it can be arbitrarily small. The fact that it is fixed suggests that for large enough
M and N subsampling will not be possible – i.e. we must take m = N . However, if U has the property that
υ(U Pk⊥ ) → 0 as k → ∞, one can circumvent this problem. This is achieved via semi-random subsampling
techniques. This is not within the scope of this paper but will be treated elsewhere [5] (see also §12).
7.3
Theorems on finite-dimensional CS
As mentioned, GS–CS generalizes standard finite-dimensional CS to an infinite-dimensional setting. It is
therefore unsurprising, but important to note nonetheless, that results concerning the latter can be obtained
as straightforward corollaries of Theorems 7.1–7.3. In particular, we have
Theorem 7.5. Let U ∈ Cn×n be an isometry, and suppose that x0 ∈ Cn with supp(x0 ) = ∆. For > 0
suppose that m ∈ N is such that
m ≥ C · n · υ 2 (U ) · |∆| · log(−1 ) + 1 · log n,
(7.9)
for some universal constant C, and let Ω ⊂ {1, . . . , n} be chosen uniformly at random with |Ω| = m. If
ζ = U x0 then, with probability exceeding 1 − , the problem
min kηkl1 subject to PΩ U η = PΩ ζ,
η∈Cn
has a unique solution ξ and this solution coincides with x0 .
18
Theorem 7.6. Let U ∈ Cn×n be an isometry, and suppose that y0 = x0 + h ∈ Cn with supp(x0 ) = ∆.
For > 0 suppose that m ∈ N satisfies (7.9), and let Ω ⊂ {1, . . . , n} be chosen uniformly at random with
|Ω| = m. If ζ = U y0 then, with probability exceeding 1 − , any minimizer ξ of the problem
min kηkl1 subject to PΩ U η = PΩ ζ,
η∈Cn
satisfies
kξ − y0 k ≤
20n
m
+ 11 +
m
2n
khkl1 .
e on H. Note that (U
e )∗ PN U
e = PN , for N = n.
Proof. U extends in the obvious way to a partial isometry U
e to an isometry U
b on H such that υ(U
b ) = υ(U ). Therefore, the weak
We may, in an obvious way extend U
balancing property is automatically satisfied for M = N and any m ∈ N. We now apply Theorem 7.1 or
Theorem 7.2.
Remark 7.4 Similar results on finite-dimensional compressed sensing has recently been proved by Candès
& Plan [13]. However, we stress that their analysis is strictly for the finite-dimensional case. In particular, it
cannot be applied to the infinite-dimensional setting considered in this paper.
8
Numerical examples
Before presenting proofs of these theorems, it is useful to see some further examples of GS–CS. We will
demonstrate the main premise of this paper in practice – in particular, one of the original motivations for GS:
provided one knows that the function f has a good representation in terms of a different basis then one can
obtain a far better reconstruction of f than that guaranteed by the Shannon Sampling Theorem. Consider the
problem of reconstructing g = Ff and f from the samples {ζj (f )}j∈N where ζj (f ) = Ff (ρ(j)), > 0
(we will use = 1/2) and ρ is defined in (4.10). We now compare three methods for approximating f and g:
(i) The Shannon reconstructions fN and gN (see (4.8)).
(ii) The GS reconstructions fN,M and gN,M (see Theorem 4.2).
(iii) The GS–CS reconstructions
fN,m,k (t) =
k
X
αj ϕj (t),
gN,m,k (t) =
j=1
k
X
αj Fϕj (t),
j=1
where α = {α1 , . . . , αk } is computed via the convex optimization problem (5.2).
Note that fN,M and gN,M use exactly the same samples as fN and gN . Moreover, fN,m,k and gN,m,k use a
subset of the samples used by the previous function, in particular less sampling information is needed.
If f is sparse or has rapidly decaying coefficients in Haar wavelets, then we expect (i) to give a very poor
reconstruction. However, both the GS and GS–CS methods should give very good reconstructions, with the
latter taking advantage of the sparsity to reduce the number of Fourier coefficients sampled (recall that GS
does not exploit any sparsity – it offers guaranteed recovery for all functions f by using the full range of
Fourier coefficients).
8.1
First example
As a first example, let us consider the function g = Ff
f (t) =
200
X
9 (t),
αj ϕj (t) + cos(2πx)χ[ 12 , 16
]
t ∈ [0, 1],
(8.1)
j=1
1 9
9
where {ϕj }j∈N are Haar wavelets on [0, 1] and χ[ 21 , 16
] is the indicator function of the interval [ 2 , 16 ]. Suppose that |{j : αj 6= 0}| = 25, so that f can be decomposed into a sparse component and a remainder. Note
that the remainder has infinite support in the Haar wavelet basis, so this function belongs to the fully-infinite
dimensional model (see §3.2).
19
−5
−5
1.5
x 10
x 10
1
0.5
0
−5000
4
4
2
2
0
5000 −5000
0
0
5000 −5000
0
0
5000
Figure 6: The figure displays the errors |g(t)−gN (t)| (left), |g(t)−gN,M (t)| (middle) and |g(t)−gN,m,k (t)|
(right) against t, for N = 601, M = 200, m = 230 and k = 650.
N
601
1201
kg − gN kL∞
1.43
0.85
kg − gN,M kL∞
−5
kg − gN,m,k kL∞ (avg. 20 trls)
4.74 × 10 , (M = 200)
2.36 × 10−5 , (M = 400)
4.73 × 10−5 , (m = 230, k = 550)
2.38 × 10−5 , (m = 460, k = 1400)
Table 1: The tables displays the errors for the reconstructions gN , gN,M and gN,m,k .
In Figure 6 we display the errors committed by the approximations (i)–(iii) for this function. As expected,
the expansion in sinc functions (i) gives an extremely poor reconstruction, whereas both the GS and GS–CS
give far better approximations. Specifically, by replacing the sinc series (i) with either (ii) or (iii) one reduces
the error by a factor of roughly 10, 000. Moreover, and also as expected, the GS–CS approximation attains
the same numerical error as the GS approximation using only around 38% of the Fourier samples. These
observations are confirmed in Table 1.
Whilst the GS and GS–CS methods give very similar numerical errors it is important to notice that
the reconstructions fN,M and fN,m,k are typically very different. In particular, in GS one reconstructs
approximately the first M Haar wavelet coefficients α1 , . . . , αM , where M < N typically. On the other
hand, in GS–CS one computes k such coefficients, where typically (although not always) k > N .
This discrepancy can be explained by examining the equations (4.5) and (5.2). In GS, which corresponds
to (4.5), one requires M < N to ensure invertibility of the operator A. On the other hand, unless k is taken
sufficiently large, (5.2) need not have a solution, since the right-hand side PΩ ζ(f ) may not lie in the range
of the (finite-dimensional) section
PΩ U Pk : Ck → C|Ω| .
In particular, this may well be the case whenever k < N . Fortunately, as shown in Proposition 7.4, this
cannot happen if k is sufficiently large. The effect of increasing k for the example (8.1) is illustrated in Table
2: once k is sufficiently large, the problem (5.2) has a solution, and this error drops accordingly.
8.2
Second example
Consider now the function
f (t) =
500
X
αj ϕj (t),
(8.2)
j=1
where |{j : αj 6= 0}| = 100. This function is sparse in the Haar wavelet basis (and hence is an example of
the semi-infinite dimensional model). The task is to reconstruct f from its Fourier samples.
Unlike the previous example, we expect exact reconstruction of this function using both the GS and GS–
CS approaches, provided the parameters are chosen correctly. This is confirmed in Table 3. Observe that the
Fourier series of f requires over 50, 000 Fourier samples to achieve four digits of accuracy. Conversely, the
GS approximation recovers f exactly using only 1501 such samples. Furthermore, the GS–CS approximation
improves over GS by a factor of three: it requires only 450 Fourier samples in total (the reader is referred to
Remark 2.4 once more here).
20
EN,m,k = kg − gN,m,k kL∞ (avg. 20 trials)
N
601
1201
EN,230,200 = ∞
EN,460,400 = ∞
EN,230,350 = ∞
EN,460,500 = ∞
EN,230,550 = 4.759 × 10−5
EN,460,1000 = 2.384 × 10−5
EN,230,850 = 4.727 × 10−5
EN,460,1300 = 2.392 × 10−5
Table 2: The table shows the error kg−gN,m,k kL∞ for different values of N, m and k (the notation EN,m,k =
∞ means that (5.2) does not have a solution).
N
1001
1501
2001
3001
50001
kf − fN kL2
4.19
1.43
1.39
1.37
2.84 × 10−4
kf˜ − fN,M kL2
−2
8.47 × 10 , (M = 500)
4.74 × 10−15 , (M = 500)
4.33 × 10−15 , (M = 500)
4.45 × 10−15 , (M = 500)
kf − fN,m,k kL2 (avg. 20 trials)
5.53 × 10−1 , (m = 450, k = 900)
1.06 × 10−10 , (m = 450, k = 900)
1.99 × 10−10 , (m = 450, k = 900)
1.98 × 10−10 , (m = 450, k = 900)
Table 3: The table shows the error corresponding to the reconstructions fN , fN,M and fN,m,k of the function
(8.2).
8.3
Interlude: outline of the remainder of the paper
In this first half of this paper we explained why conventional CS techniques are ill-suited to infinite-dimensional
problems, and introduced a new framework GS–CS that successfully overcomes this problem. The main results concerning GS–CS are stated in §7. In the remainder of this paper we present proofs of these results.
Some of these are quite technical in nature. The reader who is not concerned with their details may go
straight to §12.
9
Infinite-dimensional optimization and Proposition 7.4
We begin the second half of this paper with a proof of Proposition 7.4. As the informed reader will have
noticed, this is really an question of infinite-dimensional optimization: in particular, showing the existence
of minimizers to the finite-rank discretizations of an infinite-dimensional optimization problem, and their
convergence to minimizers of that problem. For this reason, we now recap some of the basics of this field.
The well-informed reader may proceed directly to Proposition 9.4. Also, some of the results below, although
new, are included only for completeness, and the reader only interested in the proof of the main theorems
may go directly to Proposition 10.4.
9.1
Infinite-dimensional optimization
The field of infinite-dimensional convex optimization is certainly not new [24, 35]. However, it is much less
standard than the more thoroughly investigated topic of finite-dimensional convex optimization. We will
now cover some of the basic tools that will subsequently prove useful.
In this paper we will consider complex vector spaces. Standard optimization theory is usually considered
over the reals, and this is also the case in [24] (the main reference we consider herein for the field of infinitedimensional optimization). To be able to able to quote [24] freely we use the standard trick and consider any
e is the real Banach space induced by X
complex Banach space X as a real vector space. In particular, if X
then
e ∗ = {Re(x∗ ) : x∗ ∈ X ∗ }.
X
This follows by the observation that if x∗ ∈ X ∗ and u = Re(x∗ ) then u is a real linear functional. Also, if
e ∗ and x∗ : X → C is defined by x∗ (x) = u(x) − iu(ix), then x∗ ∈ X ∗ . To avoid unnecessary clutter
u∈X
we will (with slight abuse of notation) use X as the notation for X̃.
Definition 9.1. Let X be a Banach space and let F : X → R. The polar function F ∗ : X ∗ → R is defined
by
F ∗ (x∗ ) = sup {Re(x∗ (x)) − F (x)},
x∈X
21
where R = R ∪ {−∞, ∞}.
Definition 9.2. Let X be a Banach space, F : X → R be convex and consider the following problem
(P ) :
inf{F (x) : x ∈ X}.
If Y is a Banach space and Φ : X × Y → R ∪ {∞} is a convex lower semi-continuous function such that
Φ(x, 0) = F (x) for all x ∈ X, then the dual problem P ∗ with respect to Φ is defined by
(P ∗ ) :
sup{−Φ∗ (0, y ∗ ) : y ∗ ∈ Y ∗ }.
If Φ is not specified we will say that (P ∗ ) is a dual problem for (P ).
Let X and Y be Banach spaces and suppose that T ∈ B(X, Y ) and y0 ∈ Y . Consider the problem
(P1 ) :
inf{kxk : x ∈ X, T x = y0 }.
Note that (P1 ) can be written as the equivalent convex optimization problem:
(Pe1 ) :
inf{F (x) + G(T x), x ∈ X},
(9.1)
where F (x) = kxk and G : Y → R ∪ {∞} is defined by G(z) = δ{0} (z − y0 ). Here the function
δC : Y → R ∪ {∞}, where C ⊂ Y is convex, is defined by δC (z) = 0 if z ∈ C and δC (z) = ∞ if z ∈
/ C.
Moreover, by letting Φ : X × Y → R ∪ {∞} be defined by
Φ(x, y) = F (x) + G(T x + y),
(9.2)
and observing that
Φ∗ (x∗ , y ∗ ) = F ∗ (x∗ − T 0 y ∗ ) + G∗ (y ∗ ),
where T 0 : Y ∗ → X ∗ denotes the dual mapping, we also obtain the following dual problem with respect to
Φ:
(P1∗ ) : sup{−F ∗ (−T 0 y ∗ ) − G∗ (y ∗ ) : y ∗ ∈ Y ∗ }.
Much like (P1 ) and (P̃1 ), the dual problem (P1∗ ) also has an equivalent form. In fact, since F ∗ (x∗ ) = 0 if
kx∗ kX ∗ ≤ 1 and F ∗ (x∗ ) = ∞ if kx∗ kX ∗ > 1, together with the observation that
G∗ (y ∗ ) = sup{Re(y ∗ (y)) − δ{0} (y − y0 ), y ∈ Y } = Re(y ∗ (y0 )),
we find that
(P1∗ ) :
sup{Re(y ∗ (y0 )) : kT 0 y ∗ kX ∗ ≤ 1, y ∗ ∈ Y ∗ }.
Using these ideas we obtain the following well-known result [24]:
Proposition 9.3. Let X and Y be Banach spaces and suppose that T ∈ B(X, Y ) and y0 ∈ Y. If T is onto,
then
inf{kxk : x ∈ X, T x = y0 } = sup{Re(y ∗ (y0 )) : kT 0 y ∗ kX ∗ ≤ 1, y ∗ ∈ Y ∗ }.
Proof. Let F, G and Φ be as in (9.1) and (9.2) respectively, and define h : Y → R ∪ {∞} by
h(y) = inf Φ(x, y).
x∈X
Then h is convex and, since T is onto, h is finite for all y ∈ Y . Therefore, by convexity, h is also continuous,
and, in particular continuous at zero. The result follows now by [24, Prop. 3.3.5].
9.2
Proof of Proposition 7.4
We are now in a position to prove Proposition 7.4. We first require the following:
Lemma 9.4. Let U ∈ B(H) and P be a finite rank projection. Then, for every χ ∈ Ran(P U ), there exists
ξ ∈ H satisfying
inf
kηkl1 subject to P U η = χ.
1
η∈l (N)
22
Proof. Recall that (c0 )∗ = l1 . By weak∗ compactness there is a sequence {ξk } ⊂ l1 and a ξ ∈ l1 such that
P U ξk = χ, kξk kl1 & inf{kηkl1 : P U η = χ} and hξk , ej i → hξ, ej i as k → ∞ for all j ∈ N. It follows that
kξkl1 ≤ limk→∞ kξk kl1 . Since ξk → ξ weakly as elements in H it follows by the fact that P U is compact
(since P is of finite rank) that P U ξk → P U ξ. Hence, P U ξ = χ, as required.
We now give a proof of Proposition 7.4:
Proof of Proposition 7.4. To see the existence of ξk for large k it suffices to observe that Ran(PΩ U ) and
Ran(PΩ U Pk ) coincide for all sufficiently large k, since PΩ has finite rank.
For the second part of the proposition, it is easy to see that it suffices to show that every subsequence of
{ξk }k∈N has a convergent subsequence in the l1 norm with limit ξ satisfying
kξkl1 = inf {kηkl1 : PΩ U η = PΩ U x0 } .
η∈H
(9.3)
Let therefore {ξk }k∈N be a subsequence of the original sequence (we use the same notation for simplicity).
Since kξk kl1 ≥ kξk+1 kl1 for all large k it follows that {ξk } is bounded. So by weak∗ compactness of the
l1 ball we have that, by possibly passing to a subsequence, there is a ξ ∈ H such that ξk → ξ weakly (as
elements in H) as k → ∞. By compactness of PΩ U we find that PΩ U ξk → PΩ U ξ as k → ∞, and, since
PΩ U ξk = PΩ U x0 , it follows that PΩ U ξ = PΩ U x0 .
To see that ξ satisfies (9.3) we argue as follows. We claim that for any λ > 0 we have
kξk kl1 ≤ inf {kηkl1 : PΩ U η = PΩ U x0 } + λ,
η∈H
(9.4)
for all sufficiently large k. Let r = dim(Ran(PΩ U )) < ∞, and let ê1 , . . . , êr be coordinate vectors such
that span{PΩ U êj }rj=1 = Ran(PΩ U ). Then every η ∈ Ran(PΩ U ) with kηk = 1 can be written as η =
c1 PΩ U ê1 + . . . + cr PΩ U êr , where the cj s are bounded by, say, 1 ≤ c < ∞. Now let ξ˜ be a minimizer
of (9.3) (the existence of such a minimizer is guarantied by Proposition 9.4), and choose k so large that
˜ ≤ λ/(2cr) and kP ⊥ ξk
˜ ≤ λ/2. Let c1 , . . . , cr be chosen such that
{êj }rj=1 ⊂ Ran(Pk ), kPΩ U Pk⊥ ξk
k
⊥˜
⊥˜
˜
˜
PΩ U Pk⊥ ξ/kP
Ω U Pk ξk = c1 PΩ U ê1 + . . . + cr PΩ U êr , and set η̃ = Pk ξ + (c1 ê1 + . . . cr êr )kPΩ U Pk ξk. It
˜
˜
˜
follows that PΩ U η̃ = PΩ U ξ = PΩ U x0 , kη̃kl1 ≤ kξkl1 + λ and η̃ ∈ Ran(Pk ). Hence kξk kl1 ≤ kξkl1 + λ
⊥
⊥
ξkl1 .
ξkl1 ≤ λ. Then kξkl1 ≤ kPm ξkl1 + kPm
and we have shown (9.4). Now choose m ∈ N such that kPm
But Pm ξk → Pm ξ and ξk satisfies (9.4), thus kξkl1 ≤ inf η∈H {kηkl1 : PΩ U η = PΩ U x0 } + 2λ for any
λ > 0. Therefore ξ satisfies (9.3), as required.
For the final part of the proof, we are required to show that kξk − ξkl1 → 0 as k → ∞. By possibly
passing to another subsequence, it follows by (9.4) that
kξk kl1 ≤ inf {kηkl1 : PΩ U η = PΩ U x0 } + 1/k.
η∈H
(9.5)
Note also that, for fixed m ∈ N, we have Pm (ξk − ξ) → 0 as k → ∞. But by (9.5) we also have kPm ξk kl1 +
⊥
⊥
kPm
ξk kl1 ≤ kPm ξkl1 + kPm
ξkl1 + 1/k. So
⊥
lim lim sup kPm
ξk kl1 = 0.
m→∞ k→∞
It thus follows that ξk → ξ (in l1 ) as k → ∞, and we are done.
9.3
Existence of unique minimizers
In what follows it will be useful to have several results on the existence of unique minimizers of such
problems. The finite-dimensional version of the following proposition has become standard for showing
existence of unique minimizers for finite-dimensional problems found in CS [15]. Fortunately, the extension
to infinite dimensions is rather straightforward:
Proposition 9.5. Let U ∈ B(H) be unitary and let Ω, ∆ ⊂ N be such that |Ω|, |∆| < ∞. Suppose that
x0 ∈ H and that supp(x0 ) = ∆. Consider the optimization problem
inf kηkl1 subject to PΩ U η = PΩ U x0 .
η∈H
If x0 is the unique minimizer of (9.6), then there exists a vector ρ ∈ H such that
23
(9.6)
(i) ρ = U ∗ PΩ η for some η ∈ H
(ii) hρ, ej i = hsgn(x0 ), ej i,
(iii) |hρ, ej i| < 1,
j∈∆
j∈
/ ∆.
Conversely, if (i)-(iii) are satisfied and in addition PΩ U P∆ : P∆ H → PΩ H has full rank, then x0 is the
unique minimizer of (9.6).
This proposition (for a proof, see the Appendix) may be a little hard to work with in practice. However,
a more convenient result with somewhat relaxed assumptions can also be obtained.
Proposition 9.6. Let U ∈ B(H) with kU k ≤ 1 and suppose that ∆ and Ω are finite subsets of N. Let x0 ∈ H
such that supp(x0 ) = ∆. Let M ∈ N and suppose that M is so large that ∆ ⊂ {1, . . . , M }. Let ξ, ξM ∈ H
such that
kξkl1 = inf {kηkl1 : PΩ U η = PΩ U x0 },
η∈H
kξM kl1 = inf {kηkl1 : PΩ U PM η = PΩ U x0 }.
η∈H
Suppose that there is a ρ ∈ ran(U ∗ PΩ ) and a q > 0 with the following properties
(i) kq −1 P∆ U ∗ PΩ U P∆ − P∆ k ≤ 1/2,
√
(ii) kP∆ ρ − sgn(x0 )k ≤ q/4,
⊥
(iii) kP∆
ρkl∞ ≤ 1/2,
⊥
ρkl∞ ≤ 1/2 then ξM = x0
then ξ = x0 . Also, if (i) and (ii) are satisfied and (iii) is replaced with kPM P∆
q
⊥
Proof. Let ζ = ξ − x0 . We will show that ζ = 0. We begin by showing that kP∆ ζk ≤ 2q kP∆
ζk. This
follows from some simple observations. First note that by a small computation and (i) we have
q
kPΩ U P∆ ζk2 ≥ q(1 − kq −1 P∆ U ∗ PΩ U P∆ − P∆ k)kP∆ ζk2 ≥ kP∆ ζk2 .
2
q
⊥
⊥
⊥
ζk. Thus, if kP∆ ζk > 2q kP∆
ζk ≥ kPΩ U P∆
ζk we get
Also, by assumption, we obviously have kP∆
⊥
⊥
kPΩ U P∆ ζk > kP∆
ζk ≥ kPΩ U P∆
ζk.
Since PΩ U ζ = 0 this is a contradiction.
Let us now note the following: for j ∈ ∆ we have
|(x0 + ζ)(j)| = ||(x0 )(j)| + ζ(j)sgn(x0 )(j)| ≥ |(x0 )(j)| + Re(ζ(j)sgn(x0 )(j)).
Since supp(x0 ) = ∆ we obtain
kx0 + ζkl1 ≥ kx0 kl1 + Rehζ, sgn(x0 )i +
X
|ζ(j)|,
(9.7)
j∈∆c
where ∆c = N\∆. Also, by the assumption that ρ ∈ ran(U ∗ PΩ ) and the fact that PΩ U ζ = 0, it follows
that ζ ⊥ ρ.q
Thus, using (9.7) we obtain (by applying (ii), (iii), Hölder’s inequality and finally the observation
⊥
kP∆ ζk ≤ 2q kP∆
ζk)
⊥
kx0 + ζkl1 ≥ kx0 kl1 + Rehζ, sgn(x0 ) + P∆
sgn(ζ) − ρi
⊥
⊥
≥ kx0 kl1 + kP∆
ζkl1 − (|hζ, sgn(x0 ) − P∆ ρi| + |hζ, P∆
ρi|)
√
q
1 ⊥
⊥
≥ kx0 kl1 + kP∆
ζkl1 −
kP∆ ζkl1 + kP∆
ζkl1
4
2
!
√
2 ⊥
1 ⊥
⊥
≥ kx0 kl1 + kP∆ ζkl1 −
kP∆ ζkl1 + kP∆ ζkl1 .
4
2
(9.8)
Thus, if ζ 6= 0 this gives kx0 + ζkl1 > kx0 kl1 contradicting the fact that kξkl1 ≤ kx0 kl1 . Hence ζ = 0,
and this gives the first part of the proposition. The argument for the second part of the proposition is almost
identical. By letting ζ = ξM − x0 we may use exactly the same analysis as previously, except for the
transition from the second line in (9.8) to the third line. In that case, since ζ ∈ ran(PM ), we only need the
⊥
requirement that kPM P∆
ρkl∞ ≤ 1/2.
24
10
Stability analysis for infinite-dimensional convex optimization
In the previous section we established conditions that guarantee recovery of x0 ∈ l1 (N) by solving
min kηkl1 subject to PΩ U η = PΩ U x0 ,
η∈l1 (N)
(10.1)
and its finite-dimensional approximations
min kηkl1 subject to PΩ U Pk η = PΩ U x0 .
η∈l1 (N)
(10.2)
In particular, we gave a proof of Proposition 7.4.
We now consider the issue of stability in such optimization problems. In other words, we consider the
effect of replacing x0 by x0 + h, where h is small in norm, on the minimizers ξ and ξk of (10.1) and (10.2)
respectively. Note that this is the first step towards a proof of Theorems 7.2 and 7.3 concerning the recovery
of compressible signals which are described by the semi/fully infinite-dimensional models §3. However,
at this moment we do not consider either sparsity or randomness. This comes in §11, in which the results
proved in this and the previous section are applied to the sparse recovery problems (7.1) and (7.4) to yield
proofs of Theorems 7.1–7.3.
10.1
Stability
Stability turns out to be a rather subtle issue. We now illustrate why.
Definition 10.1. Let Ω, ∆ be finite subsets of N, U ∈ B(H) and let f : R+ → R+ be a continuous function
such that limt→0 f (t) = 0. If ξ ∈ H, supp(ξ) = ∆, is the unique minimizer of
inf{kηkl1 : PΩ U η = PΩ U ξ},
(10.3)
and for any > 0 and ζ ∈ H such that kξ − ζkl1 ≤ , we have that
kx − ξkl1 ≤ f (),
where x is a minimizer of inf{kηkl1 : PΩ U η = PΩ U ζ}, then we say that {U, Ω, ∆} is locally f -stable at ξ.
If f (t) = Ct for some constant C > 0 then {U, Ω, ∆} is said to be locally linearly stable at ξ. We say that
{U, Ω, ∆} is globally f -stable (linearly stable) if the above statements hold for all ξ ∈ H, supp(ξ) = ∆,
such that ξ is the unique minimizer of (10.3).
Proposition 10.2. Let U ∈ B(H) be unitary and let Ω, ∆ be finite subsets of N. Suppose that {U, Ω, ∆} is
globally f -stable. Suppose also that there exists x ∈ H, supp(x) = ∆, such that x is the unique minimizer
of inf{kηkl1 : PΩ U η = PΩ U x}. Then, if (PΩ U P∆ )∗ PΩ U P∆ |P∆ H is invertible, and y ∈ H, supp(y) = ∆,
is arbitrary, then y is the unique minimizer of inf{kηkl1 : PΩ U η = PΩ U y}.
Proposition 10.3. Let U ∈ B(H) be unitary and let Ω, ∆ be finite subsets of N. Suppose that for any
ξ ∈ H, supp(ξ) = ∆, then ξ is the unique minimizer of inf{kηkl1 : PΩ U η = PΩ U ξ}, and also that
(PΩ U P∆ )∗ PΩ U P∆ |P∆ H is invertible. Then, {U, Ω, ∆} is globally linearly stable.
These results establish the relationship between global stability and the existence of unique minimizers
(proofs are given in the Appendix). In particular, existence of unique minimizers for all y with supp(y) = ∆
is (almost) equivalent to global stability. Thus, global stability is a rather strict condition and may be difficult
to achieve. However, we will be concerned with a fixed signal to recover and hence global stability may not
be necessary. Conditions in order to establish local stability are the topic in the next section.
10.2
The key result
The key result of this section, which will later lead to the proofs of Theorems 7.1–7.3, is the following:
Proposition 10.4. Let U ∈ B(H) with kU k ≤ 1, and suppose that ∆ and Ω are finite subsets of N. Let
x0 , h ∈ H be such that supp(x0 ) = ∆ and khkl1 < ∞, and suppose that ∆ ⊂ {1, . . . , M } for some
M ∈ N. Let ξ, ξM ∈ H satisfy
kξkl1 = inf {kηkl1 : PΩ U η = PΩ U (x0 + h)},
η∈H
25
(10.4)
kξM kl1 = inf {kηkl1 : PΩ U PM η = PΩ U (x0 + PM h)}.
η∈H
If there exists ρ ∈ ran(U ∗ PΩ ) and q > 0 with the following properties:
(i) kq −1 P∆ U ∗ PΩ U P∆ − P∆ k ≤ 1/2,
(ii) kP∆ ρ − sgn(x0 )k ≤ q/8,
⊥
(iii) kP∆
ρkl∞ ≤ 1/2,
then
kξ − x0 k ≤
20
q
+ 10 +
q
2
khkl1 .
(10.5)
⊥
Also, if (i) and (ii) hold and (iii) is replaced with kPM P∆
ρkl∞ ≤ 1/2 then
q
20
+ 10 +
kPM hkl1 .
kξM − x0 k ≤
q
2
(10.6)
Proof. Note that (10.4) and (i) yield
⊥
PΩ U (x0 − P∆ ξ) = PΩ U (P∆
ξ − h)
⇒
⊥
P∆ U ∗ PΩ U (x0 − P∆ ξ) = P∆ U ∗ PΩ U (P∆
ξ − h)
⇒
(10.7)
⊥
x0 − P∆ ξ = (P∆ U ∗ PΩ U P∆ )−1 P∆ U ∗ PΩ U (P∆
ξ − h).
(note that (i) implies that P∆ U ∗ PΩ U P∆ is invertible). Hence, from (i) and (10.7), and by using the fact that
kU k ≤ 1 we obtain
⊥
kx0 − P∆ ξk ≤ 2/qkP∆
ξ − hk.
(10.8)
Thus,
kx0 − ξk ≤
⊥
2/qkP∆
ξ
− hk +
⊥
kP∆
ξk
≤
2
2
⊥
+ 1 kP∆
ξkl1 + khkl1 .
q
q
(10.9)
⊥
The rest of the proof is therefore devoted to show that kP∆
ξkl1 is bounded by a constant times khkl1 .
∗
Note that the fact that ρ ∈ ran(U PΩ ) and PΩ U (ξ − (x0 + h)) = 0 implies that hξ, ρi = hx0 + h, ρi.
Thus, it follows, by appealing to (iii), that
Re(hx0 , ρi) + Re(hh, ρi) = Re(hξ, ρi) ≤ Re(hξ, P∆ ρi) +
1 X
|ξ(j)|.
2
c
(10.10)
j∈∆
Hence, from (10.10), (ii) and (iii) we get
Re(hx0 − ξ, P∆ ρi) − (1 + q/8)khkl1 ≤
1 X
|ξ(j)|,
2
c
j∈∆
and, by some simple adding and subtracting, we obtain
1 ⊥
⊥
ξkl1 .
Re(hx0 − ξ, P∆ ρi) − kP∆
ξkl1 −(1 + q/8)khkl1 ≤ − kP∆
2
(10.11)
We return to this equation, but for the meantime we will continue to investigate the quantity Re(hx0 −
⊥
ξ, P∆ ρi) − kP∆
ξkl1 . Note that
⊥
kx0 kl1 − kξkl1 + Re(hx0 − ξ, P∆ ρ − sgn(x0 )i) ≤ Re(hx0 − ξ, P∆ ρi) − kP∆
ξkl1 .
(10.12)
Moreover, because of (10.4) and (10.12), it follows that
⊥
−khkl1 + Re(hx0 − ξ, P∆ ρ − sgn(x0 )i) ≤ Re(hx0 − ξ, P∆ ρi) − kP∆
ξkl1 ,
so by appealing to (10.13), (10.8) and (ii) we find that
⊥
⊥
−khkl1 − 1/4kP∆
ξ − hk ≤ Re(hx0 − ξ, P∆ ρi) − kP∆
ξkl1 ,
26
(10.13)
and thus
5
1 ⊥
⊥
− khkl1 − kP∆
ξkl1 ≤ Re(hx0 − ξ, P∆ ρi) − kP∆
ξkl1 .
4
4
By inserting the latter into (10.11) we finally obtain
⊥
kP∆
ξkl1 ≤ (9 + q/2)khkl1 .
(10.14)
Substituting (10.14) into (10.9) now yields (10.5). The proof of (10.6) is almost identical, and we omit the
details.
11
Proofs of the main results
11.1
The Idea
Before we present proofs of Theorems 7.1–7.3, we would like to sketch the key ideas. Our approach is to
use Proposition 10.4 to show the existence of some ρ ∈ ran(U ∗ PΩ ) with the following properties
(i) kθ−1 P∆ U ∗ PΩ U P∆ − P∆ k ≤ 1/2,
(ii) kP∆ ρ − sgn(x0 )k ≤ θ/8
⊥
(iii) kPM P∆
ρkl∞ ≤ 1/2,
for some θ > 0 (recall the setup in Theorems 7.1 and 7.2).
Throughout the paper we will be concerned with randomly choosing a set Ω ⊂ {1, . . . , N }. In our
models we will choose Ω uniformly at random, however, in some of the proofs we will also use another
approach that renders the analysis possible, whilst not affecting the model unduly. We will typically take a
sequence {δ1 , . . . δN } of independent identically distributed Bernoulli random variables taking values 0 and
1 with P(δj = 1) = q for all j and let Ω = {j : δj = 1}. We will refer to this type of random selection of Ω
as the Bernoulli model and we will denote such a procedure by {N, . . . , 1} ⊃ Ω ∼ Ber(q).
We will assume that {N, . . . , 1} ⊃ Ω ∼ Ber(θ), for some finite N ∈ N. However, we will construct Ω
in an equivalent, but slightly different way. Namely, we let
Ω = Ω1 ∪ Ω2 ∪ · · · ∪ Ωµ ,
Ωj ∼ Ber(qj ),
where the specific value of µ will be determined later. Note that as long as the qj s are chosen according to θ
this is equivalent to letting Ω ∼ Ber(θ). Indeed, we have that Ω ∼ Ber(θ) is equivalent to Ωc ∼ Ber(1 − θ).
So, for k ∈ {1, . . . , N }, we have
P(k ∈ Ωc ) = (1 − θ),
where Ωc = {1, . . . , N }\Ω. But
P(k ∈ (Ω1 ∪ Ω2 ∪ · · · ∪ Ωµ )c ) = (1 − q1 )(1 − q2 ) · · · (1 − qµ ).
Thus, if we let
(1 − q1 )(1 − q2 ) · · · (1 − qµ ) = (1 − θ)
(11.1)
it is easy to see (by independence) that the two models are equivalent. Note that, obviously, there might be
overlaps between the Ωj s. This automatically gives us the following:
q1 + q2 + . . . + qµ ≥ θ.
This fact will be used several times in the arguments that follow and is a very crucial observation. We can
now present the Golfing Scheme.
11.2
The Golfing Scheme
Let U ∈ B(H) be an isometry and let {N, . . . , 1} ⊃ Ωj ∼ Ber(qj ) for j = 1, . . . , µ for some µ ∈ N where
the qj s satisfy (11.1) for some 0 < θ ≤ 1. Suppose also that x0 ∈ H. Define the operator
EΩj = U ∗ PΩj U,
27
j = 1, . . . , µ.
The construction of ρ is based on the following idea. Let
ρ = Yµ ,
Yi =
i
X
qj−1 EΩj Zj−1
(11.2)
j=1
Zi = sgn(x0 ) − P∆ Yi ,
Z0 = sgn(x0 ),
where the specific value of µ will be determined later. The construction suggested in (11.2) will be referred to
as the golfing scheme, and is a variant of the extremely useful original golfing scheme introduced in [27] by
David Gross. The actual construction will differ slightly from the one suggested here, however, this should
give the reader an idea about the approach. Before we can prove the theorems we need to establish some
results that will be crucial in the construction of ρ.
11.3
The Proofs
We first require the following three results. Proofs are found in the Appendix:
Proposition 11.1. Let U ∈ B(H) be an isometry. Let {N, . . . , 1} ⊃ Ω ∼ Ber(q) for some 0 < q ≤ 1, and
∆ ⊂ N with |∆| < ∞. Also, let M ∈ N be so large that ∆ ⊂ {1, . . . , M } and define EΩ = U ∗ PΩ U . Then,
for η ∈ H and t, γ > 0
⊥
⊥ ∗
P kq −1 PM P∆
EΩ P∆ ηkl∞ > (t + kPM P∆
U PN U P∆ kmr )kηk ≤ γ
(11.3)
provided
q≥
!
√
2 2p
4 c
4
+
|∆| · log
|∆ ∩ {1, . . . , M }| · υ 2 (U ).
t2
3t
γ
Also,
⊥
⊥ ∗
P kq −1 P∆
EΩ P∆ ηkl∞ > (t + kP∆
U PN U P∆ kmr )kηk ≤ γ
whenever
q≥
(11.4)
!
√
4
2 2p
+
|∆| · log (4ω/γ) · υ 2 (U ),
t2
3t
where ω = ω̃M,U (|∆|, tq, N ) (recall ω̃M,U from (6.2)). In addition, if q = 1,the left-hand sides of (11.3) and
(11.4) are equal to zero.
Proposition 11.2. Let U ∈ B(H) be an isometry, ∆ ⊂ N with |∆| < ∞ and {N, . . . , 1} ⊃ Ω ∼ Ber(q) for
some 0 < q ≤ 1. Then, for η ∈ H and t, γ > 0, we have
P q −1 P∆ U ∗ PΩ U P∆ − P∆ η > (t + kP∆ U ∗ PN U P∆ − P∆ k) kηk ≤ γ,
provided
q(1 − q)−1 ≥ 4t−2 · υ 2 (U ) · |∆|,
and
t max{q −1 − 1, 1}
2K
K
−1
2
log 1 +
≥
max{q − 1, 1} · υ (U ) · |∆| · log
,
2q −1 (1 − q)
t
γ
where K is the constant in Talagrand’s Theorem 14.2.
Theorem 11.3. There exists a constant C > 0 with the following property. Suppose that U ∈ B(H) is an
isometry, ∆ a finite subset of N and {N, . . . , 1} ⊃ Ω ∼ Ber(θ) for some 0 < θ ≤ 1 . Then, for > 0 and
γ > 0 we have that
1
P θ−1 P∆ U ∗ PΩ U P∆ − P∆ ≥ + kP∆ U ∗ PN U P∆ − P∆ k ≤ ,
(11.5)
γ
provided that
θ ≥ C · γ · υ 2 (U ) · |∆| · log(|∆|),
θ ≥ C · γ · υ 2 (U ) · |∆| · log(C−1 ) · log 1 +
If θ = 1 then the left hand side of (11.5) is equal to zero.
28
1
γ (1 − θ)
−1
(11.6)
.
With these results established, we can now embark on the task of proving the main theorems of this
paper.
Proof of Theorem 7.1 and Theorem 7.2. The set Ω ⊂ {1, . . . , N } is chosen uniformly at random with
|Ω| = m. By Proposition 10.4 it suffices to show that there exists a ρ ∈ ran(U ∗ PΩ ) such that
(i) kθ−1 P∆ U ∗ PΩ U P∆ −P∆ k ≤ 1/2,
(ii) kP∆ ρ−sgn(x0 )k ≤ θ/8,
⊥
(iii) kPM P∆
ρkl∞ ≤ 1/2, (11.7)
with large probability. Note that we may (without loss of generality) replace this way of choosing Ω with the
model that {N, . . . , 1} ⊃ Ω ∼ Ber(θ) for θ = m/N (θ will have this value throughout the proof). Doing
so may only change the constant C in (7.2). This trick has almost become standard in the literature and
we will thus skip the specifics (see [14, 15] for details). Note that, as discussed in Section 11.1, the model
{N, . . . , 1} ⊃ Ω ∼ Ber(θ) is equivalent to choosing Ω as
Ω = Ω1 ∪ Ω2 ∪ · · · ∪ Ωµ ,
Ωj ∼ Ber(qj ),
for some µ ∈ N with
(1 − q1 )(1 − q2 ) · · · (1 − qµ ) = (1 − θ).
(11.8)
The latter model is the one we will use throughout the proof and the specific value of µ will be chosen later.
The theorems will follow if we can show that the conditions in (11.7) occur with probability exceeding 1 − ,
and what follows is a setup to ensure this eventually. We will focus on (ii) and (iii) in (11.7) and deal with
(i) at the end of the proof. The proof will proceed in a number of steps.
Step I (The construction of ρ): Let ν be a positive number such that ν ≤ µ and let {α1 , . . . , αµ } and
{β1 , . . . , βµ } be sequences of positive numbers. The values of µ, ν, {αi }µi=1 and {βi }µi=1 will be carefully
chosen later in the proof. Consider now the following construction of ρ : let
Z0 = sgn(x0 ),
and define recursively the sequences {Zi }µi=0 ⊂ H, {Yi }µi=1 ⊂ H and {Θi }µi=1 ⊂ N as follows: first define
Zi = sgn(x0 ) − P∆ Yi ,
Yi =
i
X
qj−1 EΩj Zj−1 ,
i = 1, 2,
j=1
where EΩj = U ∗ PΩj U, and {q1 , . . . , qµ } stem from (11.8). The precise values of the qj ’s will be chosen
later. Let also Θ1 = {1} and Θ2 = {1, 2}. Then define recursively, for i ≥ 3, the following:

−1
≤ αi kZi−1 k ,

Θi−1 ∪ {i} if P∆ − qi P∆ EΩi P∆ Zi−1
⊥
Θi =
(11.9)
and qi−1 PM P∆
EΩi P∆ Zi−1 l∞ ≤ βi kZi−1 k,


Θi−1
otherwise,
(P
−1
j∈Θi qj EΩj Zj−1 if i ∈ Θi ,
Yi =
i ≥ 3,
Yi−1
otherwise,
(
sgn(x0 ) − P∆ Yi if i ∈ Θi ,
Zi =
i ≥ 3.
Zi−1
otherwise,
Now, let {Ai }2i=1 and {Bi }4i=1 denote the following events
P∆ − q −1 P∆ EΩi P∆ Zi−1 ≤ αi kZi−1 k ,
Ai :
i = 1, 2,
i
−1
⊥
q PM P∆ EΩi P∆ Zi−1 ∞ ≤ βi kZi−1 k,
Bi :
i = 1, 2,
i
l
B3 :
|Θµ | ≥ ν,
B4 :
(∩2i=1 Ai ) ∩ (∩3i=1 Bi ).
Also, let τ (j) denote the j th element in Θµ (e.g. τ (1) = 1, τ (2) = 2 etc.) and finally define ρ by
(
Yτ (ν)
if B4 occurs ,
ρ=
sgn(x0 ) otherwise.
29
(11.10)
Note that, clearly, ρ ∈ ran(U ∗ PΩ ) if B4 occurs. Now make the following observations: a straightforward
calculation (using the fact that Z0 = sgn(x0 )) yields
−1
−1
Zτ (i) = sgn(x0 ) − P∆ qτ−1
E
sgn(x
)
+
q
E
Z
+
.
.
.
+
q
E
Z
Ω
0
Ω
1
Ω
τ
(i−1)
τ (1)
τ (2)
τ (i)
(1)
τ (2)
τ (i)
(11.11)
−1
= (P∆ − qτ (i) P∆ EΩτ (i) P∆ )Zτ (i−1) ,
i ≤ |Θµ |.
Hence, if the event B4 occurs, we have
kP∆ ρ − sgn(x0 )k = kZτ (ν) k ≤
p
|∆|
ν
Y
ατ (i) ,
(11.12)
i=1
⊥
kPM P∆
ρkl∞ ≤
ν
X
⊥
∞
kqτ−1
(i) PM P∆ EΩτ (i) Zτ (i−1) kl
i=1
≤
ν
X
βτ (i) kZτ (i−1) k ≤
p
|∆|
ν
X
βτ (i)
i=1
i=1
i−1
Y
(11.13)
ατ (j) ,
j=1
and ρ ∈ ran(U ∗ PΩ ). We will now show that with a certain choice of parameters ν, {βj }µj=1 and {αj }µj=1
then (ii) and (iii) in (11.7) are satisfied when the event B4 occurs. We delay specifying a the value for µ until
Step IV. Let L ≥ 2, (we will give a value for L in a moment) and
α1 = α2 =
1
1/2
2 log2 (L)
1
β1 = β2 = p
,
4 |∆|
It follows that
,
αi = 1/2,
3 ≤ i ≤ µ,
p
log2 (4θ−1 |∆|)
p
βi =
,
4 |∆|
ν
Y
p
|∆|
ατ (i) =
i=1
3 ≤ i ≤ µ.
p
|∆|
.
2ν log2 (L)
Hence, if
l
p m
ν = log2 8θ−1 |∆| ,
(11.14)
then it follows by (11.12) that
kP∆ ρ − sgn(x0 )k ≤ θ/8
(recall that L ≥ 2) yielding (ii) in (11.7). Also, after inserting the values of ν, {βj }µj=1 and {αj }µj=1 into
(11.13) we get:
p
|∆|
ν
X
i=1
1
=
4
βτ (i)
i−1
Y
ατ (j)
j=1
!
p
p
p
1
1
1 log2 (4θ−1 |∆|) 1 log2 (4θ−1 |∆|)
1 log2 (4θ−1 |∆|)
1+
+
+
+ . . . + ν−1
2 log1/2 (L) 4
log2 (L)
8
log2 (L)
2
log2 (L)
2
1
≤ ,
2
if we let L = 4θ−1
p
|∆|. Thus, by (11.13) we have
⊥
kPM P∆
ρkl∞ ≤ 1/2,
yielding (iii) in (11.7). In particular, we have showed that, if ν, {βj }µj=1 and {αj }µj=1 are chosen as above,
then (ii) and (iii) are satisfied when B4 occurs.
Thus, we have now obtained a framework for proving (ii) and (iii) in (11.7) with a certain probability.
To do this, we will make a careful choice of µ and then provide bounds on P(B4c ). The way this latter step is
carried out is by giving estimates for P(Ac1 ∪ Ac2 ), P(B1c ∪ B2c ) and P(B3c ). This is the content of Steps II–IV.
30
Step II: We claim that, if γ > 0, then P(Ac1 ∪ Ac2 ) ≤ 2γ, provided N , q1 , q2 are chosen such that
kP∆ U ∗ PN U P∆ − P∆ k ≤
and
1
p
1/2
4 log2 (4θ−1 |∆|)
,
p q1 = q2 ≥ C · υ 2 (U ) · |∆| · log γ −1 + 1 · log θ−1 |∆| ,
(11.15)
(11.16)
for some universal constant C > 0. Also, if q1 = q2 = 1, then P(Ac1 ∪ Ac2 ) = 0.
To deduce the claim, we first observe that by Proposition 6.2 these requirements are well defined. Now
note that Proposition 11.2 gives, for i = 1, 2, t, γ > 0 and η ∈ H, that
P qi−1 P∆ U ∗ PΩi U P∆ − P∆ η > (t + kP∆ U ∗ PN U P∆ − P∆ k) kηk ≤ γ,
(11.17)
if
qi (1 − qi )−1 ≥ 4t−2 · υ 2 (U ) · |∆|,
(11.18)
and
2K
t max{qi−1 − 1, 1}
≥
log 1 +
max{qi−1 − 1, 1} · υ 2 (U ) · |∆| · log Kγ −1 ,
t
2qi−1 (1 − qi )
(11.19)
where K is the constant in Talagrand’s Theorem 14.2. Thus, by (11.17), (11.18) and (11.19) (and a small
computation using Taylor’s Theorem), we can choose t = αi /2 and deduce the first assertion in Step II. As
for the second assertion, clearly, if q1 = q2 = 1 then (11.18) and (11.19) are satisfied for all γ > 0. Thus the
left-hand side of (11.17) must be zero, and hence the last assertion follows.
Step III: We claim that, for γ > 0, then P(B1c ∪ B2c ) ≤ 2γ, if N , q1 and q2 are chosen such that
1
⊥ ∗
,
kPM P∆
U PN U P∆ kmr ≤ p
8 |∆|
(11.20)
q1 = q2 ≥ C · υ 2 (U ) · |∆| · log γ −1 M + 1 ,
(11.21)
and
for some universal constant C > 0. Also, if q1 = q2 = 1, then P(B1c ∪ B2c ) = 0.
To prove the claim, recall that Proposition 11.1 gives, for i = 1, 2 and t, γ > 0, that
⊥
⊥ ∗
EΩi P∆ η l∞ > (t + kPM P∆
U PN U P∆ k)kηk ≤ γ,
η ∈ H,
P qi−1 PM P∆
if
qj ≥
!
√
4
2 2p
4 c
+
|∆| · log
|∆ ∩ {1, . . . , M }| · υ 2 (U ).
t2
3t
γ
Choosing t = βi /2 automatically yields the first assertion in Step III. Also, the fact that
P(B1c ∪ B2c ) = 0,
q1 = q2 = 1,
follows automatically from Proposition 11.1.
Step IV: We claim that, for γ > 0, then P(B3c ) ≤ γ, if µ, (recall µ and ν from Step I) N and {q3 , . . . , qµ }
are chosen such that
µ = d4(log(γ −1 ) + ν)e,
(11.22)
kP∆ U ∗ PN U P∆ − P∆ k ≤ 1/4,
and
⊥ ∗
kPM P∆
U PN U P∆ kmr ≤
p
log2 (4θ−1 |∆|)
p
,
8 |∆|
(11.23)
(11.24)
and also q3 = q4 = . . . = qµ = q, where
2
q ≥ C · υ (U ) · |∆| ·
!
log (M )
p
+1 ,
log2 (4θ−1 |∆|)
31
(11.25)
for some universal constant C > 0. Also, if q3 = q4 = . . . = qµ = 1, then P(B3c ) = 0.
To prove the claim we start by determining the condition (11.22) on µ. Define the random variables
X1 , . . . Xµ−2 by
(
0 Zj+2 6= Zj+1 ,
Xj =
1 Zj+2 = Zj+1 .
We immediately observe that
P(B3c ) = P(|Θµ | < ν) = P(X1 + . . . + Xµ−2 > µ − ν).
Let P > 0 (a specific value for P will be assigned later) be such that
P ≥ P(Xj = 1),
j = 1, . . . , µ − 2,
e1 , . . . , X
eµ−2 be independent binary random variables, with P(X
ek = 1) = P and P(X
ek = 0) =
and let X
1 − P for each k. Then,
e1 + . . . + X
eµ−2 > µ − ν .
P (|Θµ | < ν) ≤ P X
Note that by the standard Chernoff bound ([34, Theorem 2.1]) it follows that, for t > 0,
e1 + . . . + X
eµ−2 ≥ (µ − 2)(t + P ) ≤ e−2(µ−2)t2 .
P X
(11.26)
If we let t = (µ − ν)/(µ − 2) − P , then (11.26) and some easy calculations give, for γ > 0, that
e1 + . . . + X
eµ−2 > µ − ν ≤ γ,
P X
whenever
µ ≥ x,
(x − 2)
x−ν
−P
x−2
2
− log(γ −1/2 ) = 0,
(11.27)
and x is the largest root satisfying the right equation of (11.27). In particular, we have shown that P(B3c ) ≤ γ
when (11.27) is satisfied. We now need to determine a P > 0 and find conditions such that the random
variables X1 , . . . , Xµ−2 have P(Xk = 1) ≤ P for each k. We choose P = 1/2. This choice of P gives (by
(11.27))
x ≤ 4(ν + log(γ −1/2 )),
hence (11.22) yields (11.27).
For the rest of the proof of Step IV we need to determine the conditions on N and {q3 , . . . , qµ } such that
P(Xk = 1) ≤ P for each k = 1, . . . , µ − 2. Note that Xk = 1 if and only if one of the following events
occur:
D1 : P∆ − qj−1 P∆ EΩj P∆ Zj−1 > αj kZj−1 k ,
j = k + 2,
−1
(11.28)
⊥
D2 : q PM P∆
EΩj P∆ Zj−1 > βj kZj−1 k,
j = k + 2.
j
l∞
Observe that we may argue exactly as in the proof of Step II (via Proposition 11.2) and deduce that P(D1 ) ≤
1/4 when N and qj , are chosen such that
kP∆ U ∗ PN U P∆ − P∆ k ≤ αj /2,
qj ≥ C · υ 2 (U ) · |∆| · αj−2 · (log (4) + 1) ,
j = k + 2,
(11.29)
for some universal constant C > 0. Observe also that we may argue exactly as in the proof of Step III (via
Proposition 11.1) and deduce that P(D2 ) ≤ 1/4 when N and qj are chosen such that
⊥ ∗
kPM P∆
U PN U P∆ kmr ≤ βj /2,
!
1
1p
2
+
|∆| · (log (4M ) + 1) ,
qj ≥ C · υ (U ) ·
βj2
βj
32
j = k + 2,
(11.30)
for some universal constant C > 0. Thus,
P(Xk = 1) ≤ P(D1 ∪ D2 ) ≤ P
whenever (11.29) and (11.30) are satisfied. But (11.29) and (11.30) follow from (11.23), (11.24) and (11.25)
(with a possibly different, however universal, C) and thus, the first part of the claim is proved. The fact that
if q3 = q4 = . . . = qµ = 1 then P(B3c ) = 0 follows from Propositions 11.2 and 11.1.
Step V: We claim that for γ > 0, then
⊥
P kP∆ ρ − sgn(x0 )k > θ/8 ∪ kPM P∆
ρkl∞ > 1/2 ≤ 5γ,
(11.31)
when N ∈ N and θ > 0 are chosen according to (6.3) and
p θ ≥ C · υ 2 (U ) · |∆| · log γ −1 + 1 · log M θ−1 |∆| ,
(11.32)
for some universal constant C > 0. Also, if θ = 1 then the left hand side of (11.31) is equal to zero.
To prove this, recall the events A1 , A2 , B1 , B2 , B3 , B4 from Step I. We have already established in Step
⊥
I that if the event B4 occurs then kP∆ ρ − sgn(x0 )k ≤ θ/8 and kPM P∆
ρkl∞ ≤ 1/2. It therefore suffices to
show that, given the conditions (6.3), (6.4) and (11.32), it holds that
P (B4c ) ≤ 5γ.
(11.33)
To do this we begin by making some observations. First
P (B4c ) ≤ P(Ac1 ∪ Ac2 ) + P(B1c ∪ B2c ) + P(B3c ),
(11.34)
q1 + q2 + . . . + qµ ≥ θ.
(11.35)
and second
P(Ac1
Ac2 )
Recall from Step II we have that
∪
≤ 2γ whenever (11.15) and (11.16) are satisfied. Also, by
Step III, P(B1c ∪ B2c ) ≤ 2γ whenever (11.20) and (11.21) are fulfilled. Finally, from Step IV we have that
P(B3c ) ≤ γ provided
l
l
p mm
(11.36)
µ = 4 log(γ −1 ) + log2 8θ−1 |∆|
and (11.23), (11.24) and (11.25) are satisfied. In particular, using (11.34) we find that (11.33) follows from
(11.15), (11.16), (11.20), (11.21), (11.23), (11.24) and (11.25). We must then show that these equations
follow from (6.3) and (11.32). Now let q1 = q2 = θ/4. Then, by (11.32), we have that (11.16) follows (with
a possibly different constant), and similarly (11.21) follows. Let q = q3 = . . . = qµ . By (11.35) and (11.36)
we have
l
l
p mm
8q log(γ −1 ) + log2 8θ−1 |∆|
≥ θ,
and hence (11.25) follows. The only thing left to do is to deal with the requirements on N . In particular, we
need to show that (11.15), (11.20), (11.23) and (11.24) follow when (6.3) is satisfied. Note that (11.23) and
(11.24) are weaker than (11.15) and (11.20). Thus, we only need to concentrate on (11.15) and (11.20). To
see that (6.3) and (6.4) imply (11.15) and (11.20), note that (since PM ≥ P∆ )
P∆ U ∗ PN U P∆ − P∆ = P∆ (PM U ∗ PN U PM − PM )P∆ ,
and so
kP∆ U ∗ PN U P∆ − P∆ k ≤ kPM U ∗ PN U PM − PM k.
Hence (11.15) follows from (6.3). The fact that (11.20) follows from (6.4) is clear. Also, the fact that the
left-hand side of (11.31) is equal to zero when θ = 1 follows from Steps II - IV and the fact that when θ = 1
we have q1 = . . . = qµ = 1.
Step VI: We claim that, for γ > 0,
P(kθ−1 P∆ U ∗ PΩ U P∆ − P∆ k > 1/2) ≤ γ,
when N ∈ N and θ > 0 are chosen such that
kP∆ U ∗ PN U P∆ − P∆ k ≤ 1/4,
θ ≥ C · υ 2 (U ) · |∆| · log γ −1 |∆| + 1 ,
33
(11.37)
for some universal constant C. Also, if θ = 1 then the left hand side of (11.37) is equal to zero.
To prove this claim note that, by Theorem 11.3, there is a K > 0 such that
1
P θ−1 (PΩ U P∆ )∗ PΩ U P∆ − P∆ ≥ + kP∆ U ∗ PN U P∆ − P∆ k ≤ γ,
4
provided
θ ≥ 4K · υ 2 (U ) · |∆| · log(|∆|),
and
θ ≥ 4K · υ 2 (U ) · |∆| · log(Cγ −1 ) · log 1 +
4
(1 − θ)
−1
.
This yields the asserted claim. The fact that the left hand side of (11.37) is equal to zero when θ = 1 is clear.
Step VII: In this final step we will patch the different parts of the proof together. Recall that our initial
goal was to show that (11.7) follows with probability exceeding 1 − . Note that in Step V we have shown
that if γ > 0, then (ii) and (iii) in (11.7) are satisfied with probability exceeding 1 − 5γ, provided (6.3) and
(11.32) is satisfied. We are thus only left to show that (i) follows with a certain probability. However, we
immediately recognize that the conditions in Step VI follow from (6.4) and (11.32), and hence (i) in (11.7)
follows with probability exceeding 1 − γ. This implies that (i), (ii) and (iii) in (11.7) hold with probability
exceeding 1 − 6γ. By choosing γ such that 6γ = we observe that (11.32) follows (with possibly a different
C) from the conditions in Theorems 7.1 and 7.2 and we have finally proved the first assertions in Theorem
7.1 and Theorem 7.2. The last assertions follow by the fact that θ = 1 when m = N , (and hence also
q1 = . . . = qµ = 1) and Step V - VI.
Proof of Theorem 7.3. We will follow the recipe from the of proof of Theorem 7.2 almost word for word,
and we will only point out where the differences lie. The first such difference is the set of conditions provided
by Proposition 10.4. In particular we must show that there exists a ρ ∈ ran(U ∗ PΩ ) such that
(i) kθ−1 P∆ U ∗ PΩ U P∆ − P∆ k ≤ 1/2,
(ii) kP∆ ρ − sgn(x0 )k ≤ θ/8
⊥
(iii) kP∆
ρkl∞ ≤ 1/2, (11.38)
is true with probability exceeding 1 − . (Note that only condition (iii) is changed from the proof of Theorem
7.2).
Step I: Almost as in the proof of Theorem 7.2, except that (11.9) should read

−1

Θi−1 ∪ {i} if P∆ − qi P∆ EΩi P∆ Zi−1 ≤ αi kZi−1 k ,
−1
⊥
Θi =
and qi P∆ EΩi P∆ Zi−1 l∞ ≤ βi kZi−1 k,


Θi−1
otherwise,
and the events B1 and B2 in (11.10) should be
−1 ⊥
q P∆ EΩj P∆ Zj−1 ≤ βj kZj−1 k,
Bj :
j
l∞
j = 1, 2.
Also, (11.13) must be changed to
⊥
kP∆
ρkl∞
≤
ν
X
⊥
∞
kqτ−1
(i) P∆ EΩτ (i) Zτ (i−1) kl
i=1
≤
ν
X
βτ (i) kZτ (i−1) k ≤
i=1
ν
i
X
Y
p
|∆|
βτ (i)
ατ (j) .
i=1
j=1
Step II: Exactly as in the proof of Theorem 7.1.
Step III: We claim that, for γ > 0, then P(B1c ∪ B2c ) ≤ 2γ, if N , q1 and q2 are chosen such that
1
⊥ ∗
kP∆
U PN U P∆ kmr ≤ p
,
8 |∆|
(11.39)
q1 = q2 ≥ C · υ 2 (U ) · |∆| · log γ −1 ω1 + 1 ,
(11.40)
and
34
where
p
ω1 = ω̃M,U (|∆|, q1 (8 |∆|)−1 , N ),
(recall ω̃M,U from (6.2)) for some universal constant C > 0. Also, if q1 = q2 = 1, then P(B1c ∪ B2c ) = 0.
The claim follows exactly as in the proof of Step III in the proof of Theorem 7.1 by using the last part of
Proposition 11.1.
Step IV: We claim that, for γ > 0, then P(B3c ) ≤ γ, if µ, (recall µ and ν from Step I) N and {q3 , . . . , qµ }
are chosen according to (11.22), (11.23) and
p
log2 (4θ−1 |∆|)
⊥ ∗
p
,
(11.41)
kP∆ U PN U P∆ kmr ≤
8 |∆|
and also that q3 = q4 = . . . = qµ = q, where
2
q ≥ C · υ (U ) · |∆| ·
and
ω2 = ω̃M,U
!
log (ω2 )
p
+1 ,
log2 (4θ−1 |∆|)
(11.42)
!
p
log2 (4θ−1 |∆|)
p
,N ,
|∆|, q
8 |∆|
(recall ω̃M,U from (6.2)) for some universal constant C > 0. Also, if q3 = q4 = . . . = qµ = 1, then
P(B3c ) = 0.
The proof is almost as in the proof of Theorem 7.2, except that the last part of (11.28) should read
⊥
EΩj P∆ Zj−1 l∞ > βj kZj−1 k,
D2 : qj−1 P∆
j = k + 2,
and (11.30) should be
⊥ ∗
kP∆
U PN U P∆ kmr ≤ βj /2,
!
p
1
1
qj ≥ C · υ 2 (U ) ·
+
|∆| · (log (4ω2 ) + 1) ,
βj2
βj
j = k + 2.
Step V: We claim that, for γ > 0,
⊥
P kP∆ ρ − sgn(x0 )k > θ/8 ∪ kP∆
ρkl∞ > 1/2 ≤ 5γ,
when N ∈ N and θ > 0 are chosen according to (6.3), (6.5) and
p θ ≥ C · υ 2 (U ) · |∆| · log γ −1 + 1 · log ωθ−1 |∆| ,
where
ω = ω̃M,U (|∆|, s, N ),
s=
(11.43)
(11.44)
θ
p
,
32 |∆| log(e4 γ −1 )
and ω̃M,U is defined in (6.2), for some universal constant C > 0. Also, if θ = 1 then the left hand side of
(11.43) is equal to zero.
The strategy is almost as in the proof of Step V in Theorem 7.1. In particular, we argue by using Step II
- IV that P (B4c ) ≤ 5γ when (11.15), (11.16), (11.39), (11.40), (11.23), (11.41) and (11.42) are satisfied, and
thus (11.43) follows. We then need to show that these equations follow from (6.3), (6.5) and (11.44). To do
this, let q1 = q2 = θ/4. Then, by (11.44), we have that (11.16) follows (with a possibly different constant).
To show that (11.40) is implied by (11.44) it suffices to show that ω ≥ ω1 . This will follow by the definition
(6.2) of ω̃M,U (recall that the mapping s 7→ ω̃M,U (|∆|, s, N ) is a decreasing function), and by observing that
p
−1
p
q1 (8 |∆|)−1 > s = θ 32 |∆| log(e4 γ −1 )
.
To show that (11.42) follows from (11.44) it suffices to show that ω ≥ ω2 . To do this (as argued above) it is
sufficient to prove that
p
log2 (4θ−1 |∆|)
p
q
≥ s.
(11.45)
8 |∆|
35
To see why the latter inequality is true, note that
q1 + q2 + . . . + qµ ≥ θ.
So, by recalling the value of µ (from (11.22)) from Step IV and noting that q = q3 = . . . = qµ , we get
l
l
p mm
8q log(γ −1 ) + log2 8θ−1 |∆|
≥ θ.
In particular, it follows that
q log2 (4θ−1
p
|∆|) ≥
θ log2 (4θ−1
p
|∆|)
θ
p
>
.
4 log(e4 γ −1 )
|∆|) + 1)
8(log(γ −1 ) + log2 (8θ−1
(11.46)
Thus, we have shown (11.45).
We are now left with the task of showing that (11.15), (11.39), (11.40), (11.23) and (11.41) follow from
(6.3) and (6.5), and this follows by arguing exactly as in the proof of Step V in the proof of Theorem 7.1
Step VI and Step VII: Exactly as in the proof of Theorem 7.1.
12
Conclusions and future work
We have presented a theoretical framework that allows for CS in infinite dimensions (see also [30, 37]
for related ideas). As a result of the theorems proved, one can reconstruct arbitrary signals in separable
Hilbert spaces from their measurements, and when the additional structure of sparsity is present, one can
also dramatically undersample. In developing this theory, we have extended current finite-dimensional CS
to an infinite-dimensional signal model.
This paper marks the first foray into a very large topic – infinite-dimensional CS – which brings with it
an abundance of new challenges and open problems. We now briefly describe a number of such issues.
Noisy data. In this paper we have assumed that the measurements ζ = U x0 are noiseless. However, in practice one always encounters noise, and therefore, rather than solving the equality-constrained
l1 -minimization problem
min
kηkl1 subject to PΩ U η = PΩ ζ,
(12.1)
1
η∈l (N)
one instead considers
min kηkl1 subject to kPΩ U η − PΩ ζk < ,
η∈l1 (N)
(12.2)
where is appropriately chosen according to the noise level. Even in finite dimensions, analyzing (12.2)
is significantly more involved, and requires different techniques to those used for (12.1). In particular, in
finite dimensions the analysis of (12.2) is usually carried out via the Restricted Isometry Property (RIP)
[16]. When analyzing (12.2) in infinite dimensions we are faced with exactly the same questions as those
addressed in Theorems 7.1–7.3: namely, how large must N and m, and, of course, how large is the error (in
terms of )? Unfortunately, any attempt to obtain conditions similar to those of Theorems 7.1–7.3 for (12.2)
by using an RIP framework is likely to be unsuccessful, since the RIP is a very strong condition. Fortunately,
recent developments in so-called RIPless CS in finite dimensions [13] indicate that such a condition need not
be necessary. Developing a full theory for the infinite-dimensional noisy case (12.2) is a subject of current
work.
Incoherence and semi-random subsampling. The key results this paper, Theorems 7.1–7.3, demonstrate
that subsampling is achievable. However, the amount of subsampling possible is limited by the incoherence
parameter υ(U ). Although in many cases of interest, this number is small, meaning that one can attain a high
degree of undersampling, this need not always be the case. Much as in finite-dimensional CS, there are cases
for which this cannot be avoided. However, there are numerous examples in infinite dimensions where the
overall incoherence υ(U ) is large, but the incoherence of ‘tail operator’ U Pk⊥ tends to zero as k → ∞. This
is the phenomenon of local coherence: the first few columns of U are coherent, but the remainder a relatively
incoherent. This suggests a new approach, semi-random subsampling, where one carries out full sampling
on those indices corresponding to the coherent part of U , and random subsampling on the incoherent part.
This is currently work in progress, see [5] for details.
36
13
Acknowledgments
The authors would like to thank Akram Aldroubi, Emmanuel Candès, Holger Rauhut, Jared Tanner, Gerd
Teschke, Joel Tropp, Martin Vetterli, Christopher White and Pengchong Yan for useful discussions and
comments.
14
Appendix
The appendix contains all the proofs that have not been displayed so far. However, before do this, there are
two results that are absolutely crucial. The first is a due to Rudelson [38].
Lemma 14.1. (Rudelson) Let η1 , . . . , ηM ∈ Cn and let ε1 , . . . εM be independent Bernoulli variables taking
values 1, −1 with probability 1/2. Then
v
!
u M
M
X
uX
p
3
t
log(n) max kηi k η̄i ⊗ ηi εi η̄i ⊗ ηi ≤
E i≤M
2
i=1
i=1
Note that the original lemma in [38] does not apply in this case. Actually, we need the complex version
proved in [40]. We will, however, still refer to it as Rudelson’s Lemma. One should also note that the original
lemma only has a constant C in the bound. This constant has been bounded by 3/2 by Tropp in [40].
The following theorem is also indispensable [39]:
Theorem 14.2. (Talagrand) There exists a number K with the following property. Consider n independent random variables Xi valued in a measurable space Ω. P
Consider a (countable) class F of measurable
functions on Ω. Consider the random variable Z = supf ∈F i≤n f (Xi ). Consider

S = sup kf k∞ ,
f ∈F
V = E  sup

X
f ∈F i≤n
f (Xi )2  .
Then for each t > 0, we have
1 t
tS
P(|Z − E(Z)| ≥ t) ≤ K exp −
log 1 +
.
KS
V
Proof of Proposition 9.5. Suppose that x0 is the unique minimizer of (9.6). For n ∈ N let Pn be the
projection onto span{e1 , . . . , en }. Then, for all sufficiently large n, it follows that x0 is the unique minimizer
to the finite-dimensional optimization problem
inf{kxkl1 : x ∈ Pn H, PΩ U Pn x = PΩ U x0 }.
Proposition 9.5 is well known to be true in finite dimensions. It follows that there is a yn such that, for
ρn = Pn U ∗ PΩ yn , we have hρn , ej i < 1 when j ∈
/ ∆ and j ≤ n, and hρn , ej i = sgn(x0 , ej i) for j ∈ ∆.
We now claim that there is a constant M < ∞ such that kyn kl∞ ≤ M for all large n. First note that if we
consider U ∗ PΩ and Pn U ∗ PΩ as operators from PΩ H to c0 (the set of sequences decaying at infinity) where
both PΩ H and c0 are equipped with the l∞ norm, then kU ∗ PΩ − Pn U ∗ PΩ k → 0 as n → ∞. Also, it is
clear (recall that U is unitary) that there is a γ > 0 such that
inf
ξ∈PΩ H,kξkl∞ =1
kU ∗ PΩ ξkl∞ ≥ γ,
and therefore, by the operator norm convergence, there is a γ̃ > 0 such that
inf{kPn U ∗ PΩ ξkl∞ : ξ ∈ PΩ H, kξkl∞ = 1} ≥ γ̃
for all sufficiently large n. And the claim follows. Now choose n so large that
kPn⊥ U ∗ PΩ ξkl∞ < 1, ∀ ξ ∈ PΩ H, kξkl∞ ≤ M.
37
Then, since kyn kl∞ ≤ M we can define ρ = U ∗ PΩ yn . Then ρ = ιρn + Pn⊥ U ∗ PΩ yn , where ι : Pn H → H
is the inclusion map, and thus ρ satisfies the requirements (i), (ii) and (iii).
As for the other direction, suppose that (i), (ii) and (iii) are satisfied and that PΩ U P∆ has full rank. Then
there is a ρ ∈ l∞ (N) such that ρ = U ∗ PΩ y for some y ∈ PΩ H and kρkl∞ ≤ 1. Also, by (ii)
X
Re(hPΩ U P∆ x0 , yi) = Re(hx0 , P∆ ρi) =
sign(hx0 , ej i)hx0 , ej i = kx0 kl1 .
j∈∆
Thus, by using duality (recall Proposition 9.3), in particular the fact that PΩ U : H → PΩ H is onto (this
follows since U is unitary) and that
inf{kxkl1 : PΩ U x = PΩ U x0 } = sup{Re(hPΩ U x0 , yi) : kPn U ∗ PΩ ykl∞ ≤ 1},
it follows that x0 is a minimizer. But hρ, ej i < 1 for j ∈
/ ∆ so if ξ is another minimizer then supp(ξ) = ∆.
However, PΩ U P∆ has full rank, so ξ = x0 .
Proof of Proposition 10.2. Let α = |∆| and also ω = {ωj }α
j=1 be a sequence, where ωj ∈ C. Now define
⊥
⊥
Vω = I∆c ⊕ Sω : P∆
H ⊕ P∆ H → P∆
H ⊕ P∆ H,
(14.1)
⊥
where Sω = diag({ωj }α
j=1 ) on P∆ H and I∆c is the identity on P∆ H. Define U (ω) = U Vω . Note that to
prove our claim it suffices to show that Vω x is the unique minimizer of inf{kηkl1 : PΩ U η = PΩ U (ω)x} for
all ω, where
ω ∈ Λ = {(eiθ1 , . . . , eiθα ) ∈ Cα : θj ∈ [0, 2π), 1 ≤ j ≤ α}.
(14.2)
Indeed, if that is the case then, by Proposition 9.5, for every ω ∈ Λ there exists ζω ∈ PΩ H such that
π ω = U ∗ PΩ ζ ω ,
P∆ πω = sgn(Vω x),
kP∆c πω kl∞ < 1.
(14.3)
Thus, for any y ∈ H such that supp(y) = ∆ choose ω ∈ Λ such that sgn(y) = sgn(Vω x). Then, since
(PΩ U P∆ )∗ PΩ U P∆ P∆ H is invertible it follows by 14.3 and Proposition 9.5 that y is the unique minimizer
of inf{kηkl1 : PΩ U η = PΩ U y}. Note also that if ω ∈ Λ then Vω is clearly unitary and also an isometry on
l1 (N). Thus, it is easy to see that Vω ζ is a minimizer of inf{kηkl1 : PΩ U η = PΩ U (ω)x} if and only if ζ
is a minimizer of inf{kηkl1 : PΩ U (ω)η = PΩ U (ω)x}. We will therefore consider the latter minimization
problem and show that x is the unique minimizer for that for all ω ∈ Λ. To do that, it suffices, by Proposition
9.5 and the fact that U (ω) is unitary, to show that there exists a vector ρ ∈ H such that
PΩc U (ω)ρ = 0,
P∆ ρ = sgn(x),
kP∆c ρkl∞ < 1.
(14.4)
Now, for > 0 (we will specify the value of later), define the function ϕ : ∪a∈Λ B(a, ) → R+ , where
B(a, ) denotes the -ball around a, in the following way. Let
W = I∆ ⊕ PΩc U P∆c : P∆ H ⊕ P∆c H → P∆ H ⊕ PΩc H,
and define
ϕ(ω) = inf{kP∆c ρkl∞ : W ρ = ι∗∆ sgn(x) ⊕ −PΩc U (ω)P∆ ι∗∆ sgn(x)},
where ι∆ : P∆ H → H is the inclusion operator. Then (14.4) is satisfied if and only if ϕ(ω) < 1. Thus, to
show 14.4 we must show that ϕ(ω) < 1 for all ω ∈ Λ.
Suppose for the moment that is chosen such that ϕ is defined on its domain. We will show that ϕ is
continuous. It suffices to show that ϕ is continuous on B(a, ) for a ∈ Λ. Note that, by the fact that B(a, )
is open it suffices to show that ϕ is convex. To see that ϕ is convex, let ω1 , ω2 ∈ B(a, ) and t ∈ (0, 1). Let
also ξ, η ∈ H such that
W ξ = ι∗∆ sgn(x) ⊕ −PΩc U (ω1 )P∆ ι∗∆ sgn(x),
W η = ι∗∆ sgn(x) ⊕ −PΩc U (ω2 )P∆ ι∗∆ sgn(x).
Note that the existence of such vectors is guaranteed by the assumption that ϕ is defined on its domain. Now,
observe that
ϕ(tω1 + (1 − t)ω2 ) ≤ kP∆c (tξ + (1 − t)η)kl∞ ≤ tkP∆c ξkl∞ + (1 − t)kP∆c ηkl∞ .
38
Thus, taking infimum on the right hand side yields ϕ(tω1 + (1 − t)ω2 ) ≤ tϕ(ω1 ) + (1 − t)ϕ(ω2 ), and
we have shown the assertion that ϕ is convex. Returning to the question on the domain of ϕ, note that if
(PΩ U P∆ )∗ PΩ U P∆ P∆ H is invertible, then
PΩ U (ω)P∆ )∗ PΩ U (ω)P∆ P∆ H
is invertible if kU (ω̃) − U (ω)k is small and ω̃ ∈ Λ. Letting
ρ = U (ω)∗ PΩ U (ω)P∆ ((PΩ U (ω)P∆ )∗ PΩ U (ω)P∆ P∆ H )−1 sgn(x)
we get
PΩc U P∆c ρ = −PΩc U (ω)P∆ sgn(x).
Thus, ϕ is defined on its domain for small .
Let Γ denote the subset of all ω ∈ Λ such that x is the unique minimizer of inf{kηkl1 : PΩ U (ω)η =
PΩ U (ω)x}. Note that Γ is closed. Indeed, if ω ∈ Γ and {ωn } ⊂ Γ is a sequence such that ωn → ω then
ω ∈ Γ. To see that, observe that since {U, Ω, ∆} is weakly f stable, it follows that for ξ ∈ H satisfying
kξkl1 = inf{kηkl1 : PΩ U (ω)η = PΩ U (ω)x}
we have
kξ − xkl1 ≤ f (kω − ωn kl∞ ),
∀ n ∈ N.
Thus, ξ = x and hence ω ∈ Γ.
Note also that Γ is open. Indeed, for if ω̃ ∈ Γ then there exist ρ ∈ H such that ρ satisfies 14.4
(with ω replaced by ω̃) e.g. ϕ(ω̃) < 1. But, by continuity of ϕ it follows that ϕ is strictly less than
one on a neighborhood of ω̃. Since (PΩ U P∆ )∗ PΩ U P∆ P∆ H is invertible, then it is easy to see that
PΩ U (ω)P∆ )∗ PΩ U (ω)P∆ P∆ H is invertible, for all ω ∈ Λ thus it follows by Proposition 9.5 that (14.4) is
satisfied for all ω ∈ Λ in a neighborhood of ω̃ and hence Γ is open.
The fact that Γ is open and closed yields that either Γ = ∅ or Γ = Λ. The fact that {1, . . . , 1} ∈ Γ by
assumption yields the theorem.
Proof of Proposition 10.3. Let Vω and Λ be defined as in (14.1) and (14.2) respectively. Suppose that y ∈ H
such that supp(y) = ∆. Then, by assumption, Vω y is the unique minimizer of inf{kηkl1 : PΩ U η =
PΩ U Vω y}. Thus, by Proposition 9.5 it follows that there exists a ρω ∈ H such that
PΩc U ρω = 0,
P∆ ρω = sgn(Vω y),
kP∆c ρω kl∞ < 1.
(14.5)
Let β = supω∈Λ {kP∆c ρω kl∞ }. Note that β < 1, since Λ is closed. Thus, for every y ∈ H with supp(y) =
∆ there exists ρω ∈ H satisfying (14.5) where kP∆c ρω kl∞ ≤ β. It is now easy to show that (see the proof
of Lemma 2.1 in [17]) there exists a constant C > 0 (depending on β) such that, if ξ ∈ H, supp(ξ) = ∆, is
the unique minimizer of inf{kηkl1 : PΩ U η = PΩ U ξ}, ζ ∈ H and x is a minimizer of inf{kηkl1 : PΩ U η =
PΩ U ζ} then kP∆c xkl1 ≤ Ckξ − ζkl1 . Thus, since
PΩ U P∆ (x − ξ) = PΩ U (ζ − ξ) − PΩ U P∆c x,
and (PΩ U P∆ )∗ PΩ U P∆ |P∆ H is invertible, the proposition follows.
Proof of Proposition 11.1. Without loss of generality we may assume that kηk = 1. Let {δj }N
j=1 be random
Bernoulli variables with P(δj = 1) = q. We will split the proof into two steps, where we will prove the
finite-dimensional part of the proposition in Step I, and then tweak these ideas to fit the infinite-dimensional
part of the proposition in Step II.
Step I: We start by noting that, clearly (by using the fact that U is an isometry), we have
⊥
q −1 PM P∆
EΩ P∆ η = q −1
N
X
⊥ ∗
PM P∆
U δj (ej ⊗ ej )U P∆ η
j=1
=q
−1
N
X
(14.6)
⊥ ∗
PM P∆
U (δj
− q)(ej ⊗ ej )U P∆ η +
j=1
39
⊥ ∗ ⊥
PM P ∆
U PN U P∆ η.
Our goal is to eventually use Bernstein’s inequality and the following is therefore a setup for that. Define,
for 1 ≤ j ≤ N the random variables
⊥ ∗
Yj = q −1 PM P∆
U (δj − q)(ej ⊗ ej )U P∆ η,
Xji = hq −1 U ∗ (δj − q)(ej ⊗ ej )U P∆ η, ei i,
i ∈ ∆c ∩ {1, . . . , M }.
Thus, by (14.6) it follows that for s > 0 we have


N
X
−1
⊥ ∗ ⊥
⊥

Yj + PM P∆
U PN U P ∆ η P q PM P∆
EΩ P∆ η l∞ > s = P 
>s
∞
j=1
l


N
X
X i
⊥ ∗ ⊥
Xj + hPM P∆
U PN U P∆ η, ei i > s
≤
P 
j=1
i∈∆c ∩{1,...,M }


N
X
X i ⊥ ∗
U PN U P∆ kmr  ,
Xj > s − kPM P∆
≤
P 
j=1
i∈∆c ∩{1,...,M }
where we have used the fact that U is an isometry and hence
⊥ ∗
⊥ ∗ ⊥
PM P ∆
U PN U P∆ = −PM P∆
U P N U P∆ .
⊥ ∗
Thus, by choosing s = t + kPM P∆
U PN U P∆ kmr it follows that
⊥ ∗
⊥
U PN U P∆ kmr ≤
EΩ P∆ η l∞ > t + kPM P∆
P q −1 PM P∆


X
N
P 
Xji > t . (14.7)
j=1 i∈∆c ∩{1,...,M }
X
To get a bound on the right hand side of (14.7) we will be using Bernstein’s inequality, and in order to do
that we need a couple of observations. First note that
E |Xji |2 = q −2 E |hU P∆ η, (δj − q)(ej ⊗ ej )U ei i|2
= q −2 E (δj − q)2 |hU P∆ η, ej ihU ei , ej i|2
= q −1 (1 − q)|hU P∆ η, ej ihU ei , ej i|2 ,
i ∈ ∆c ∩ {1, . . . , M }.
Thus
N
X
E |Xji |2 ≤ q −1 (1 − q)kηk2 υ 2 (U ) = q −1 (1 − q)υ 2 (U ),
i ∈ ∆c ∩ {1, . . . , M }.
(14.8)
p
|Xji | = q −1 |(δj − q)||hη, P∆ U ∗ (ej ⊗ ej )U ei i| ≤ max{(1 − q)/q, 1}υ 2 (U ) |∆|,
(14.9)
j=1
Also, observe that
i
for 1 ≤ j ≤ N and i ∈ ∆c ∩ {1, . . . , M }. Now applying Bernstein’s inequality to Re(X1i ), . . . , Re(XN
)
i
i
and Im(X1 ), . . . , Im(XN ) we get that


!
N
2
X i t
/4
p
√
P 
Xj > t ≤ 4 exp −
, (14.10)
q −1 (1 − q)υ 2 (U ) + max{q −1 (1 − q), 1}υ 2 (U ) |∆|t/3 2
j=1 for all i ∈ ∆c ∩ {1, . . . , M }. Thus, by invoking (14.10) and (14.7) it follows that
⊥
⊥ ∗
P q −1 PM P∆
EΩ P∆ η l∞ > t + kPM P∆
U PN U P∆ kmr ≤ γ
when
q≥
!
√
2 2p
4
4 c
+
|∆|
log
|∆
∩
{1,
.
.
.
,
M
}|
υ 2 (U )
t2
3t
γ
40
and the first part of the proposition follows. The fact that the left hand side of (11.3) when q = 1 is clear
from (14.8) and (14.9).
Step II: To prove the second part of the proposition we will use the same ideas, however, we are now
⊥
⊥
faced with the problem that P∆
EΩ P∆ η (contrary to PM P∆
EΩ P∆ η) actually has infinitely many com⊥
ponents. This is an obstacle since the proof of the bound on PM P∆
EΩ P∆ η was based on bounding the
⊥
probability of the deviation of every component of PM P∆ EΩ P∆ η and thus, if there are infinitely many
components to take care of, the task would be impossible. To overcome this obstacle we proceed as follows.
Note that, just as argued in the previous case in Step I, we have that
⊥
q −1 P∆
EΩ P∆ η =
N
X
⊥ ∗ ⊥
Yej + P∆
U PN U P∆ η,
⊥ ∗
Yej = q −1 P∆
U (δj − q)(ej ⊗ ej )U P∆ η.
(14.11)
j=1
Define (as we did above) the random variables
Xji = hq −1 U ∗ (δj − q)(ej ⊗ ej )U P∆ η, ei i,
i ∈ ∆c .
Note that we now have infinitely many Xji s, however, suppose for a moment that for every t > 0 there exists
a non-empty set Λt ⊂ N such that


X
N
|∆c \ Λt | < ∞.
Xji > t  = 0
(14.12)
P  sup i∈Λt j=1
Then, if that was the case, we would immediately get (by arguing as in Step I and using (14.11) and the
assumption that kηk = 1) that
⊥ ∗
⊥
U PN U P∆ kmr
EΩ P∆ η l∞ > t + kP∆
P q −1 P∆


X
N
⊥ ∗ ⊥
⊥ ∗

Yej + P∆
U P N U P∆ η = P 
> t + kP∆ U PN U P∆ kmr
∞
j=1
l


N
X
X i Xj > t ,
≤
P 
c
j=1
i∈|∆ \Λt |
Thus, we could use the analysis provided above, via (14.10), and deduce that
⊥ ∗
⊥
U PN U P∆ kmr ≤ γ
EΩ P∆ η l∞ > t + kP∆
P q −1 P∆
when
q≥
!
√
4
2 2p
4 c
+
|∆|
log
|∆
\
Λ
|
υ 2 (U ).
t
t2
3t
γ
(14.13)
Hence, if we could show the existence of Λt and provide a bound on |∆c \ Λt | we could appeal to (14.11)
and (14.13) and be done. To do that, define




X


N
∗
≤ tq  = 1 .
Λt = i ∈
/ ∆ : P 
P
U
δ
(e
⊗
e
)U
e
∆
j
j
j
i


j=1
Note that (ej ⊗ ej )U ei → 0 as i → ∞ for all j ≤ N . Thus, Λt 6= ∅. Moreover, we also automatically
get that |∆c \ Λt | < ∞. Note also that (14.12) follows by the fact that Xji = hη, q −1 P∆ U ∗ δj (ej ⊗ ej )U ei i
and the Cauchy-Schwartz inequality. With the existence of Λt established, we now continue with the task
of estimating |∆c \ Λt | . Note that to estimate |∆c \ Λt | we need information about the location of ∆ which
is not assumed. We only assume the knowledge of some M ∈ N such that PM ≥ P∆ . Thus, (although an
estimate of |∆c \ Λt | would be sharper than what we will eventually obtain) we define






∗
Λ̃q (|∆|, M, t) = i ∈ N :
max
kPΓ1 U PΓ2 U ei k ≤ tq .


Γ1 ⊂{1,...,M },|Γ1 |=|∆|


Γ2 ⊂{1,...,N }
41
Note that it is straightforward to show that Λ̃q (|∆|, M, t) ⊂ Λt . Also, Λ̃q (|∆|, M, t) depends only on known
quantities. Observe that, clearly, for any Γ1 ⊂ {1, . . . , M } and Γ2 ⊂ {1, . . . , N } then kPΓ1 U ∗ PΓ2 U ei k →
∞ as i → ∞. Thus, |∆c \ Λq (|∆|, M, t)| < ∞ and since Λq (|∆|, M, t) ⊂ Λt it follows that






max
kPΓ1 U ∗ PΓ2 U ei k > tq |∆c \ Λq (∆, t)| ≤ i ∈ N :


Γ
⊂{1,...,M
},|Γ
|=|∆|
1
1


Γ ⊂{1,...,N }
2
and the second part of the proposition follows. The fact that the left hand side of (11.4) is zero when q = 1
is clear from (14.8) and (14.9).
Proof of Proposition 11.2. Without loss of generality we may assume that kηk = 1. Let {δj }N
j=1 be random
Bernoulli variables with P(δj = 1) = q. Let also, for k ∈ N, ξk = (U P∆ )∗ ek . Observe that, since U is an
isometry,
N
∞
X
X
q −1 (PΩ U P∆ )∗ PΩ U P∆ =
q −1 δk ξk ⊗ ξ¯k , P∆ =
ξk ⊗ ξ¯k ,
(14.14)
k=1
k=1
and
1
(PΩ U P∆ )∗ PΩ U P∆ − P∆ η q
! N
X
−1
≤
(q δk − 1)ξk ⊗ ξ¯k η + k(P∆ U ∗ PN U P∆ − P∆ )ηk,
(14.15)
k=1
where the infinite series in (14.14) converges in operator norm.P
Also, (14.15) follows directly from (14.14).
N
To get the desired result we first focus on getting bounds on k( k=1 (q −1 δk − 1)ξk ⊗ ξ¯k )ηk The goal is to
use Talagrand’s formula, and the following is really a setup for that. In particular, let ζ ∈ H be a unit vector,
and denote the mapping H 3 ξ 7→ Re(hξ, ζi) by ζ̂. Also, let F be a countable collection of unit vectors such
that for any ξ ∈ H we have that kξk = supζ∈F ζ̂(ξ). Now define
Z = kXk,
X=
N
X
Zk = ((q −1 δk − 1)ξk ⊗ ξ¯k )η.
Zk ,
k=1
Observe that the following is clear (and note how this immediately gives us the setup for Talagrand’s Theorem)
N
! !
N
N
X
X
X
−1
Zk = sup
ζ̂(Zk ).
Z=
(q δk − 1)ξk ⊗ ξ¯k η = sup ζ̂
ζ∈F
ζ∈F
k=1
k=1
To use Talagrand’s Theorem we must estimate the following quantities:
!
N
X
2
V = E sup
ζ̂(Zk ) ,
S = sup kζ̂k∞ ,
R=E
ζ∈F
ζ∈F
k=1
k=1
N
!
X Zk .
k=1
Note that for V we get the following estimate:


!
N
X
X
2
E sup
ζ̂(Zk )2 ≤ E  sup
q −1 δk − 1 |hξk , ζi|2 |hξk , ηi|2 
ζ∈F
ζ∈F
k=1
≤q
−1
≤q
−1
k≤N
(1 − q)
X
kξk k2 |hek , U P∆ ηi|2
k≤N
(1 − q)υ 2 (U )|∆|,
where we have used the fact that U is an isometry in the step going from the second to the third inequality.
And S can be estimated as follows. Note that
ζ̂(Zk ) = | q −1 δk − 1 |hξk , ζi||hξk , ηi| ≤ max{q −1 − 1, 1}υ 2 (U )|∆|, k ≤ N,
(14.16)
42
thus
S ≤ max{q −1 − 1, 1}υ 2 (U )|∆|,
(14.17)
where (14.17) is a direct consequence of (14.16). Finally, we can estimate R as follows

2 
N
N
X
X
 X

=
E Zk E(kZk k2 ) +
E(hZk , Zj i)
k=1
k=1
k6=j
X
≤ q −1 (1 − q)
kξk k2 |hek , U P∆ ηi|2
k≤N
≤q
−1
(1 − q)υ 2 (U )|∆|,
again using the fact that U is an isometry. Therefore,
v
2 
 u 

u
X
u  X
 p
u

≤
Z
Z
E
q −1 (1 − q)υ 2 (U )|∆|.
E 
k  ≤
k
t 
k≤N k≤N With the estimates on V, S and R now established we may appeal to Theorem 14.2 and deduce that there is
a constant K > 0 such that for θ > 0 it follows that
! !
N
X
p
−1
P (q δk − 1)ξk ⊗ ξ¯k η ≥ θ + q −1 (1 − q)υ 2 (U )|∆|
k=1
(14.18)
θ
θ max{q −1 − 1, 1}
−1
2
−1
≤ K exp − (max{q − 1, 1}υ (U )|∆|) log 1 +
.
K
q −1 (1 − q)
But by (14.15) it follows that for any r > 0, we have
1
∗
P (PΩ U P∆ ) PΩ U P∆ − P∆ η ≥ r
q
! !
N
X
−1
∗
¯
≤P (q δk − 1)ξk ⊗ ξk η ≥ r − k(P∆ U PN U P∆ − P∆ )k .
(14.19)
k=1
Therefore, by appealing to (14.19) and (14.18) we obtain that for θ > 0
p
1
∗
−1 (1 − q)υ 2 (U )|∆| + Ξ
P (P
U
P
)
P
U
P
−
P
η
≥
θ
+
q
Ω
∆
∆
q Ω ∆
θ max{q −1 − 1, 1}
θ
,
≤ K exp − (max{q −1 − 1, 1}υ 2 (U )|∆|)−1 log 1 +
K
q −1 (1 − q)
where Ξ = k(P∆ U ∗ PN U P∆ − P∆ )k. Choosing θ = t/2 yields the proposition.
Proof of Theorem 11.3. The proof is quite similar to the proof of Proposition 11.2. Let {δj }N
j=1 be random
Bernoulli variables with P(δj = 1) = θ. Note that we may argue as in (14.14) and observe that
−1
θ (PΩ U P∆ )∗ PΩ U P∆ − P∆ N
X
(14.20)
−1
≤ (θ δk − 1)ξk ⊗ ξ¯k + k(P∆ U ∗ PN U P∆ − P∆ )k ,
k=1
PN
where ξk = (U P∆ )∗ ek . To get the desired result we first focus on getting bounds on k k=1 (θ−1 δk −
1)ξk ⊗ ξ¯k k. As in the proof of Proposition 11.2 the goal is to use Talagrand’s powerful inequality and the
PN
first step is to estimate E (kZk) , where Z = k=1 (θ−1 δk − 1)ξk ⊗ ξ¯k .
Claim: We claim that
E (kZk) ≤ 5 log(|∆|)θ−1 υ 2 (U )|∆|,
(14.21)
43
when
θ ≥ 3 log(|∆|)υ 2 (U )|∆|.
To prove the claim we simply rework the techniques used in [38]. This is now standard and has also been
used in [14, 40]. Since the setup here is a little different, we write the argument for the convenience of the
N
reader. We we start by observing that by letting δ̃ = {δ̃k }N
k=1 be independent copies of δ = {δk }k=1 . Then
!!
N X
−1
¯
θ δ̃k − 1 ξk ⊗ ξk Eδ (kZk) = Eδ Z − Eδ̃
k=1
(14.22)
!!
N b X −1
,
≤ Eδ Eδ̃ Z −
θ δ̃k − 1 ξk ⊗ ξ¯k k=1
{εj }N
j=1
be a sequence of Bernoulli variables taking values ±1 with probaby Jensen’s inequality. Let ε =
bility 1/2. Then, by (14.22), symmetry, Fubini’s Theorem and the triangle inequality, it follows that
Eδ (kZk)
N
!!!
X −1
−1
¯
≤ Eε Eδ Eδ̃ εk θ δk − θ δ̃k ξk ⊗ ξk k=1
N
!!
X
≤ 2Eδ Eε εk θ−1 δk ξk ⊗ ξ¯k .
(14.23)
k=1
Note that the setup in (14.23) is now ready for the use of Rudelson’s Lemma (Lemma 14.1). However, as
specified before, it is the complex version that is crucial here. Now, by Lemma 14.1 we get that
v
!
u N
N
X
uX
p
3
Eε εk θ−1 δk ξk ⊗ ξ¯k ≤
log(|∆|)θ−1 max kξk kt
θ−1 δk ξk ⊗ ξ¯k .
(14.24)
1≤k≤N
2
k=1
k=1
And hence, by using (14.23) and (14.24), it follows that
v
u
u
p
2
−1
Eδ (kZk) ≤ 3 log(|∆|)θ υ (U )|∆|tEδ
!
N
X
ξk ⊗ ξ¯k .
Z +
k=1
√
Thus, by using the easy calculus fact that for r > 0 then if r ≤ r + 1 we have that r < 5/3, and the fact
PN
that U is an isometry (so that k k=1 ξk ⊗ ξ¯k k ≤ 1), it is easy to see that the claim follows.
To be able to use Talagrands formula there are some preparations that have to be done. First write
Z=
N
X
Zk = θ−1 δk − 1 ξk ⊗ ξ¯k .
Zk ,
k=1
Clearly, since Z is self-adjoint, we have that kZk = supη∈F |hZη, ηi|, where F is a countable set of unit
vectors. Let, for η ∈ F, the mapping B(H) 3 T 7→ |hT η, ηi| be denoted by η̂. Then, for k = 1, . . . , N we
have
η̂(Zk ) = θ−1 δk − 1 |h ξk ⊗ ξ¯k η, ηi| ≤ θ−1 kξk2 .
Thus, after restricting η̂ to the ball of radius θ−1 maxk≤N kξk k2 it follows that
S = sup kη̂k∞ ≤ θ−1 max kξk k2 ≤ θ−1 υ 2 (U )|∆|.
k≤N
η∈F
(14.25)
Also, note that


V = E  sup
η̂∈F
X
η̂(Zk )2  ≤ E  sup
η̂∈F
k≤N
≤ max kξk k2 θ−1 − 1 sup
k≤N
≤ θ
−1

η̂∈F
X
X

2
θ−1 δk − 1 |hξk , ηi|4 
k≤N
|hek , U P∆ ηi|2
k≤N
− 1 max kξk k ≤ θ−1 − 1 υ 2 (U )|∆|,
2
k≤N
44
(14.26)
where the third inequality follows from the fact that U is an isometry. It follows by Talagrand’s inequality
(Theorem 14.2), by using the claim, (14.25) and (14.26), that there is a constant K > 0 such that for t > 0
N
!
X
−1
−1
2
P (θ δk − 1)ξk ⊗ ξ¯k ≥ t + 5 log(|∆|)θ υ (U )|∆|
k=1
(14.27)
t −1 2
tθ−1
−1
≤ K exp − (θ υ (U )|∆|) log 1 + −1
.
K
(θ − 1)
But by (14.20) it follows that for any r > 0, we have
1
∗
P (PΩ U P∆ ) PΩ U P∆ − P∆ ≥ r
θ
!
N
X
≤ P (θ−1 δk − 1)ξk ⊗ ξ¯k ≥ r − k(P∆ U ∗ PN U P∆ − P∆ )k .
(14.28)
k=1
Therefore, by appealing to (14.28) and (14.27) we obtain that for t > 0
1
∗
−1 2
P (PΩ U P∆ ) PΩ U P∆ − P∆ ≥
t
+
5
log(|∆|)θ
υ
(U
)|∆|
+
Ξ
θ
tθ−1
t
, Ξ = k(P∆ U ∗ PN U P∆ . − P∆ )k.
≤ K exp − (θ−1 υ 2 (U )|∆|)−1 log 1 + −1
K
(θ − 1)
Choosing t =
1
2γ
yields the first part of the theorem. The last statement of the theorem is clear.
References
[1] B. Adcock and A. C. Hansen. A generalized sampling theorem for stable reconstructions in arbitrary
bases. J. Fourier Anal. Appl. (accepted), 2011.
[2] B. Adcock and A. C. Hansen. Reduced consistency sampling in Hilbert spaces. In Proceedings of the
9th International Conference on Sampling Theory and Applications, 2011.
[3] B. Adcock and A. C. Hansen. Sharp bounds and optimality for generalised sampling in Hilbert spaces.
Submitted, 2011.
[4] B. Adcock and A. C. Hansen. Stable reconstructions in Hilbert spaces and the resolution of the Gibbs
phenomenon. Appl. Comput. Harmon. Anal. (to appear), 2011.
[5] B. Adcock, A. C. Hansen, E. Herrholz, and G. Teschke. Generalized sampling: extension to frames,
ill-posed problems, and infinite-dimensional compressed sensing. Submitted, 2011.
[6] A. Aldroubi. Oblique projections in atomic spaces. Proc. Amer. Math. Soc., 124(7):2051–2060, 1996.
[7] W. Arveson. C ∗ -algebras and numerical linear algebra. J. Funct. Anal., 122(2):333–360, 1994.
[8] T. Blu, P. L. Dragotti, M. Vetterli, P. Marziliano, and L. Coulout. Sparse sampling of signal innovations.
IEEE Signal Process. Mag., 25(2):31–40, 2008.
[9] A. Böttcher. Infinite matrices and projection methods. In Lectures on operator theory and its applications (Waterloo, ON, 1994), volume 3 of Fields Inst. Monogr., pages 1–72. Amer. Math. Soc.,
Providence, RI, 1996.
[10] A. Böttcher and B. Silbermann. Introduction to large truncated Toeplitz matrices. Universitext.
Springer-Verlag, New York, 1999.
[11] S. Brenner and R. L. Scott. The Mathematical Theory of Finite Element Methods. Springer, 2nd edition,
2005.
[12] E. J. Candès. An introduction to compressive sensing. IEEE Signal Process. Mag., 25(2):21–30, 2008.
45
[13] E. J. Candès and Y. Plan. A probabilistic and RIPless theory of compressed sensing. Submitted, 2010.
[14] E. J. Candès and J. Romberg. Sparsity and incoherence in compressive sampling. Inverse Problems,
23(3):969–985, 2007.
[15] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from
highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489–509, 2006.
[16] E. J. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.
[17] E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: universal encoding
strategies? IEEE Trans. Inform. Theory, 52(12):5406–5425, 2006.
[18] P. G. Ciarlet. The Finite Element Method for Elliptic Problems. SIAM, 2002.
[19] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. J. Amer.
Math. Soc., 22(1):211–231, 2009.
[20] R. Coifman, F. Geshwind, and Y. Meyer. Noiselets. Appl. Comput. Harmon. Anal., 10(1):27–44, 2001.
[21] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289–1306, 2006.
[22] D. L. Donoho and J. Tanner. Counting faces of randomly projected polytopes when the projection
radically lowers dimension. J. Amer. Math. Soc., 22(1):1–53, 2009.
[23] P. L. Dragotti, M. Vetterli, and T. Blu. Sampling moments and reconstructing signals of finite rate of
innovation: Shannon meets Strang–Fix. IEEE Trans. Signal Process., 55(5):1741–1757, 2007.
[24] I. Ekeland and T. Turnbull. Infinite-dimensional optimization and convexity. Chicago Lectures in
Mathematics. University of Chicago Press, Chicago, IL, 1983.
[25] Y. Eldar. Sampling with arbitrary sampling and reconstruction spaces and oblique dual frame vectors.
Journal of Fourier Analysis and Applications, 9(1):77–96, 2003.
[26] Y. C. Eldar and T. Michaeli. Beyond Bandlimited Sampling. IEEE Signal Process. Mag., 26(3):48–68,
2009.
[27] D. Gross. Recovering low-rank matrices from few coefficients in any basis. Submitted, 2010.
[28] A. C. Hansen. On the approximation of spectra of linear operators on Hilbert spaces. J. Funct. Anal.,
254(8):2092–2126, 2008.
[29] A. C. Hansen. On the solvability complexity index, the n-pseudospectrum and approximations of
spectra of operators. J. Amer. Math. Soc., 24(1):81–124, 2011.
[30] E. Herrholz and G. Teschke. Compressive sensing principles and iterative sparse recovery for inverse
and ill-posed problems. Inverse Problems, 26(12):125012, 2010.
[31] A. J. Jerri. The Shannon sampling theorem – its various extensions and applications: A tutorial review.
Proc. IEEE, 65:1565–1596, 1977.
[32] T. W. Körner. Fourier Analysis. Cambridge University Press, 1988.
[33] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed Sensing MRI. Signal Processing
Magazine, IEEE, 25(2):72–82, March 2008.
[34] C. McDiarmid. Concentration. In Probabilistic methods for algorithmic discrete mathematics, volume 16 of Algorithms Combin., pages 195–248. Springer, Berlin, 1998.
[35] S. K. Mitter. Convex optimization in infinite dimensional spaces. In Recent advances in learning and
control, volume 371 of Lecture Notes in Control and Inform. Sci., pages 161–179. Springer, London,
2008.
46
[36] A. Quarteroni and A. Valli. Numerical Approximation of Partial Differential Equations. Springer–
Verlag, 1994.
[37] H. Rauhut and R. Ward. Sparse legendre expansions via l1-minimization. Submitted, 2010.
[38] M. Rudelson. Random vectors in the isotropic position. J. Funct. Anal., 164(1):60–72, 1999.
[39] M. Talagrand. New concentration inequalities in product spaces. Invent. Math., 126(3):505–563, 1996.
[40] J. A. Tropp. On the conditioning of random subdictionaries. Appl. Comput. Harmon. Anal., 25(1):1–24,
2008.
[41] M. Unser. Sampling–50 years after Shannon. Proc. IEEE, 88(4):569–587, 2000.
[42] M. Unser and A. Aldroubi. A general sampling theory for nonideal acquisition devices. IEEE Trans.
Signal Process., 42(11):2915–2925, 1994.
[43] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Trans.
Signal Process., 50(6):1417–1428, 2002.
47
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement