New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements

New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements
New Analysis of Manifold Embeddings and Signal
Recovery from Compressive Measurements
Armin Eftekhari and Michael B. Wakin∗
arXiv:1306.4748v4 [cs.IT] 1 May 2014
Department of Electrical Engineering and Computer Science, Colorado School of Mines
June 2013; Revised April 2014
Abstract
Compressive Sensing (CS) exploits the surprising fact that the information contained in a
sparse signal can be preserved in a small number of compressive, often random linear measurements of that signal. Strong theoretical guarantees have been established concerning the
embedding of a sparse signal family under a random measurement operator and on the accuracy
to which sparse signals can be recovered from noisy compressive measurements. In this paper,
we address similar questions in the context of a different modeling framework. Instead of sparse
models, we focus on the broad class of manifold models, which can arise in both parametric
and non-parametric signal families. Using tools from the theory of empirical processes, we
improve upon previous results concerning the embedding of low-dimensional manifolds under
random measurement operators. We also establish both deterministic and probabilistic instanceoptimal bounds in `2 for manifold-based signal recovery and parameter estimation from noisy
compressive measurements. In line with analogous results for sparsity-based CS, we conclude
that much stronger bounds are possible in the probabilistic setting. Our work supports the
growing evidence that manifold-based models can be used with high accuracy in compressive
signal processing.
Keywords. Manifolds, Compressive Sensing, dimensionality reduction, random projections, manifold embeddings, signal recovery, parameter estimation.
AMS Subject Classification. 53A07, 57R40, 62H12, 68P30, 94A12, 94A29.
1
Introduction
1.1
Concise signal models
A significant byproduct of the Information Age has been an explosion in the sheer quantity of raw
data demanded from sensing systems. From digital cameras to mobile devices, scientific computing
to medical imaging, and remote surveillance to signals intelligence, the size (or dimension) N of a
typical desired signal continues to increase. Naturally, the dimension N imposes a direct burden on
the various stages of the data processing pipeline, from the data acquisition itself to the subsequent
transmission, storage, and/or analysis.
Fortunately, in many cases, the information contained within a high-dimensional signal actually
obeys some sort of concise, low-dimensional model. Such a signal may be described as having just
∗
Email: aeftekha,[email protected] This work was partially supported by NSF grant DMS-0603606, DARPA
grant HR0011-08-1-0078, NSF grant CCF-0830320, and NSF CAREER grant CCF-1149225.
1
K N degrees of freedom for some K. Periodic signals bandlimited to a certain frequency are one
example; they live along a fixed K-dimensional linear subspace of RN . Piecewise smooth signals
are an example of sparse signals, which can be written as a succinct linear combination of just
K elements from some basis such as a wavelet dictionary. Still other signals may live along Kdimensional submanifolds of the ambient signal space RN ; examples include collections of signals
observed from multiple viewpoints in a camera or sensor network. In general, the conciseness of
these models suggests the possibility for efficient processing and compression of these signals.
1.2
Compressive measurements
Recently, the conciseness of certain signal models has led to the use of compressive measurements
for simplifying the data acquisition process. Rather than designing a sensor to measure a signal
x ∈ RN , for example, it often suffices to design a sensor that can measure a much shorter vector
y = Φx, where Φ is a linear measurement operator represented as an M × N matrix, and where
typically M N . As we discuss below in the context of Compressive Sensing (CS), when Φ is
properly designed, the requisite number of measurements M typically scales with the information
level K of the signal, rather than with its ambient dimension N .
Surprisingly, the requirements on the measurement matrix Φ can often be met by choosing
Φ randomly from an acceptable distribution. Most commonly, the entries of Φ are chosen to be
independent and identically distributed (i.i.d.) Gaussian random variables, although the use of
structured random matrices is on the rise [25, 37]. Physical architectures have been proposed for
hardware that will enable the acquisition of signals using compressive measurements [10, 22, 30, 36];
many of these collect the compressive measurements y of a signal x directly, without explicitly
computing a matrix multiplication on board. The potential benefits for data acquisition are numerous. These systems can enable simple, low-cost acquisition of a signal directly in compressed
form without requiring knowledge of the signal structure in advance. Some of the many possible
applications include distributed source coding in sensor networks [23], medical imaging [40], and
high-rate analog-to-digital conversion [10, 30, 57]. We note that, in all cases, the measurement matrix Φ must be known to any decoder that will be used to process the compressed measurement
vector y, but with suitable synchronization between the compressive measurement system and the
decoder (e.g., exchanging a seed used to initialize a random number generator), it it not necessary
for Φ to be explicitly transmitted along with y.
1.3
Signal understanding from compressive measurements
Having acquired a signal x in compressed form (in the form of a measurement vector y), there are
many questions that may then be asked of the signal. These include:
Q1. Recovery: What was the original signal x?
Q2. Parameter estimation: Supposing x was generated from a K-dimensional parametric model,
what was the original K-dimensional parameter that generated x?
Given only the measurements y (possibly corrupted by noise), solving either of the above problems
requires exploiting the concise, K-dimensional structure inherent in the signal.1 CS addresses
questions Q1 and Q2 under the assumption that the signal x is K-sparse (or approximately so) in
some basis or dictionary; in Section 2.1 we outline some key theoretical bounds from CS regarding
the accuracy to which these questions may be answered.
1
Other problems, such as finding the nearest neighbor to x in a large database of signals [34], can also be solved
using compressive measurements and do not require assumptions about the concise structure in x.
2
g(t−θ)
t
(a)
0
θ
1
(b)
Figure 1: (a) The articulated signal fθ (t) = g(t − θ) is defined via shifts of a primitive function g, where
g is a Gaussian pulse. Each signal is sampled at N points, and as θ changes, the resulting signals trace out
a 1-D manifold in RN . (b) Projection of the manifold from RN into R3 via a random 3 × N matrix; the
color/shading represents different values of θ ∈ [0, 1].
1.4
Manifold models for signal understanding
In this paper, we will address these questions in the context of a different modeling framework for
concise signal structure. Instead of sparse models, we focus on the broad class of manifold models,
which arise both in settings where a K-dimensional parameter θ controls the generation of the
signal and also in non-parametric settings.
As a very simple illustration, consider the articulated signal in Figure 1(a). We let g(t) be a
fixed continuous-time Gaussian pulse centered at t = 0 and consider a shifted version of g denoted
as the parametric signal fθ (t) := g(t − θ) with t, θ ∈ [0, 1]. We then suppose the discrete-time
signal x = xθ ∈ RN arises by sampling the continuous-time signal fθ (t) uniformly in time, i.e.,
xθ (n) = fθ (n/N ) for n = 1, 2, . . . , N . As the parameter θ changes, the signals xθ trace out a
continuous one-dimensional (1-D) curve M = {xθ : θ ∈ [0, 1]} ⊂ RN . The conciseness of our
model (in contrast with the potentially high dimension N of the signal space) is reflected in the
low dimension of the path M.
In the real world, manifold models may arise in a variety of settings. A K-dimensional parameter
θ could reflect uncertainty about the 1-D timing of the arrival of a signal (as in Figure 1(a); see
also [24]), the 2-D orientation and position of an edge in an image, the 2-D translation of an
image under study [46], the multiple degrees of freedom in positioning a camera or sensor to
measure a scene [15], the physical degrees of freedom in an articulated robotic or sensing system,
or combinations of the above. Manifolds have also been proposed as approximate models for signal
databases such as collections of images of human faces or of handwritten digits [4, 33, 58].
Consequently, the potential applications of manifold models are numerous in signal processing.
In some applications, the signal x itself may be the object of interest, and the concise manifold
model may facilitate the acquisition or compression of that signal. Alternatively, in parametric
settings one may be interested in using a signal x = xθ to infer the parameter θ that generated
that signal. In an application known as manifold learning, one may be presented with a collection
of data {xθ1 , xθ2 , . . . , xθn } sampled from a parametric manifold and wish to discover the underlying
parameterization that generated that manifold. Multiple manifolds can also be considered simultaneously, for example in problems that require recognizing an object from one of n possible classes,
where the viewpoint of the object is uncertain during the image capture process. In this case, we
may wish to know which of n manifolds is closest to the observed image x.
While any of these questions may be answered with full knowledge of the high-dimensional signal
3
x ∈ RN , there is growing theoretical and experimental support that they can also be answered from
only compressive measurements y = Φx. In past work [3], we have shown that given a sufficient
number M of random measurements, one can ensure with high probability that a manifold M ⊂ RN
has a stable embedding in the measurement space RM under the operator Φ, such that pairwise
Euclidean and geodesic distances are approximately preserved on its image ΦM. We will discuss
this in more detail later in Section 3, but a key aspect is that the number of requisite measurements
M is linearly proportional to the information level of the signal, i.e., the dimension K of the
manifold. In that work, the number of measurements was also logarithmically dependent on the
ambient dimension N , although this dependence was later removed in the asymptotic case in [12]
using a different set of assumptions on the manifold.
The first contribution of this paper—presented in Section 3—is that we provide an improved
lower bound on the number of random measurements to guarantee a stable embedding of a signal
manifold. In particular, we make the same assumptions on the manifold as in our past work [3]
but provide a measurement bound that is independent of the ambient dimension N . Our bound
is non-asymptotic, and we provide explicit constants. Additionally we point out that this result
is generic in the sense that it applies to any compact and smooth submanifold of RN for which
certain geometric properties (namely volume, dimension, and condition number) are known.
In order to do this, we use tools from the theory of empirical processes (namely, the idea of
“generic chaining” [56]), which have recently been used to develop state-of-the-art RIP results
for structured measurement matrices in CS [24, 37, 48, 50–53, 57]. More elementary arguments
(e.g., involving simple concentration of measure inequalities) have previously been used in CS
(see, e.g., [2]) for deriving RIP bounds for unstructured i.i.d. measurement matrices, and we also
used such arguments in [3] to derive a manifold embedding guarantee. However, it appears that
the stronger machinery of the empirical process approach is necessary to derive stronger bounds,
both in RIP problems and in manifold embedding problems. A chaining argument was employed
in [12], and in this paper we present a chaining argument that is suitable for studying the manifold
embedding problem under our set of assumptions on the manifold. Because this chaining framework
is fairly technical, we develop it entirely in the appendices so that the body of the paper will be as
self-contained and expository as possible for someone seeking merely to understand the substance
and context of our results. (We do, however, include an example in the body of the paper to
provide insight into the machinery developed in the appendices.) We also observe that similar
results are attainable through the use of the Dudley inequality [39], but a direct argument (as the
one presented here) has the advantages of potentially better exploiting the geometry of the model
and therefore producing tighter bounds, offering improved insight into the problem, and being more
amenable to future improvements to our arguments.
As a very simple illustration of the embedding phenomenon, Figure 1(b) presents an experiment where just M = 3 compressive measurements are acquired from each point xθ described in
Figure 1(a). We let N = 1024 and construct a randomly generated 3 × N matrix Φ whose entries
are i.i.d. Gaussian random variables with zero mean and variance of 1/3. Each point xθ from the
original manifold M ⊂ R1024 maps to a unique point Φxθ in R3 ; the manifold embeds in the lowdimensional measurement space. Given any y = Φxθ0 for θ0 unknown, then, it is possible to infer
the value θ0 using only knowledge of the parametric model for M and the measurement operator
Φ. Moreover, as the number M of compressive measurements increases, the manifold embedding
becomes much more stable and remains highly self-avoiding.
Indeed, there is strong evidence that, as a consequence of this phenomenon, questions such as
Q1 (signal recovery) and Q2 (parameter estimation) can be accurately solved using only compressive measurements of a signal x, and that these procedures are robust to noise and to deviations of
the signal x away from the manifold M [14, 54, 61]. Additional theoretical and empirical justifica4
tion has followed for the manifold learning [32] and multiclass recognition problems [14] described
above. Consequently, many of the advantages of compressive measurements that are beneficial
in sparsity-based CS (low-cost sensor design, reduced transmission requirements, reduced storage
requirements, lack of need for advance knowledge of signal structure, simplified computation in the
low-dimensional space RM , etc.) may also be enjoyed in settings where manifold models capture the
concise signal structure. Moreover, the use of a manifold model can often capture the structure of a
signal in many fewer degrees of freedom K than would be required in any sparse representation, and
thus the measurement rate M can be greatly reduced compared to sparsity-based CS approaches.
The second contribution of this paper—presented in Section 4—is that we establish theoretical
bounds on the accuracy to which questions Q1 (signal recovery) and Q2 (parameter estimation)
may be answered. To do this, we rely largely on the new analytical chaining framework described
above. We consider both deterministic and probabilistic instance-optimal bounds, and we see strong
similarities to analogous results that have been derived for sparsity-based CS. As with sparsitybased CS, we show for manifold-based CS that for any fixed Φ, uniform deterministic `2 recovery
bounds for recovery of all x are necessarily poor. We then show that, as with sparsity-based CS,
providing for any x a probabilistic bound that holds over most Φ is possible with the desired
accuracy. We consider both noise-free and noisy measurement settings and compare our bounds
with sparsity-based CS. Finally, it should be noted that our results concerning question Q1 are
independent of the parametrization of the manifold, whereas, in contrast, our results concerning
question Q2 are specific to the given parametrization of the manifold.
We feel that a third contribution of this paper comes in the form of the analytical tools we use
to study the above problems. Our chaining argument allows us to study not only the embedding
problem (as in [12]) but also Q1 and Q2. Moreover, in Appendix A, which we call the “Toolbox,”
we present a collection of implications of our assumption that the manifold has bounded condition
number (see Section 2.2 for definition). This elementary property, also known as the reach of a
manifold in the geometric measure theory literature [26], has become somewhat popular in the
analysis of manifold models for signal processing (e.g., see [3, 14, 15, 35, 43, 59, 64]). The seminal
paper [43] (also see [26]) contains a collection of implications of bounded condition number that
have been used directly or indirectly in numerous works, including [3, 14, 15, 35, 59, 64]. We restate
some of these implications in the Toolbox. Unfortunately, after very careful study we were unable
to confirm for ourselves some of the original proofs appearing in [43]. Therefore, some of the
statements and proofs in the Toolbox below differ slightly from their original counterparts in [43].
We hope that these results will provide a useful reference for the continued study of manifolds with
bounded condition number.
1.5
Paper organization
Section 2 provides the necessary background on sparsity-based CS and on manifold models to
place our work in the proper context. In Section 3, we state our improved bound regarding stable
embeddings of manifolds. In Section 4, we then formalize our criteria for answering questions Q1
and Q2 in the context of manifold models. We first confront the task of deriving deterministic
instance-optimal bounds in `2 and then consider probabilistic instance-optimal bounds in `2 . We
conclude in Section 5 with a final discussion. The Toolbox (Appendix A) establishes a collection
of useful results in differential geometry that are frequently used throughout our technical proofs,
which appear in the remaining appendices.
5
2
Background
2.1
2.1.1
Sparsity-Based Compressive Sensing
Sparse models
The concise modeling framework used in CS is sparsity. Consider a signal x ∈ RN and suppose
the N × N matrix Ψ = [ψ1 ψ2 · · · ψN ] forms an orthonormal basis for RN . We say x is Ksparse in the basis Ψ if for α ∈ RN we can write x = Ψα, where kαk0 = K < N . (The `0 -norm
notation counts the number of nonzeros of the entries of α.) In a sparse representation, the actual
information content of a signal is contained exclusively in the K < N positions and values of its
nonzero coefficients.
For those signals that are approximately sparse, we may measure their proximity to sparse
signals as follows. We define αK ∈ RN to be the vector containing only the largest K entries of
α in magnitude, with the remaining entries set to zero. Similarly, we let xK = ΨαK . It is then
common to measure the proximity to sparseness using either kα − αK k1 or kα − αK k (the latter of
which equals kx − xK k because Ψ is orthonormal). Here and elsewhere in this paper, k · k stands
for the `2 norm.
2.1.2
Stable embeddings of sparse signal families
CS uses the concept of sparsity to simplify the data acquisition process. Rather than designing
a sensor to measure a signal x ∈ RN , for example, it often suffices to design a sensor that can
measure a much shorter vector y = Φx, where Φ is a linear measurement operator represented as
an M × N matrix, and typically M N .
The measurement matrix Φ must have certain properties in order to be suitable for CS. One
desirable property (which leads to the theoretical results we mention in Section 2.1.3) is known as
the Restricted Isometry Property (RIP) [6–8]. We say a matrix Φ meets the RIP of order K with
respect to the basis Ψ if for some δK > 0,
(1 − δK ) kαk ≤ kΦΨαk ≤ (1 + δK ) kαk
holds for all α ∈ RN with kαk0 ≤ K. Intuitively, the RIP can be viewed as guaranteeing a stable
embedding of the collection of K-sparse signals within the measurement space RM . In particular,
supposing the RIP of order 2K is satisfied with respect to the basis Ψ, then for all pairs of K-sparse
signals x1 , x2 ∈ RN , we have
(1 − δ2K ) kx1 − x2 k ≤ kΦx1 − Φx2 k ≤ (1 + δ2K ) kx1 − x2 k .
(1)
Although deterministic constructions of matrices meeting the RIP with few rows (ideally proportional to the sparsity level K) are still a work in progress, it is known that the RIP can often
be met by choosing Φ randomly from an acceptable distribution. For example, let Ψ be a fixed
orthonormal basis for RN and suppose that
M ≥ C1 K log(N/K)
(2)
for some constant C1 . Then supposing that the entries of the M × N matrix Φ are drawn as i.i.d.
1
Gaussian random variables with mean 0 and variance M
, it follows that with high probability Φ
meets the RIP of order K with respect to the basis Ψ. Two aspects of this construction deserve
special notice: first, the number M of measurements required is linearly proportional to the information level K (and logarithmic in the ambient dimension N ), and second, neither the sparse basis
6
Ψ nor the locations of the nonzero entries of α need be known when designing the measurement
operator Φ. Other random distributions for Φ may also be used, all requiring approximately the
same number of measurements [25, 37, 49].
2.1.3
Sparsity-based signal recovery
Although the sparse structure of a signal x need not be known when collecting measurements
y = Φx, a hallmark of CS is the use of the sparse model in order to facilitate understanding from
the compressive measurements. A variety of algorithms have been proposed to answer Q1 (signal
recovery), where we seek to solve the apparently undercomplete set of M linear equations y = Φx
for N unknowns. The canonical method [5, 8, 18] is known as `1 -minimization and is formulated as
follows: first solve
α
b = argminα0 ∈RN kα0 k1 subject to y = ΦΨα0 ,
(3)
and then set x
b = Ψb
α. This recovery program can also be extended to account for measurement
noise. The following bound is known.
Theorem
√ 1. [9] Suppose that Φ satisfies the RIP of order 2K with respect to Ψ and with constant
δ2K < 2 − 1. Let x ∈ RN , and suppose that y = Φx + n where knk ≤ . Then let
α
b = arg min kα0 k1 subject to y − ΦΨα0 ≤ ,
α0 ∈RN
and set x
b = Ψb
α. Then
1
kx − x
bk = kα − α
bk ≤ C1 K − 2 kα − αK k1 + C2 .
(4)
for constants C1 and C2 .
This result is not unique to `1 minimization; similar bounds have been established for signal
recovery using greedy iterative algorithms OMP [16], ROMP [42], and CoSAMP [41]. Bounds
of this type are extremely encouraging for signal processing. From only M measurements, it is
possible to recover x with quality that is comparable to its proximity to the nearest K-sparse
signal, and if x itself is K-sparse and there is no measurement noise, then x can be recovered
exactly. Moreover, despite the apparent ill-conditioning of the inverse problem, the measurement
noise is not dramatically amplified in the recovery process.
These bounds are known as deterministic, instance-optimal bounds because they hold deterministically for any Φ that meets the RIP, and because for a given Φ they give a guarantee for
recovery of any x ∈ RN based on its proximity to the concise model.
The use of `1 as a measure for proximity to the concise model (on the right hand side of (4))
arises due to the difficulty in establishing `2 bounds on the right hand side. Indeed, it is known that
deterministic `2 instance-optimal bounds cannot exist that are comparable to (4). In particular,
for any Φ, to ensure that kx − x
bk ≤ C3 kx − xK k for all x, it is known [13] that this requires that
M ≥ C4 N regardless of K.
However, it is possible to obtain an instance-optimal `2 bound for sparse signal recovery in the
noise-free setting by changing from a deterministic formulation to a probabilistic one [13, 17]. In
particular, by considering any given x ∈ RN , it is possible to show that for most random Φ, letting
the measurements y = Φx, and recovering x
b via `1 -minimization (3), it holds that
kx − x
bk ≤ C5 kx − xK k .
7
(5)
While the proof of this statement [17] does not involve the RIP directly, it holds for many of
the same random distributions that work for RIP matrices, and it requires the same number of
measurements (2) up to a constant.
Similar bounds hold for the closely related problem of “sketching,” where the goal is to use
the compressive measurement vector y to identify and report only approximately K expansion
coefficients that best describe the original signal, i.e., a sparse approximation to αK . In the case
where Ψ = I, an efficient randomized measurement process coupled with a customized recovery algorithm [29] provides signal sketches that meet a deterministic mixed-norm `2 /`1 instance-optimal
bound analogous to (4) in the noise-free setting. A desirable aspect of this construction is that the
computational complexity scales with only log(N ) (and is polynomial in K); this is possible because
only approximately K pieces of information must be computed to describe the signal. Though at
a higher computational cost, the aforementioned greedy algorithms (such as CoSaMP) for signal
recovery can also be interpreted as sketching techniques in that they produce explicit sparse approximations to αK . Finally, for signals that are sparse in the Fourier domain (Ψ consists of the
DFT vectors), probabilistic `2 /`2 instance-optimal bounds have been established for a specialized
sketching algorithm [27, 28] that are analogous to (5).
2.2
2.2.1
Manifold models and properties
Overview
As we have discussed in Section 1.4, there are many possible modeling frameworks for capturing
concise signal structure. Among these possibilities are the broad class of manifold models.
Manifold models arise, for example, in settings where the signals of interest vary continuously as
a function of some K-dimensional parameter. Suppose, for instance, that there exists some parameter θ that controls the generation of the signal. We let xθ ∈ RN denote the signal corresponding to
the parameter θ, and we let Θ denote the K-dimensional parameter space from which θ is drawn.
In general, Θ itself may be a K-dimensional manifold and need not be embedded in an ambient
Euclidean space. For example, supposing θ describes the 1-D rotation parameter in a top-down
satellite image, we have Θ = S1 .
Under certain conditions on the parameterization θ 7→ xθ , it follows that M := {xθ : θ ∈
Θ} forms a K-dimensional submanifold of RN . An appropriate visualization is that the set M
forms a nonlinear K-dimensional “surface” within the high-dimensional ambient signal space RN .
Depending on the circumstances, we may measure the distance between points two points xθ1 and
xθ2 on the manifold M using either the ambient Euclidean distance kxθ1 − xθ2 k or the geodesic
distance along the manifold, which we denote as dM (xθ1 , xθ2 ). In the case where the geodesic
distance along M equals the native distance in parameter space, i.e., when
dM (xθ1 , xθ2 ) = dΘ (θ1 , θ2 ),
(6)
we say that M is isometric to Θ. The definition of the distance dΘ (θ1 , θ2 ) depends on the appropriate metric for the parameter space Θ; supposing Θ is a convex subset of Euclidean space, then
we can let dΘ (θ1 , θ2 ) = kθ1 − θ2 k.
While our discussion above concentrates on the case of manifolds M generated by underlying parameterizations, we stress that manifolds have also been proposed as approximate low-dimensional
models within RN for nonparametric signal classes such as images of human faces or handwritten
digits [4, 33, 58]. These signal families may also be considered.
The results we present in this paper will make reference to certain characteristic properties of
the manifold under study. These terms are originally defined in [3, 43] and are repeated here for
8
completeness. First, our results will depend on a measure of regularity for the manifold. For this
purpose, we adopt the notion of the condition number of a manifold, which is also known as the
reach of a manifold in the geometric measure theory literature [26, 43].
Definition 1. [43] Let M be a compact Riemannian submanifold of RN . The condition number
is defined as 1/τ , where τ is the largest number having the following property: The open normal
bundle about M of radius r is embedded in RN for all r < τ .
The condition number 1/τ controls both local properties and global properties of the manifold.
Its role is summarized in two key relationships (see the Toolbox and [43] for more detail). First, the
the curvature of any unit-speed geodesic path on M is bounded by 1/τ . Second, at long geodesic
distances, the condition number controls how close the manifold may curve back upon itself. For
example, supposing x1 , x2 ∈ M with dM (x1 , x2 ) > τ , it must hold that kx1 − x2 k > τ /2.
We continue with a brief but concrete example to illustrate specific values for these quantities.
Let N > 0, κ > 0, Θ = R mod 2π, and suppose xθ ∈ RN is given by
xθ = [κ cos(θ); κ sin(θ); 0; 0; · · · 0]T .
In this case, M = {xθ : θ ∈ Θ} forms a circle of radius κ in the x(1), x(2) plane. The manifold
dimension K = 1, and the condition number 1/τ = 1/κ. We also refer in our results to the Kdimensional volume of M, denoted by VM , which in this example corresponds to the circumference
2πκ of the circle.
We conclude this section with a less trivial example of computing the condition number (or,
alternatively, reach).
2.2.2
Example: Complex exponential curve
For an integer fC , set N := 2fC + 1. Let β : R → CN denote the complex exponential curve defined
as


e−i2πfC t
 −i2π(fC −1)t 
 e



.

..
βt = β(t) = 
(7)


 i2π(fC −1)t 
 e

i
2πf
t
C
e
for t ∈ R. The following result, proved in Appendix B, gives an estimate of the condition number
(which we denote here by 1/τβ ) of the complex exponential curve β.2 The reader may refer to [63]
for related computations concerning the complex exponential curve.
Lemma 1. For the complex exponential curve β in CN (as defined in (7)), let 1/τβ denote its
condition number. Then, for some integer Nsine and (known) constant αsine < 1, the following
holds if N > Nsine :
√
√
αsine N ≤ τβ ≤ N .
2
Unlike β, which is a subset of CN , its real or imaginary parts live in RN and are perhaps more consistent with
the rest of this paper (which studies submanifolds of RN ). However, finding the condition number of re(β) or im(β)
is far more tedious and therefore not pursued here for the sake of the clarity of our exposition.
9
3
Stable embeddings of manifolds
In cases where the signal class of interest M forms a low-dimensional submanifold of RN , we have
theoretical justification that the information necessary to distinguish and recover signals x ∈ M can
be well-preserved under a sufficient number of compressive measurements y = Φx. In particular, it
was first shown in [3] that an RIP-like property holds for families of manifold-modeled signals. The
result stated that, under a random projection onto RM , pairwise distances between the points on M
are approximately preserved with high probability, provided that M , the number of measurements,
is large enough. Mainly, M should scale linearly with the dimension K of M and logarithmically
with the ambient dimension N . The dependence on N was later removed in [12], which used a
different set of assumptions on the manifold to help derive a sharper lower bound on the requisite
number of random measurements. Unfortunately, the results given in [12] hold only as the isometry
constant → 0 in (1), with asymptotic threshold and constants fixed but unspecified. The manifold
properties assumed in [12] are arguably more complicated and less commonly used than the manifold
volume and condition number which are at the heart of our results. (On the other hand, there do
exist manifolds where the properties assumed in [12] clearly provide a stronger analysis.)
In this section, we establish an improved lower bound on M to ensure a stable embedding of a
manifold. We make the same assumptions on the manifold as in our past work [3] but provide a
measurement bound that is independent of the ambient dimension N . Our bound holds for every
≤ 1/3 and we provide explicit constants. The proof, presented in Appendix C, draws from the
ideas in generic chaining [56], which have been recently used to develop state-of-the-art RIP results
for structured measurement matrices in CS [24, 37, 48, 50–53, 57]. As in [12], we control the failure
probability of the manifold embedding by forming a so-called chain on a sequence of increasingly
finer covers on the index set of the random process [39, 56]. Aside from delivering an improved
bound (and also allowing us to study Q1 and Q2 in Section 4), we hope that our exposition in this
paper will encourage yet more researchers in the field of CS to use this powerful technique.
Theorem 2. Let M be a compact K-dimensional Riemannian submanifold of RN having condition
number 1/τ and volume VM . Conveniently assume that3
VM
≥
τK
21
√
2 K
K
.
(8)
Fix 0 < ≤ 1/3 and 0 < ρ < 1. Let Φ be a random M × N matrix populated with i.i.d. zero-mean
Gaussian random variables with variance of 1/M with
√ !
!
K
8
−2
2
M ≥ 18 max 24K + 2K log
+ log(2VM ), log
.
(9)
τ 2
ρ
Then with probability at least 1 − ρ the following statement holds: For every pair of points x1 , x2 ∈
M,
(1 − ) kx1 − x2 k ≤ kΦx1 − Φx2 k ≤ (1 + ) kx1 − x2 k .
(10)
The proof of the above result can be found in Appendix C. In essence, manifolds with higher
volume or with greater curvature have more complexity, which leads to an increased number of
3
Theorem 2 still holds, with a worse (larger) lower bound in (9), after relaxing the assumption in (8). One example
of a manifold that does satisfy the assumption in (8) is the complex exponential curve described in Section 2.2.2, as
long as N ≥ 7.
10
measurements (9). By comparing (1) with (10), we see a strong analogy to the RIP of order 2K.
This theorem establishes that, like the class of K-sparse signals, a collection of signals described by a
K-dimensional manifold M ⊂ RN can have a stable embedding in an M -dimensional measurement
space. Moreover, the requisite number of random measurements M is once again almost linearly
proportional to the information level (or number of degrees of freedom) K. It is important to note
that in (9), the combined dependence on the manifold dimension K, condition number 1/τ , and
volume VM cannot, generally speaking, be improved. In particular, consider the case where M is
a K-dimensional unit ball in RN , that is M = BK . Clearly, in this case, τ = 1. Additionally, from
2 ∝ −K log K. As a result, plugging for
(72), we observe that VM = VBK ∝ K −K/2√and so log VM
VM back into (10) cancels the term 2K log K = K log K on the right hand side of (9). It follows
that the lower bound in (9) scales with K (rather than K log K) in this case. We conclude that, in
this special case, the lower bound in (10) is optimal (up to a constant factor).
As was the case with the RIP for sparse signal processing, this sort of result has a number
of possible implications for manifold-based signal processing. First, individual signals obeying a
manifold model can be acquired and stored efficiently using compressive measurements, and it is
unnecessary to employ the manifold model itself as part of the compression process. Rather, the
model needs only to be used for signal understanding from the compressive measurements. Second,
problems such as Q1 (signal recovery) and Q2 (parameter estimation) can be addressed, as we
discuss in Section 4. Aside from this theoretical analysis, we have reported promising experimental recovery/estimation results with various classes of parametric signals [14, 61]. Also, taking a
different analytical perspective (a statistical one, assuming additive white Gaussian measurement
noise), estimation-theoretic quantities such as the Cramer-Rao Lower Bound (for a specialized set
of parametric problems) have been shown to be preserved in the compressive measurement space
as a consequence of the stable embedding [47]. Third, the stable embedding results can also be extended to the case of multiple manifolds that are simultaneously embedded [14]; this allows both the
classification of an observed object to one of several possible models (different manifolds) and the
estimation of a parameter within that class (position on a manifold). Fourth, collections of signals
obeying a manifold model (such as multiple images of a scene photographed from different perspectives) can be acquired using compressive measurements, and the resulting manifold structure will be
preserved among the suite of measurement vectors in RM [15,46]. Fifth, we have provided empirical
and theoretical support for the use of manifold learning in the reduced-dimensional space [32]; this
can dramatically simplify the computational and storage demands on a system for processing large
databases of signals.
Before presenting an application of Theorem 2 in the next section, we would like to outline
the chaining argument used in its proof through an example. Suppose that M is the unit circle
embedded in RN and that we observe this manifold through a measurement operator Φ ∈ RM ×N .
To study the quality of embedding, we first need to identify the set of all secants connecting
two points in M. In this example, the set of all normalized secants of M (which we denote by
U (M)) also forms a unit circle and equals M, i.e., U (M) = M. Let {Cj } be a sequence of
increasingly finer covers on M (or equivalently on U (M)). Constructing a sequence of covers for
the secants of a manifold in general is studied in Appendix C.1. For an arbitrary normalized secant
y = (x1 − x2 )/kx1 − x2 k with x1 , x2 ∈ M, let πj (y) represent the nearest point to y on the jth
cover (for every j ≥ 0). This construction is illustrated in Figure 2. We can use the telescoping
sum
X
y = π0 (y) +
(πj (y) − πj−1 (y))
j≥1
11
Figure 2: A sequence of increasingly finer covers for the unit circle. Also shown is an arbitrary point y and
its projection πj (y) onto each cover.
to write that
(
)
kΦx1 − Φx2 k
>1+
P
sup
x1 ,x2 ∈M kx1 − x2 k
(
)
sup kΦyk > 1 + =P
y∈U (M)
(
)
sup kΦyk > 1 + =P
y∈M



X 
≤ P sup kΦπ0 (y)k +
sup kΦ(πj (y) − πj−1 (y))k > 1 +
j
y∈M

y∈M
j≥1
j≥0



X
X 
≤ P max kΦpk +
kΦ(p − q)k > 1 +
max
j
p∈C0

(p,q)∈Cj ×Cj−1
j≥1
j≥0
X
≤ #C0 · max P {kΦpk > 1 + 0 } +
P {kΦ(p − q)k > j } ,
#Cj · #Cj−1
max
X
p∈C0
(p,q)∈Cj+1 ×Cj
j≥1
(11)
P
where {j } is an exponentially-fast decreasing sequence of constants such that j j = . The third
line uses the fact that U (M) = M here. The last line above uses the union bound. Therefore
the failure probability of obtaining a stable embedding of M is controlled by an infinite series
that only involves the sequence of covers constructed earlier. As we will see in more detail later,
given enough measurements, the (exponentially growing) size of the covers #Cj can be balanced
by the (exponentially decreasing) failure probabilities in the last line of (11) to guarantee that
overall failure probability is exponentially small. A more general version of this chaining argument
is detailed in Appendix C.2.
4
Manifold-based signal recovery and parameter estimation
In this section, we establish theoretical bounds on the accuracy to which problems Q1 (signal
recovery) and Q2 (parameter estimation) can be solved under a manifold model. To be specific, let
us consider a length-N signal x that is not necessarily K-sparse, but rather that we assume lives
on or near some known K-dimensional manifold M ⊂ RN . From a collection of measurements
y = Φx + n,
12
where Φ is a random M × N matrix and n ∈ RM is an additive noise vector, we would like to
recover either x or a parameter θ that generates x.
For the signal recovery problem, we will consider the following as a method for estimating x.
Solve the program
min
ky − Φx0 k,
(12)
0
x ∈M
and let x
b, as an estimate of x, be a solution of the above program. We also let x∗ be a solution of
the program
min
kx − x0 k
(13)
0
x ∈M
and, therefore, an optimal nearest neighbor to x on M. To consider signal recovery successful, we
would like to guarantee that kx − x
bk is not much larger than kx − x∗ k.
For the parameter estimation problem, where we presume x ≈ xθ for some θ ∈ Θ, we propose
a similar method for estimating θ from the compressive measurements. Solve the program
min ky − Φxθ0 k,
θ0 ∈Θ
(14)
b as an estimate of θ, be a solution of the above program. Also let θ∗ be a solution of the
and let θ,
program
min
kx − xθ0 k.
(15)
0
θ ∈Θ
θ∗
Here,
is an optimal estimate that could be obtained using the full data x ∈ RN . (If x = xθ
exactly for some θ, then θ∗ = θ; otherwise this formulation allows us to consider signals x that
are not precisely on the manifold M in RN . This generalization has practical relevance; a local
image block, for example, may only approximately resemble a straight edge, which has a simple
parameterization.) To consider parameter estimation successful, we would like to guarantee that
b θ∗ ) is small.
dΘ (θ,
As we will see, bounds pertaining to accurate signal recovery can often be extended to imply
accurate parameter estimation as well. However, the relationships between distance dΘ in parameter
space and distances dM and k · k in the signal space can vary depending on the parametric signal
model under study. Thus, for the parameter estimation problem, our ability to provide generic
b θ∗ ) will be restricted. In this paper we focus primarily on the signal recovery
bounds on dΘ (θ,
problem and provide preliminary results for the parameter estimation problem that pertain most
strongly to the case of isometric parameterizations.
In this paper, we do not confront in depth the question of how a recovery program such as (12)
can be efficiently solved. Some efforts in this direction have recently appeared in [31,35,54]. In [54],
the authors guarantee the success of a gradient-projection algorithm for recovering a signal that
lives exactly on the manifold from noisy compressive measurements. The keys to the success of this
method are a stable embedding of the manifold (as is guaranteed by [3] or our Theorem 2) and the
knowledge of the projection operator onto the manifold within RN . In [35], the authors construct
a geometric multi-resolution approximation of a manifold using a collection of affine subspaces.
A major contribution of that work is a recovery algorithm that works by assigning a measured
signal to the closest projected affine subspace in the compressed domain. Two recovery results
are presented. In the first of these, the number of measurements is independent of the ambient
dimension and the recovery error holds for any given signal in the ambient space. All of this is
analogous to our Theorem 4 (a probabilistic instance-optimal bound in `2 ), but the recovery is
guaranteed for a particular algorithm. Unlike that result, however, our Theorem 4 includes explicit
constants, allows for the consideration of measurement noise, and falls nearly for free out of our
13
novel analytical framework based on chaining. A second result appearing in [35] provides a special
type of deterministic instance-optimal bound for signal recovery and involves embedding arguments
that extend those in [3]. It would be interesting to see if our improved embedding guarantees
in the present paper could now be used to remove the dependence on the ambient dimension
appearing in that result. In [11], the authors provide a Bayesian treatment of the signal recovery
problem using a mixture of (low-rank) Gaussians for approximating the manifold. Furthermore,
some discussion of signal recovery is provided in [3], with application-specific examples provided
in [14, 61]. Unfortunately, it is difficult to propose a single general-purpose algorithm for solving
(12) in RM , as even the problem (13) in RN may be difficult to solve depending on certain nuances
(such as topology) of the individual manifold. Additional complications arise when the manifold
M is non-differentiable, as may happen when the signals x represent 2-D images. However, just as
a multiscale regularization can be incorporated into Newton’s method for solving (13) (see [62]),
an analogous regularization can be incorporated into a compressive measurement operator Φ to
facilitate Newton’s method for solving (12) (see [19,21,61]). For manifolds that lack differentiability,
additional care must be taken when applying results such as Theorem 2. We therefore expect that
the research on signal recovery and approximation based on low-dimensional manifold models will
witness even more growth in the future.
It is also crucial to study the theoretical limits and guarantees in this problem; in what follows,
we will consider both deterministic and probabilistic instance-optimal bounds for signal recovery
and parameter estimation, and we will draw comparisons to the sparsity-based CS results of Section 2.1.3. Our bounds are formulated in terms of generic properties of the manifold (as mentioned
in Section 2.2), which will vary from signal model to signal model. In some cases, calculating
these may be possible, whereas in other cases it may not. Nonetheless, we feel the results in this
paper highlight the relative importance of these properties in determining the requisite number of
measurements.
4.1
A deterministic instance-optimal bound in `2
We begin by seeking an instance-optimal bound. That is, for a measurement matrix Φ that meets
(10) for all x1 , x2 ∈ M, we seek an upper bound for the relative reconstruction error
kx − x
bk
kx − x∗ k
that holds uniformly for all x ∈ RN . We would also like this bound to account for noise in the
measurements. In this section we consider only the signal recovery problem; however, similar
bounds would apply to parameter estimation. We have the following result, which is proved in
Appendix E.
Theorem 3. Fix 0 < ≤ 1/3 and 0 < ρ < 1. Let M be as described in Theorem 2. Assume that Φ
satisfies (10) for all pairs of points x1 , x2 ∈ M. Take x ∈ RN , let y = Φx + n, and let the recovered
estimate x
b and an optimal estimate x∗ be as defined in (12) and (13). Then the following holds:
kx − x
bk ≤ (1 + 2) (2σM (Φ) + 1) kx − x∗ k + (2 + 4)knk,
(16)
where σM (Φ) is the largest singular value of Φ.
In particular, it is interesting to consider the case where Φ is a random Gaussian matrix as
described in Theorem 2. It is well-known, e.g., [60, Corollary 5.35], that the nonzero singular
14
values of Φ satisfy the following:
(
r
P σM (Φ) >
(
r
P σm (Φ) <
N
+1+t
M
)
2 M/2
≤ e−t
,
(17)
)
N
2
− 1 − t ≤ e−t M/2 ,
M
(18)
for t > 0. Here, σM (Φ) and σm (Φ) are the largest and smallest (nonzero) singular values of Φ,
respectively. Suppose that M satisfies (9) so that the promises of Theorem 2 hold except for a
probability of at most ρ. Set t = 1 in (17). Now since e−M/2 ≤ ρ, we have that
r
N
σM (Φ) ≤
+ 2,
M
except with a probability of at most ρ. In combination with Theorem 3, it finally follows that,
except with a probability of at most 2ρ, Φ satisfies (10) for every pair of points on the manifold
and that
!
r
N
(19)
+ 5 kx − x∗ k + (2 + 4)knk,
kx − x
bk ≤ (1 + 2) 2
M
for every x ∈ RN . Here, x
b and x∗ are as defined in (12) and (13). In the
q noise-free case (knk = 0) and
M
N
as N → 0, the bound on the right hand side of (19) grows as (2+4) M
. Unfortunately, this is not
desirable for signal recovery. Supposing, for example, that we wish to ensure kx − x
bk ≤ C6 kx − x∗ k
for all x ∈ RN (assuming no measurement noise), then using the bound (19) we would require that
M ≥ C7 N regardless of the dimension K of the manifold.
The weakness of this bound is a geometric necessity; indeed, the bound itself is quite tight in
general, as the following simple example illustrates. The proof can be found in Appendix F.
Proposition 1. Fix 0 < ≤ 1/3. Let M denote the line segment in RN joining the origin and
e1 := [1, 0, 0, . . . , 0]T . Suppose that Φ satisfies (10) for all x1 , x2 ∈ M and that σm (Φ) ≥ 8/3.
Then, there exists a point x ∈ RN such that if y = Φx (with no measurement noise), and if x
b and
x∗ are defined in (12) and (13),
kx − x
bk
1
≥
σm (Φ).
∗
kx − x k
2(1 + )
In particular, consider the case where Φ is a random Gaussian matrix as described in Theorem 2.
According to (76) and (18) (with t = 1), the following two statements are valid except with a
2
2
probability of at most 2e−M /6 + 2e−M/2 ≤ 4e−M /6 . First, (10) holds for every x1 , x2 ∈ M.
Second,
r
N
σm (Φ) ≥
− 2.
(20)
M
If we assume that N/M ≥ (14/3)2 , we can conclude, using Proposition 1, that Φ satisfies (10) for
every x1 , x2 ∈ M and yet there exists x ∈ RN such that
r
kx − x
bk
1
N
≥
,
kx − x∗ k
4(1 + ) M
2 /6
except with a probability of at most 4e−M on the choice of Φ.
15
It is worth recalling that, as we discussed in Section 2.1.3, similar difficulties arise in sparsitybased CS when attempting to establish a deterministic `2 instance-optimal bound. In particular,
to ensure that kx − x
bk ≤ C3 kx − xK k for all x ∈ RN , it is known [13] that this requires M ≥ C4 N
regardless of the sparsity level K.
In sparsity-based CS, there have been at least two types of alternative approaches. The first
are the deterministic “mixed-norm” results of the type given in (4). These involve the use of an
alternative norm such as the `1 norm to measure the distance from the coefficient vector α to its
best K-term approximation αK . While it may be possible to pursue similar directions for manifoldmodeled signals, we feel this is undesirable as a general approach because when sparsity is no longer
part of the modeling framework, the `1 norm has less of a natural meaning. Instead, we prefer to
seek bounds using `2 , as that is the most conventional norm used in signal processing to measure
energy and error.
Thus, the second type of alternative bounds in sparsity-based CS have involved `2 bounds in
probability, as we discussed in Section 2.1.3. Indeed, the performance of both sparsity-based and
manifold-based CS is often much better in practice than a deterministic `2 instance-optimal bound
might indicate. The reason is that, for any Φ, such bounds consider the worst case signal over all
possible x ∈ RN . Fortunately, this worst case is not typical. As a result, it is possible to derive
much stronger results that consider any given signal x ∈ RN and establish that for most random
Φ, the recovery error of that signal x will be small.
4.2
Probabilistic instance-optimal bounds in `2
For a given measurement operator Φ, our bound in Theorem 3 applies uniformly to any signal in
RN . However, a much sharper bound can be obtained by relaxing the deterministic requirement.
4.2.1
Signal recovery
Our first bound applies to the signal recovery problem. The proof of this result is provided in
Appendix G and, like that of Theorem 2, involves a generic chaining argument.
Theorem 4. Suppose x ∈ RN . Let M be a compact K-dimensional Riemannian submanifold of
RN having condition number 1/τ and volume VM . Conveniently assume that4
VM
≥
τK
21
√
K
K
.
(21)
Fix 0 < ≤ 1/3 and 0 < ρ < 1. Let Φ be an M × N random matrix populated with i.i.d. zero-mean
Gaussian random variables with variance 1/M , chosen independently of x, with M satisfying (9).
Let n ∈ RM , let y = Φx + n, and let the recovered estimate x
b and an optimal estimate x∗ be as
defined in (12) and (13). Then with a probability of at least 1 − 4ρ, the following statement holds:
!
!
r
τ
N
∗
∗
, (1 + 2) 2
+ 5 kx − x k + (2 + 4) knk . (22)
kx − x
bk ≤ min (1 + 3) kx − x k +
40
M
Roughly speaking, one can discern two different operating regimes in (22):
4
Theorem 4 still holds, with worse (larger) constants, after relaxing this assumption.
16
• When x is sufficiently far from the manifold (kx − x∗ k τ ), then (22) roughly reads
kx − x
bk ≤ (1 + 3) kx − x∗ k + (2 + 4) knk .
In particular, by setting knk = 0 in the bound above (which corresponds to the noise-free
setup), we obtain a multiplicative error bound: The recovery error from compressive measurements kx − x
bk is no larger than twice the recovery error from a full set of measurements
∗
kx − x k.
p
• On the other hand, when x is close to the manifold (kx−x∗ k τ M/N ), then (22) becomes
!
r
N
+ 5 kx − x∗ k + (2 + 4) knk .
kx − x
bk ≤ (1 + 2) 2
M
When knk = 0, we still obtain a multiplicative error bound but with a larger factor in front
of kx − x∗ k.
Let us also compare and contrast our bound with the analogous results for sparsity-based CS.
Like Theorem 1, we consider the problem of signal recovery in the presence of additive measurement
noise. Both bounds relate the recovery error kx − x
bk to the proximity of x to its nearest neighbor
in the concise model class (either xK or x∗ depending on the model), and both bounds relate the
recovery error kx − x
bk to the amount knk of additive measurement noise. However, Theorem 1 is
a deterministic bound whereas Theorem 4 is probabilistic, and our bound (22) measures proximity
to the concise model in the `2 norm, whereas (4) uses the `1 norm.
Our bound can also be compared with (5), as both are instance-optimal bounds in probability,
and both use the `2 norm to measure proximity to the concise model. However, we note that unlike
(5), our bound (22) allows the consideration of measurement noise.
4.2.2
Parameter estimation
Above we have derived a bound for the signal recovery problem, with an error metric that measures
the discrepancy between the recovered signal x
b and the original signal x.
However, in some applications it may be the case that the original signal x ≈ xθ∗ , where θ∗ ∈ Θ
is a parameter of interest. In this case we may be interested in using the compressive measurements
y = Φx + n to solve the problem (14) and recover an estimate θb of the underlying parameter.
Of course, these two problems are closely related. However, we should emphasize that guaranteeing kx − x
bk ≈ kx − x∗ k does not automatically guarantee that dM (xθb, xθ∗ ) is small (and
b θ∗ ) is small). If the manifold is shaped like a horseshoe, for
therefore does not ensure that dΘ (θ,
example, then it could be the case that xθ∗ sits at the end of one arm but xθb sits at the end of the
opposing arm. These two points would be much closer in a Euclidean metric than in a geodesic
one.
Consequently, in order to establish bounds relevant for parameter estimation, our concern focuses on guaranteeing that the geodesic distance dM (xθb, xθ∗ ) is itself small. Our next result is
proved in Appendix H.
Theorem 5. Suppose x ∈ RN , and fix 0 < ≤ 1/3 and 0 < ρ < 1. Let M and Φ be as described
in Theorem 4, assuming that M satisfies (9) and that the convenient assumption (21) holds. Let
n ∈ RM , let y = Φx + n, and let the recovered estimate x
b and an optimal estimate x∗ be as defined
17
in (12) and (13). If kx − x∗ k +
statement holds:
10
9
knk ≤ 0.163τ , then with probability at least 1 − 4ρ the following
τ
dM (b
x, x ) ≤ min (4 + 6)kx − x k +
,
20
∗
∗
r
(4 + 8)
!
!
N
∗
+ 12 + 20 kx − x k + (4 + 8)knk.
M
(23)
In several ways, this bound is similar to (22). Both bounds relate the recovery error to the
proximity of x to its nearest neighbor x∗ on the manifold and to the amount knk of additive
measurement noise. Both bounds also have an additive term on the right hand side that is small
in relation to the condition number τ .
In contrast, (23) guarantees that the recovered estimate x
b is near to the optimal estimate x∗ in
terms of geodesic distance along the manifold. Establishing this condition required the additional
assumption that kx − x∗ k+ 10
9 knk ≤ 0.163τ . Because τ relates to the degree to which the manifold
can curve back upon itself at long geodesic distances, this assumption prevents exactly the type of
“horseshoe” problem that was mentioned above, where it may happen that dM (b
x, x∗ ) kb
x − x∗ k.
Suppose, for example, it were to happen that kx − x∗ k ≈ τ and x was approximately equidistant
from both ends of the horseshoe; a small distortion of distances under Φ could then lead to an
estimate x
b for which kx − x
bk ≈ kx − x∗ k but dM (b
x, x∗ ) 0. Similarly, additive noise could cause
a similar problem of “crossing over” in the measurement space. Although our bound provides
no guarantee in these situations, we stress that under these circumstances, accurate parameter
estimation would be difficult (or perhaps even unimportant) in the original signal space RN .
Finally, we revisit the situation where the original signal x ≈ xθ∗ for some θ∗ ∈ Θ (with θ∗
satisfying (15)), where the measurements y = Φx + n, and where the recovered estimate θb satisfies
b θ∗ ). As
(14). We consider the question of whether (23) can be translated into a bound on dΘ (θ,
described in Section 2.2, in signal models where M is isometric to Θ, this is automatic: we have
simply that
b θ∗ ).
dM (xθb, xθ∗ ) = dΘ (θ,
Such signal models are not nonexistent. Work by Donoho and Grimes [20], for example, has
characterized a variety of articulated image classes for which (6) holds or for which dM (xθ1 , xθ2 ) =
C8 dΘ (θ1 , θ2 ) for some constant C8 > 0. In other models it may hold that
C9 dM (xθ1 , xθ2 ) ≤ dΘ (θ1 , θ2 ) ≤ C10 dM (xθ1 , xθ2 )
for constants C9 , C10 > 0. Each of these relationships may be incorporated to the bound (23).
5
Conclusions
In this paper, we have provided an improved and non-asymptotic lower bound on the number of
requisite measurements to ensure a stable embedding of a manifold under a random linear measurement operator. We have also considered the tasks of signal recovery and parameter estimation
using compressive measurements of a manifold-modeled signal, and we have established theoretical
bounds on the accuracy to which these questions may be answered. Although these problems differ
substantially from the mechanics of sparsity-based CS, we have seen a number of similarities that
arise due to the low-dimensional geometry of the each of the concise models. First, we have seen
that a sufficient number of compressive measurements can guarantee a stable embedding of either
type of signal family, and the requisite number of measurements scales essentially linearly with the
18
information level of the signal. Second, we have seen that deterministic instance-optimal bounds in
`2 are necessarily weak for both problems. Third, we have seen that probabilistic instance-optimal
bounds in `2 can be derived that give the optimal scaling with respect to the signal proximity to
the concise model and with respect to the amount of measurement noise. Thus, our work supports
the growing evidence that manifold-based models can be used with high accuracy in compressive
signal processing.
Most of our analysis in this paper rests on a new analytical framework for studying manifold
embeddings that uses tools from the theory of empirical processes (namely, the idea of generic
chaining). While such tools are becoming more widely adopted in the analysis of sparsity-based CS
problems, we believe they are also very promising for studying the interactions of nonlinear signal
families (such as manifolds) with random, compressive measurement operators. We hope that the
chaining argument in this paper will be useful for future investigations along these lines.
Acknowledgements
M.B.W. is grateful to Rich Baraniuk and the Rice CS research team for many stimulating discussions. A.E. thanks Justin Romberg for introducing him to the generic chaining and other topics in
the theory of empirical processes, Han Lun Yap for his valuable contributions to an early version
of the proof of Theorem 2 and many productive discussions about the topic, and Alejandro Weinstein for helpful discussions. Finally, both authors would like to acknowledge the tremendous and
positive influence that the late Partha Niyogi has had on our work.
A
Toolbox
We begin by introducing some notation that will be used throughout the rest of the appendices.
In this paper, N stands for the set of nonnegative integers. The tangent space of M at p ∈ M
is denoted Tp . The orthogonal projection operator onto this linear subspace is denoted by ↓p . We
let ∠ [·, ·] represent the angle between two vectors after being shifted to the same starting point.
Throughout this paper, dM (·, ·) measures the geodesic distance between two points on M. By
r-ball we refer to a Euclidean (open) ball of radius r > 0. In addition, with BN we denote the unit
ball in RN with volume VBN and we reserve BN (p, r) to represent an N -dimensional r-ball centered
at p in RN . For r > 0, let AM (p, r) := M ∩ BN (p, r) − p denote a (relatively) open neighborhood
of p on M after being shifted to the origin. Here the subtraction is in the Minkowski sense. The
K-dimensional ball of radius r in Tp will be denoted by BTp ; this ball is centered at the origin, as Tp
is a linear subspace. Unless otherwise stated, all distances are measured in the Euclidean metric.
A collection of N -dimensional r-balls that covers M is called an r-cover for M, with their
centers forming a so-called r-net for M. Notice that in general we do not require a net for M
to be a subset of M. However, we define the covering number of M at scale r, NM (r), to be
the cardinality of a minimal r-net for M among all subsets of M. (In other words, NM (r) is the
smallest number of r-balls centered on M that it takes to cover M.) A maximal r-separated subset
of M is called an r-packing for M. The packing number of M at scale r > 0, denoted by PM (r),
is the cardinality of such a set. It can be easily verified that an r-packing for M is also an r-cover
for M, so
NM (r) ≤ PM (r).
(24)
The concept of (principal) angle between subspaces will later come in handy. The (principal) angle between two linear subspaces Tp and Tq is defined such that cos (∠ [Tp , Tq ]) = minu maxv |hu, vi|,
19
where the unit vectors u and v belong to Tp and Tq , respectively. It is known that
k ↓p (·)− ↓q (·)k2,2 = sin (∠ [Tp , Tq ]) ,
(25)
where, as defined above, ↓p and ↓q are linear orthogonal projectors onto the tangent subspaces Tp
and Tq , respectively [55, Theorem 2.5].5 The norm above is the spectral norm, namely the operator
norm from RN equipped with `2 to itself.
We will also use the following conventions to clarify the exposition. For x1 6= x2 ∈ RN , define
U (x1 , x2 ) :=
x2 − x1
.
kx2 − x1 k
Additionally, we let U (S1 , S2 ) denote the set of directions of all the chords connecting two sets
S1 , S2 ⊆ RN , namely
U (S1 , S2 ) := {U (x1 , x2 ) : x1 ∈ S1 , x2 ∈ S2 , x1 6= x2 } .
Clearly, U (S1 , S2 ) ⊆ SN −1 , where SN −1 is the unit sphere in RN . Whenever possible, we also
simplify our notation by using U (S) := U (S, S).
Below we list a few useful results (mostly from differential geometry) which are used throughout
the rest of the paper. We begin with a well-known bound on the covering number of Euclidean
balls, e.g., [60, Lemma 5.2].6
Lemma 2. A K-dimensional unit ball can be covered by at most (3/r)K r-balls with r ≤ 1.
We now recall several results from Sections 5 and 6 in [43]. Unfortunately we were unable to
confirm for ourselves some of the original proofs appearing in [43]. Therefore, some of the statements
and proofs below differ slightly from their original counterparts. The first result is closely related
to Lemma 5.3 in [43].
Lemma 3. Fix p, q ∈ M, such that kq − pk < 2τ . Then ∠ [q − p, ↓p (q − p)] ≤ sin−1 (kq − pk/2τ ).
Proof. Consider the unit vector v along (q − p)− ↓p (q − p) ⊥ Tp and the point z := p + τ · v.
Observe that z − p is orthogonal to the manifold at p. By definition of the condition number, the
distance from z to the manifold is minimized at p and we must therefore have kz − qk ≥ τ .7 Now
consider the triangle formed by the points p, q, z and the line l passing through z and perpendicular
to q − p. Let z 0 denote the intersection of l with the line passing through p and q. (See Figure 3.)
5
In fact, (25) holds for any two linear subspaces (not only tangent subspaces of a manifold).
Lemma 5.2 in [60] concerns the unit sphere in RK , but the result still holds for the unit Euclidean ball using
essentially the same argument.
7
To see this, consider a sequence of points zn := p + (τ − 1/n) · v for integer values of n. For each n, kzn − pk =
τ − 1/n < τ , and zn − p is orthogonal to the manifold at p. Therefore, by the definition of the condition number, no
point q 0 ∈ M can satisfy kzn − q 0 k < kzn − pk. Thus, the distance from zn to the manifold equals τ − 1/n. Taking
the limit as n → ∞ and using the continuity of the distance function, we conclude that the distance from z to M
equals τ . So, no point q 0 ∈ M can satisfy kz − q 0 k < kz − pk = τ .
6
20
Figure 3: See proof of Lemma 3.
It is clear that ∠[q − p, z − p] ≤ π/2. Also since kz − qk ≥ kz − pk = τ , we have ∠[z − q, p − q] ≤
∠[z − p, q − p] ≤ π/2. Therefore, z 0 is indeed between p and q. The angle between l and the line
passing through p and z equals the angle between q − p and ↓p (q − p), that is
∠[p − z, z 0 − z] = ∠[q − p, ↓p (q − p)] =: θp,q .
(26)
To obtain an upper bound for θp,q , we again note that kz − qk ≥ kz − pk and therefore kz 0 − qk ≥
kz 0 − pk, or kz 0 − pk ≤ 12 kq − pk. So, θp,q is bounded by sin−1 (kq − pk/2τ ). This completes the
proof of Lemma 3.
Lemma 4. [43, Lemma 5.4]
AM (p, τ /2).
For p ∈ M, the derivative of ↓p : RN → Tp is nonsingular on
Lemma 5. [43, Proposition 6.1] Let γ(·) denote a smooth unit-speed geodesic curve on M defined
on an interval I ⊂ R. For every t ∈ I, the following holds.
kγ 00 (t)k ≤ 1/τ.
Lemma 6. [43, Proposition 6.2] Fix p, q ∈ M. The angle between Tp and Tq , ∠[Tp , Tq ], satisfies
cos(∠[Tp , Tq ]) ≥ 1 − dM (p, q) /τ .
The next lemma guarantees that two points separated by a small Euclidean distance are also
separated by a small geodesic distance, and so the manifold does not “curve back” upon itself.
Lemma 7. [43, Proposition 6.3] For p, q ∈ M with kq − pk ≤ τ /2, we have
r
2
dM (p, q) ≤ τ − τ 1 − kq − pk.
τ
(27)
Proof. The first part of the proof of Proposition 6.3 in [43] establishes that for any p, q ∈ M,
kq − pk ≥ dM (p, q) −
(dM (p, q))2
,
2τ
(28)
which is satisfied only if (27) is satisfied or if
r
dM (p, q) ≥ τ + τ
2
1 − kq − pk
τ
is satisfied. We provide the following argument to complete the proof.
21
(29)
For fixed p ∈ M, let us consider
qb := arg
kq − pk.
min
q∈M,dM (p,q)≥τ
We know the minimizer qb exists because we are minimizing a continuous function over a compact
set. We consider two cases. First, if dM (p, qb) = τ , then by (28), we will have kb
q − pk ≥ τ /2.
Second, if dM (p, qb) > τ , then there must exist an open neighborhood of qb on M over which the
distance to p is minimized at qb. This implies that p − qb will be normal to M at qb, which by the
definition of condition number (and the fact that p ∈ M) means that we must have kb
q − pk ≥ 2τ .
Now, for any p, q ∈ M such that kq − pk < τ /2, (27) would imply that dM (p, q) < τ and (29)
would imply that dM (p, q) > τ . From the paragraph above, we see that if dM (p, q) ≥ τ , then
kq − pk ≥ kb
q − pk ≥ τ /2, and so we can rule out the possibility that (29) is true. Thus, (27) must
hold for any p, q ∈ M with kq − pk < τ /2.
For any p, q ∈ M such that kq − pk = τ /2, (27) would imply that dM (p, q) ≤ τ and (29) would
imply that dM (p, q) ≥ τ . From the paragraph above involving qb, we see that any point q ∈ M
satisfying both dM (p, q) ≥ τ and kq − pk = τ /2 would have to be a local minimizer of kq − pk on
the convex set and in fact would have to fall into the first case, implying that dM (p, q) = τ exactly.
It follows that (27) must hold for any p, q ∈ M with kq − pk = τ /2.
The next lemma concerns the invertibility of ↓p within the neighborhood of p and is closely
related to Lemma 5.3 in [43].
Lemma 8. For p ∈ M, ↓p is invertible on AM (p, τ /4).
Proof. Lemma 4 states that the derivative of ↓p is nonsingular on AM (p, τ /2). The inverse function
theorem then implies that there exists an r > 0 such that ↓p is invertible on AM (p, rτ ); without
loss of generality assume that r < 1/4 (otherwise we are done). Now, suppose that there exists
c > 0 and distinct points q, z ∈ M such that kq − pk = cτ , kz − pk ≤ cτ , and ↓p (q − p) =↓p (z − p).
In particular, this implies that
z − q ⊥ Tp .
(30)
That is, for any unit vector v ∈ Tp , we have
hz − q, vi = 0.
(31)
Our goal is to show that c > 1/4. Suppose, in contradiction that indeed c ≤ 1/4. Let γ(·) be the
unit-speed geodesic curve connecting q to z, such that γ(0) = q and γ(dM (q, z)) = z. By applying
the fundamental theorem of calculus twice, we realize that
z − q = γ (dM (q, z)) − γ(0)
Z dM (q,z)
=
γ 0 (α) dα
0
Z dM (q,z) Z
0
=
γ (0) +
0
0
Z
α
00
γ (β) dβ
0
dM (q,z) Z α
= γ (0) · dM (q, z) +
0
22
0
dα
γ 00 (β) dβdα.
Invoking Lemma 5, we can write that
Z
0
dM (q,z) Z α
k(z − q) − γ (0) · dM (q, z)k ≤
0
kγ 00 (β)k dβdα
0
Z
Z
1 dM (q,z) α
≤
dβdα
τ 0
0
d2 (q, z)
= M
.
2τ
Meanwhile, having kz − qk ≤ 2cτ implies, via Lemma 7, that
√
dM (q, z) ≤ τ − τ 1 − 4c,
which, after plugging back into (32), yields
z−q
1 1√
0
dM (q, z) − γ (0) ≤ 2 − 2 1 − 4c.
So, for any unit vector v ∈ Tp , we have
0
z−q
γ (0), v ≤ γ 0 (0) − z − q , v + , v dM (q, z)
dM (q, z)
0
z−q
, v = γ (0) −
dM (q, z)
0
z−q ≤ γ (0) −
dM (q, z) 1 1√
≤ −
1 − 4c,
2 2
(32)
(33)
(34)
(35)
where the first line follows from the triangle inequality, and the second line uses (31). The last line
uses (34). To reiterate, (35) is valid for any unit vector v ∈ Tp .
On the other hand, Lemma 6 implies that
1
cos (∠[Tq , Tp ]) ≥ 1 − dM (p, q)
r τ
2
≥ 1 − kq − pk
τ
√
= 1 − 2c,
(36)
where the second line follows from Lemma 7, and the last line uses kq − pk = cτ . By the definition
of the angle between subspaces, (36) implies that there exists a unit vector v0 ∈ Tp such that
√
v0 , γ 0 (0) ≥ 1 − 2c
(37)
because γ 0 (0) ∈ Tq . Combining this bound with (35) for v = v0 , we realize that
√
1 − 2c ≤
1 1√
−
1 − 4c.
2 2
This inequality is not met for any c ≤ 1/4. Thus, indeed c > 1/4. In particular, this means that
↓p is invertible on AM (p, τ /4).
23
The next three lemmas are of importance when approximating the long and short chords on M
with, respectively, nearby long chords and vectors on the nearby tangent planes.
Lemma 9. [12, Implicit in Lemma 4.1] Consider two pair of points a1 , a2 and b1 , b2 , all in
√
RN , such that ka1 − b1 k , ka2 − b2 k ≤ r, and that ka1 − a2 k ≥ κ r, for some r, κ > 0. Then
√
kU (a1 , a2 ) − U (b1 , b2 )k ≤ 4κ−1 r.
Lemma 10. For a, b ∈ M with ka − bk ≤ l1 < τ /2, it holds true that
r
2l1
,
k ↓a v− ↓b vk ≤
τ
for every unit vector v ∈ RN .
Proof. It follows from (25) that
k ↓a v− ↓b vk ≤ k(↓a − ↓b )(·)k2,2 = sin(∠[Ta , Tb ]).
(38)
On the other hand, since ka − bk ≤ l1 < τ /2, Lemma 7 implies that
r
2l1
dM (a, b) ≤ τ − τ 1 −
,
τ
and thus, using Lemma 6, we arrive at
r
cos (∠ [Ta , Tb ]) ≥
1−
2l1
.
τ
Plugging back the estimate above into (38), we conclude that k ↓a v− ↓b vk ≤
p
2l1 /τ , as claimed.
Lemma 11. Fix p ∈ M, and take two points x1 , x2 ∈ M such that kx1 −pk ≤ l1 and kx2 −x1 k ≤ l2 ,
l1 , l2 < τ /2. Then, we have that
r
2l1
l2
kU (x1 , x2 )− ↓p U (x1 , x2 )k ≤
+ .
τ
2τ
Proof. The triangle inequality implies that
kU (x1 , x2 )− ↓p U (x1 , x2 )k ≤ kU (x1 , x2 )− ↓x1 U (x1 , x2 )k + k ↓x1 U (x1 , x2 )− ↓p U (x1 , x2 )k. (39)
Since kx2 − x1 k ≤ l2 < 2τ , Lemma 3 is the right tool to bound the first term on the right hand
side of (39):
kU (x1 , x2 ) − ↓x1 U (x1 , x2 )k = sin (∠ [U (x1 , x2 ) , ↓x1 U (x1 , x2 )])
= sin (∠ [x2 − x1 , ↓x1 (x2 − x1 )])
l2
≤
.
2τ
(40)
Since kx1 − pk ≤ l1 < τ /2, a bound on the second term follows from an application of Lemma 10:
r
2l1
k ↓x1 U (x1 , x2 )− ↓p U (x1 , x2 )k ≤
.
(41)
τ
Combining (40) and (41) immediately proves our claim.
24
We will also need the following result regarding the local properties of M, which is closely
related to Lemma 5.3 in [43].
Lemma 12. Let p ∈ M and r ≤ τ /4. Then the following holds:
volK (AM (p, r)) ≥
r2
1− 2
4τ
K2
rK VBK ,
where volK (·) measures the K-dimensional volume.
Proof. As in the proof of Lemma 5.3 in [43], we will show that for some r0 > 0 to be defined below,
BTp (r0 ) ⊂↓p (AM (p, r)),
as our claim follows directly from the inclusion above. To show the above inclusion, we use the
following argument. Let us denote the inverse of ↓p on AM (p, τ /4) with g(·).
From Lemma 8, ↓p is invertible on AM (p, r) and therefore ↓p (AM (p, r)) is an open set. Thus
there exists s > 0 such that BTp (s) ⊂↓p (AM (p, r)). We can keep increasing s until at s = s∗ we
reach a point y on the boundary of the closure of BTp (s∗ ) such that y ∈↓
/ p (AM (p, r)). Consider
∗
a sequence {yi } ⊂ BTp (s ) ⊂↓p (AM (p, r)) such that yi → y when i → ∞. Note that {g(yi )} ⊂
AM (p, r) and, because every sequence in a compact space contains a convergent subsequence, there
exist a convergent sebsequence {g(yik )} and x in the closure of AM (p, r) such that g(yik ) → x.
Since ↓p is continuous, ↓p x = y. Therefore y =↓p x ∈↓
/ p (AM (p, r)), and x ∈
/ AM (p, r) and thus x
is on the boundary of the closure of AM (p, r) and kxk = r. Now we invoke Lemma 3 with q = x + p
to obtain that
r
r2
cos(∠[x, y]) ≥ 1 − 2 .
4τ
It follows that
s∗ = kyk
= cos(∠[x, y]) · r
r
r2
≥ 1 − 2 · r =: r0 ,
4τ
and thus BTp (r0 ) ⊂↓p (AM (p, r)). This completes the proof of Lemma 12 since
volK (AM (p, r)) ≥ volK (↓p (AM (p, r))) ≥ volK (BTp (r0 )) = (r0 )K vol(BK ),
where the first inequality holds because projection onto a subspace is non-expansive.
We close this section with a list of properties of the Dirichlet kernel which are later used in the
proof of Lemma 1 (about the condition number of the complex exponential curve).
Lemma 13. (Dirichlet kernel) For z ∈ [−1/2, 1/2], the Dirichlet kernel takes z to
DN (z) :=
sin(πN z)
.
sin(πz)
If |z| > 2/N , then it holds that
|DN (z)| ≤ α1 N,
25
with α1 ≈ 0.23. Moreover, there exists some α2 > 0 and N2 := N2 (α2 ), such that the following
holds for every N > N2 :
(N πz)2
|DN (z)| ≤ N 1 −
+ α2 N z 2
40
for all |z| ≤ 2/N .
Proof. According to [45, Table 7.2], the relative peak side-lobe amplitude of the Dirichlet kernel is
(approximately) −13 decibels. That is, the peak side-lobe of the Dirichlet kernel is no larger than
α1 N with α1 ≈ 0.23. It is also easily verified that this peak does not occur further than 2/N away
from the origin. To summarize,
|DN (z)| ≤ α1 N,
as long as |z| > 2/N . This completes the proof of the first inequality in Lemma 13. To prove the
second inequality, assume that |z| ≤ 2/N . As N → ∞, any z ∈ [−2/N, 2/N ] approaches zero and
we may replace the sine in the denominator of the Dirichlet kernel with its argument. That is, as
N → ∞, z → 0 and
|DN (z)|
| sin(N πz)|
=
N
N | sin(πz)|
| sin(N πz)|
=
N π|z|(1 + O(z 2 ))
| sin(N πz)|
≤
(1 + O(z 2 ))
N π|z|
1
|(N πz) − 40
(N πz)3 |
≤
(1 + O(z 2 ))
N π|z|
(N πz)2
(1 + O(z 2 )),
= 1−
40
where the third line uses the fact that 1/(1 − a) ≤ 1 + 2a for all 0 ≤ a ≤ 1/2. The second to last line
holds because sin a ≤ a − a3 /40 for all 0 ≤ a ≤ 2π. As a result, for some α2 > 0 and N2 = N2 (α2 ),
the following holds for every N > N2 :
|DN (z)|
(N πz)2
(N πz)2
2
≤ 1−
(1 + O(z )) ≤ 1 −
+ α2 z 2 ,
(42)
N
40
40
which, to reiterate, holds as long as |z| ≤ 2/N . This completes the proof of Lemma 13.
B
Proof of Lemma 1
Here, τβ stands for the reach of the complex exponential curve, i.e., the inverse of its condition
number. Note that the reach of the complex exponential curve is defined as the largest d ≥ 0 such
that every point within an `2 distance less than d from β has a unique nearest point (in the `2
sense) on β. In the rest of the proof, (i) we first find a unit-speed parametrization of β, (ii) we
then derive some basic properties of the reparametrized curve, and (iii) finally, we estimate τβ by
studying the long and short chords on the reparametrized curve separately.
26
B.1
Unit-speed geodesic on β
Let γ : R → CN be a unit-speed geodesic obtained by appropriately normalizing β. For every s ∈ R,
there must exist t = t(s) ∈ R such that γs = βt . In particular, we note that β is a constant-speed
curve with

1/2
fC
X
dβt 1
2π
3/2

= O(fC ),
(2πn)2  = √ (fC (fC + 1) (2fC + 1))1/2 =:
dt =
v
3
N
n=−f
C
and therefore we can simply take t(s) = vN s. This gives

e−i2πfC vN s
 −i2π(fC −1)vN s
 e

..
γs = βt(s) = βvN s = 
.

 i2π(fC −1)vN s
 e
ei2πfC vN s




dγs
= vN 

ds


−i2πfC · e−i2πfC vN s
−i2π(fC − 1) · e−i2π(fC −1)vN s
..
.
i2π(fC − 1) · ei2π(fC −1)vN s
i2πf · ei2πfC vN s




,



(43)




 , and



(44)
C




d2 γs
2 
= −vN 
2
ds


(2πfC )2 · e−i2πfC vN s
(2π(fC − 1))2 · e−i2π(fC −1)vN s
..
.
2
(2π(fC − 1)) · ei2π(fC −1)vN s
(2πf )2 · ei2πfC vN s




.



(45)
C
To reiterate, (43) and (44) represent γ (a unit-speed parametrization of β) and its tangent vector.
In addition, the curvature at any point can be computed as the magnitude of the second derivative
in (45). That is,

1/2 
−1 
1/2
2 fC
fC
fC
X
X
X
d γs −1/2
2 
(2πn)4  = 
(2πn)2  
(2πn)4  =: wN = O(fC ),
ds2 = vN
−fC
−fC
(46)
−fC
√
where we used (45). Observe that the curvature is constant and scales like 1/ N for large N .
Since γ is periodic, we will use t1 t2 to denote subtraction modulo 1 for any t1 , t2 ∈ R so that
t1 − t2 = bt1 − t2 c + (t1 t2 ).
(Equivalently, represents the natural subtraction on the unit circle.) We continue by recording
a few simple facts about the reparametrized complex exponential curve γ.
27
B.2
Some observations about γ
Note that γs as a zero-padded sequence in `2 (Z) can be interpreted as the (reversed) sequence of
Fourier series coefficients of the signal in time that, at t ∈ R, takes the value
γ̌s (t) =
sin(πN (t vN s))
=: DN (t vN s),
sin(π(t vN s))
where DN (·) is the Dirichlet kernel of width ∼ 2/N . The Dirichlet kernel is known to decay rapidly
outside of an interval of width ∼ 2/N centered at the origin as studied in Lemma 13 in the Toolbox.
One immediate consequence of Lemma 13 is that
|hDN (· t1 ), DN (· t2 )i| = |DN (t1 t2 )| ≤ α1 N
if t1 t2 ∈ [2/N, 1 − 2/N ].
(47)
The first identity above holds because circular convolution of the Dirichlet kernel with itself produces
the Dirichlet kernel again. Now, for any pair s1 , s2 ∈ R, consider the following correlation:
|hγs1 , γs2 i| = |hγ̌s1 , γ̌s2 i| = |hDN (· vN s1 ), DN (· vN s2 )i| = |DN (vN s1 vN s2 )| ,
(48)
where we used the Plancherel identity above. Then it follows from (47) that
|hγs1 , γs2 i| ≤ α1 N
if vN s1 vN s2 ∈ [2/N, 1 − 2/N ].
(49)
In words, (49) captures the long-distance correlations on γ. We now turn our attention to shortdistance correlations. According to Lemma 13, for some α2 > 0 and N2 = N2 (α2 ), the following
holds for every N > N2 :
N 2π2
2
|hγs1 , γs2 i| ≤ N 1 −
(vN s1 vN s2 ) + α2 N (vN s1 vN s2 )2
(50)
40
if vN s1 vN s2 ∈ [0, 2/N ], where we used (48) again. If vN s1 vN s2 ∈ [1 − 2/N, 1], then (50)
holds with vN s1 vN s2 replaced by 1 − (vN s1 vN s2 ) = vN s2 vN s1 . The conclusion in (50) is a
direct consequence of the vanishing derivative of the Dirichlet kernel at the origin. We are now in
a position to estimate the reach of the complex exponential curve.
B.3
Estimating τβ
Consider a point γs ∈ CN on the complex exponential curve for an arbitrary s ∈ R. We deviate
from γs by χ to obtain x = γs + χ where χ is assumed to be normal to the complex exponential
curve at γs , that is
dγs
χ,
= 0,
(51)
ds
where dγs /ds is the tangent vector at γs (which was computed in (44)).
We seek the largest d > 0 such that for all χ with kχk < d and satisfying (51), γs is the unique
nearest point to x = γs + χ on the complex exponential curve. For γs to be the unique nearest
point to x, it must hold that
kχk = kx − γs k < kx − γs0 k = kχ + γs − γs0 k,
∀vN s0 vN s 6= 0.
(52)
Now (52) is equivalent to
Re[hχ, γs0 − γs i] < N − Re[hγs , γs0 i] = N − hγs , γs0 i ,
∀vN s0 vN s 6= 0,
where we used the fact that the complex exponential curve lives on a sphere of radius
We consider two separate cases:
28
(53)
√
N in CN .
Long distances vN s0 vN s ∈ [2/N, 1 − 2/N ]:
In this case, it follows from (49) that
|hγs , γs0 i| ≤ α1 N,
with α ≈ 0.23 as in Lemma 13. As a result, for the inequality in (53) to hold for long distances, it
suffices that the following holds:
√
kχk · (kγs k + kγs0 k) = kχk · 2 N < (1 − α1 ) N,
∀vN s0 vN s ∈ [2/N, 1 − 2/N ],
which is guaranteed as long as
kχk <
1 − α1 √
(1 − α1 ) N
√
=
N.
2
2 N
(54)
Short distances vN s0 vN s ∈ [0, 2/N ] ∪ [1 − 2/N, 1]:
Without loss of generality assume that
vN s0 − vN s ∈ [0, 2/N ]. In this case, we first note that
* Z 0
+
s
dγη
|hχ, γs0 − γs i| = χ,
dη ds
s
* Z 0
+
Z s0 Z η 2
s
d γξ
dγs
dη +
dξdη = χ,
2
ds
ds
s
s
s
*
+
Z s0 Z η 2
d γξ
dγs
0
+
dξdη
= χ, (s − s)
2
ds
s
s ds
* Z 0 Z
+
s
η 2
d γξ
= χ,
dξdη
2
s
s ds
Z s0 Z η 2 d γξ ≤ kχk ·
ds2 dξdη
s
s
Z s0 Z η
= wN kχk
dξdη
s
s
− s|2
2
(vN s0 vN s)2
= wN kχk ·
,
2
2vN
= wN kχk ·
|s0
where we used the fundamental theorem of calculus twice. The fourth line above uses the fact that
χ is normal to the tangent of γ at s, namely hχ, dγs /dsi = 0. The sixth line uses the fact that
curvature of γ is constant and was calculated in (46).
Recall that vN s0 vN s ≤ 2/N . Therefore, for a fixed α2 > 0 and N > N2 = N2 (α2 ), (50)
dictates that
2
2
N 2π2
0
0
vN s vN s
+ α2 N vN s0 vN s .
hγs , γs i ≤ N 1 −
40
As a result, for the inequality (53) to hold for short distances, it suffices that the following statement
holds:
wN kχk ·
2
2
(vN s0 vN s)2
N 3π2
<
vN s0 vN s − α2 N vN s0 vN s ,
2
40
2vN
29
∀vN s0 − vN s ∈ [0, 2/N ],
which in turn holds if
kχk <
2
2
N vN
π 2 N 3 vN
− 2α2 ·
.
·
20
wN
wN
(55)
A lower bound on the reach: From (55) and (54), we overall observe that if
√ 2 !!
√ 2
√
N vN
1 − α1
π 2 N 5/2 vN
kχk < min
− 2α2 ·
N =O
N ,
,
·
2
20
wN
wN
then (53) holds uniformly regardless of the value of s. Therefore, we find the following lower bound
on the reach of the complex exponential curve:
√ 2 !!
2
√
N vN
1 − α1
π 2 N 5/2 vN
τβ ≥ min
− 2α2 ·
N,
,
·
2
20
wN
wN
which, to reiterate, holds for some α2 and every N > N2 . Because the factor multiplying α2
scales with N −2 whereas the factor multiplying π 2 /20 scales with 1, the following holds for every
N > Nsine for some Nsine :
!
√ 2
√
1 − α1 π 2 N 5/2 vN
τβ ≥ min
,
·
N =O
N .
(56)
2
40
wN
An upper bound on the reach: Since the complex exponential curve lives on the unit sphere
in CN , γs is normal to γ at any arbitrary s. This immediately implies the following upper bound
on the reach:
√
τβ ≤ N .
(57)
Together, (56) and (57) complete the proof of Lemma 1.
C
Proof of Theorem 2
It is easily verified that our objective is to find an upper bound for
(
)
P
sup |kΦyk − 1| > ,
y∈U (M)
when ≤ 1/3.
The remainder of this section is divided to two parts. In the first part, we construct a sequence
of increasingly finer nets for M. This is in turn used to construct a sequence of covers for the set
of all (normalized) secants in M. In the second part, we apply a chaining argument that utilizes
this later sequence of covers to prove Theorem 2.
C.1
Sequence of covers for U (M)
For η > 0, let C0 (η) ⊂ M denote a minimal η-net for M over all η-nets that are a subset of M.
Upper and lower bounds for #C0 (η) = NM (η) are known for sufficiently small η [43], where #C0 (η)
denotes the cardinality of C0 (η). Since the claim below slightly differs from the one in [43], the
proof is included here.
30
Lemma 14. When η ≤ τ /2, it holds that
#C0 (η) ≤
where θ (α) :=
√
2
θ (η/4τ ) η
K
VM
=: c0 (η),
VBK
(58)
1 − α2 for |α| ≤ 1.
Proof. Using (24) and a simple volume comparison argument, we observe that
#C0 (η) = NM (η) ≤ PM (η) ≤
VM
.
inf p∈M volK (AM (p, η/2))
Since η/2 ≤ τ /4, we can apply Lemma 12 from the Toolbox (with r = η/2) and obtain that
#C0 (η) ≤ 1−
=
η2
VM
K
2
16τ 2
2
θ(η/4τ )η
K
η K
2
VBK
VM
,
VBK
This completes the proof of Lemma 14.
By replacing η with 4−j η, we can construct a sequence of increasingly finer nets for M, {Cj (η)},
such that Cj (η) ⊂ M is a (4−j η)-net for M, for every j ∈ N. In light of Lemma 14, we have that
#Cj (η) ≤ 4jK · c0 (η).
(59)
Construction of a sequence of covers for U (M) demands the following setup. For η 0 > 0 and
j ∈ N, let C j (η 0 ) denote a minimal (2−j η 0 )-net for BK . For p ∈ Cj (η), we can naturally map C j (η 0 )
to live in the K-dimensional unit ball along Tp (and anchored at the origin). We represent this set
of vectors by C j,p (η 0 ) and define
[
Cj0 (η, η 0 ) :=
C j,p (η 0 ),
p∈Cj (η)
which forms a (2−j η 0 )-net for the unit balls along the tangent spaces at every point in Cj (η) ⊂ M.
For δ > 0, let us specify η p
and η 0 as functions of δ. For C2 , C3 > 0 to be set later, take η = η(δ) =
2
2
0
0
C2 τ δ and η = η (δ) = C3 η/τ = C2 C3 δ. Now, for every j ∈ N, simply set
Tj (δ) := U (Cj (η)) ∪ Cj0 (η, η 0 ).
It turns out that U (Cj (η)), the set of all directions in Cj (η), provides a net for the directions of
long chords on M. In contrast, Cj0 (η, η 0 ) forms a net for the directions in U (M) that correspond
to the short chords on M. It is therefore not surprising that {Tj (δ)} proves to be a sequence of
increasingly finer covers for U (M). This discussion is formalized in the next lemma. We remark
that Lemma 15 holds more generally for all constants C2 , C3 that satisfy the conditions listed in
the proof.
√
Lemma 15. Set C2 = 0.4 and C3 = 1.7 − 2. For every j ∈ N, Tj (δ), as constructed above, is a
(2−j δ)-net for U (M), when δ ≤ 1/2. Under the mild assumption that
VM
21 K
√
≥
,
(60)
τK
2 K
31
it also holds that
#Tj (δ) ≤ 2 · 42jK
√ !2K VM 2
6.12 K
=: tj (δ).
δ2
τK
(61)
Proof. Consider two arbitrary but distinct points x1 , x2 ∈ M. For C4 > 0 to be p
set later in the
proof, we separate the treatment of long and short chords, i.e., kx2 − x1 k/τ > C4 η/τ =: γ and
kx2 − x1 k/τ ≤ γ, and in this strategy we follow [3, 12]. Short chords are distinct in that, as we will
see later, they have to be approximated with nearby tangent vectors. For convenience, let us also
define
Uγl (M) := {U (z1 , z2 ) : kz2 − z1 k > γτ, z1 , z2 ∈ M},
Uγs (M) := {U (z1 , z2 ) : 0 < kz2 − z1 k ≤ γτ, z1 , z2 ∈ M}.
Of course, Uγl (M) ∪ Uγs (M) = U (M), although their intersection might not be empty.
p
Suppose that kx2 − x1 k/τ > γ = C4 η/τ so that U (x1 , x2 ) ∈ Uγl (M). Since C0 (η) is an η-net
for M, there exist p1 and p2 in C0 (η) such that kx1 − p1 k, kx2 − p2 k ≤ η. It then follows from
Lemma 9 (with a1 = x1 , a2 = x2 , b1 = p1 , and b2 = p2 ) that
r
4
η
4C2 δ
kU (x1 , x2 ) − U (p1 , p2 )k ≤
=
.
(62)
C4 τ
C4
Now, assuming that
4C2 = C4 ,
(63)
and leveraging the fact that the choice of x1 , x2 ∈ M was arbitrary, we conclude that U (C0 (η)) is
a δ-net for Uγl (M).
p
p
On the other hand, suppose that 0 < kx2 − x1 k/τ ≤ γ = C4 η/τ = 4C2 η/τ so that
U (x1 , x2 ) ∈ Uγs (M). We assume that
η
= C22 δ 2 < min
τ
1
1
,
2
64C2 2
,
(64)
so that, in particular, kx2 −x1 k < τ /2. Since C0 (η) is an η-net for M, there exists a point p ∈ C0 (η)
√
such that kx1 − pk ≤ η < τ /2. Lemma 11 (with l1 = η and l2 = 4C2 τ η) then implies that the
direction of the chord connecting x1 to x2 can be approximated with a tangent vector in Tp , that is
r
r
2η
η √
+ 2C2
=
2 + 2C2 C2 δ.
(65)
kU (x1 , x2 )− ↓p U (x1 , x2 )k ≤
τ
τ
Recall that C 0,p (η 0 ) is an η 0 -net for the unit ball centered at p and along Tp . So, there also exists
a vector v ∈ C 0,p (η 0 ) such that k ↓p U (x1 , x2 ) − vk ≤ η 0 = C2 C3 δ. Using the triangle inequality, we
therefore arrive at
√
kU (x1 , x2 ) − vk ≤
2 + 2C2 + C3 C2 δ.
(66)
Assuming that
√
2 + 2C2 + C3 = C2−1 ,
(67)
and leveraging the fact that the choice of x1 , x2 ∈ M was arbitrary, we conclude that C00 (η, η 0 ) is
a δ-net for Uγs (M). Overall, under (63), (64), and (67), T0 (δ) = U (C0 (η)) ∪ C00 (η, η 0 ) is a δ-net
for U (M). By repeating the argument above (with η, δ, η 0 , γ replaced with η/4j , δ/2j , η 0 /2j , γ/2j )
we observe that Tj (δ) is a (2−j δ)-net for U (M), for every j ∈ N. In particular, the choice of
32
√
C2 = 0.4, C3 = 1.7 − 2, C4 = 1.6 satisfies the conditions above for every δ ≤ 1/2 and completes
the proof of the first statement in Lemma 15.
In order to bound the cardinality of Tj (δ), we begin with estimating #Cj0 (η, η 0 ). According to
Lemma 2, we can write that
K
3 · 2j
0
0
#Cj η, η ≤
· #Cj (η) ,
(68)
η0
which holds assuming that η 0 ≤ 1, i.e., C2 C3 δ ≤ 1. (Our choice of C2 , C3 above satisfies this
condition.) It is possible now to write that
#Tj (δ) ≤ (#Cj (η))2 + #Cj0 (η, η 0 )
K
3 · 2j
2
≤ (#Cj (η)) +
· #Cj (η)
η0
K !
3 · 2j
jK
· 4jK c0 (η),
≤ 2 max 4 c0 (η),
η0
(69)
where we used (68) in the second line and (59) in the last line. To guarantee that the first term
dominates the maximum in (69), it suffices (according to the definition of c0 (η) in (58)) to enforce
that
K
K
2
VM
3
≥
,
θ(η/4τ )η
VBK
η0
which, after plugging in for η and η 0 in terms of δ and using the hypothesis that δ ≤ 1, is satisfied
under the mild assumption that
VM
3C2 K
K
≥ 2.5 VBK ≥
VBK .
(70)
τK
2C3
The assumption in (70) allows us to simplify (69) and obtain that
#Tj (δ) ≤ 2 · 42jK c20 (η).
(71)
It follows from (71) and the definition of c0 (η) in (58) that
2K 2
VM 2
12.52 2K VM 2
2jK
2jK
#Tj (δ) ≤ 2 · 4
≤2·4
,
VBK
τ δ2
VBK
θ(C22 /4)C22 τ δ 2
where we used the fact that δ ≤ 1. We remind the reader that
K/2
4π
π K/2
2eπ K/2
≤
≤ VBK =
,
K +2
K +2
Γ K2 + 1
(72)
where the inequalities follow from the fact that (K/e)K−1 ≤ Γ (K) ≤ (K/2)K−1 for K ∈ N [44].
Here Γ(·) denotes the Gamma function. The above inequality leads us to
√ !2K 6.12
K
VM 2
2jK
#Tj (δ) ≤ 2 · 4
,
δ2
τK
√
which holds under the mild assumption that VM /τ K ≥ (21/ K)K . Indeed, this assumption is
obtained by plugging our choice of C2 , C3 into (70). This completes the proof of Lemma 15.
33
C.2
Applying the chaining argument
Every y ∈ U (M) can be represented with a chain of points in {Tj (δ)}. Let πj (y) be the nearest
point to y in Tj (δ). This way we obtain a sequence {πj (y)} that represents y via an almost surely
convergent telescoping sum, that is
X
y = π0 (y) +
(πj+1 (y) − πj (y)) .
(73)
j∈N
Note that, for every j ∈ N and every y ∈ M, the length of the chord connecting πj (y) to πj+1 (y)
is no longer than 2−j+1 δ. We are now ready to state a generic chaining argument that allows us
to bound the failure probability of obtaining a stable embedding of M in terms of its geometrical
properties. The interested reader is referred to [56] for more information about the generic chaining.
Lemma 16. Fix 0 < δ < 1 < ≤ 1/3, and 2 > 0 such that 1 + 2 = . Choose C5 , C6 > 0 so
1+C5
that 1 /δ ≥ 1−C
and 2 /δ ≥ C6 . Then, under (60), we have that
5
(
)
sup |kΦyk − 1| > P
≤ 2t0 (δ) · max P {|kΦt0 k − kt0 k| > C5 1 kt0 k}
t0 ∈T0 (δ)
y∈U (M)
+2
X
t2j+1 (δ) ·
j∈N
max
(tj ,sj )∈Qj (δ)
P kΦsj − Φtj k > 8−1 C6 (j + 1)ksj − tj k ,
(74)
where {tj (δ)} were previously defined in Lemma 15. For j ∈ N, Qj (δ) is defined as
Qj (δ) := {(tj , sj ) : πj (y) = tj and πj+1 (y) = sj for some y ∈ U (M)} .
Proof. For notational convenience, let us denote the infinite sum in (73) by Σ(y). Then, using the
triangle inequality, we observe that
(
)
sup kΦyk > 1 + = P sup kΦπ0 (y) + ΦΣ(y)k > 1 + 1 + 2
P
y
y∈U (M)
≤ P sup kΦπ0 (y)k + sup kΦΣ(y)k > 1 + 1 + 2
y
y
≤ P sup kΦπ0 (y)k − 1 > 1 + P sup kΦΣ(y)k > 2 ,
y
y
and similarly,
P
inf
kΦyk < 1 − = P inf kΦπ0 (y) + ΦΣ(y)k < 1 − 1 − 2
y
y∈U (M)
≤ P inf kΦπ0 (y)k − sup kΦΣ(y)k < 1 − 1 − 2
y
y
≤ P sup 1 − kΦπ0 (y)k > 1 + P sup kΦΣ(y)k > 2 .
y
We can therefore argue that
(
)
P
sup |kΦyk − 1| > y∈U (M)
y
≤ P sup kΦyk > 1 + + P inf kΦyk < 1 − y
y
≤ 2 P sup |kΦπ0 (y)k − 1| > 1
y
34
+ 2 P sup kΦΣ(y)k > 2 .
y
(75)
Consider the first probability on the last line of (75):
)
(
P
sup |kΦπ0 (y)k − 1| > 1 ≤ P sup |kΦπ0 (y)k − kπ0 (y)k| + sup |kπ0 (y)k − 1| > 1
y
y∈U (M)
y
≤ P sup |kΦπ0 (y)k − kπ0 (y)k| > 1 − δ
y
|kΦπ0 (y)k − kπ0 (y)k|
1 − δ
≤ P sup
>
kπ0 (y)k
1+δ
y
|kΦπ0 (y)k − kπ0 (y)k|
≤ P sup
> C5 1
kπ0 (y)k
y
|kΦt0 k − kt0 k|
≤ P max
> C5 1
kt0 k
t0 ∈T0 (δ)
≤ #T0 (δ) · max P {|kΦt0 k − kt0 k| > C5 1 kt0 k}
t0 ∈T0 (δ)
where the first line uses the triangle inequality. The second and third lines hold on account of T0 (δ)
being a net for a subset of SN −1 , namely U (M). An application of the union bound gives the last
line above.
Now consider the second probability on the last line of (75). By the definition of Σ(y), we
observe that
(
)
sup kΦΣ(y)k > 2
P
y∈U (M)
X
Φπj+1 (y) − Φπj (y) > 2
= P sup y



X
≤P
max
kΦsj − Φtj k > C6 δ


(tj ,sj )∈Qj (δ)
j


X

X
=P
max
kΦsj − Φtj k > C6
(j + 1)2−j−2 δ


(tj ,sj )∈Qj (δ)
j
j
X
≤
P
max
kΦsj − Φtj k > C6 (j + 1)2−j−2 δ
(tj ,sj )∈Qj (δ)
j
≤
X
j
≤
X
j
P
max
(tj ,sj )∈Qj (δ)
2
#Tj+1
(δ)
kΦsj − Φtj k > 8
max
(tj ,sj )∈Qj (δ)
−1
C6 (j + 1)ksj − tj k
P kΦsj − Φtj k > 8−1 C6 (j + 1)ksj − tj k .
The third line above uses the triangle inequality and the assumption on 2 , while the fifth and last
lines use the union bound. It can be easily verified that the infinite sum on the right hand side
of the inequality in the fourth line equals one. In the sixth line, we used the observation that
(tj , sj ) ∈ Qj (δ) implies that ksj − tj k ≤ 2−j δ + 2−j−1 δ ≤ 2−j+1 δ. Having upper bounds for both
35
terms on the last line of (75), we overall arrive at
(
)
P
sup |kΦyk − 1| > ≤ 2#T0 (δ) · max P {|kΦt0 k − kt0 k| > C5 1 kt0 k}
t0 ∈T0 (δ)
y∈U (M)
+2
X
2
#Tj+1
(δ)
j
max
(tj ,sj )∈Qj (δ)
P kΦsj − Φtj k > 8−1 C6 (j + 1)ksj − tj k .
From Lemma 15, #Tj (δ) ≤ tj (δ). This establishes Lemma 16.
There are two type of probabilities involved in the upper bound above. One controls the large
deviations of kΦt0 k from its expectation, and the other corresponds to very large (one sided)
deviations of kΦsj − Φtj k from its expectation. As claimed in the next lemma and proved in
Appendix D, both of these probabilities are exponentially small when M is large enough.
Lemma 17. Fix 0 ≤ λ ≤ 1/3 and λ0 ≥ 1/5. Then, for fixed y ∈ RN , we have
M λ2
P {|kΦyk − kyk| > λkyk} ≤ 2e− 6
M λ0
P kΦyk > 1 + λ0 kyk ≤ e− 7 .
(76)
(77)
p
Now fix ≤ 1/3 and set 1 = 9/10. Taking C5 = 6/7, C6 = 16, δ = /160 and finally
assuming (60) guarantees that Lemma 15 is in force. Under this setup, note that an upper bound
for the first term on the right hand side of (74) can be found by applying (76) (after plugging in
for C5 ):
(
)
r
M 2
6
1
2t0 (δ) · max P |kΦt0 k − kt0 k| >
1 kt0 k ≤ 2t0 (δ) · 2e− 7 ,
7
t0 ∈T0 (δ)
and, assuming that
M ≥ 14−2
1 log t0 (δ),
(78)
we arrive at
(
r
2t0 (δ) · max P |kΦt0 k − kt0 k| >
t0 ∈T0 (δ)
)
M 2
6
1
1 kt0 k ≤ 4e− 14 .
7
(79)
In order to bound the second term on the right hand side of (74), we proceed as follows. Consider
the maximum inside the summation. After plugging in for C6 and applying (77), we can bound
this maximum as
max
(tj ,sj )∈Qj (δ)
P {kΦsj − Φtj k > 2(j + 1)ksj − tj k} ≤ e−
(2j+1)M
7
.
Using the estimate above and Lemma 15, we get an upper bound for the second term on the right
hand side of (74):
X
2
t2j+1 (δ) ·
max
P {kΦsj − Φtj k > 2(j + 1)ksj − tj k}
(tj ,sj )∈Qj (δ)
j∈N
M
≤ 2t20 (δ)e− 7 44K
X
2
44jK e− 7 jM .
j∈N
36
(80)
Assuming that
M ≥ max(32 log t0 (δ), 310K)
(81)
allows us to continue simplifying (80), therefore arriving at
X
2
t2j+1 (δ) ·
max
P {kΦsj − Φtj k > 2(j + 1)ksj − tj k}
(tj ,sj )∈Qj (δ)
≤ 4e
−M
17
.
(82)
We can now combine (79) and (82) to obtain
M 2
M 2
M
1
1
P sup |kΦyk − 1| > ≤ 4e− 14 + 4e− 17 ≤ 8e− 14 ,
y
where the second inequality follows since 1 ≤ 1/3 and thus 21 /14 ≤ 1/17. In particular, to achieve
a failure probability of at most ρ ≤ 1, we need
M ≥ 14−2
1 log (8/ρ) .
(83)
Assuming that (60) holds and that ≤ 1/3, we verify that (81) may be absorbed into (78) (i.e.,
(78) implies (81)). We are now left with (78) and (83), which are in turn lumped into a single lower
bound on M (after plugging in for δ), that is
√ !
!
K
8
2
M ≥ 18−2 max log(2VM
) + 24K + 2K log
, log
2
τ
ρ

 

!
√
2K
6.12 K
8 
−2
2



≥ 18 max log 2VM
, log
2
τδ
ρ
= 18−2 max (log t0 (δ), log (8/ρ))
≥ 14−2
1 max (log t0 (δ), log (8/ρ)) .
(84)
Therefore, we proved that
(
)
sup |kΦyk − 1| > P
≤ ρ,
y∈U (M)
provided that M satisfies (84). This completes the proof of Theorem 2.
D
Proof of Lemma 17
The proof is elementary. It is easily verified that E kΦyk2 = kyk2 , and we then note that
P {|kΦyk − kyk| > λkyk} = P {kΦyk > (1 + λ)kyk} + P {kΦyk < (1 − λ)kyk}
n
o
n
o
≤ P kΦyk2 > (1 + λ)kyk2 + P kΦyk2 < (1 − λ)kyk2
−M
2
≤ 2e
3
λ2
− λ3
2
≤ 2e−
M λ2
2
( 12 − 19 )
≤ 2e−
M λ2
6
,
37
where the third line uses a well-known concentration bound [1]. The fourth line holds because
λ ≤ 1/3. This establishes the first inequality in Lemma 17. For the second inequality, assume,
without loss of generality, that kyk = 1. We begin by observing that
P kΦyk > 1 + λ0 kyk = P kΦyk > 1 + λ0
n
o
≤ P kΦyk2 > 1 + 2λ0
(
)
M
X
= P M −1
n2i − 1 > 2λ0
i=1
=P
(M
X
)
n2i − M > 2λ0 M
,
(85)
i=1
where n1 , n2 , · · · , nM are zero-mean and unit-variance Gaussian random variables. The third line
above follows since the entries of the vector Φy are distributed as i.i.d. zero-mean Gaussians with
variance of 1/M . We now recall Lemma 1 in [38], which states that
)
(M
X
√
2
(86)
P
ni − M > 2 M α + 2α ≤ e−α ,
i=1
for α > 0. Comparing the last line in (85) to the inequality above, we observe that taking
α=
2
M √
1 + 4λ0 − 1
4
allows us to continue simplifying (85) to obtain that
(M
)
X
√
0
2
P kΦyk > 1 + λ
≤P
ni − M > 2 M α + 2α ≤ e−α .
(87)
i=1
It is easily verified that
√
1 + 4λ0 − 1 ≥ (3 −
α≥
and consequently,
√ √
5) λ0 when λ0 ≥ 1/5. It follows that
√
M
· (3 − 5)2 λ0 ≥ M λ0 /7,
4
M λ0
P kΦyk > 1 + λ0 kyk ≤ e− 7 ,
as claimed. This establishes the second inequality in Lemma 17 and completes the proof.
E
Proof of Theorem 3
Fix α ∈ [1 − , 1 + ]. We consider any two points wa , wb ∈ M such that
kΦwa − Φwb k
= α,
kwa − wb k
and supposing that x is closer to wa , i.e.,
kx − wa k ≤ kx − wb k ,
38
(88)
but y = Φx + n is closer to Φwb , i.e.,
ky − Φwb k ≤ ky − Φwa k ,
we seek the maximum value that kx − wb k may take. In other words, we wish to bound the worst
possible “mistake” (according to our error criterion) between two candidate points on the manifold
whose distance is scaled by the factor α.
This can be posed in the form of an optimization problem
max
x∈RN ,wa ,wb ∈M
kx − wb k s.t. kx − wa k ≤ kx − wb k ,
ky − Φwb k ≤ ky − Φwa k ,
kΦwa − Φwb k
= α.
kwa − wb k
For simplicity, we may expand the constraint set to include all wa , wb ∈ RN ; the solution to this
larger problem is an upper bound for the solution to the case where wa , wb ∈ M. This leaves
max
x,wa ,wb ∈RN
kx − wb k s.t. ky − Φwb k ≤ ky − Φwa k ,
kΦwb − Φwa k
= α.
kwb − wa k
where we also ignored the first constraint (because of its relation to the objective function). Under
the constraints above, the objective satisfies
kx − wb k ≤ kx − wa k + kwb − wa k
= kx − wa k + kΦwb − Φwa k/α
≤ kx − wa k + 2ky − Φwa k/α
≤ kx − wa k + 2(kΦx − Φwa k + knk)/α
2σM (Φ)
2knk
≤ kx − wa k +
kx − wa k +
1−
1−
2knk
1
≤
.
(2σM (Φ) + 1) kx − wa k +
1−
1−
The first line follows from the triangle inequality. The first identity above uses the first constraint.
The first constraint (via the triangle inequality) implies that kΦwb − Φwa k ≤ 2ky − Φwa k and the
third line thus follows. The fourth line uses the triangle inequality one more time. The fifth line
follows after considering the possible range of α. To reiterate, the above conclusion holds for any
observation x that could be mistakenly paired with wb instead of wa (under a Φ that scales the
distance kwa − wb k by α). This completes the proof of Theorem 3 after noting that (1−)−1 ≤ 1+2
when ≤ 1/2.
F
Proof of Proposition 1
Set
δ := (1 + ) (σm (Φ))−1 ,
and let
x = e1 + δu,
39
where kuk ≤ 1 belongs to the row span of Φ and satisfies Φx = Φ(e1 + δu) = 0. Finding such u is
possible because
kΦe1 k ≤ (1 + )ke1 k = 1 + = δ · σm (Φ) = δ · σm (Φ)kvk ≤ δkΦvk,
for every unit vector v in the row span of Φ. The first inequality holds because e1 , 0 ∈ M and
Φ stably embeds M. The second equality holds by our choice of δ, and the last inequality holds
because v belongs to the row span of Φ. With our choice of x above, we have Φx = 0 and therefore
x
b = 0. On the other hand,
kx − x∗ k ≤ kx − e1 k ≤ δ.
It follows that
kx − x
bk
kx − x
bk
kxk
1−δ
1
1
≥
≥
≥
≥
=
σm (Φ).
∗
kx − x k
kx − e1 k
δ
δ
2δ
2(1 + )
Indeed, one can verify that δ ≤ 1/2 because (by hypothesis) ≤ 1/3 and σm (Φ) ≥ 8/3. This
immediately implies the second to last inequality above. This completes the proof of Proposition 1.
G
Proof of Theorem 4
Our success in stably embedding M via random linear measurements (and what distinguished
Theorem 2 from embedding of a point cloud) relied on the smoothness of the manifold. This
assumption enabled us to control the behavior of short chords on M. However, x does not generally
belong to the manifold and hence, in general, we cannot control the direction of short chords
connecting x to M. To deal with this issue, we proceed as follows. For fixed γ > 0 to be specified
later, define
Mγ := {z ∈ M : kz − xk > γτ } ,
and let MC
γ := M\Mγ , i.e., the complement of Mγ in M. Note that one of the two sets may be
empty. Our first step towards a proof is to show that, for every z ∈ Mγ with an appropriately
chosen γ, we have
(1 − ) kz − xk ≤ kΦz − Φxk ≤ (1 + ) kz − xk .
(89)
In other words, we first study the stable embedding of the directions of all the chords connecting
x to Mγ , namely U (Mγ , x), for an appropriate γ. This is addressed next.
Lemma 18. Choose 0 < ≤ 1/3 and 0 < ρ < 1. Conveniently assume that (21) holds. If
√ !
!
K
8
M ≥ 18−2 max 11K + K log
+ log VM , log
,
2
τ
ρ
(90)
then, except with a probability of at most ρ, (89) holds for every z ∈ M/40 .
Proof. The proof strategy is identical to that in Appendix C. We will prove that (89) holds for
every z ∈ M/40 , with high probability and provided that M is large enough. As before, this is
achieved by finding an upper bound on
(
)
P
|kΦyk − 1| > ,
sup
y∈U (M/40 ,x)
for ≤ 1/3.
40
(91)
We begin again by constructing a sequence of increasingly finer covers for U (Mγ , x), with γ
to be set later. √
We denote this sequence by {Lj (δ)}—each Lj (δ) is a (2−j δ)-net for U (Mγ , x).
For 0 < δ ≤ 1/ 2, set η = δ 2 τ and γ = 4δ. We form {Lj (δ)} from {Cj (η)}, the sequence of
covers for M constructed in Appendix C.1. Indeed, the same argument in that section proves that
U (Cj (η), x) is a (2−j δ)-cover for U (Mγ , x). It also holds that
√ !K
K
2
VM
K
jK
jK
#Lj (δ) ≤ #Cj (η) ≤ 4
≤4
VM =: lj (δ).
(92)
2
2
2
θ(δ /4)δ τ
VBK
δ τ
As before, we can represent every y ∈ U (Mγ , x) with an infinite chain of points from the sequence
of covers {Lj (δ)}. After setting δ = /160, using the same argument as the one in Appendix C.2,
and exploiting the estimates above, one can verify that the failure probability in (91) is at most ρ,
provided that (90) holds.
We now combine Lemma 18 and an elementary argument to complete the proof of Theorem 4.
It is possible to recognize two different cases: when x
b ∈ MC
b ∈ M/40 . Clearly,
/40 and when x
kx − x∗ k ≤ kx − x
bk ≤
τ
,
40
when x
b ∈ MC
/40 .
(93)
If, however, x
b ∈ M/40 , then a more detailed analysis is required. An application of Lemma 17
implies that (89) holds for z = x∗ , except with a probability of at most ρ and provided that
M ≥ 6−2 log(2/ρ). Suppose the assumptions in Lemma 18 are met. Therefore, (89) holds for
every z ∈ M/40 ∪ {x∗ }, except with a probability of at most 2ρ. Also, by the definition of x∗ and
x
b, it holds true that
kx − x∗ k ≤ kx − x
bk
and
k(Φx + n) − Φb
xk ≤ k(Φx + n) − Φx∗ k .
Now, combining all these bounds and using several applications of the triangle inequality we obtain
that
kx − x
bk ≤ (1 − )−1 kΦx − Φb
xk
≤ (1 − )−1 k(Φx + n) − Φb
xk + (1 − )−1 knk
≤ (1 − )−1 k(Φx + n) − Φx∗ k + (1 − )−1 knk
≤ (1 − )−1 kΦx − Φx∗ k + 2 (1 − )−1 knk
1+
≤
kx − x∗ k + 2 (1 − )−1 knk .
1−
Since ≤ 1/3, one can easily check that
(1 − )−1 ≤ 1 + 2 and
1+
≤ 1 + 3.
1−
Consequently, we obtain that
kx − x
bk ≤ (1 + 3) kx − x∗ k + (2 + 4) knk ,
when x
b ∈ M/40 .
Combining (93) and (94), we overall obtain that
τ
kx − x
bk ≤ max
, (1 + 3) kx − x∗ k + (2 + 4) knk
40
τ
≤ (1 + 3) kx − x∗ k +
+ (2 + 4) knk ,
40
41
(94)
(95)
which, to emphasize, is valid under the assumptions of Lemma 18 and except for a probability of
at most 2ρ.
On the other hand, according to Theorem 3 and the remarks that followed it (see (19)), it holds
that
!
r
N
kx − x
bk ≤ (1 + 2) 2
+ 5 kx − x∗ k + (2 + 4)knk,
(96)
M
except for a probability of at most 2ρ and as long as both (8) and (9) hold. From (95) and (96),
we conclude that
!
!
r
τ
N
∗
∗
kx − x
bk ≤ min (1 + 3) kx − x k +
, (1 + 2) 2
+ 5 kx − x k + (2 + 4) knk ,
40
M
except for a probability of at most 4ρ and as long as both (9) and (21) hold. This completes the
proof of Theorem 4.
H
Proof of Theorem 5
Using the triangle inequality and (22), we have
kb
x − x∗ k ≤ kx − x
bk + kx − x∗ k
τ
≤ min (2 + 3) kx − x∗ k +
,
40
r
(2 + 4)
!
!
N
+ 6 + 10 kx − x∗ k + (2 + 4) knk .
M
(97)
Now, since both x
b and x∗ belong to M, we can invoke Lemma 7 from the Toolbox, which states
that if kb
x − x∗ k ≤ τ /2, then
p
x − x∗ k /τ .
(98)
dM (b
x, x∗ ) ≤ τ − τ 1 − 2 kb
To apply this lemma, it is sufficient to know that
(2 + 3) kx − x∗ k +
τ
+ (2 + 4) knk ≤ τ /2,
40
i.e., that
τ
1 + 2
kx − x k +
knk ≤
1 + 3/2
4
∗
1 − /20
1 + 3/2
.
For the sake of neatness, we may tighten this condition to kx − x∗ k+ 10
9 knk ≤ 0.163τ , which implies
the sufficient condition above (since ≤ 1/3). Thus, if kx − x∗ k and knk are sufficiently small (on
the order of the condition number τ ), then we may combine (97) and (98), giving
dM (b
x, x∗ )
v
!
!
!
r
u
u
2
τ
N
≤ τ − τ t1 −
min (2 + 3) kx − x∗ k +
, (2 + 4)
+ 6 + 10 kx − x∗ k + (2 + 4) knk
τ
40
M
v
!
!
r
u
u
N
4
+
6
4 + 8
= τ − τ t1 − min
kx − x∗ k +
, τ −1 (4 + 8)
+ 12 + 20 kx − x∗ k −
knk.
τ
20
M
τ
(99)
42
Under the assumption that kx − x∗ k + 10
9 knk ≤ 0.163τ , it follows that the term inside the square
root in the last line above must be nonnegative, and therefore (23) holds. This completes the proof
of Theorem 5.
References
[1] D. Achlioptas. Database-friendly random projections. In Proc. Symp. on Principles of Database Systems (PODS
’01), pages 274–281. ACM, 2001.
[2] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the Restricted Isometry Property for
random matrices. Constr. Approx., 28(3):253–263, 2008.
[3] R. G. Baraniuk and M. B. Wakin. Random projections of smooth manifolds. Found. Comput. Math., 9(1):51–77,
2009.
[4] D. S. Broomhead and M. J. Kirby. The Whitney Reduction Network: A method for computing autoassociative
graphs. Neural Comput., 13(11):2595–2616, November 2001.
[5] E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly
incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489–509, February 2006.
[6] E. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements.
Comm. Pure Appl. Math., 59(8):1207–1223, August 2006.
[7] E. Candès and T. Tao. Decoding via linear programming. IEEE Trans. Inform. Theory, 51(12):4203–4215,
December 2005.
[8] E. Candès and T. Tao. Near optimal signal recovery from random projections: Universal encoding strategies?
IEEE Trans. Inform. Theory, 52(12):5406–5425, December 2006.
[9] E. J. Candès. The Restricted Isometry Property and its implications for compressed sensing. Compte Rendus
de l’Academie des Sciences, Paris, 346:589–592, 2008.
[10] E. J. Candès and M. B. Wakin. An introduction to compressive sampling. IEEE Signal Proc. Mag., 25(2):21–30,
2008.
[11] M. Chen, J. Silva, J. Paisley, C. Wang, D. Dunson, and L. Carin. Compressive sensing on manifolds using
a nonparametric mixture of factor analyzers: Algorithm and performance bounds. IEEE Trans. Signal Proc.,
58(12):6140–6155, 2010.
[12] K. L. Clarkson. Tighter bounds for random projections of manifolds. In Proc. Symp. Comput. Geom., pages
39–48. ACM, 2008.
[13] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. J. Amer. Math.
Soc., 22(1):211–231, 2009.
[14] M. A. Davenport, M. F. Duarte, M. B. Wakin, J. N. Laska, D. Takhar, K. F. Kelly, and R. G. Baraniuk.
The smashed filter for compressive classification and target recognition. In Proc. Comput. Imaging V at SPIE
Electronic Imaging, January 2007.
[15] M. A. Davenport, C. Hegde, M. F. Duarte, and R. G. Baraniuk. Joint manifolds for data fusion. IEEE Trans.
Image Proc., 19(10):2580–2594, 2010.
[16] M. A. Davenport and M. B. Wakin. Analysis of orthogonal matching pursuit using the restricted isometry
property. IEEE Trans. Inform. Theory, 56(9):4395–4401, 2010.
[17] R. DeVore, G. Petrova, and P. Wojtaszczyk. Instance-optimality in probability with an `1 -minimization decoder.
Appl. Comput. Harmonic Anal., 27(3):275–288, 2009.
[18] D. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4), April 2006.
[19] D. Donoho and Y. Tsaig. Extensions of compressed sensing. Signal Proc., 86(3):533–548, March 2006.
[20] D. L. Donoho and C. Grimes. Image manifolds which are isometric to Euclidean space. J. Math. Imaging Comp.
Vision, 23(1):5–24, July 2005.
[21] M. Duarte, M. Davenport, M. Wakin, J. Laska, D. Takhar, K. Kelly, and R. Baraniuk. Multiscale random
projections for compressive classification. In Proc. IEEE Conf. Image Proc. (ICIP), September 2007.
[22] M. F. Duarte, M. A. Davenport, D. Takbar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk. Single-pixel
imaging via compressive sampling. IEEE Signal Proc. Mag., 25(2):83–91, 2008.
43
[23] M. F. Duarte, M. B. Wakin, D. Baron, S. Sarvotham, and R. G. Baraniuk. Measurement bounds for sparse
signal ensembles via graphical models. IEEE Trans. Inform. Theory, 59(7):4280–4289, July 2013.
[24] A. Eftekhari, J. Romberg, and M. B. Wakin. Matched filtering from limited frequency samples. IEEE Trans.
Inform. Theory, 59(6):3475–3496, June 2013.
[25] A. Eftekhari, H. L. Yap, C. J. Rozell, and M. B. Wakin. The Restricted Isometry Property for random block
diagonal matrices. arXiv preprint arXiv:1210.3395, 2012.
[26] H. Federer. Curvature measures. Trans. Amer. Math. Soc, 93(3):418–491, 1959.
[27] A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss. Improved time bounds for near-optimal sparse Fourier
representations. In Proc. SPIE Wavelets XI, 2005.
[28] A. C. Gilbert, M. J. Strauss, and J. A. Tropp. A tutorial on fast Fourier sampling. IEEE Signal Proc. Mag.,
25(2):57–66, 2008.
[29] A.C. Gilbert, M.J. Strauss, J.A. Tropp, and R. Vershynin. One sketch for all: Fast algorithms for compressed
sensing. In Proc. Symp. Theory Comput. (STOC), 2007.
[30] D. Healy and D. J. Brady. Compression at the physical interface. IEEE Signal Proc. Mag., 25(2):67–71, 2008.
[31] C. Hegde and R. G. Baraniuk.
58(12):7204–7214, 2012.
Signal recovery on incoherent manifolds.
IEEE Trans. Inform. Theory,
[32] C. Hegde, M. B. Wakin, and R. G. Baraniuk. Random projections for manifold learning. In Proc. Neural Inform.
Proc. Sys. (NIPS), December 2007.
[33] G. E. Hinton, P. Dayan, and M. Revow. Modeling the manifolds of images of handwritten digits. IEEE Trans.
Neural Networks, 8(1):65–74, January 1997.
[34] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimenstionality. In
Proc. Symp. Theory Comput. (STOC), 1998.
[35] M. A. Iwen and M. Maggioni. Approximation of points on low-dimensional manifolds via random linear projections. Information and Inference, 2013.
[36] S. Kirolos, J. Laska, M. Wakin, M. Duarte, D. Baron, T. Ragheb, Y. Massoud, and R. Baraniuk. Analog-toinformation conversion via random demodulation. In Proc. IEEE Dallas Circuits Systems Workshop (DCAS),
October 2006.
[37] F. Krahmer, S. Mendelson, and H. Rauhut. Suprema of chaos processes and the Restricted Isometry Property.
arXiv preprint arXiv:1207.0235, 2012.
[38] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Ann. Stat.,
28(5):1302–1338, 2000.
[39] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der
Mathematik Und Ihrer Grenzgebiete. Springer, 2011.
[40] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed sensing MRI. IEEE Signal Proc. Mag.,
25(2):72–82, 2008.
[41] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl.
Comput. Harmonic Anal., 26(3):301–321, 2009.
[42] D. Needell and R. Vershynin. Signal recovery from incomplete and inaccurate measurements via regularized
orthogonal matching pursuit. IEEE J. Sel. Topics Signal Proc., 4(2):310–316, 2010.
[43] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random
samples. Discrete Comput. Geom., 39(1):419–441, 2008.
[44] F. W. J. Olver and National Institute of Standards and Technology (U.S.). NIST Handbook of Mathematical
Functions. Cambridge University Press, 2010.
[45] A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing. Prentice-Hall Signal Processing Series.
Pearson Education, 2011.
[46] J. Y. Park and M. B. Wakin. A geometric approach to multi-view compressive imaging. EURASIP J. Advances
Signal Proc., 2012(1):37, 2012.
[47] D. Ramasamy, S. Venkateswaran, and U. Madhow. Compressive estimation in AWGN: General observations
and a case study. In Proc. Asilomar Conf. Signals, Systems, and Computers, 2012.
[48] H. Rauhut. Circulant and Toeplitz matrices in compressed sensing. In Proc. SPARS’09, Saint-Malo, France,
2009.
44
[49] H. Rauhut. Compressive sensing and structured random matrices. In M. Fornasier, editor, Theoretical foundations and numerical methods for sparse recovery, pages 1–92. De Gruyter, 2010.
[50] H. Rauhut, J. Romberg, and J. Tropp. Restricted isometries for partial random circulant matrices. Appl.
Comput. Harmonic Anal., 32(2):242–254, 2012.
[51] J. Romberg. A uniform uncertainty principle for Gaussian circulant matrices. In Proc. Int. Conf. Digital Signal
Proc., 2009.
[52] J. Romberg and R. Neelamani. Sparse channel separation using random probes. Inverse Problems, 26(11):115015,
2010.
[53] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier and Gaussian measurements. Comm.
Pure Appl. Math., 61(8):1025–1045, 2008.
[54] P. Shah and V. Chandrasekaran. Iterative projections for signal identification on manifolds: Global recovery
guarantees. In Allerton Conf. Comm., Control, Comput. (Allerton), pages 760–767, 2011.
[55] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM
Rev., 15(4):727–764, 1973.
[56] M. Talagrand. The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer Monographs
in Mathematics. Springer, 2005.
[57] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk. Beyond Nyquist: Efficient sampling
of sparse bandlimited signals. IEEE Trans. Inform. Theory, 56(1):520–544, 2010.
[58] M. Turk and A. Pentland. Eigenfaces for recognition. J. Cogn. Neurosci., 3(1):71–83, 1991.
[59] N. Verma. Distance preserving embeddings for general n-dimensional manifolds. J. Machine Learning Research,
14(Aug):2415–2448, 2013.
[60] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y.C. Eldar and G. Kutyniok,
editors, Compressed Sensing: Theory and Applications. Cambridge University Press, 2012.
[61] M. B. Wakin. The Geometry of Low-Dimensional Signal Models. PhD thesis, Department of Electrical and
Computer Engineering, Rice University, Houston, TX, 2006.
[62] M. B. Wakin, D. L. Donoho, H. Choi, and R. G. Baraniuk. The multiscale structure of non-differentiable image
manifolds. In Proc. Wavelets XI at SPIE Optics and Photonics, August 2005.
[63] H. L. Yap, M. B. Wakin, and C. J. Rozell. Some geometric properties of sampled sinusoids. Technical report,
School of Elect. and Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA, June 2013.
[64] H. L. Yap, M. B. Wakin, and C.J. Rozell. Stable manifold embeddings with structured random matrices. IEEE
J. Sel. Topics Signal Proc., 7(4):720–730, August 2013.
45
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement