Adaptive compressed image sensing using dictionaries

Adaptive compressed image sensing using dictionaries
c 2012 Society for Industrial and Applied Mathematics
Vol. 5, No. 1, pp. 57–89
Adaptive Compressed Image Sensing Using Dictionaries∗
Amir Averbuch†, Shai Dekel‡, and Shay Deutsch§
Abstract. In recent years, the theory of compressed sensing has emerged as an alternative for the Shannon
sampling theorem, suggesting that compressible signals can be reconstructed from far fewer samples
than required by the Shannon sampling theorem. In fact the theory advocates that nonadaptive,
“random” functionals are in some sense optimal for this task. However, in practice, compressed sensing is very difficult to implement for large data sets, particularly because the recovery algorithms
require significant computational resources. In this work, we present a new alternative method for
simultaneous image acquisition and compression called adaptive compressed sampling. We exploit
wavelet tree structures found in natural images to replace the “universal” acquisition of incoherent
measurements with a direct and fast method for adaptive wavelet tree acquisition. The main advantages of this direct approach are that no complex recovery algorithm is in fact needed and that
it allows more control over the compressed image quality, in particular, the sharpness of edges. Our
experimental results show, by way of software simulations, that our adaptive algorithms perform
better than existing nonadaptive methods in terms of image quality and speed.
Key words. compressed sensing, adaptive approximation, nonlinear approximation, wavelet trees
AMS subject classifications. 65T60, 65Y10, 68U10
DOI. 10.1137/110820579
1. Introduction. The Shannon sampling theory is at the core of nearly all signal acquisition protocols. The classic Shannon theorem states that the sampling rate, i.e., the Nyquist
rate, must be at least twice the maximum frequency present in the signal (see, e.g., [52]).
However, the Nyquist rate is too high and results in a massive data acquisition that must be
compressed in order to be either stored or transmitted. In addition, there are important and
emerging applications in which high sampling rates are very expensive or limited by physical
or physiological constraints.
In recent years, a novel approach for signal sampling, known as compressed sensing (CS)
[1], [9], [18], [14], [35], has emerged. CS is an alternative to the Nyquist rate for the acquisition
of “sparse” signals. Instead of uniformly sampling the signal at Nyquist rate, in the CS
paradigm one is given a budget of n independent “questions” to ask about the signal x ∈ RN ,
with n N . These questions take the form of n linear functionals φi , ·, asking for the values
yi = φi , x, 1 ≤ i ≤ n. The vectors φi ∈ RN form the sensing matrix Φ = (φ1 , φ2 , . . . , φn ),
such that y = Φx, where the dimension of Φ is n × N . In the next stage, a numerical
optimization method is used to reconstruct the full-length signal from the small amount of
Received by the editors January 10, 2011; accepted for publication (in revised form) October 19, 2011; published
electronically January 24, 2012.
School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel ([email protected]).
GE Healthcare, Herzeliya 46733, Israel ([email protected]).
Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978,
Israel ([email protected]).
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
collected data. The CS paradigm relies on two key properties: sparsity and incoherency.
A signal x ∈ RN is k-sparse in a transform basis Ψ = [ψ1 , ψ2 , . . . , ψN ] if the number of
nonzero coefficients {θi = x, ψi } is relatively small k N . It is compressible if the number
of significant (e.g., absolute value above some threshold) coefficients is relatively small. The
basis can be, for example, a wavelet basis, a Fourier basis, or a local Fourier basis (see, e.g.,
[31], [52]), depending on the application. There are also the tight frames such as curvelets
[10] or contourlets [62] that incorporate directional information, thereby allowing a sparser
representation of curve singularities. Incoherency, which is concerned with the acquisition
modality, states that the sensing waveforms should have an extremely dense representation in
Based on these two properties, one can design sampling protocols that capture useful
information content embedded in a sparse signal and condense it into small data amount.
A related condition to incoherency, which also has proved to be very useful in studying the
general robustness of CS, is the so-called restricted isometry property (RIP) [14], [17]. Given
a measurement matrix Φ (and a basis Ψ), the RIP can provide a sufficient condition for a
stable solution for both k-sparse and compressible (in the basis Ψ) signals. In other words, the
matrix Φ preserves the information in a sparse or compressible signal if it satisfies the RIP
condition. In general, constructing a measurement matrix that satisfies the RIP condition
for large k and N is a hard combinatorial problem [56]. However, the RIP condition can be
achieved for a large class of randomized matrices with a high probability [14], [36].
In scenarios where there is no noise present, the recovery of an unknown signal x from the
measurement vector y would ideally be achieved by searching for the l0 -sparsest representation
that agrees with the measurements
θ̂ = arg min θ0 subject to y = ΦΨ−1 θ.
It is known [26] that if x is a k-sparse vector, then with Ψ = I, one can design a sampling
matrix Φ of only n = 2k measurements and recover x via l0 minimization. Unfortunately, the
l0 minimization problem leads to a daunting NP-complete combinatorial optimization problem
Instead of solving the l0 minimization problem, nonadaptive CS theory seeks to solve the
“closest possible” tractable minimization problem, i.e., the l1 minimization
θ̂ = arg min θ1 subject to y = ΦΨ−1 θ.
This modification, known as Basis Pursuit [22], leads to a much simpler convex problem,
which approximates the original problem and can be solved by various classical optimization
techniques. But at the same time, minimizing the l1 norm finds a sparse approximation
because it prevents diffusion of the signal’s energy over a large number of coefficients. It
typically requires a number of n = O (k log(N/k)) measurements to robustly recover k-sparse
and compressible signals. One can adjust (1.2) to deal with cases where x or y is corrupted
with additive noise by replacing the equality y = ΦΨ−1 θ with an approximation constraint
[22], [16]. Another such algorithm is the minimization of the total variation with quadratic
constraints that solves the following problem:
min IT V subject to ΦxI − y2 ≤ ε,
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
IT V =
(I(i + 1, j) − I(i, j))2 + (I(i, j + 1) − I(i, j))2 ,
xI is the vector created from a digital image I by concatenating its rows, and ε depends on
the noise level.
In general, solving the total variation problem (1.3) provides a better image quality for
image reconstruction at the expense of greater time complexity.
Although different convex optimization algorithms such as Basis Pursuit are optimal in
the sense that they provide a global minimum, they involve expensive computations when
applied to large-scale signals. Therefore a second approach, using iterative greedy methods,
has been proposed. Examples include Matching Pursuit [39], Orthogonal Matching Pursuit
[72], Tree Matching Pursuit (TMP) [48], StOMP [37], Iterative Hard Thresholding Pursuit
[5], Subspace Pursuit [30], and CoSaMP [57].
Let us focus on one of the most promising implications of CS derivatives: the enabling and
design of new types of compressed imaging (CI). One such device, which implements optical
CS, is the single-pixel camera [38] inspiring a wide interest in CI applications. The single-pixel
camera relies on the use of a digital micromirror device (DMD), which is composed of a grid
of mirrors, where each mirror of the array is suspended above an individual static random
access memory (SRAM) cell (see section 2.2). We also note that there are experimental CI
algorithms, designed for measurements taken in the frequency domain in medical imaging [50].
However, the design of an efficient and optimal measurement basis in such a system is still
a challenging problem. Although theoretically very powerful, there is still a huge gap between
the CS theory and imaging applications. We note a few of these key difficulties:
1. Control over image quality of the compressed image. From the theory of wavelet approximation, it is clear that the error of n-term wavelet approximation is characterized
by the “weak-type smoothness” of the image as a function in a Besov space [34]. In
terms of image representation, more wavelet coefficients are needed to achieve a specified level of quality if the image contains higher visual activity such as edges and
texture parts. Typically, however, existing CS architectures are not adaptive, and the
number of measurements is determined before the acquisition process begins, with no
feedback during the acquisition process on the improved quality. Yet, there exists work
on a “sequential CS” paradigm [51], where the sampling procedure is composed of iterations of acquisition of random projections, reconstruction, testing for convergence,
and then acquisition of more random projections if needed. The main challenge with
this scheme is that the recovery algorithm needs to be applied at each iteration; thus
the total complexity of the acquisition algorithm cannot be of order O (n).
2. Computationally intensive sampling process. Constructing an efficient and fast sampling operator for nonadaptive CS remains a challenging problem. On the one hand,
the CS theory shows that random measurement (i.e., unstructured) systems are in
some sense an optimal strategy for acquiring sparse signals. However, sampling operators of random binary patterns are by nature dense matrices that require huge memory
allocations and intensive computations for large values of N and n. It is impractical
even for an image of size 256 × 256. Some of these practical challenges have already
been somewhat addressed. The Scrambled Fourier Ensemble [12] was proposed for an
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
MRI application and is often employed in CS research because of its fast implementation. Other researchers [4] developed binary sparse measurement matrices with low
complexity and near-optimal performance. However, these operators are incoherent
only with identity matrices. The Block Hadamard Ensemble [40] was designed to offer
a practical measurement operator for implementations.
3. Computationally intensive reconstruction algorithm. Although the algorithms designed
to solve the l1 minimization (1.2) are proved to be theoretically efficient, i.e., with
polynomial complexity bounds, whenever the problem has a large scale, it requires
heavy computations.
In this work, we introduce an architecture that, in accordance with the CS paradigm, aims
at providing an acquisition process that captures n measurements with n N and n = O (k),
where N is the dimension of the full high-resolution image which is assumed to be k-sparse.
However, in practice, it improves upon the existing CS approach by additionally achieving
the following:
1. Adaptivity to the image content. The acquisition algorithm is still of complexity O (n),
but it adaptively takes more measurements if needed to achieve some compressed image
target quality.
2. A fast and efficient computational framework. The algorithm does not require the
computational intensive CS recovery algorithm. The decoding process has the computational complexity of existing algorithms in use today such as JPEG or JPEG2000
The algorithms described in this paper were inspired by the CS idea of a sensing process that acquires a relatively small number of samples from which a sparse signal can be
reconstructed. However, they depart fundamentally from the classic (nonadaptive) CS mathematical framework. Instead of acquiring the visual data using functionals that are random
in nature and incoherent with the sparse basis, e.g., wavelets, we suggest a simple, direct
adaptive, and fast method for the acquisition of transform coefficients. We show that it is
possible to use devices such as the DMD array in a single-pixel camera for the computation
of arbitrary transform coefficients, where the computation of each coefficient requires a fixed
number of measurements (see section 2.2). With this tool in hand, instead of asking n universal, nonadaptive questions about the signal in a form of linear functionals, our method takes
linear measurements, directly in a sparse and structured basis, with each measurement based
on the previous measurements. This allows us to take advantage of the geometric structure of
the sparse representation explicitly, such as imposing the modeling of wavelet tree structures
over image edges to gather its significant information.
The sampling process is initialized by acquiring a very small number of low-resolution measurements. Then, based on this initial data, by utilizing specific properties of the wavelet structure, our algorithm adaptively extracts the significant information by using a near-minimal
number of measurements n N , n = O(k). This significant information, i.e., wavelet coefficients which correspond to image edges, is gathered based only on the subset of wavelet
coefficients that were previously sampled and on an established set of rules. Therefore, our
method is called Adaptive Compressed Sampling (ACS). Since the ACS algorithm takes only
linear sampling directly in the wavelet domain, there is no need for the standard CS recovery
process from “pseudorandom” measurements.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
The outline of the paper is as follows: in section 2 we recall the basic properties of wavelets
and explain how the DMD architecture can be potentially used to acquire any transform
coefficients in O (1). In section 3 we provide the ACS version of an exact reconstruction
result for cartoon images using a wavelet domain sampling scheme. Using basic tools from
differential geometry, we prove that our ACS scheme “efficiently” samples all the nonzero
wavelet coefficients of cartoon images, which correspond to the edges, thereby providing exact
recovery with “optimal” (up to a constant) sampling.
In section 4 we review the Stat-ACS, an adaptive sampling algorithm that applies wellknown statistical models of wavelet representations of images. Modeling of wavelet trees over
image edges has already been used in the context of CS [2], [44], [48] but in a different and
implicit way to speed up a reconstruction algorithm (1.2), as well as to speed up the numeric
solution and to reduce the number of measurements. In particular in [2] it was shown that
tree-sparse signals can be sampled using traditional CS with O (k) samples. In contrast, we
use adaptively statistical tree models in a direct and explicit way, namely, at the time of the
acquisition. The Stat-ACS algorithm is of higher complexity than the algorithm we presented
in [33], but is more efficient in predicting significant wavelet coefficients and hence provides
a better rate/distortion performance. We also provide comparison results with nonadaptive
CS methods. Finally, in section 5, we present a more advanced algorithm, the Tex-ACS,
which addresses the model of images as a mix of cartoon and local texture patches. The
Tex-ACS algorithm uses a dictionary composed of local Fourier and wavelet basis functions.
Experimental results presented for both Stat-ACS and Tex-ACS clearly show that the adaptive
form of CS provides state-of-the-art results in terms of higher reconstruction quality for the
same number of samples.
2. Direct sampling of transform coefficients.
2.1. The wavelet tree structure.
Recall that the univariate wavelet system (e.g., [31],
[52]) is a family of real functions ψj,k : (j, k) ∈ Z2 in L2 (R), built by dilating and translating
a unique mother wavelet function ψ(x)
ψj,k = 2−j/2 ψ(2−j x − k),
where the mother wavelet satisfies
ψ(x)dx = 0.
The wavelet system is a basis of L2 (R), where for each f ∈ L2 (R),
f, ψ̃j,k ψj,k ,
where ψ̃ is a dual of ψ. For special choices of ψ, the set {ψj,k } forms an orthonormal basis
for L2 (R), and then ψ = ψ̃. Wavelets are usually constructed from a multiresolution analysis
(MRA), which is a sequence of closed subspaces of L2 (R),
. . . V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ V−2 . . . ,
∩Vj = {0} ,
∪j Vj = L2 (R).
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Usually, one starts the construction from a scaling function (generator) ϕ ∈ L2 (R) that
satisfies a two-scale equation
ak ϕ (2 · −k).
In the orthonormal case, one then sets Vj = span ϕj,k := 2−j/2 ϕ 2−j · −k : k ∈ Z and
constructs ψ such that Wj := span {ψj,k : k ∈ Z}, with Vj+1 ⊕ Wj+1 = Vj . A classic example
for an orthonormal system is the Haar scaling function and wavelet:
x ∈ 0, 12 ,
⎨ 1,
1, x ∈ [0, 1] ,
ϕ (x) :=
ψ (x) :=
−1, x ∈ 12 , 1 ,
0 else,
To obtain smooth symmetric wavelets of compact support one needs to construct biorthogonal
wavelets, where ψ = ψ̃. This is achieved by using a dual set of scaling functions and MRAs
The wavelet model can be easily generalized to any dimension [31], [52] via a tensor
product of the wavelet and the scaling functions. Assume that the univariate wavelet ψ and
the scaling function ϕ are given. Then, a bivariate separable basis is constructed using three
basic wavelets
ψ 1 (x1 , x2 ) = ϕ(x1 )ψ(x2 ),
ψ 2 (x1 , x2 ) = ψ(x1 )ϕ(x2 ),
ψ 3 (x1 , x2 ) = ψ(x1 )ψ(x2 ).
The bivariate
transform represents an image f ∈ L2 (R ), in terms of the three wavelet
1 discrete
functions ψ , ψ , ψ :
e } are coefficients. The wavelet decomposition can thus be interpreted as a signal
where {wj,k
decomposition in a set of three spatially oriented frequency subbands: LH(e = 1) detects
horizontal edges, HL (e = 2) detects vertical edges, and HH (e = 3) detects diagonal edges.
e at a scale j represents the information about the image in the
A wavelet coefficient wj,k
spatial region in the neighborhood of 2j k, k ∈ Z2 . At the next finer scale j −1, the information
about this region is represented by four wavelet coefficients, which are described as the children
e . This leads to a natural tree structure organized in a quad tree structure of each of the
of wj,k
three subbands as shown in Figure 1. Note that for each finer scale, the coefficients represent
a smaller spatial area of the image at higher frequencies. As j decreases, the child coefficients
add finer and finer details into the spatial regions occupied by their ancestors.
Wavelet representations are considered very efficient for image compression: the edges
constitute a small portion of a typical image, and a wavelet coefficient is large only if edges
are present within the support of the wavelet. Consequently, the image can be approximated
well using a few large wavelet coefficients. A significant statistical structure also follows:
large/small values of wavelet coefficients tend to propagate through the scales of the quad
trees (see section 4.1 for details). As an example, a sparse wavelet representation of the
512 × 512 image Lena and a compressed version of this image are shown in Figure 2, where
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Figure 1. Wavelet coefficient tree structure across the subbands (MRA decomposition).
(a) 10,179 significant wavelet coefficients
(b) Compressed image 1:23
Figure 2. (a) A sparse wavelet representation of an image. Black—significant coefficient; white—
insignificant coefficient. (b) Compressed JPEG2000 image based on the sparse representation of (a).
the compression algorithm was based on the sparse representation. The figure clearly depicts
that the significant wavelet coefficients (coefficients with relatively large absolute value) are
located on strong edges of the image.
2.2. Direct sampling of transform coefficients using digital micromirror arrays. Today’s
digital camera devices are in the megapixel range thanks to the introduction of charged coupled
devices (CCDs) and complementary metal oxide semiconductor (CMOS) digital technology.
Assume that I is a digital image consisting of N pixels that we wish to acquire. Some digital
cameras perform the acquisition using an array of N CCDs after exposure to the optical image.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
The charges on its pixels are read row by row, are fed to an amplifier, and then undergo an
analog-to-digital conversion. CMOS technology allows today’s professional cameras to acquire
about 12 million pixels (3000× 4000). Once the digital image I has been acquired, it is usually
compressed using standard compression algorithms such as JPEG and JPEG2000. In most
digital cameras, the user has the capability to control the compression/quality tradeoff through
the compression algorithm. We now describe the digital micromirror device (DMD) hardware
architecture, depicted in Figure 3, that was also used for the single-pixel camera of [38].
The DMD consists of an array of electrostatically actuated micromirrors, where each
mirror of the array is suspended above an individual SRAM cell. Each mirror rotates about a
hinge and can be positioned in one of two states: +10 or −10 degrees from horizontal. Using
lenses, all the reflections from the micromirrors at a given state (e.g., the light reflected by
mirrors in +10 degree state) are focused and collected by a single photodiode to yield an
absolute voltage. The output of the photodiode is amplified through an op-amp circuit and
then digitized by an analog-to-digital converter. This value should be interpreted as
xi 1θi =+10 + DC offset,
where xI = (x1 , . . . , xN ) is the vector formulated by concatenating the rows of the digital
image I, the indicator function 1θi =+10 obtains the value 1 if the ith micromirror is at the
state +10 degrees and 0 otherwise, and the DC offset is the value outputted when all the
micromirrors are set to −10 degrees.
Figure 3. The architecture of the DMD.
In [38], the rows of the CS sampling matrix Φ are in fact a sequence of n pseudorandom
binary masks, where each mask is actually a scrambled configuration of the DMD array (see
also [3]). Thus, the measurement vector y is composed of inner-products of the digital image x
with pseudorandom masks. In [38] the measurements are collected into a compressed bitstream
with possible lossy or lossless compression applied to the elements of y. At the core of the
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
decoding process, which takes place at the viewing device, there is the minimization algorithm
(1.2). Once a solution is computed, one obtains from it an approximate “reconstructed” image
by applying the transform Ψ−1 to the coefficients. In [38], the authors choose the transform
Ψ to be a wavelet transform, but their sampling process has a universal property. This
means that any transform T which is in incoherence with a process of pseudorandom masks
Φ (almost all reasonable transforms have this property) and in which the image has a sparse
representation will lead to a reconstruction with good approximation of the original image.
Recall that our proposed approach, although inspired by the CS concept of acquisition
processes of simultaneous sensing and compression of the image, departs fundamentally in
the mathematical framework. Instead of acquiring the visual data using a representation that
is incoherent with wavelets, we sample directly in the wavelet domain. At a glance, this
might seem to be a paradox, since computing the fast wavelet transform of an N -pixel image
requires O (N ) computations, whereas we want to take only n measurements with n N .
Furthermore, computing even one low-frequency wavelet coefficient requires calculating an
integral over a significant portion of the image pixels, which requires O (N ) computations. As
we shall see, this paradox is solved by using the DMD in a different way than in [38].
There exist DMD arrays with the functionality that a micromirror can “reflect” a grayscale
value (contemporary DMD can produce 1024 grayscale values). In theory, one can use these
devices to compute arbitrary real-valued functionals acting on the data, since any functional
g can be represented as a difference of two nonnegative g+ , g− functionals, such that g =
g+ − g− (g+ , g− ≥ 0). This allows us to compute any transform coefficient using only two
measurements. Also, we may leverage on the “feedback” architecture of the DMD and make
decisions on future measurements based on existing value measurements. As we shall see, this
adaptive sampling process relies on a well-known modeling of image edges using a wavelet
coefficient tree structure. Then, decisions on which wavelet coefficients should be sampled
next are based on the values of the wavelet coefficients obtained so far.
Let us now give the example of the acquisition of an arbitrary Haar wavelet coefficient,
computed by using the even simpler binary DMD with only two measurements. The wavelet
coefficient associated with a Haar wavelet in subband e = 1 is given by
2 (k1 +1) 2j (k2 +1/2)
f (x1 , x2 ) dx1 dx2
f, ψj,k = 2
2j k1
2j k2
2j (k1 +1) 2j (k2 +1)
2j k1
2j (k2 +1/2)
f (x1 , x2 ) dx1 dx2 .
This implies that for the purpose of computing a wavelet coefficient of the first type using a
binary DMD array, we need to rotate and collect twice, into the photodiode, responses from
two subsets of micromirrors, each supported over neighboring rectangular regions corresponding to the scale j and location k. From (2.6), the value of the wavelet coefficient we wish to
acquire is the difference of these two outputs multiplied by 2−j .
We emphasize that there is ongoing research of compressed image acquisition applications
where DMD arrays might have an advantage over conventional CCD architectures. One
of these areas is hyperspectral imaging, where the acquisition device samples an image in
multiple bands by using configurable optics [47]. In [61], the authors show promising results
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
in hyperspectral imaging using the DMD array. The adaptive approach presented in this work
can potentially improve their nonadaptive scheme.
Finally, as pointed out by the referees, we note that in this work we do not investigate the
influence of DMD accuracy on the quality of reconstruction. Typically, when noncompressive
sensing is followed by image compression, transform coefficients are first calculated in relatively high precision and then quantized for the purpose of compression. When one acquires
transform coefficients directly using the DMD, they are contaminated with noise, due to the
relatively low precision of the DMD hardware. We postulate that this has a smaller effect
when the goal is low bit-rate compression and transform coefficients are quantized significantly
with quantization steps that are larger than the noise level of the hardware system. Obviously,
this approach will become even more valid as DMD technology matures and more accurate
systems enter the market.
3. Exact reconstruction of cartoon images using ACS. The theory of CS contains results
on the ability to achieve exact recovery of a sparse signal from a limited number of linear
measurements [14], [17], [35]. The prototype “sparse image” used in CS is the piecewise
constant “cartoon” Shepp–Logan image (e.g., [14]), since such images are sparse in the wavelet
domain and also have “small” total-variation norm. In this section we present a simple
theoretical result on exact reconstruction via critical sampling that serves as a motivation for
our ACS algorithms. We prove that if a function is a typical cartoon image, i.e., a piecewise
constant function over domains with smooth boundaries, then it is possible to reconstruct it
exactly by adaptive wavelet coefficient sampling, using an optimal number of measurements
(up to a constant). In the following a “sample” is the evaluation of the given function by a
linear functional, specifically a wavelet coefficient. In this section, where the given functions
are not discrete, we use the convention as in [31] that scales are finer as the index j is larger
(which is the opposite of the usual notation in the discrete setting).
Theorem 3.1. Let f (x) = c1 + c2 1Ω (x), where the boundary ∂Ω is a simple closed C 2 (twice
continuously differentiable) curve. Then, there exists a scale T ∈ Z, such that if all the nonzero
e are known for j ≤ T , then all nonzero coefficients
Haar wavelet (2.4) coefficients f, ψj,k
e , j > T , can be acquired by an ACS process, where for each resolution
f, ψj,k
= 0 : e = 1, 2, 3, k ∈ Z2 .
# {required samples at resolution j} ≤ 12# f, ψj−1,k
Remark. Observe that the above theoretical result is of a different nature than previous
“exact reconstruction” results in CS. Most importantly, the setting is not discrete. The
simple claim is that, from fine enough resolution, the location of nonzero wavelet coefficients
at some resolution is completely predictable from the location of nonzero coefficients at the
lower resolution. This motivates the Stat-ACS adaptive sampling algorithm (section 4) that is
based on wavelet trees and predicts the location of significant coefficients by using information
derived from coefficients already sampled at lower scales. However, since images typically are
modeled as the combination of cartoon and texture models, the Tex-ACS algorithm (section
5) is needed as a complement.
We begin with some definitions from differential geometry [43].
Definition 3.2. Given three points x = γ(s), y = γ(σ), z = γ(τ ) on a curve γ, we define the
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
local radius of curvature ρ(x) at each point x of γ by
ρ(x) := lim r(x, y, z),
where r(x, y, z) is the radius of the unique circle which passes through the three points x, y, z.
In the case where a line passes through the points we say that r(x, y, z) = ∞. Assuming a
natural parameterization, the curvature k(s) at a point γ(s) is
k(s) := γ (s) =
The minimal local radius of curvature of the curve γ is
ρ(γ) := min ρ (x) .
Definition 3.3. Given a simple closed C 2 curve γ, we define the global radius of curvature
ρG (x) at each point x of γ by
ρG (x) = min r(x, y, z),
ρG (γ) := min ρG (x).
It is clear that the global radius of curvature is bounded by the local radius of a curvature,
0 < ρG (x) ≤ ρ(x) ∀x ∈ γ.
In Figure 4 we see an example that shows the difference between the global radius of a
curvature and the local radius of the curvature. In contrast to a local radius of a curvature,
the global radius of the curvature contains information about nonlocal parts of the curve.
Figure 4. Global radius of curvature (left circle) and local radius of curvature (right circle).
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Let Ω be a domain with a simple and closed C 2 boundary ∂Ω. Assume that the boundary
is given in the natural parametric form by
γ(s) = (x(s), y(s)),
where L is the length of the curve,
γ (s)
s ∈ [0, L],
= max γ (s) ≤ M,
0 < ρG (γ) = MG < ∞.
Let Qj,k denote a dyadic square in R2 (recall that in this section, the scales are finer as
j → ∞):
k1 (k1 + 1)
k2 (k2 + 1)
× j,
, j ∈ Z, k = (k1 , k2 ) ∈ Z2 .
Qj,k := j ,
Lemma 3.4. Let γ be a smooth, closed, and simple curve. If for j ∈ Z
2−j ≤ ρG (γ),
then the circumscribed circle Oj,k of any cube Qj,k , k ∈ Z2 , contains at most one connected
segment of the curve γ.
Proof. Since γ is a non-self-intersecting curve, we have ρG (γ) > 0. Assume, in contrast,
that Oj,k contains two segments of γ, γ1 ≡ γ|[s1 ,s2 ] and γ2 ≡ γ|[s3 ,s4 ] , with s1 < s2 < s3 < s4 .
Thus, the radius R of the circle passing through γ(s1 ), γ(s2 ), and γ(s3 ) satisfies
2 −j
2 < 2−j ,
ρG (γ) ≤ R =
which contradicts (3.12). We conclude that Oj,k may contain at most one connected component of γ.
Thus, we may assume that if j satisfies (3.12), then either γ ∩Oj,k = ∅ or γ ∩Oj,k is a single
connected segment. Let us assume now the second case, where γ ∩ Oj,k is a single segment,
and let the entry and exit points of γ in Oj,k be γ(s1 ), and γ(s2 ), respectively. Define the
following line:
γ̃(s) = γ(s1 ) + (s − s1 )γ (s1 ).
Lemma 3.5. If j satisfies for α ≥ 1
2απ 2 M ≤ 2j ,
with M given in (3.9), then the distance between γ and the segment γ̃, with γ̃(s) = (x̃(s), ỹ(s)),
given in (3.14) obeys
γ(s) − γ̃ s ≤ 2 ,
s ∈ [s1 , s2 ] .
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Proof. Since 2M ≤ 2j , the radius of curvature of γ is bigger than the radius of Oj,k .
Therefore, the maximal length of the single connected curve segment contained in Oj,k is
bounded by the circumference of Oj,k :
length (γ ∩ Oj,k ) = s2 − s1 ≤
2π2−j .
Using the differentiability properties of γ (s) = (x (s) , y (s)), we obtain
|x(s) − x̃(s)| ≤ x ∞ (s − s1 )2 , |y(s) − ỹ(s)| ≤ y ∞ (s − s1 )2
∀s ∈ [s1 , s2 ].
Applying (3.15) and (3.17) for the parametric distance |γ(s) − γ̃(s)| yields
|γ(s) − γ̃(s)| = (x(s) − x̃(s))2 + (y(s) − ỹ(s))2
≤ M (s2 − s1 )2 ≤ M
= 2π 2 M 2−2j ≤ 2α .
Since the geometric distance is always smaller than the parametric distance, we obtain
Let j be a sufficiently fine (large) scale which obeys (3.12) and (3.15) for α = 16. As
depicted in Figure 5, let I be the strip composed from two linear
segments γ̃− and γ̃+ that are
parallel to γ̃ and obtained by shifts of γ̃ by −2 16 and +2 16, respectively, in the direction
of the normal of γ̃. By construction of γ̃− , γ̃+ and by (3.16), the curve γ is confined to the
strip I in Qj,k . Define the width of I, w(I), to be the Hausdorff distance between γ̃− and γ̃+ .
Obviously, by construction
w(I) = 2−j /8.
Recall that f (x) = c1 + c2 1Ω (x). For i = 1, . . . , 4, let Si = area(Q̃i ∩ Ω), where {Q̃i },
i = 1, . . . , 4, are the four quadrants of Qj,k , i.e., the four dyadic cubes at level j + 1 that are
contained in Qj,k . Also, let S(I) = area (I ∩ Qj,k ).
In Figure 5, we see an example of a curve component which is contained in a dyadic cube
Qj,k and bounded by a strip I. Our proof rests on the fact that we can make the width of the
strip I to be relatively sufficiently small, such that the equality S1 = S2 = S3 = S4 cannot
Lemma 3.6. Assume that j satisfies (3.15) for α ≥ 16. The equality S1 = S2 = S3 = S4
holds for Qj,k only if
Si = 0,
i ∈ {1, 2, 3, 4} ,
Si =
i ∈ {1, 2, 3, 4} .
Proof. Denote the center of Qj,k by o. If o ∈ (I)C , then γ does not intersect with at least
one of the quadrants Q̃i and therefore S1 = S2 = S3 = S4 can hold only if Qj,k ∩ Ω = ∅,
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Figure 5. A curve component bounded by the strip I in Qj,k .
corresponding to (3.19), or Qj,k ∩ Ωc = ∅, corresponding to (3.20). Otherwise o ∈ I (as
depicted in Figure 5). By (3.18)
2−j √
2 −2j
· 2·2 =
2 .
S(I) ≤ w(I) · 2 · side(Qj,k ) =
From a simple combinatorial argument, there exists a quadrant Q̃i for which the intersection
with I is equal to or smaller than a quarter of S(I):
2 −2j
2 .
S(Q̃i ∩ I) ≤
Without loss of generality, assume that (3.21) holds for i = 1 and that S1 > 0. This implies
√ −2j
6 · 2−2j
S1 ≥ area(Q̃1 ) − S(Q̃1 ∩ I) ≥
On the other hand, for at least one of the other quadrants, Q̃i , i ∈ {2, 3, 4},
2 −2j
·2 .
Si ≤ S(Q̃i ∩ I) ≤ S (I) ≤
Thus, Si < S1 for one of i = 2, 3, 4, and therefore, o ∈ I is not possible. This completes the
proof of the lemma.
Proof of Theorem 3.1. Let f (x) = c1 + c2 1Ω (x), and let γ = ∂Ω. Let Qj,k be a dyadic
e = f, ψ e = 0 for e = 1, 2, 3. We claim that if Q
cube. If Qj,k ∩ γ = ∅, then wj,k
j,k ∩ γ = ∅,
1 = w 2 = w 3 = 0 simultaneously. Indeed,
then for sufficiently large j it is impossible that wj,k
assume Qj,k , j ≥ T , where for α ≥ 16
T := max 0, log2 (2απ 2 M ) , log2 (ρG (γ)) .
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1 = w 2 = w 3 = 0. Observe that,
Assume, by contradiction, that Qj,k ∩ γ = ∅ and that wj,k
using the notation where the scales are finer as j → ∞,
2 (k1 +1) 2−j (k2 +1/2)
wj,(k1 ,k2 ) = 2
f (x, y)dxdy
2−j k1
1 ,k2 )
2−j (k2 +1)
2−j (k2 +1/2)
2−j k1
1 +1)
f (x, y)dxdy
= 0,
2−j (k1 +1/2) 2−j (k2 +1)
1 ,k2 )
2−j k2
2−j k1
2−j (k
2−j k2
2−j (k1 +1)
2−j (k1 +1/2)
f (x, y)dxdy
2−j (k2 +1)
2−j k2
f (x, y)dxdy
= 0,
2−j (k1 +1/2) 2−j (k2 +1/2)
2−j k1
2−j (k1 +1)
2−j (k1 +1/2)
f (x, y)dxdy
2−j (k2 +1)
2−j (k2 +1/2)
f (x, y)dxdy
2−j (k1 +1/2) 2−j (k2 +1)
2−j k1
2−j k2
2−j (k1 +1)
2−j (k1 +1/2)
2−j (k2 +1/2)
f (x, y)dxdy
2−j (k2 +1/2)
2−j k2
f (x, y)dxdy
= 0.
With the notation of {Si }4i=1 given above, the solution of (3.25)–(3.27) is obtained by solving
the linear algebraic system
−1 −1 1 1
⎜ S2 ⎟
c2 ⎝ 1 −1 1 −1 ⎠ ⎜
⎝ S3 ⎠ = 0.
−1 1
1 −1
Since we may assume c2 = 0, we obtain the set of solutions
S1 = S2 = S3 = S4 .
As T satisfies (3.24) for α ≥ 16, by Lemma 3.6 we obtain that (3.29) is satisfied only if (3.19)
or (3.20) holds. But this is a contradiction to our assumption that Qj,k ∩ γ = ∅.
The proof of the theorem can now be completed by a simple induction: at any resolution
e }. For any such nonzero
j ≥ T assume that we have all nonzero wavelet coefficients{wj,k
coefficient wj,k , we sample the 12 wavelet coefficients
wj+1,l | e = 1, 2, 3, l = {(2k1 , 2k2 ), (2k1 + 1, 2k2 ), (2k1 , 2k2 + 1), (2k1 + 1, 2k2 + 1)} .
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
If for the cube Qj+1,l , we have wj+1,l
= 0 for e = 1, 2, 3, then we know Qj+1,l ∩ γ = ∅, and
therefore we do not sample any wavelet coefficients supported in this cube. Otherwise our
ACS process continues to sample wavelet coefficients in Qj+1,l .
4. Stat-ACS: An ACS algorithm based on statistical modeling in the wavelet domain.
In [33], we presented a rather simple ACS algorithm, in which wavelet coefficients were sampled
based on local Lipschitz smoothness estimates using their ancestors [23], [52]. In this work
we elaborate on a more advanced statistical-based ACS algorithm: the Stat-ACS . We use
information-theoretic analysis tools to estimate statistical dependencies between a wavelet
coefficient and related coefficients at adjacent spatial locations and scales. The information
obtained from previously sampled coefficients is summarized into a linear predictor which
predicts the significance of a coefficient that has not yet been sampled. Since the Stat-ACS
algorithm operates on a relatively complex dependence network, we employ hash tables to
construct efficient dynamic dictionaries. Roughly speaking, the algorithm, of order O (n),
captures n measurements, with n N , where N is the number of pixels. The goal of the
algorithm is that n = O (k), where the image is assumed to be well approximated by k wavelet
terms. While this is motivated by the cartoon model of section 3, it is certainly not guaranteed.
The Stat-ACS algorithm is more complex than the adaptive algorithm in [33]; its scheme
increases the probability that significant wavelet coefficients will not be “missed” and will be
sampled. Thus, the Stat-ACS algorithm suggests a tradeoff between a higher complexity of
the algorithm and a higher efficiency in recovering edges and important features in the image.
4.1. Wavelet statistical models. Our ACS algorithm relies heavily on statistical modeling
of images, as is used in tasks such as compression [8], [25], [29], [49], denoising [29], [71], and
classification [24]. In most of these works, wavelet coefficients of natural images are modeled
by generalized Gaussian marginal statistics. The probability density function (PDF) of the
generalized Gaussian density is defined as
p(x; α, β) =
e−(|x|/α) ,
where α models the width of the PDF peak, β is inversely proportional to the decreasing rate
of the peak, and Γ(z) = 0 e−t tz−1 dt. z > 0 is the Gamma function.
Also, these works reveal that for images, wavelet coefficients are not statistically independent mainly due to the geometric properties of edge singularities. Thus, at least for natural
images, there are significant dependencies between the magnitudes of pairs of wavelet coefficients which correspond to adjacent spatial locations, orientation, and scales.
There are various statistical models which have been developed to capture the dependencies between wavelet coefficients. Image compression schemes that use spatial and scale-toscale dependencies have proved to be extremely effective where a large magnitude wavelet
coefficient may indicate that a neighbor wavelet coefficient at the same subband or at the
next finer scale also has a large magnitude. The conditional distribution of a coefficient, given
its coarser scale ancestors, was captured by the embedded zero-tree wavelet (EZW) coder [69]
to encode entire trees of insignificant coefficients with a single symbol. A predictive method
was used in [66] to provide high quality zero-tree coding results. Several studies [60], [59],
[63] predict blocks of fine coefficients from blocks of coarse coefficients by using vectorized
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
look-up tables. Adaptive entropy coding was used in [67] to capture conditional statistics of
coefficients that are based on the most significant bits of each of the eight spatial neighbors
and the coefficients at the coarser scale. A switch between multiple probability methods that
depend on the values of neighboring coefficients was done in [25]. The EZW coder was extended in [74] to use local coefficient contexts. The encoding of classification maps of regions
of coefficients based upon the classes of the left and parent regions was described in [46]. The
hidden Markov tree (HMT) model [29] also captures the dependencies between a parent and
its children. In this work, the statistical correlation of an unsampled wavelet coefficient with
neighbors that were already sampled is at the heart of our adaptive sensing algorithms.
We highlight some important wavelet coefficient dependencies as illustrated in Figure 6.
X is a wavelet coefficient, P is its parent at the coarser scale, and N X is a predefined set of
adjacent coefficients of X at the same subband.
Coarser scale (Level j + 1) Finer scale (Level j)
Figure 6. Set of conditioning candidates for a given coefficient X.
Recall that the mutual information between two random variables X and Y with joint
density p(x, y) and marginal densities p(x) and p(y) is defined in [68] to be
I(X; Y ) =
p(x, y) log
p(x, y)
The mutual information I(X; Y ) can be interpreted as the amount of information Y conveys
about X and vice versa. A higher value of I(X; Y ) implies a stronger dependency between X
and Y , and thus, it is easier to estimate X from Y . In practice, the random variables are discrete, and in our case the PDFs of p(x, y), p(x), and p(y) are unknown and must be estimated
from the empirical data. Studies of this problem resort to nonparametric histogram estimation of the PDFs [8], [58], [70]. The histograms are usually constructed as follows: the ranges
of X and Y are partitioned into NX and NY intervals, respectively. The histogram of (X, Y )
obtained from the empirical data is denoted as {pX,Y (i, j)| for 1 ≤ i ≤ NX , 1 ≤ j ≤ NY } and
yields an approximation to the PDF p(x, y). Similarly, the marginal distributions p(x), p(y)
are estimated.
Consider a pair of wavelet coefficients X and Y . Based on the estimated discrete proba-
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
bility distribution, we use the following estimator for the mutual information [55]:
pX,Y (i, j) log
pX,Y (i, j)
kij N
pX (i)pY (j)
ki kj
is the number of coefficient pairs in the joint histogram
such that Nij replaces
! pij , where kij!
cell (i, j), and ki = j kij and kj = i kij are the marginal distribution histogram estimates.
Tables 1–3 show averaged mutual information results for a wavelet coefficient (we use
the (9,7) wavelet basis), modeled to be dependent on a single other coefficient. The index
refers to a relative location of a neighbor with respect to the location of X. For example,
N X−1,0 , N X0,−1 correspond to the immediate upward and left-hand neighbors of X. We
see that similar results are obtained on a variety of different test images. We measured and
compared the mutual information with a single neighbor (scale, location, subband). Roughly
speaking, across the three subbands, the parent and the horizontal and vertical neighbors
exhibit relatively higher mutual information than the diagonal neighbors.
We conclude this section by noting that recent studies have adopted the mutual information estimation for an efficient characterization of statistical dependencies in an overcomplete
Table 1
Mutual information for scale = 3, horizontal subband.
Iˆ (mutual information)
N X−1,−1 )
N X−1,0 )
N X−1,1 )
N X0,−1 )
N X0,1 )
I(X; N X1,−1 )
N X1,0 )
N X1,1 )
Table 2
Mutual information for scale = 3, vertical subband.
Iˆ (mutual information)
N X−1,−1 )
N X−1,0 )
N X−1,1 )
I(X; N X0,−1 )
N X0,1 )
N X1,−1 )
N X1,0 )
N X1,1 )
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Table 3
Mutual information for scale = 3, diagonal subband.
Iˆ (mutual information)
N X−1,−1 )
N X−1,0 )
N X−1,1 )
N X0,−1 )
N X0,1 )
I(X; N X1,−1 )
N X1,0 )
N X1,1 )
wavelet representation [45] and multiscale directional transforms such as contourlets [62] and
curvelets [7].
4.2. The Stat-ACS Algorithm.
The linear predictor. To impose the statistics of wavelet coefficients exhibited by the
mutual information into the Stat-ACS algorithm, we utilize a predictor for the absolute value
of a coefficient that has not yet been sampled, using the absolute value of coefficients that
have already been sampled (or have been determined to be insignificant and not sampled). In
our experiments we used the linear predictor
0.5 |P | + 0.28 |N X0,−1 | + 0.22 |N X−1,0 | , X in horizontal or diagonal,
Predicted (|X|) =
0.5 |P | + 0.28 |N X−1,0 | + 0.22 |N X0,−1 | , X in vertical,
where N X0,−1 , N X−1,0 , are the left-hand and upward neighbors and P is the parent of the
unsampled coefficient X. The fixed set of weights in (4.4) was “learned” from a collection of
standard test images via a simple least square error minimization.
We now review in detail the way in which the joint statistical model is used in the framework of our Stat-ACS algorithm. We begin with preliminary notation: given an arbitrary
wavelet coefficient corresponding to the index (e, j + 1, (k1 , k2 )) and label “P” (as for parent), its four children at the next finer resolution correspond to the indices (e, j, 2k1 , 2k2 ),
(e, j, 2k1 + 1, 2k2 ), (e, j, 2k1 , 2k2 + 1), (e, j, 2k1 + 1, 2k2 + 1), where we label the indices
(e, j, 2k1 , 2k2 + 1) and (e, j, 2k1 + 1, 2k2 ) as “L” (as for left) and “U” (as for up), respectively. In this algorithm, the indices of the sampled coefficients found to be significant are
placed in a processing queue, together with their labels parent, left, up, or right. The labels
given to each coefficient refer to the relation between the input and output indices of the linear
predictor: a coefficient with label “P” indicates that the linear predictor attempts to estimate
the significance of its four children at the next higher resolution; a label of “L” indicates that
the linear predictor attempts to estimate the right-hand neighbor of that coefficient; and a
label of “U” indicates that the linear predictor attempts to estimate the down spatial neighbor of that coefficient. Note that a coefficient can have more than one label. The index and
value of the coefficient at the head of the processing queue are passed as input to the linear
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
predictor given in (4.4), which attempts to assess whether the related unsampled coefficients
are potentially significant.
In a manner similar to that of existing wavelet compression and denoising algorithms, the
significance of a (predicted) coefficient is determined by comparing its (predicted) absolute
value with a predetermined threshold. Thus, the threshold parameter is critical: it controls
the rate/distortion tradeoff of the algorithm: a smaller threshold will lead to more samples, revealing more significant coefficients. Once the prediction asserts that an unsampled coefficient
is with high probability significant, it is sampled by the sensing apparatus.
The algorithm scans each subband such that the most informative spatial neighbor could
be used in the processing queue. Thus, in the horizontal subband, where the left neighbor
conveys the highest information, the rows are scanned first (left-to-right order), and in the
vertical and diagonal subbands, the columns are scanned first (up-to-right order) since the up
coefficient is the most informative.
The hash table. Incorporating a contextual model into a Stat-ACS algorithm, which
operates simultaneously on the information of several coefficients (parent, left, up neighbors),
requires the use of a “sparse” data structure which supports fast “find” and “add” operations.
This is essential in ensuring that our ACS acquisition algorithm is of complexity O(n), where
n is the number of measurements, with n N (N is the number of pixels in the image). To
this end we use a hash table [28] that on average performs find and add operations in O (1)
time. During the adaptive sampling process, we store in the hash table the coefficients that
were sampled, and we later retrieve their values for the purpose of predicting the significance
of unsampled coefficients. We summarize the algorithm in the pseudocode below.
Stat-ACS acquisition algorithm.
1. Acquire the values of all low-resolution coefficients up to a certain low-resolution J;
each computation is completed using a fixed number of DMD array measurements as
in (2.6). The initial resolution J can be selected as log2 N /2 − const. In any case, J
should be bigger if the image is bigger. For images of size 512 × 512, we used J = 4.
Note that the total number of coefficients at resolutions ≥ J is 22(1−J) N , which is a
small fraction of N .
2. Initialize a processing queue, P (q), containing the indices of each of the wavelet coefficients at resolution J, with absolute value bigger than a given threshold Tthresh , and
label as a parent, “P.”
3. Until the queue P (q) is exhausted, pop out the coefficient at the head of the queue
and process as follows:
(a) If labeled “P” with index (e, j, k) and children coefficients have not been sampled
(i) Acquire them now.
(ii) If the child with index (e, j − 1, (2k1 , 2k2 + 1)) is significant, then add its index
to the head of P (q) labeled as “L.”
(b) If in the horizontal subband and labeled “L” as left of a coefficient X, find and
read from the hash table the values of “P,” the parent of X, and of “U,” the
upper neighbor of X. The values are set to zero if coefficients are not in the
= (value(“P”), value(“L”), value(“U”)) for
hash table. Fix the input vector Q
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
the linear predictor (4.4). If |predict (|X|)| ≥ Tthresh , then do the following:
(i) Sample the group of 4 coefficients to which X belongs, i.e., having the same
parent (e, j, k) .
(ii) If a coefficient is significant and at resolution j > 1, add its index to the end
of the queue, labeled as “P.”
(iii) If the coefficient with index (e, j − 1, 2k1 , 2k2 + 1) is significant, then add to
the head of the queue labeled as “L.”
(This step is similar for coefficients from the other 2 subbands.)
(c) Store the values of any sampled coefficient in the hash table.
4.3. Stat-ACS: Experimental results. A good method for evaluating the effectiveness
of our approach is to benchmark it using the optimal n-term wavelet approximation. It
is well known [34] that for a given image with N pixels, the optimal orthonormal wavelet
approximation using only n coefficients is obtained using the n largest coefficients:
# "
# "
f, ψje11,k1 ≥ f, ψje22,k2 ≥ f, ψje33,k3 ≥ · · · ,
n "
ei f, ψji ,ki ψji ,ki f −
(R2 )
= min f, ψj,k ψj,k f−
#Λ=n (e,j,k)∈Λ
L2 (R2 )
For biorthogonal wavelets this “greedy” approach gives a near-best result, i.e., within a constant factor of the optimal n-term approximation. Obviously, to find the near-best n-term
approximation one needs to compute all the wavelet coefficients and then select from them
the n coefficients with largest absolute value. Any other threshold method would also require
computing each and every coefficient and testing whether its absolute value is above a certain
threshold. Thus, any threshold method requires order N computations, whereas our ACS
algorithm is output sensitive and requires only order n computations.
To simulate our algorithm, we first precompute the entire wavelet transform of a given
image. However, we strictly follow the recipe of our ACS algorithm and extract a wavelet
coefficient from the precomputed coefficient matrix only if its index was added to the adaptive
sampling queue.
In Figure 7(a) we see a “benchmark” near-best n = 7000 term approximation of the image
Lena using a biorthogonal (9,7) wavelet, which achieves a peak signal-to-noise ratio (PSNR)
of 31.6. In Figure 7(b) we see our Stat-ACS with n = 14000 and PSNR = 30.26. Here, J = 4
and the thresholds Tthresh used were 0.28 for j = 1, 0.24 for j = 2, and 0.2 for j = 3. We see
that we have sampled twice as many as in the sparse representation to which we compare.
Typically, in an application where the output is the compressed image, the 14000 sampled
wavelet coefficients (Figure 7(b)) should be pruned, with insignificant coefficients removed
with a minimal impact on PSNR and the remaining coefficients compressed using a tree-based
wavelet compression algorithm [66], [69]. Indeed, in Figures 8(a) and 8(b) we see a 5000term and an 8000-term approximation, with PSNR = 29.35 and PSNR = 30.1, respectively,
extracted from the Stat-ACS approximation obtained in Figure 7(b).
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
(a) Benchmark, n = 262144, 7000-term, PSNR=31.6. (b) Stat-ACS, n = 14000, 14000-term, PSNR=30.26.
Figure 7. Comparison of the Stat-ACS algorithm with optimal benchmark on Lena.
(a) Stat-ACS, n = 14000, 5000-term, PSNR=29.35.
(b) Stat-ACS, n = 14000, 8000-term, PSNR=30.1.
Figure 8. Adaptive sampling with Stat-ACS with postprocessing pruning of “insignificant” coefficients.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
(a) Benchmark, n = 262144, 7000-term, PSNR=31.6. (b) Stat-ACS, n = 14000, 14000-term, PSNR=30.26.
Figure 9. Zoom-in from Figure 7 on Lena’s shoulder.
Figure 9 shows zooming in on specific edge regions from Figure 7. As Figure 9(b) illustrates, the edges were recovered quite well and are comparable to the near-best 7000-term
approximation. In Figure 10(a) we see a benchmark 7000-term biorthogonal (9,7) wavelet
approximation of the Peppers image, with PSNR = 30.8. In Figure 10(b) we see our StatACS with n = 14000, using an 8000-term approximation extracted from the 14000 adaptively
sampled wavelet coefficients with PSNR = 29.17.
4.4. Stat-ACS: Rate/distortion comparison to nonadaptive CS algorithms. In this section we compare the performance of our Stat-ACS algorithm with that of some of the prominent nonadaptive CS measurement schemes such as the Scrambled Fourier Ensemble (SFE)
[11] and the Scrambled Block Hadamard Ensemble (SBHE) [40]. The performances of the
SFE and the SBHE have been evaluated using the following reconstruction algorithms: minTV, l1 optimization solver, and iterative thresholding. Table 4 tabulates the PSNR values
for three 256 × 256 natural images, Lena, Cameraman, and Peppers. Also, results from [13]
are included, where random Fourier sampling matrices were applied directly in the wavelet
domain. The best result for each image and each number of measurements is given in boldface.
Model-based CS methods [2], [44], [48] use wavelet representation modeling in an implicit
way. The CS acquisition process is still pseudorandom, but the recovery algorithms try to
integrate the assumptions on wavelet structure, with the goal of faster and more accurate
recovery in the presence of noise. Next, we compare the performances of our Stat-ACS method
and these methods. In Table 5 we see recovery results for the 128 × 128 test image Peppers
(used in [2]) from 5000 samples. We run the Stat-ACS algorithm twice, with the Daubechies
ortho-basis D8 that was used in [2] and with our preferred basis the (9,7). The results are
reported in root mean square error (RSME) to match the reported results in the references.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
(a) Benchmark, n = 262144, 7000-term, PSNR=30.8. (b) Stat-ACS, n = 14000, 8000-term, PSNR=29.17.
Figure 10. Comparison of the Stat-ACS algorithm with optimal benchmark on Peppers.
Table 4
Comparison of Stat-ACS with nonadaptive CS algorithms.
l1 optimization
Lena (256 × 256)
Cameraman (256 × 256)
Peppers (256 × 256)
Iterative thresholding
5. Tex-ACS: An ACS algorithm based on a cartoon/texture model. In this section we
elaborate on an ACS algorithm that relies on a model of images as a mix of cartoon and local
texture patches. While the edges are somewhat structured, capturing the textural parts is
more difficult and elusive, though easily visually perceived. In general, texture is regarded as
a function of the spatial variation in pixel intensities [21].
In the wavelet representation, the significant information of texture typically manifests
at the higher frequency scales, i.e., finer resolutions with a smaller index j. The Stat-ACS
algorithm of the previous section relies on a cartoon model of the image and is not equipped to
predict significant high-frequency wavelet coefficients which correspond to texture information
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Table 5
Comparison of Stat-ACS with nonadaptive model-based methods for 128 × 128 Peppers.
CoSaMP [57]
RMSE for 5000 sam- 22.8
Model-based [2]
(D8 wavelet basis)
(D8 wavelet basis)
((9,7) wavelet basis)
from the coefficients at lower resolutions. Thus, our aim is to identify the local texture areas
and to sample wavelet coefficients of corresponding frequencies that are supported in these
We address this problem by sampling from a dictionary composed of wavelets and local
Fourier-window components over tiles of the image. The role of the local Fourier sampling is
to identify the texture information. Once we identify textural areas, we use the correlation
between Fourier frequency components and wavelet coefficients with coinciding frequencies to
subsequently sample correctly the wavelet coefficients whose support overlaps these areas. In
the experimental results, the suggested approach yields significant improvement in extracting
the texture parts.
5.1. Texture and wavelets in the frequency domain. We very roughly sketch the properties of wavelets from a Fourier side perspective (for more detailed analysis, see, e.g., [31],
[52]). In particular we focus on the (9,7) biorthogonal wavelet used in our experiments. Let
ϕ, ψ and ϕ̃, ψ̃ be two dual pairs of scaling functions and wavelets that generate biorthogonal
wavelet bases of L2 (R). Recall that the bivariate wavelet bases we use are created by a tensor
product, yielding three types of dual wavelets:
ψ̃ 1 (x) = ϕ̃(x1 )ψ̃(x2 ),
ψ̃ 2 (x) = ψ̃(x1 )ϕ̃(x2 ),
(x) = 2−j ψ̃ e 2−j x − k ,
ψ̃ 3 (x) = ψ̃(x1 )ψ̃(x2 ).
e = 1, 2, 3, j ∈ Z, k ∈ Z2 .
In the Fourier domain, the separable wavelets from (5.1) become
ˆ ), ψ̃ˆ2 (ω , ω ) = ψ̃(ω
ˆ )ϕ̃(ω
ˆ )ψ̃(ω
ˆ ).
ˆ 1 )ψ̃(ω
ˆ 2 ), ψ̃ˆ3 (ω1 , ω2 ) = ψ̃(ω
ψ̃ˆ1 (ω1 , ω2 ) = ϕ̃(ω
ˆ is concentrated around the origin and |ψ̃|
Roughly speaking, |ϕ̃|
away from the origin. Thereˆ1
fore, by (5.3), |ψ̃ (ω1 , ω2 )| is large at low horizontal frequencies ω1 and high vertical frequencies
ω , |ψ̃ˆ2 (ω , ω )| is large at high frequencies ω and low vertical frequencies ω , and |ψ̃ˆ3 (ω , ω )|
is large at high horizontal and vertical frequencies . According to the dominant frequencies
computed for the wavelet in scale j = 0, one can obtain the corresponding dominant frequencies at scale j by the relation
ψ̂j,k (ω) = 2j ψ̂ e (2j ω) .
Recall the Plancherel formula
f, ψ̃j,k
1 ˆ ˆe
f , ψ̃j,k .
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
We wish to examine the amount of oscillatory activity exhibited in the image f over an
arbitrary local area defined by a cube Q:
fQ (x1 , x2 ) =
f (x1 , x2 ) if (x, y) ∈ Q,
Intensive oscillatory activities may indicate that the support of Q contains texture and other
important features at fine scales. If the support of the wavelet coefficient ψ̃j,k
support(ψ̃j,k ) ⊆ Q, then
= fQ , ψ̃j,k
f, ψ̃j,k
1 ˆ ˆe
fQ , ψ̃j,k .
Although the (9,7) wavelet is not compactly supported in the frequency domain, it is “essentially” compactly supported in the frequency domain, and by (5.4) its dilates have “peaks”
at the corresponding frequencies. Thus, if, locally, f contains corresponding frequencies, then
we should expect some of the coefficients of wavelets that have essential support around that
frequency to have large absolute value.
In practice, we estimate the “energy” of the significant local frequencies by sampling
discrete cosine transform (DCT) coefficients. The DCT decomposes a local block of an image
over [x1 , x1 + Ñ ] × [x2 , x2 + Ñ ] in the frequency domain
−1 Ñ
f (x1 + n1 , x2 + n2 ) cos
(5.8) fˆ(k1 , k2 ) =
n1 =0 n2 =0
n1 +
n2 +
k2 .
Since the DCT is a linear, real-valued transform, we can compute each DCT coefficient as a
combination of two positive functionals which correspond to two DMD measurements, as in
5.2. The Tex-ACS algorithm. This algorithm is a second step that is applied after the
Stat-ACS. For simplicity, we assume that the input image to be processed is of dyadic dimensions, N1 × N1 = N . We subdivide the image into blocks of 16 × 16 pixels each, {Ωi,l },
1 ≤ i, l ≤ N1 /16. Over each patch Ωil , we compute using (5.8), with Ñ = 16, the three
DCT coefficients fˆΩil (1, 8), fˆΩil (8, 1), fˆΩil (8, 8), which correspond to the dominant frequencies
of the horizontal, vertical, and diagonal wavelet coefficients, respectively, at scale j = 2. This
implies that the second step is initialized by taking about 0.012 × N12 local frequency samples.
If any of the three DCT coefficients fˆΩil (k1 , k2 ) satisfies |fˆΩil (k1 , k2 )| ≥ tj for a large threshold
tj , then we sample the coefficients of wavelets at the corresponding scale and subband whose
support significantly overlaps the block Ωi,l . Also, if the frequency component fˆΩil (k1 , k2 ) is
significant, it is also likely that significant texture information exists at adjacent scales. Thus
in such a case we sample the frequency componentsfˆΩil (k1 /2 , k2 /2) and fˆΩil (2k1 , 2k2 ) to
test oscillatory activities in the adjacent frequency domains. If |fˆΩil (k1 /2 , k2 /2)| ≥ tj , or
|fˆΩil (2k1 , 2k2 )| ≥ tj , for a scale dependent threshold tj , then we subsequently also sample the
corresponding wavelet coefficients of scale, compact support, and subband.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
The Tex-ACS algorithm is implemented through two separate stages. The first stage,
based on the Stat-ACS algorithm, acquires wavelet coefficients which correspond to the cartoon areas. The second stage identifies texture areas using DCT sampling and subsequently
acquires fine-scale wavelet coefficients at locations that with high probability contain textures.
The Tex-ACS algorithm can be summarized as follows.
Tex-ACS algorithm. Adaptive sampling for a cartoon-texture image based
Sampling of the cartoon parts. Acquire significant wavelet coefficients according to
the Stat-ACS algorithm from section 4.2.
Sampling of the texture parts. Process each block Ωi,l , 1 ≤ i ≤ N1 /16, 1 ≤ l ≤ N1 /16,
as follows:
1. Compute the DCT frequency components, fˆΩi,l (k1 , k2 ), (k1 , k2 ) = {(1, 8), (8, 1), (8, 8},
to detect oscillatory activities at the three subbands of scale j = 2.
2. If |fˆΩil (k1 , k2 )| ≥ tj , then
(a) Sample the wavelet coefficients at the corresponding compact support and subband.
(b) Compute the frequency components fˆΩil (2k1 , 2k2 ) and fˆΩil (k1 /2, k2 /2) in order to detect oscillatory activities at scales j = 1 and j = 3, respectively. If
|fˆΩil (2k1 , 2k2 )| ≥ tj or |fˆΩil (k1 /2 , k2 /2)| ≥ tj , then sample the wavelet coefficients in the corresponding scale, compact support, and subband.
5.3. Tex-ACS: Experimental results. As in the case of the Stat-ACS algorithm, we use
the near-best n-term wavelet approximation as a benchmark. In addition, we compare the
efficiency of the Tex-ACS algorithm with our Stat-ACS algorithm from section 4. We test
the Tex-ACS algorithm on the Barbara test image, which is an excellent representative of an
image that exhibits a mix of cartoon and texture patches. It is evident (illustrated in Figure
12) that the Stat-ACS algorithm based on the cartoon model does not yield good results with
regard to the recovery of the texture parts in the image. Moreover, increasing the number
of measurements does not yield significant improvement visually or in terms of PSNR. The
outcome of the Tex-ACS algorithm (illustrated in Figure 13) shows a significant improvement
in terms of PSNR and succeeds in reconstructing texture patches in the image.
In Figure 11(a) we see the Barbara input image. Figure 11(b) depicts a benchmark nearbest 12419-term biorthogonal (9,7) wavelet approximation, extracted from the full wavelet representation with PSNR = 27.76. In Figure 12 we show the results of the Stat-ACS algorithm
with 12500 terms and PSNR = 24.55, obtained after pruning the insignificant coefficients
from 26000 samples. In Figure 13 we see the results using our Tex-ACS algorithm with 12500
terms and PSNR = 25.11, obtained after pruning from 26000 measurements. Note that when
we state 26000 samples, this total is composed of both local DCT and wavelet coefficients. In
Figures 14 and 15 we zoom in on some textural areas of the result in Figure 13.
We note that the Tex-ACS algorithm works well on Barbara since the texture parts are
composed of stripes and checkers that are relatively well modeled by local DCT coefficients.
Indeed, it is well known that local Fourier methods usually capture texture in this manner.
Possibly, in other cases, we will need to create a different sampling dictionary composed of
systems in which we believe the signal will be sparse.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
(a) Barbara test image.
(b) Benchmark, N = n = 262144, 12419-term, PSNR=
Figure 11. Benchmark wavelet approximation of Barbara using 12419 terms.
Figure 12. n = 26000, Stat-ACS (cartoon model based), approximation using 12500 most significant wavelet
coefficients, PSNR = 24.55.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Figure 13. n = 26000, TeX-ACS, approximation using 12500 most significant wavelet coefficients, PSNR =
(a) Zoom in on the chair.
(b) Zoom in on the scarf and legs.
Figure 14. Results from Figure 13.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
(a) Zoom in on the scarf.
(b) Zoom in on the tablecloth.
Figure 15. Results from Figure 13.
6. Conclusions. In this work we proposed a direct and fast method for adaptive transform
coefficient acquisition, which relies on statistical modeling of images. While our algorithm
uses the same order of measurements as nonadaptive CS methods, the main advantages of
our direct approach are that no complex recovery algorithm is needed and that it allows more
control over the compressed image quality. Our experimental results show, by way of software
simulations, that our adaptive algorithms perform well in terms of image quality and speed.
One of the possible hardware architectures on which such a scheme could be implemented is
the DMD, although more work is required to fully analyze the influence of the DMD accuracy
on the proposed method.
Acknowledgment. We sincerely thank all the referees for their patience with us and their
numerous comments that have significantly improved the quality of the paper.
[1] R. Baraniuk, Compressive sensing, IEEE Signal Processing Mag., 24 (2007), pp. 118–120.
[2] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, Model-based compressive sensing, IEEE Trans.
Inform. Theory, 56 (2010), pp. 1982–2001.
[3] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry
property for random matrices, Constr. Approx., 28 (2008), pp. 253–263.
[4] R. Berinde and P. Indik, Sparse Recovery Using Sparse Random Matrices, Technical report, Computer
Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, 2008.
[5] T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Appl. Comput.
Harmon. Anal., 27 (2009), pp. 265–274.
[6] J. Bobin, J.-L. Starck, and R. Ottensamer, Compressed sensing in astronomy, IEEE J. Sel. Topics
Signal Process., 2 (2008), pp. 718–726.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
[7] L. Boubchir and M. Fadili, Multivariate statistical modeling of images with the curvelet transform,
in Proceedings of the 8th IEEE International Conference on Signal Processing and Its Applications,
2005, pp. 747–750.
[8] R. W. Buccigrossi and E. P. Simoncelli, Image compression via joint statistical characterization in
the wavelet domain, IEEE Trans. Image Process., 8 (1999), pp. 1688–1701.
[9] E. Candès, Compressive sampling, in Proceedings of the International Congress of Mathematics, 2006,
pp. 1433–1452.
[10] E. Candès and D. Donoho, New tight frames of curvelets and optimal representations of objects with
piecewise singularities, Comm. Pure Appl. Math., 57 (2004), pp. 219–266.
[11] E. Candès and J. Romberg, Sparsity and incoherence in compressive sampling, Inverse Problems, 23
(2007), pp. 969–985.
[12] E. Candès and J. Romberg, Robust signal recovery from incomplete observations, in Proceedings of the
IEEE International Conference on Image Processing, 2006, pp. 1281–1284.
[13] E. Candès and J. Romberg, Practical signal recovery from random projections, in Wavelet Applications
in Signal and Image Processing XI, Proceedings of the SPIE 5914, 2004.
[14] E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from
highly incomplete frequency information, IEEE Trans. Inform. Theory, 52 (2006), pp. 489–509.
[15] E. Candès, J. Romberg, and T. Tao, Signal recovery from incomplete and inaccurate measurements,
Comm. Pure Appl. Math., 59 (2005), pp. 1207–1223.
[16] E. Candès and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n, Ann.
Statist., 35 (2007), pp. 2313–2351.
[17] E. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory, 51 (2005),
pp. 4203–4215.
[18] E. Candès and M. Wakin, An introduction to compressive sampling, IEEE Signal Processing Mag., 25
(2008), pp. 21–30.
[19] W. Carey, D. Chuang, and S. Hemami, Regularity-preserving interpolation, IEEE Trans. Image Process., 8 (1999), pp. 1293–1297.
[20] S. G. Chang, Z. Cvetkovi’c, and M. Vetterli, Resolution enhancement of images using wavelet
transform extrema extrapolation, in Proceedings of the IEEE International Conference on Acoustics,
Speech, and Signal Processing, 1995, pp. 2379–2382.
[21] C. Chen, L. Pau, and P. Wang, Handbook of Pattern Recognition and Computer Vision, World Scientific, Singapore, 1993.
[22] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J.
Sci. Comput., 20 (1998), pp. 33–61.
[23] Z. Chen and M. A. Karim, Forest representation of wavelet transform and feature detection, Opt. Eng.,
39 (2000), pp. 1194–1202.
[24] H. Choi and R. Baraniuk, Multiscale texture segmentation using wavelet-domain hidden Markov models,
in Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers,
Vol. 2, Pacific Grove, CA, 1998, pp. 1692–1697.
[25] C. Chrysafis and A. Ortega, Efficient context-based lossy wavelet image coding, in Proceedings of the
Conference on Data Compression, Snowbird, UT, IEEE Computer Society, Washington, DC, 1997.
[26] A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term, J. Amer. Math. Soc.,
22 (2009), pp. 211–231.
[27] R. Coifman, F. Geshwind, and Y. Meyer, Noiselets, Appl. Comput. Harmon. Anal., 10 (2001),
pp. 27–44.
[28] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed., MIT Press,
Cambridge, MA, 2009.
[29] M. Crouse, R. Nowak, and R. Baraniuk, Wavelet-based statistical signal processing using hidden
Markov models, IEEE Trans. Signal Process., 46 (1998), pp. 886–902.
[30] W. Dai and O. Milenkovic, Subspace pursuit for compressive sensing: Closing the gap between performance and complexity, IEEE Trans. Inform. Theory, 55 (2009), pp. 2230–2249.
[31] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992.
[32] S. Dekel, Adaptive Compressed Image Sensing Based on Wavelet-Trees, GE Healthcare technical report,
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
[33] S. Deutsch, A. Averbuch, and S. Dekel, Adaptive compressed image sensing based on wavelet modeling and direct sampling, in Proceedings of the 8th International Conference on Sampling Theory and
Applications, 2009.
[34] R. DeVore, Nonlinear approximation, in Acta Numerica, Acta Numer. 7, Cambridge University Press,
Cambridge, UK, 1998, pp. 51–150.
[35] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289–1306.
[36] D. Donoho, For most large underdetermined systems of linear equations the minimal 1-norm solution is
also the sparsest solution, Comm. Pure Appl. Math., 59 (2006), pp. 797–829.
[37] D. Donoho, I. Drori, Y. Tsaig, and J. L. Starck, Sparse Solution of Underdetermined Linear
Equations by Stagewise Orthogonal Matching Pursuit, preprint, Department of Statistics, Stanford
University, Stanford, CA, 2006.
[38] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, Single-pixel
imaging via compressive sampling, IEEE Signal Processing Mag., 25 (2008), pp. 83–91.
[39] M. Duarte, M. Wakin, and R. Baraniuk, Fast reconstruction of piecewise smooth signals from random
projections, in Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured
Representations, Rennes, France, 2005.
[40] L. Gan, T. Do, and T. Tran, Fast compressive imaging using scrambled Hadamard transform ensemble,
in Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008), 2008.
[41] O. Gonzalez and J. Maddocks, Global curvature, thickness, and the ideal shapes of knots, Proc. Natl.
Acad. Sci. USA, 96 (1999), pp. 4769–4773.
[42] O. Gonzalez, J. Maddocks, and J. Smutny, Curves, circles, and spheres, in Physical Knots: Knotting,
Linking, and Folding Geometric Objects in R3 , AMS, Providence, RI, 2002, pp. 195–215.
[43] H. W. Guggenheimer, Differential Geometry, Dover, New York, 1977.
[44] L. He and L. Carin, Exploiting structure in wavelet-based Bayesian compressive sensing, IEEE Trans.
Signal Process., 57 (2009), pp. 3488–3497.
[45] K. Huang and S. Aviyente, Mutual information based subband selection for wavelet packet based image
classification, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, Vol. 2, Philadelphia, PA, 2005, pp. 241–244.
[46] R. L. Joshi, H. Jafarkhani, J. H. Kasner, T. R. Fischer, N. Farvardin, M. W. Marcellin, and
R. H. Bamberger, Comparison of different methods of classification in subband coding of images,
IEEE Trans. Image Process., 6 (1997), pp. 1473–1486.
[47] R. Kawakami, J. Wright, Y. Tai, Y. Matsushita, M. Ben-Ezra, and K. Ikeuchi, High-resolution
hyperspectral imaging via matrix factorization, in Proceedings of the Conference on Computer Vision
and Pattern Recognition (CVPR), 2011, pp. 2329–2336.
[48] C. La and M. Do, Signal reconstruction using sparse tree representation, in Wavelets XI, Proceedings
of SPIE, Vol. 5914, San Diego, CA, 2005.
[49] J. Liu and P. Moulin, Information-theoretic analysis of interscale and intrascale dependencies between
image wavelet coefficients, IEEE Trans. Image Process., 10 (2001), pp. 1647–1658.
[50] M. Lustig, D. Donoho, and J. Pauly, Sparse MRI: The application of compressed sensing for rapid
MR imaging, Magn. Reson. Med., 58 (2007), pp. 1182–1195.
[51] D. Malioutov, S. Sanghavi, and A. Willsky, Sequential compressed sensing, IEEE J. Sel. Topics
Signal Process., 4 (2010), pp. 435–444.
[52] S. Mallat, A Wavelet Tour of Signal Processing. The Sparse Way, 3rd ed., Academic Press, Burlington,
MA, 2009.
[53] S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans.
Pattern Anal. Mach. Intell., 11 (1989), pp. 674–693.
[54] S. Mallat and W. L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inform.
Theory, 38 (1992), pp. 617–642.
[55] R. Moddemeijer, On estimation of entropy and mutual information of continuous distributions, Signal
Process., 16 (1989), pp. 233–248.
[56] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput., 24 (1995), pp. 227–
[57] D. Needell and J. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,
Appl. Comput. Harmon. Anal., 26 (2009), pp. 301–321.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
[58] L. Paninski, Estimation of entropy and mutual information, Neural Comput., 15 (2003), pp. 1191–1253.
[59] A. Pentland and B. Horowitz, A practical approach to fractal-based image compression, in Digital
Images and Human Vision, A. B. Watson, ed., MIT Press, Cambridge, MA, 1993, pp. 53–59.
[60] A. Pentland, E. Simoncelli, and T. Stephenson, Fractal-Based Image Compression and Interpolation, U.S. Patent 5, 148, 497, Sept. 15, 1992.
[61] Y. Pfeffer and M. Zibulevsky, A Micro-Mirror Array Based System for Compressive Sensing of
Hyperspectral Data, Technical report CS-2010-01, Department of Computer Science, Technion-Israel
Institute of Technology, Haifa, Israel, 2010.
[62] D. D.-Y. Po and M. N. Do, Directional multiscale modeling of images using the contourlet transform,
IEEE Trans. Image Process., 15 (2006), pp. 1610–1620.
[63] R. Rinaldo and G. Calvagno, Image coding by block prediction of multiresolution subimages, IEEE
Trans. Image Process., 4 (1995), pp. 909–920.
[64] J. Romberg, H. Choi, and R. Baraniuk, Bayesian tree-structured image modeling using waveletdomain hidden Markov models, IEEE Trans. Image Process., 10 (2001), pp. 1056–1068.
[65] F. Rooms, A. Pizurica, and W. Philips, Estimating image blur in the wavelet domain, in Proceedings
of the IEEE Benelux Signal Processing Symposium (SPS), 2002.
[66] A. Said and W. Pearlman, A new fast and efficient image codec based on set partitioning in hierarchical
trees, IEEE Trans. Circuits Syst. Video Technol., 6 (1996), pp. 243–250.
[67] E. Schwartz, A. Zandi, and M. Boliek, Implementation of compression with reversible embedded
wavelets, in Proceedings of SPIE 2564, 1995.
[68] C. E. Shannon, A mathematical theory of communication, Bell System Tech. J., 27 (1948), pp. 379–423,
[69] J. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Process.,
41 (1993), pp. 3445–3462.
[70] E. Simoncelli, Modeling the joint statistics of images in the wavelet domain, in Proceedings of SPIE
3813, 1999, pp. 188–195.
[71] K. Timmermann and R. Nowak, Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging, IEEE Trans. Inform. Theory, 45 (1999), pp. 846–862.
[72] J. Tropp and A. Gilbert, Signal recovery from partial information via orthogonal matching pursuit,
IEEE Trans. Inform. Theory, 53 (2007), pp. 4655–4666.
[73] Y. Tsaig and D. Donoho, Extensions of compressed sensing, Signal Process., 86 (2006), pp. 549–571.
[74] X. Wu and J. Chen, Context modeling and entropy coding of wavelet coefficients for image compression,
in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing,
Munich, Germany, 1997, pp. 3097–3100.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF