Adaptive compressed image sensing using dictionaries

c 2012 Society for Industrial and Applied Mathematics SIAM J. IMAGING SCIENCES Vol. 5, No. 1, pp. 57–89 Adaptive Compressed Image Sensing Using Dictionaries∗ Amir Averbuch†, Shai Dekel‡, and Shay Deutsch§ Abstract. In recent years, the theory of compressed sensing has emerged as an alternative for the Shannon sampling theorem, suggesting that compressible signals can be reconstructed from far fewer samples than required by the Shannon sampling theorem. In fact the theory advocates that nonadaptive, “random” functionals are in some sense optimal for this task. However, in practice, compressed sensing is very difficult to implement for large data sets, particularly because the recovery algorithms require significant computational resources. In this work, we present a new alternative method for simultaneous image acquisition and compression called adaptive compressed sampling. We exploit wavelet tree structures found in natural images to replace the “universal” acquisition of incoherent measurements with a direct and fast method for adaptive wavelet tree acquisition. The main advantages of this direct approach are that no complex recovery algorithm is in fact needed and that it allows more control over the compressed image quality, in particular, the sharpness of edges. Our experimental results show, by way of software simulations, that our adaptive algorithms perform better than existing nonadaptive methods in terms of image quality and speed. Key words. compressed sensing, adaptive approximation, nonlinear approximation, wavelet trees AMS subject classifications. 65T60, 65Y10, 68U10 DOI. 10.1137/110820579 1. Introduction. The Shannon sampling theory is at the core of nearly all signal acquisition protocols. The classic Shannon theorem states that the sampling rate, i.e., the Nyquist rate, must be at least twice the maximum frequency present in the signal (see, e.g., [52]). However, the Nyquist rate is too high and results in a massive data acquisition that must be compressed in order to be either stored or transmitted. In addition, there are important and emerging applications in which high sampling rates are very expensive or limited by physical or physiological constraints. In recent years, a novel approach for signal sampling, known as compressed sensing (CS) [1], [9], [18], [14], [35], has emerged. CS is an alternative to the Nyquist rate for the acquisition of “sparse” signals. Instead of uniformly sampling the signal at Nyquist rate, in the CS paradigm one is given a budget of n independent “questions” to ask about the signal x ∈ RN , with n N . These questions take the form of n linear functionals φi , ·, asking for the values yi = φi , x, 1 ≤ i ≤ n. The vectors φi ∈ RN form the sensing matrix Φ = (φ1 , φ2 , . . . , φn ), such that y = Φx, where the dimension of Φ is n × N . In the next stage, a numerical optimization method is used to reconstruct the full-length signal from the small amount of ∗ Received by the editors January 10, 2011; accepted for publication (in revised form) October 19, 2011; published electronically January 24, 2012. http://www.siam.org/journals/siims/5-1/82057.html † School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel ([email protected]). ‡ GE Healthcare, Herzeliya 46733, Israel ([email protected]). § Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel ([email protected]). 57 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 58 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH collected data. The CS paradigm relies on two key properties: sparsity and incoherency. A signal x ∈ RN is k-sparse in a transform basis Ψ = [ψ1 , ψ2 , . . . , ψN ] if the number of nonzero coefficients {θi = x, ψi } is relatively small k N . It is compressible if the number of significant (e.g., absolute value above some threshold) coefficients is relatively small. The basis can be, for example, a wavelet basis, a Fourier basis, or a local Fourier basis (see, e.g., [31], [52]), depending on the application. There are also the tight frames such as curvelets [10] or contourlets [62] that incorporate directional information, thereby allowing a sparser representation of curve singularities. Incoherency, which is concerned with the acquisition modality, states that the sensing waveforms should have an extremely dense representation in Ψ. Based on these two properties, one can design sampling protocols that capture useful information content embedded in a sparse signal and condense it into small data amount. A related condition to incoherency, which also has proved to be very useful in studying the general robustness of CS, is the so-called restricted isometry property (RIP) [14], [17]. Given a measurement matrix Φ (and a basis Ψ), the RIP can provide a sufficient condition for a stable solution for both k-sparse and compressible (in the basis Ψ) signals. In other words, the matrix Φ preserves the information in a sparse or compressible signal if it satisfies the RIP condition. In general, constructing a measurement matrix that satisfies the RIP condition for large k and N is a hard combinatorial problem [56]. However, the RIP condition can be achieved for a large class of randomized matrices with a high probability [14], [36]. In scenarios where there is no noise present, the recovery of an unknown signal x from the measurement vector y would ideally be achieved by searching for the l0 -sparsest representation that agrees with the measurements (1.1) θ̂ = arg min θ0 subject to y = ΦΨ−1 θ. It is known [26] that if x is a k-sparse vector, then with Ψ = I, one can design a sampling matrix Φ of only n = 2k measurements and recover x via l0 minimization. Unfortunately, the l0 minimization problem leads to a daunting NP-complete combinatorial optimization problem [56]. Instead of solving the l0 minimization problem, nonadaptive CS theory seeks to solve the “closest possible” tractable minimization problem, i.e., the l1 minimization (1.2) θ̂ = arg min θ1 subject to y = ΦΨ−1 θ. This modification, known as Basis Pursuit [22], leads to a much simpler convex problem, which approximates the original problem and can be solved by various classical optimization techniques. But at the same time, minimizing the l1 norm finds a sparse approximation because it prevents diffusion of the signal’s energy over a large number of coefficients. It typically requires a number of n = O (k log(N/k)) measurements to robustly recover k-sparse and compressible signals. One can adjust (1.2) to deal with cases where x or y is corrupted with additive noise by replacing the equality y = ΦΨ−1 θ with an approximation constraint [22], [16]. Another such algorithm is the minimization of the total variation with quadratic constraints that solves the following problem: (1.3) min IT V subject to ΦxI − y2 ≤ ε, I Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING where IT V = 59 (I(i + 1, j) − I(i, j))2 + (I(i, j + 1) − I(i, j))2 , i,j xI is the vector created from a digital image I by concatenating its rows, and ε depends on the noise level. In general, solving the total variation problem (1.3) provides a better image quality for image reconstruction at the expense of greater time complexity. Although different convex optimization algorithms such as Basis Pursuit are optimal in the sense that they provide a global minimum, they involve expensive computations when applied to large-scale signals. Therefore a second approach, using iterative greedy methods, has been proposed. Examples include Matching Pursuit [39], Orthogonal Matching Pursuit [72], Tree Matching Pursuit (TMP) [48], StOMP [37], Iterative Hard Thresholding Pursuit [5], Subspace Pursuit [30], and CoSaMP [57]. Let us focus on one of the most promising implications of CS derivatives: the enabling and design of new types of compressed imaging (CI). One such device, which implements optical CS, is the single-pixel camera [38] inspiring a wide interest in CI applications. The single-pixel camera relies on the use of a digital micromirror device (DMD), which is composed of a grid of mirrors, where each mirror of the array is suspended above an individual static random access memory (SRAM) cell (see section 2.2). We also note that there are experimental CI algorithms, designed for measurements taken in the frequency domain in medical imaging [50]. However, the design of an efficient and optimal measurement basis in such a system is still a challenging problem. Although theoretically very powerful, there is still a huge gap between the CS theory and imaging applications. We note a few of these key difficulties: 1. Control over image quality of the compressed image. From the theory of wavelet approximation, it is clear that the error of n-term wavelet approximation is characterized by the “weak-type smoothness” of the image as a function in a Besov space [34]. In terms of image representation, more wavelet coefficients are needed to achieve a specified level of quality if the image contains higher visual activity such as edges and texture parts. Typically, however, existing CS architectures are not adaptive, and the number of measurements is determined before the acquisition process begins, with no feedback during the acquisition process on the improved quality. Yet, there exists work on a “sequential CS” paradigm [51], where the sampling procedure is composed of iterations of acquisition of random projections, reconstruction, testing for convergence, and then acquisition of more random projections if needed. The main challenge with this scheme is that the recovery algorithm needs to be applied at each iteration; thus the total complexity of the acquisition algorithm cannot be of order O (n). 2. Computationally intensive sampling process. Constructing an efficient and fast sampling operator for nonadaptive CS remains a challenging problem. On the one hand, the CS theory shows that random measurement (i.e., unstructured) systems are in some sense an optimal strategy for acquiring sparse signals. However, sampling operators of random binary patterns are by nature dense matrices that require huge memory allocations and intensive computations for large values of N and n. It is impractical even for an image of size 256 × 256. Some of these practical challenges have already been somewhat addressed. The Scrambled Fourier Ensemble [12] was proposed for an Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 60 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH MRI application and is often employed in CS research because of its fast implementation. Other researchers [4] developed binary sparse measurement matrices with low complexity and near-optimal performance. However, these operators are incoherent only with identity matrices. The Block Hadamard Ensemble [40] was designed to offer a practical measurement operator for implementations. 3. Computationally intensive reconstruction algorithm. Although the algorithms designed to solve the l1 minimization (1.2) are proved to be theoretically efficient, i.e., with polynomial complexity bounds, whenever the problem has a large scale, it requires heavy computations. In this work, we introduce an architecture that, in accordance with the CS paradigm, aims at providing an acquisition process that captures n measurements with n N and n = O (k), where N is the dimension of the full high-resolution image which is assumed to be k-sparse. However, in practice, it improves upon the existing CS approach by additionally achieving the following: 1. Adaptivity to the image content. The acquisition algorithm is still of complexity O (n), but it adaptively takes more measurements if needed to achieve some compressed image target quality. 2. A fast and efficient computational framework. The algorithm does not require the computational intensive CS recovery algorithm. The decoding process has the computational complexity of existing algorithms in use today such as JPEG or JPEG2000 decoding. The algorithms described in this paper were inspired by the CS idea of a sensing process that acquires a relatively small number of samples from which a sparse signal can be reconstructed. However, they depart fundamentally from the classic (nonadaptive) CS mathematical framework. Instead of acquiring the visual data using functionals that are random in nature and incoherent with the sparse basis, e.g., wavelets, we suggest a simple, direct adaptive, and fast method for the acquisition of transform coefficients. We show that it is possible to use devices such as the DMD array in a single-pixel camera for the computation of arbitrary transform coefficients, where the computation of each coefficient requires a fixed number of measurements (see section 2.2). With this tool in hand, instead of asking n universal, nonadaptive questions about the signal in a form of linear functionals, our method takes linear measurements, directly in a sparse and structured basis, with each measurement based on the previous measurements. This allows us to take advantage of the geometric structure of the sparse representation explicitly, such as imposing the modeling of wavelet tree structures over image edges to gather its significant information. The sampling process is initialized by acquiring a very small number of low-resolution measurements. Then, based on this initial data, by utilizing specific properties of the wavelet structure, our algorithm adaptively extracts the significant information by using a near-minimal number of measurements n N , n = O(k). This significant information, i.e., wavelet coefficients which correspond to image edges, is gathered based only on the subset of wavelet coefficients that were previously sampled and on an established set of rules. Therefore, our method is called Adaptive Compressed Sampling (ACS). Since the ACS algorithm takes only linear sampling directly in the wavelet domain, there is no need for the standard CS recovery process from “pseudorandom” measurements. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 61 The outline of the paper is as follows: in section 2 we recall the basic properties of wavelets and explain how the DMD architecture can be potentially used to acquire any transform coefficients in O (1). In section 3 we provide the ACS version of an exact reconstruction result for cartoon images using a wavelet domain sampling scheme. Using basic tools from differential geometry, we prove that our ACS scheme “efficiently” samples all the nonzero wavelet coefficients of cartoon images, which correspond to the edges, thereby providing exact recovery with “optimal” (up to a constant) sampling. In section 4 we review the Stat-ACS, an adaptive sampling algorithm that applies wellknown statistical models of wavelet representations of images. Modeling of wavelet trees over image edges has already been used in the context of CS [2], [44], [48] but in a different and implicit way to speed up a reconstruction algorithm (1.2), as well as to speed up the numeric solution and to reduce the number of measurements. In particular in [2] it was shown that tree-sparse signals can be sampled using traditional CS with O (k) samples. In contrast, we use adaptively statistical tree models in a direct and explicit way, namely, at the time of the acquisition. The Stat-ACS algorithm is of higher complexity than the algorithm we presented in [33], but is more efficient in predicting significant wavelet coefficients and hence provides a better rate/distortion performance. We also provide comparison results with nonadaptive CS methods. Finally, in section 5, we present a more advanced algorithm, the Tex-ACS, which addresses the model of images as a mix of cartoon and local texture patches. The Tex-ACS algorithm uses a dictionary composed of local Fourier and wavelet basis functions. Experimental results presented for both Stat-ACS and Tex-ACS clearly show that the adaptive form of CS provides state-of-the-art results in terms of higher reconstruction quality for the same number of samples. 2. Direct sampling of transform coefficients. 2.1. The wavelet tree structure. Recall that the univariate wavelet system (e.g., [31], [52]) is a family of real functions ψj,k : (j, k) ∈ Z2 in L2 (R), built by dilating and translating a unique mother wavelet function ψ(x) (2.1) ψj,k = 2−j/2 ψ(2−j x − k), where the mother wavelet satisfies ψ(x)dx = 0. (2.2) R The wavelet system is a basis of L2 (R), where for each f ∈ L2 (R), f, ψ̃j,k ψj,k , (2.3) f= j,k where ψ̃ is a dual of ψ. For special choices of ψ, the set {ψj,k } forms an orthonormal basis for L2 (R), and then ψ = ψ̃. Wavelets are usually constructed from a multiresolution analysis (MRA), which is a sequence of closed subspaces of L2 (R), . . . V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ V−2 . . . , ∩Vj = {0} , ∪j Vj = L2 (R). Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 62 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH Usually, one starts the construction from a scaling function (generator) ϕ ∈ L2 (R) that satisfies a two-scale equation ϕ= ak ϕ (2 · −k). k In the orthonormal case, one then sets Vj = span ϕj,k := 2−j/2 ϕ 2−j · −k : k ∈ Z and constructs ψ such that Wj := span {ψj,k : k ∈ Z}, with Vj+1 ⊕ Wj+1 = Vj . A classic example for an orthonormal system is the Haar scaling function and wavelet: ⎧ x ∈ 0, 12 , ⎨ 1, 1, x ∈ [0, 1] , (2.4) ϕ (x) := ψ (x) := −1, x ∈ 12 , 1 , 0 else, ⎩ 0 otherwise. To obtain smooth symmetric wavelets of compact support one needs to construct biorthogonal wavelets, where ψ = ψ̃. This is achieved by using a dual set of scaling functions and MRAs [31]. The wavelet model can be easily generalized to any dimension [31], [52] via a tensor product of the wavelet and the scaling functions. Assume that the univariate wavelet ψ and the scaling function ϕ are given. Then, a bivariate separable basis is constructed using three basic wavelets ψ 1 (x1 , x2 ) = ϕ(x1 )ψ(x2 ), ψ 2 (x1 , x2 ) = ψ(x1 )ϕ(x2 ), ψ 3 (x1 , x2 ) = ψ(x1 )ψ(x2 ). 2 The bivariate transform represents an image f ∈ L2 (R ), in terms of the three wavelet 1 discrete 2 3 functions ψ , ψ , ψ : e e wj,k ψj,k , (2.5) f= j∈Z,e=1,2,3 e } are coefficients. The wavelet decomposition can thus be interpreted as a signal where {wj,k decomposition in a set of three spatially oriented frequency subbands: LH(e = 1) detects horizontal edges, HL (e = 2) detects vertical edges, and HH (e = 3) detects diagonal edges. e at a scale j represents the information about the image in the A wavelet coefficient wj,k spatial region in the neighborhood of 2j k, k ∈ Z2 . At the next finer scale j −1, the information about this region is represented by four wavelet coefficients, which are described as the children e . This leads to a natural tree structure organized in a quad tree structure of each of the of wj,k three subbands as shown in Figure 1. Note that for each finer scale, the coefficients represent a smaller spatial area of the image at higher frequencies. As j decreases, the child coefficients add finer and finer details into the spatial regions occupied by their ancestors. Wavelet representations are considered very efficient for image compression: the edges constitute a small portion of a typical image, and a wavelet coefficient is large only if edges are present within the support of the wavelet. Consequently, the image can be approximated well using a few large wavelet coefficients. A significant statistical structure also follows: large/small values of wavelet coefficients tend to propagate through the scales of the quad trees (see section 4.1 for details). As an example, a sparse wavelet representation of the 512 × 512 image Lena and a compressed version of this image are shown in Figure 2, where Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 63 Figure 1. Wavelet coefficient tree structure across the subbands (MRA decomposition). (a) 10,179 significant wavelet coefficients (b) Compressed image 1:23 Figure 2. (a) A sparse wavelet representation of an image. Black—significant coefficient; white— insignificant coefficient. (b) Compressed JPEG2000 image based on the sparse representation of (a). the compression algorithm was based on the sparse representation. The figure clearly depicts that the significant wavelet coefficients (coefficients with relatively large absolute value) are located on strong edges of the image. 2.2. Direct sampling of transform coefficients using digital micromirror arrays. Today’s digital camera devices are in the megapixel range thanks to the introduction of charged coupled devices (CCDs) and complementary metal oxide semiconductor (CMOS) digital technology. Assume that I is a digital image consisting of N pixels that we wish to acquire. Some digital cameras perform the acquisition using an array of N CCDs after exposure to the optical image. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 64 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH The charges on its pixels are read row by row, are fed to an amplifier, and then undergo an analog-to-digital conversion. CMOS technology allows today’s professional cameras to acquire about 12 million pixels (3000× 4000). Once the digital image I has been acquired, it is usually compressed using standard compression algorithms such as JPEG and JPEG2000. In most digital cameras, the user has the capability to control the compression/quality tradeoff through the compression algorithm. We now describe the digital micromirror device (DMD) hardware architecture, depicted in Figure 3, that was also used for the single-pixel camera of [38]. The DMD consists of an array of electrostatically actuated micromirrors, where each mirror of the array is suspended above an individual SRAM cell. Each mirror rotates about a hinge and can be positioned in one of two states: +10 or −10 degrees from horizontal. Using lenses, all the reflections from the micromirrors at a given state (e.g., the light reflected by mirrors in +10 degree state) are focused and collected by a single photodiode to yield an absolute voltage. The output of the photodiode is amplified through an op-amp circuit and then digitized by an analog-to-digital converter. This value should be interpreted as v= N xi 1θi =+10 + DC offset, i=1 where xI = (x1 , . . . , xN ) is the vector formulated by concatenating the rows of the digital image I, the indicator function 1θi =+10 obtains the value 1 if the ith micromirror is at the state +10 degrees and 0 otherwise, and the DC offset is the value outputted when all the micromirrors are set to −10 degrees. Figure 3. The architecture of the DMD. In [38], the rows of the CS sampling matrix Φ are in fact a sequence of n pseudorandom binary masks, where each mask is actually a scrambled configuration of the DMD array (see also [3]). Thus, the measurement vector y is composed of inner-products of the digital image x with pseudorandom masks. In [38] the measurements are collected into a compressed bitstream with possible lossy or lossless compression applied to the elements of y. At the core of the Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 65 decoding process, which takes place at the viewing device, there is the minimization algorithm (1.2). Once a solution is computed, one obtains from it an approximate “reconstructed” image by applying the transform Ψ−1 to the coefficients. In [38], the authors choose the transform Ψ to be a wavelet transform, but their sampling process has a universal property. This means that any transform T which is in incoherence with a process of pseudorandom masks Φ (almost all reasonable transforms have this property) and in which the image has a sparse representation will lead to a reconstruction with good approximation of the original image. Recall that our proposed approach, although inspired by the CS concept of acquisition processes of simultaneous sensing and compression of the image, departs fundamentally in the mathematical framework. Instead of acquiring the visual data using a representation that is incoherent with wavelets, we sample directly in the wavelet domain. At a glance, this might seem to be a paradox, since computing the fast wavelet transform of an N -pixel image requires O (N ) computations, whereas we want to take only n measurements with n N . Furthermore, computing even one low-frequency wavelet coefficient requires calculating an integral over a significant portion of the image pixels, which requires O (N ) computations. As we shall see, this paradox is solved by using the DMD in a different way than in [38]. There exist DMD arrays with the functionality that a micromirror can “reflect” a grayscale value (contemporary DMD can produce 1024 grayscale values). In theory, one can use these devices to compute arbitrary real-valued functionals acting on the data, since any functional g can be represented as a difference of two nonnegative g+ , g− functionals, such that g = g+ − g− (g+ , g− ≥ 0). This allows us to compute any transform coefficient using only two measurements. Also, we may leverage on the “feedback” architecture of the DMD and make decisions on future measurements based on existing value measurements. As we shall see, this adaptive sampling process relies on a well-known modeling of image edges using a wavelet coefficient tree structure. Then, decisions on which wavelet coefficients should be sampled next are based on the values of the wavelet coefficients obtained so far. Let us now give the example of the acquisition of an arbitrary Haar wavelet coefficient, computed by using the even simpler binary DMD with only two measurements. The wavelet coefficient associated with a Haar wavelet in subband e = 1 is given by j 2 (k1 +1) 2j (k2 +1/2) 1 −j (2.6) f (x1 , x2 ) dx1 dx2 f, ψj,k = 2 2j k1 − 2j k2 2j (k1 +1) 2j (k2 +1) 2j k1 2j (k2 +1/2) f (x1 , x2 ) dx1 dx2 . This implies that for the purpose of computing a wavelet coefficient of the first type using a binary DMD array, we need to rotate and collect twice, into the photodiode, responses from two subsets of micromirrors, each supported over neighboring rectangular regions corresponding to the scale j and location k. From (2.6), the value of the wavelet coefficient we wish to acquire is the difference of these two outputs multiplied by 2−j . We emphasize that there is ongoing research of compressed image acquisition applications where DMD arrays might have an advantage over conventional CCD architectures. One of these areas is hyperspectral imaging, where the acquisition device samples an image in multiple bands by using configurable optics [47]. In [61], the authors show promising results Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 66 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH in hyperspectral imaging using the DMD array. The adaptive approach presented in this work can potentially improve their nonadaptive scheme. Finally, as pointed out by the referees, we note that in this work we do not investigate the influence of DMD accuracy on the quality of reconstruction. Typically, when noncompressive sensing is followed by image compression, transform coefficients are first calculated in relatively high precision and then quantized for the purpose of compression. When one acquires transform coefficients directly using the DMD, they are contaminated with noise, due to the relatively low precision of the DMD hardware. We postulate that this has a smaller effect when the goal is low bit-rate compression and transform coefficients are quantized significantly with quantization steps that are larger than the noise level of the hardware system. Obviously, this approach will become even more valid as DMD technology matures and more accurate systems enter the market. 3. Exact reconstruction of cartoon images using ACS. The theory of CS contains results on the ability to achieve exact recovery of a sparse signal from a limited number of linear measurements [14], [17], [35]. The prototype “sparse image” used in CS is the piecewise constant “cartoon” Shepp–Logan image (e.g., [14]), since such images are sparse in the wavelet domain and also have “small” total-variation norm. In this section we present a simple theoretical result on exact reconstruction via critical sampling that serves as a motivation for our ACS algorithms. We prove that if a function is a typical cartoon image, i.e., a piecewise constant function over domains with smooth boundaries, then it is possible to reconstruct it exactly by adaptive wavelet coefficient sampling, using an optimal number of measurements (up to a constant). In the following a “sample” is the evaluation of the given function by a linear functional, specifically a wavelet coefficient. In this section, where the given functions are not discrete, we use the convention as in [31] that scales are finer as the index j is larger (which is the opposite of the usual notation in the discrete setting). Theorem 3.1. Let f (x) = c1 + c2 1Ω (x), where the boundary ∂Ω is a simple closed C 2 (twice continuously differentiable) curve. Then, there exists a scale T ∈ Z, such that if all the nonzero e are known for j ≤ T , then all nonzero coefficients Haar wavelet (2.4) coefficients f, ψj,k e , j > T , can be acquired by an ACS process, where for each resolution f, ψj,k (3.1) e = 0 : e = 1, 2, 3, k ∈ Z2 . # {required samples at resolution j} ≤ 12# f, ψj−1,k Remark. Observe that the above theoretical result is of a different nature than previous “exact reconstruction” results in CS. Most importantly, the setting is not discrete. The simple claim is that, from fine enough resolution, the location of nonzero wavelet coefficients at some resolution is completely predictable from the location of nonzero coefficients at the lower resolution. This motivates the Stat-ACS adaptive sampling algorithm (section 4) that is based on wavelet trees and predicts the location of significant coefficients by using information derived from coefficients already sampled at lower scales. However, since images typically are modeled as the combination of cartoon and texture models, the Tex-ACS algorithm (section 5) is needed as a complement. We begin with some definitions from differential geometry [43]. Definition 3.2. Given three points x = γ(s), y = γ(σ), z = γ(τ ) on a curve γ, we define the Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 67 local radius of curvature ρ(x) at each point x of γ by (3.2) ρ(x) := lim r(x, y, z), τ,σ→s where r(x, y, z) is the radius of the unique circle which passes through the three points x, y, z. In the case where a line passes through the points we say that r(x, y, z) = ∞. Assuming a natural parameterization, the curvature k(s) at a point γ(s) is (3.3) k(s) := γ (s) = 1 . ρ(x) The minimal local radius of curvature of the curve γ is (3.4) ρ(γ) := min ρ (x) . x∈γ Definition 3.3. Given a simple closed C 2 curve γ, we define the global radius of curvature ρG (x) at each point x of γ by (3.5) ρG (x) = min r(x, y, z), (3.6) ρG (γ) := min ρG (x). y,z∈γ x∈γ It is clear that the global radius of curvature is bounded by the local radius of a curvature, i.e., (3.7) 0 < ρG (x) ≤ ρ(x) ∀x ∈ γ. In Figure 4 we see an example that shows the difference between the global radius of a curvature and the local radius of the curvature. In contrast to a local radius of a curvature, the global radius of the curvature contains information about nonlocal parts of the curve. Figure 4. Global radius of curvature (left circle) and local radius of curvature (right circle). Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 68 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH Let Ω be a domain with a simple and closed C 2 boundary ∂Ω. Assume that the boundary is given in the natural parametric form by (3.8) γ(s) = (x(s), y(s)), where L is the length of the curve, γ (s) (3.9) ∞ s ∈ [0, L], = max γ (s) ≤ M, 0≤s≤L and (3.10) 0 < ρG (γ) = MG < ∞. Let Qj,k denote a dyadic square in R2 (recall that in this section, the scales are finer as j → ∞): k1 (k1 + 1) k2 (k2 + 1) × j, , j ∈ Z, k = (k1 , k2 ) ∈ Z2 . (3.11) Qj,k := j , 2 2j 2 2j Lemma 3.4. Let γ be a smooth, closed, and simple curve. If for j ∈ Z 2−j ≤ ρG (γ), (3.12) then the circumscribed circle Oj,k of any cube Qj,k , k ∈ Z2 , contains at most one connected segment of the curve γ. Proof. Since γ is a non-self-intersecting curve, we have ρG (γ) > 0. Assume, in contrast, that Oj,k contains two segments of γ, γ1 ≡ γ|[s1 ,s2 ] and γ2 ≡ γ|[s3 ,s4 ] , with s1 < s2 < s3 < s4 . Thus, the radius R of the circle passing through γ(s1 ), γ(s2 ), and γ(s3 ) satisfies √ 2 −j 2 < 2−j , (3.13) ρG (γ) ≤ R = 2 which contradicts (3.12). We conclude that Oj,k may contain at most one connected component of γ. Thus, we may assume that if j satisfies (3.12), then either γ ∩Oj,k = ∅ or γ ∩Oj,k is a single connected segment. Let us assume now the second case, where γ ∩ Oj,k is a single segment, and let the entry and exit points of γ in Oj,k be γ(s1 ), and γ(s2 ), respectively. Define the following line: (3.14) γ̃(s) = γ(s1 ) + (s − s1 )γ (s1 ). Lemma 3.5. If j satisfies for α ≥ 1 (3.15) 2απ 2 M ≤ 2j , with M given in (3.9), then the distance between γ and the segment γ̃, with γ̃(s) = (x̃(s), ỹ(s)), given in (3.14) obeys (3.16) −j γ(s) − γ̃ s ≤ 2 , min s α s ∈ [s1 , s2 ] . Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 69 Proof. Since 2M ≤ 2j , the radius of curvature of γ is bigger than the radius of Oj,k . Therefore, the maximal length of the single connected curve segment contained in Oj,k is bounded by the circumference of Oj,k : (3.17) length (γ ∩ Oj,k ) = s2 − s1 ≤ √ 2π2−j . Using the differentiability properties of γ (s) = (x (s) , y (s)), we obtain |x(s) − x̃(s)| ≤ x ∞ (s − s1 )2 , |y(s) − ỹ(s)| ≤ y ∞ (s − s1 )2 ∀s ∈ [s1 , s2 ]. Applying (3.15) and (3.17) for the parametric distance |γ(s) − γ̃(s)| yields |γ(s) − γ̃(s)| = (x(s) − x̃(s))2 + (y(s) − ỹ(s))2 √ 2 ≤ M (s2 − s1 )2 ≤ M 2π2−j −j = 2π 2 M 2−2j ≤ 2α . Since the geometric distance is always smaller than the parametric distance, we obtain (3.16). Let j be a sufficiently fine (large) scale which obeys (3.12) and (3.15) for α = 16. As depicted in Figure 5, let I be the strip composed from two linear segments γ̃− and γ̃+ that are j j parallel to γ̃ and obtained by shifts of γ̃ by −2 16 and +2 16, respectively, in the direction of the normal of γ̃. By construction of γ̃− , γ̃+ and by (3.16), the curve γ is confined to the strip I in Qj,k . Define the width of I, w(I), to be the Hausdorff distance between γ̃− and γ̃+ . Obviously, by construction w(I) = 2−j /8. (3.18) Recall that f (x) = c1 + c2 1Ω (x). For i = 1, . . . , 4, let Si = area(Q̃i ∩ Ω), where {Q̃i }, i = 1, . . . , 4, are the four quadrants of Qj,k , i.e., the four dyadic cubes at level j + 1 that are contained in Qj,k . Also, let S(I) = area (I ∩ Qj,k ). In Figure 5, we see an example of a curve component which is contained in a dyadic cube Qj,k and bounded by a strip I. Our proof rests on the fact that we can make the width of the strip I to be relatively sufficiently small, such that the equality S1 = S2 = S3 = S4 cannot hold. Lemma 3.6. Assume that j satisfies (3.15) for α ≥ 16. The equality S1 = S2 = S3 = S4 holds for Qj,k only if (3.19) Si = 0, i ∈ {1, 2, 3, 4} , or (3.20) Si = 2−2j , 4 i ∈ {1, 2, 3, 4} . Proof. Denote the center of Qj,k by o. If o ∈ (I)C , then γ does not intersect with at least one of the quadrants Q̃i and therefore S1 = S2 = S3 = S4 can hold only if Qj,k ∩ Ω = ∅, Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 70 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH Figure 5. A curve component bounded by the strip I in Qj,k . corresponding to (3.19), or Qj,k ∩ Ωc = ∅, corresponding to (3.20). Otherwise o ∈ I (as depicted in Figure 5). By (3.18) √ √ 2−j √ 2 −2j −j · 2·2 = 2 . S(I) ≤ w(I) · 2 · side(Qj,k ) = 8 8 From a simple combinatorial argument, there exists a quadrant Q̃i for which the intersection with I is equal to or smaller than a quarter of S(I): √ 2 −2j 2 . (3.21) S(Q̃i ∩ I) ≤ 32 Without loss of generality, assume that (3.21) holds for i = 1 and that S1 > 0. This implies that √ −2j 6 · 2−2j 2−2j 22 − ≥ . (3.22) S1 ≥ area(Q̃1 ) − S(Q̃1 ∩ I) ≥ 4 32 32 On the other hand, for at least one of the other quadrants, Q̃i , i ∈ {2, 3, 4}, √ 2 −2j ·2 . (3.23) Si ≤ S(Q̃i ∩ I) ≤ S (I) ≤ 8 Thus, Si < S1 for one of i = 2, 3, 4, and therefore, o ∈ I is not possible. This completes the proof of the lemma. Proof of Theorem 3.1. Let f (x) = c1 + c2 1Ω (x), and let γ = ∂Ω. Let Qj,k be a dyadic e = f, ψ e = 0 for e = 1, 2, 3. We claim that if Q cube. If Qj,k ∩ γ = ∅, then wj,k j,k ∩ γ = ∅, j,k 1 = w 2 = w 3 = 0 simultaneously. Indeed, then for sufficiently large j it is impossible that wj,k j,k j,k assume Qj,k , j ≥ T , where for α ≥ 16 (3.24) T := max 0, log2 (2απ 2 M ) , log2 (ρG (γ)) . Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 71 1 = w 2 = w 3 = 0. Observe that, Assume, by contradiction, that Qj,k ∩ γ = ∅ and that wj,k j,k j,k using the notation where the scales are finer as j → ∞, −j 2 (k1 +1) 2−j (k2 +1/2) 1 j wj,(k1 ,k2 ) = 2 (3.25) f (x, y)dxdy 2−j k1 − (3.26) 2 wj,(k 1 ,k2 ) j = 2−j (k2 +1) 2−j (k2 +1/2) 2−j k1 − 2j 1 +1) f (x, y)dxdy = 0, 2−j (k1 +1/2) 2−j (k2 +1) 3 wj,(k 1 ,k2 ) 2−j k2 2−j k1 =2 2−j (k 2−j k2 2−j (k1 +1) 2−j (k1 +1/2) f (x, y)dxdy 2−j (k2 +1) 2−j k2 f (x, y)dxdy = 0, 2−j (k1 +1/2) 2−j (k2 +1/2) 2−j k1 + (3.27) 2−j (k1 +1) 2−j (k1 +1/2) − f (x, y)dxdy 2−j (k2 +1) 2−j (k2 +1/2) f (x, y)dxdy 2−j (k1 +1/2) 2−j (k2 +1) 2−j k1 − 2−j k2 2−j (k1 +1) 2−j (k1 +1/2) 2−j (k2 +1/2) f (x, y)dxdy 2−j (k2 +1/2) 2−j k2 f (x, y)dxdy = 0. With the notation of {Si }4i=1 given above, the solution of (3.25)–(3.27) is obtained by solving the linear algebraic system ⎛ ⎞ ⎛ ⎞ S1 −1 −1 1 1 ⎜ S2 ⎟ ⎟ (3.28) c2 ⎝ 1 −1 1 −1 ⎠ ⎜ ⎝ S3 ⎠ = 0. −1 1 1 −1 S4 Since we may assume c2 = 0, we obtain the set of solutions (3.29) S1 = S2 = S3 = S4 . As T satisfies (3.24) for α ≥ 16, by Lemma 3.6 we obtain that (3.29) is satisfied only if (3.19) or (3.20) holds. But this is a contradiction to our assumption that Qj,k ∩ γ = ∅. The proof of the theorem can now be completed by a simple induction: at any resolution e }. For any such nonzero j ≥ T assume that we have all nonzero wavelet coefficients{wj,k e coefficient wj,k , we sample the 12 wavelet coefficients e wj+1,l | e = 1, 2, 3, l = {(2k1 , 2k2 ), (2k1 + 1, 2k2 ), (2k1 , 2k2 + 1), (2k1 + 1, 2k2 + 1)} . Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 72 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH e If for the cube Qj+1,l , we have wj+1,l = 0 for e = 1, 2, 3, then we know Qj+1,l ∩ γ = ∅, and therefore we do not sample any wavelet coefficients supported in this cube. Otherwise our ACS process continues to sample wavelet coefficients in Qj+1,l . 4. Stat-ACS: An ACS algorithm based on statistical modeling in the wavelet domain. In [33], we presented a rather simple ACS algorithm, in which wavelet coefficients were sampled based on local Lipschitz smoothness estimates using their ancestors [23], [52]. In this work we elaborate on a more advanced statistical-based ACS algorithm: the Stat-ACS . We use information-theoretic analysis tools to estimate statistical dependencies between a wavelet coefficient and related coefficients at adjacent spatial locations and scales. The information obtained from previously sampled coefficients is summarized into a linear predictor which predicts the significance of a coefficient that has not yet been sampled. Since the Stat-ACS algorithm operates on a relatively complex dependence network, we employ hash tables to construct efficient dynamic dictionaries. Roughly speaking, the algorithm, of order O (n), captures n measurements, with n N , where N is the number of pixels. The goal of the algorithm is that n = O (k), where the image is assumed to be well approximated by k wavelet terms. While this is motivated by the cartoon model of section 3, it is certainly not guaranteed. The Stat-ACS algorithm is more complex than the adaptive algorithm in [33]; its scheme increases the probability that significant wavelet coefficients will not be “missed” and will be sampled. Thus, the Stat-ACS algorithm suggests a tradeoff between a higher complexity of the algorithm and a higher efficiency in recovering edges and important features in the image. 4.1. Wavelet statistical models. Our ACS algorithm relies heavily on statistical modeling of images, as is used in tasks such as compression [8], [25], [29], [49], denoising [29], [71], and classification [24]. In most of these works, wavelet coefficients of natural images are modeled by generalized Gaussian marginal statistics. The probability density function (PDF) of the generalized Gaussian density is defined as (4.1) p(x; α, β) = β β e−(|x|/α) , 2αΓ(1/β) where α models the width of the PDF peak, β is inversely proportional to the decreasing rate ∞ of the peak, and Γ(z) = 0 e−t tz−1 dt. z > 0 is the Gamma function. Also, these works reveal that for images, wavelet coefficients are not statistically independent mainly due to the geometric properties of edge singularities. Thus, at least for natural images, there are significant dependencies between the magnitudes of pairs of wavelet coefficients which correspond to adjacent spatial locations, orientation, and scales. There are various statistical models which have been developed to capture the dependencies between wavelet coefficients. Image compression schemes that use spatial and scale-toscale dependencies have proved to be extremely effective where a large magnitude wavelet coefficient may indicate that a neighbor wavelet coefficient at the same subband or at the next finer scale also has a large magnitude. The conditional distribution of a coefficient, given its coarser scale ancestors, was captured by the embedded zero-tree wavelet (EZW) coder [69] to encode entire trees of insignificant coefficients with a single symbol. A predictive method was used in [66] to provide high quality zero-tree coding results. Several studies [60], [59], [63] predict blocks of fine coefficients from blocks of coarse coefficients by using vectorized Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 73 look-up tables. Adaptive entropy coding was used in [67] to capture conditional statistics of coefficients that are based on the most significant bits of each of the eight spatial neighbors and the coefficients at the coarser scale. A switch between multiple probability methods that depend on the values of neighboring coefficients was done in [25]. The EZW coder was extended in [74] to use local coefficient contexts. The encoding of classification maps of regions of coefficients based upon the classes of the left and parent regions was described in [46]. The hidden Markov tree (HMT) model [29] also captures the dependencies between a parent and its children. In this work, the statistical correlation of an unsampled wavelet coefficient with neighbors that were already sampled is at the heart of our adaptive sensing algorithms. We highlight some important wavelet coefficient dependencies as illustrated in Figure 6. X is a wavelet coefficient, P is its parent at the coarser scale, and N X is a predefined set of adjacent coefficients of X at the same subband. Coarser scale (Level j + 1) Finer scale (Level j) Figure 6. Set of conditioning candidates for a given coefficient X. Recall that the mutual information between two random variables X and Y with joint density p(x, y) and marginal densities p(x) and p(y) is defined in [68] to be (4.2) I(X; Y ) = p(x, y) log p(x, y) dxdy. p(x)p(y) The mutual information I(X; Y ) can be interpreted as the amount of information Y conveys about X and vice versa. A higher value of I(X; Y ) implies a stronger dependency between X and Y , and thus, it is easier to estimate X from Y . In practice, the random variables are discrete, and in our case the PDFs of p(x, y), p(x), and p(y) are unknown and must be estimated from the empirical data. Studies of this problem resort to nonparametric histogram estimation of the PDFs [8], [58], [70]. The histograms are usually constructed as follows: the ranges of X and Y are partitioned into NX and NY intervals, respectively. The histogram of (X, Y ) obtained from the empirical data is denoted as {pX,Y (i, j)| for 1 ≤ i ≤ NX , 1 ≤ j ≤ NY } and yields an approximation to the PDF p(x, y). Similarly, the marginal distributions p(x), p(y) are estimated. Consider a pair of wavelet coefficients X and Y . Based on the estimated discrete proba- Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 74 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH bility distribution, we use the following estimator for the mutual information [55]: ˆ I(X; Y)= (4.3) i pX,Y (i, j) log j kij pX,Y (i, j) kij N = log , pX (i)pY (j) N ki kj i,j k is the number of coefficient pairs in the joint histogram such that Nij replaces ! pij , where kij! cell (i, j), and ki = j kij and kj = i kij are the marginal distribution histogram estimates. Tables 1–3 show averaged mutual information results for a wavelet coefficient (we use the (9,7) wavelet basis), modeled to be dependent on a single other coefficient. The index refers to a relative location of a neighbor with respect to the location of X. For example, N X−1,0 , N X0,−1 correspond to the immediate upward and left-hand neighbors of X. We see that similar results are obtained on a variety of different test images. We measured and compared the mutual information with a single neighbor (scale, location, subband). Roughly speaking, across the three subbands, the parent and the horizontal and vertical neighbors exhibit relatively higher mutual information than the diagonal neighbors. We conclude this section by noting that recent studies have adopted the mutual information estimation for an efficient characterization of statistical dependencies in an overcomplete Table 1 Mutual information for scale = 3, horizontal subband. Iˆ (mutual information) ˆ I(X; P) ˆ I(X; N X−1,−1 ) ˆ I(X; N X−1,0 ) ˆ I(X; N X−1,1 ) ˆ I(X; N X0,−1 ) ˆ I(X; N X0,1 ) ˆ I(X; N X1,−1 ) ˆ I(X; N X1,0 ) ˆ I(X; N X1,1 ) Lena 0.109 0.07 0.103 0.09 0.119 0.1208 0.09 0.1 0.07 Peppers 0.12 0.05 0.08 0.05 0.115 0.114 0.04 0.07 0.12 Barbara 0.08 0.08 0.123 0.095 0.132 0.13 0.09 0.11 0.07 Table 2 Mutual information for scale = 3, vertical subband. Iˆ (mutual information) ˆ I(X; P) ˆ I(X; N X−1,−1 ) ˆ I(X; N X−1,0 ) ˆ I(X; N X−1,1 ) ˆ I(X; N X0,−1 ) ˆ I(X; N X0,1 ) ˆ I(X; N X1,−1 ) ˆ I(X; N X1,0 ) ˆ I(X; N X1,1 ) Lena 0.127 0.088 0.197 0.106 0.109 0.107 0.109 0.199 0.087 Peppers 0.141 0.05 0.139 0.07 0.08 0.08 0.06 0.14 0.05 Barbara 0.07 0.1 0.22 0.11 0.12 0.125 0.11 0.22 0.1 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 75 Table 3 Mutual information for scale = 3, diagonal subband. Iˆ (mutual information) ˆ I(X; P) ˆ I(X; N X−1,−1 ) ˆ I(X; N X−1,0 ) ˆ I(X; N X−1,1 ) ˆ I(X; N X0,−1 ) ˆ I(X; N X0,1 ) ˆ I(X; N X1,−1 ) ˆ I(X; N X1,0 ) ˆ I(X; N X1,1 ) Lena 0.107 0.075 0.12 0.10 0.101 0.101 0.108 0.122 0.07 Peppers 0.07 0.02 0.05 0.03 0.05 0.05 0.03 0.04 0.03 Barbara 0.06 0.12 0.18 0.124 0.147 0.145 0.125 0.176 0.124 wavelet representation [45] and multiscale directional transforms such as contourlets [62] and curvelets [7]. 4.2. The Stat-ACS Algorithm. The linear predictor. To impose the statistics of wavelet coefficients exhibited by the mutual information into the Stat-ACS algorithm, we utilize a predictor for the absolute value of a coefficient that has not yet been sampled, using the absolute value of coefficients that have already been sampled (or have been determined to be insignificant and not sampled). In our experiments we used the linear predictor (4.4) 0.5 |P | + 0.28 |N X0,−1 | + 0.22 |N X−1,0 | , X in horizontal or diagonal, Predicted (|X|) = 0.5 |P | + 0.28 |N X−1,0 | + 0.22 |N X0,−1 | , X in vertical, where N X0,−1 , N X−1,0 , are the left-hand and upward neighbors and P is the parent of the unsampled coefficient X. The fixed set of weights in (4.4) was “learned” from a collection of standard test images via a simple least square error minimization. We now review in detail the way in which the joint statistical model is used in the framework of our Stat-ACS algorithm. We begin with preliminary notation: given an arbitrary wavelet coefficient corresponding to the index (e, j + 1, (k1 , k2 )) and label “P” (as for parent), its four children at the next finer resolution correspond to the indices (e, j, 2k1 , 2k2 ), (e, j, 2k1 + 1, 2k2 ), (e, j, 2k1 , 2k2 + 1), (e, j, 2k1 + 1, 2k2 + 1), where we label the indices (e, j, 2k1 , 2k2 + 1) and (e, j, 2k1 + 1, 2k2 ) as “L” (as for left) and “U” (as for up), respectively. In this algorithm, the indices of the sampled coefficients found to be significant are placed in a processing queue, together with their labels parent, left, up, or right. The labels given to each coefficient refer to the relation between the input and output indices of the linear predictor: a coefficient with label “P” indicates that the linear predictor attempts to estimate the significance of its four children at the next higher resolution; a label of “L” indicates that the linear predictor attempts to estimate the right-hand neighbor of that coefficient; and a label of “U” indicates that the linear predictor attempts to estimate the down spatial neighbor of that coefficient. Note that a coefficient can have more than one label. The index and value of the coefficient at the head of the processing queue are passed as input to the linear Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 76 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH predictor given in (4.4), which attempts to assess whether the related unsampled coefficients are potentially significant. In a manner similar to that of existing wavelet compression and denoising algorithms, the significance of a (predicted) coefficient is determined by comparing its (predicted) absolute value with a predetermined threshold. Thus, the threshold parameter is critical: it controls the rate/distortion tradeoff of the algorithm: a smaller threshold will lead to more samples, revealing more significant coefficients. Once the prediction asserts that an unsampled coefficient is with high probability significant, it is sampled by the sensing apparatus. The algorithm scans each subband such that the most informative spatial neighbor could be used in the processing queue. Thus, in the horizontal subband, where the left neighbor conveys the highest information, the rows are scanned first (left-to-right order), and in the vertical and diagonal subbands, the columns are scanned first (up-to-right order) since the up coefficient is the most informative. The hash table. Incorporating a contextual model into a Stat-ACS algorithm, which operates simultaneously on the information of several coefficients (parent, left, up neighbors), requires the use of a “sparse” data structure which supports fast “find” and “add” operations. This is essential in ensuring that our ACS acquisition algorithm is of complexity O(n), where n is the number of measurements, with n N (N is the number of pixels in the image). To this end we use a hash table [28] that on average performs find and add operations in O (1) time. During the adaptive sampling process, we store in the hash table the coefficients that were sampled, and we later retrieve their values for the purpose of predicting the significance of unsampled coefficients. We summarize the algorithm in the pseudocode below. Stat-ACS acquisition algorithm. 1. Acquire the values of all low-resolution coefficients up to a certain low-resolution J; each computation is completed using a fixed number of DMD array measurements as in (2.6). The initial resolution J can be selected as log2 N /2 − const. In any case, J should be bigger if the image is bigger. For images of size 512 × 512, we used J = 4. Note that the total number of coefficients at resolutions ≥ J is 22(1−J) N , which is a small fraction of N . 2. Initialize a processing queue, P (q), containing the indices of each of the wavelet coefficients at resolution J, with absolute value bigger than a given threshold Tthresh , and label as a parent, “P.” 3. Until the queue P (q) is exhausted, pop out the coefficient at the head of the queue and process as follows: (a) If labeled “P” with index (e, j, k) and children coefficients have not been sampled yet: (i) Acquire them now. (ii) If the child with index (e, j − 1, (2k1 , 2k2 + 1)) is significant, then add its index to the head of P (q) labeled as “L.” (b) If in the horizontal subband and labeled “L” as left of a coefficient X, find and read from the hash table the values of “P,” the parent of X, and of “U,” the upper neighbor of X. The values are set to zero if coefficients are not in the = (value(“P”), value(“L”), value(“U”)) for hash table. Fix the input vector Q Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 77 the linear predictor (4.4). If |predict (|X|)| ≥ Tthresh , then do the following: (i) Sample the group of 4 coefficients to which X belongs, i.e., having the same parent (e, j, k) . (ii) If a coefficient is significant and at resolution j > 1, add its index to the end of the queue, labeled as “P.” (iii) If the coefficient with index (e, j − 1, 2k1 , 2k2 + 1) is significant, then add to the head of the queue labeled as “L.” (This step is similar for coefficients from the other 2 subbands.) (c) Store the values of any sampled coefficient in the hash table. 4.3. Stat-ACS: Experimental results. A good method for evaluating the effectiveness of our approach is to benchmark it using the optimal n-term wavelet approximation. It is well known [34] that for a given image with N pixels, the optimal orthonormal wavelet approximation using only n coefficients is obtained using the n largest coefficients: " # " # " # f, ψje11,k1 ≥ f, ψje22,k2 ≥ f, ψje33,k3 ≥ · · · , n " # ei ei f, ψji ,ki ψji ,ki f − i=1 L2 (R2 ) e e = min f, ψj,k ψj,k f− #Λ=n (e,j,k)∈Λ . L2 (R2 ) For biorthogonal wavelets this “greedy” approach gives a near-best result, i.e., within a constant factor of the optimal n-term approximation. Obviously, to find the near-best n-term approximation one needs to compute all the wavelet coefficients and then select from them the n coefficients with largest absolute value. Any other threshold method would also require computing each and every coefficient and testing whether its absolute value is above a certain threshold. Thus, any threshold method requires order N computations, whereas our ACS algorithm is output sensitive and requires only order n computations. To simulate our algorithm, we first precompute the entire wavelet transform of a given image. However, we strictly follow the recipe of our ACS algorithm and extract a wavelet coefficient from the precomputed coefficient matrix only if its index was added to the adaptive sampling queue. In Figure 7(a) we see a “benchmark” near-best n = 7000 term approximation of the image Lena using a biorthogonal (9,7) wavelet, which achieves a peak signal-to-noise ratio (PSNR) of 31.6. In Figure 7(b) we see our Stat-ACS with n = 14000 and PSNR = 30.26. Here, J = 4 and the thresholds Tthresh used were 0.28 for j = 1, 0.24 for j = 2, and 0.2 for j = 3. We see that we have sampled twice as many as in the sparse representation to which we compare. Typically, in an application where the output is the compressed image, the 14000 sampled wavelet coefficients (Figure 7(b)) should be pruned, with insignificant coefficients removed with a minimal impact on PSNR and the remaining coefficients compressed using a tree-based wavelet compression algorithm [66], [69]. Indeed, in Figures 8(a) and 8(b) we see a 5000term and an 8000-term approximation, with PSNR = 29.35 and PSNR = 30.1, respectively, extracted from the Stat-ACS approximation obtained in Figure 7(b). Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 78 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH (a) Benchmark, n = 262144, 7000-term, PSNR=31.6. (b) Stat-ACS, n = 14000, 14000-term, PSNR=30.26. Figure 7. Comparison of the Stat-ACS algorithm with optimal benchmark on Lena. (a) Stat-ACS, n = 14000, 5000-term, PSNR=29.35. (b) Stat-ACS, n = 14000, 8000-term, PSNR=30.1. Figure 8. Adaptive sampling with Stat-ACS with postprocessing pruning of “insignificant” coefficients. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 79 (a) Benchmark, n = 262144, 7000-term, PSNR=31.6. (b) Stat-ACS, n = 14000, 14000-term, PSNR=30.26. Figure 9. Zoom-in from Figure 7 on Lena’s shoulder. Figure 9 shows zooming in on specific edge regions from Figure 7. As Figure 9(b) illustrates, the edges were recovered quite well and are comparable to the near-best 7000-term approximation. In Figure 10(a) we see a benchmark 7000-term biorthogonal (9,7) wavelet approximation of the Peppers image, with PSNR = 30.8. In Figure 10(b) we see our StatACS with n = 14000, using an 8000-term approximation extracted from the 14000 adaptively sampled wavelet coefficients with PSNR = 29.17. 4.4. Stat-ACS: Rate/distortion comparison to nonadaptive CS algorithms. In this section we compare the performance of our Stat-ACS algorithm with that of some of the prominent nonadaptive CS measurement schemes such as the Scrambled Fourier Ensemble (SFE) [11] and the Scrambled Block Hadamard Ensemble (SBHE) [40]. The performances of the SFE and the SBHE have been evaluated using the following reconstruction algorithms: minTV, l1 optimization solver, and iterative thresholding. Table 4 tabulates the PSNR values for three 256 × 256 natural images, Lena, Cameraman, and Peppers. Also, results from [13] are included, where random Fourier sampling matrices were applied directly in the wavelet domain. The best result for each image and each number of measurements is given in boldface. Model-based CS methods [2], [44], [48] use wavelet representation modeling in an implicit way. The CS acquisition process is still pseudorandom, but the recovery algorithms try to integrate the assumptions on wavelet structure, with the goal of faster and more accurate recovery in the presence of noise. Next, we compare the performances of our Stat-ACS method and these methods. In Table 5 we see recovery results for the 128 × 128 test image Peppers (used in [2]) from 5000 samples. We run the Stat-ACS algorithm twice, with the Daubechies ortho-basis D8 that was used in [2] and with our preferred basis the (9,7). The results are reported in root mean square error (RSME) to match the reported results in the references. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 80 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH (a) Benchmark, n = 262144, 7000-term, PSNR=30.8. (b) Stat-ACS, n = 14000, 8000-term, PSNR=29.17. Figure 10. Comparison of the Stat-ACS algorithm with optimal benchmark on Peppers. Table 4 Comparison of Stat-ACS with nonadaptive CS algorithms. n [13] l1 optimization SFE Lena (256 × 256) 10000 26.5 21.5 15000 28.7 23.9 Cameraman (256 × 256) 10000 26.2 20.9 15000 28.7 23.2 Peppers (256 × 256) 10000 26.7 21.6 15000 25.3 22.7 SBHE min-TV optimization SFE SBHE Iterative thresholding SFE SBHE 21.1 23.7 27.5 29.7 28.0 30.0 27.2 30.4 27.7 30.6 30.83 33.9 20.7 23.0 27.0 29.3 27.1 29.5 26.5 29.5 26.8 29.7 28.57 31.22 21.4 22.6 28.6 31.2 29.0 31.7 28.1 31.2 28.2 31.6 30.75 34.14 Stat-ACS 5. Tex-ACS: An ACS algorithm based on a cartoon/texture model. In this section we elaborate on an ACS algorithm that relies on a model of images as a mix of cartoon and local texture patches. While the edges are somewhat structured, capturing the textural parts is more difficult and elusive, though easily visually perceived. In general, texture is regarded as a function of the spatial variation in pixel intensities [21]. In the wavelet representation, the significant information of texture typically manifests at the higher frequency scales, i.e., finer resolutions with a smaller index j. The Stat-ACS algorithm of the previous section relies on a cartoon model of the image and is not equipped to predict significant high-frequency wavelet coefficients which correspond to texture information Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 81 Table 5 Comparison of Stat-ACS with nonadaptive model-based methods for 128 × 128 Peppers. Method CoSaMP [57] RMSE for 5000 sam- 22.8 ples Model-based [2] (D8 wavelet basis) 11.1 Stat-ACS, (D8 wavelet basis) 9.4 Stat-ACS, ((9,7) wavelet basis) 7.38 from the coefficients at lower resolutions. Thus, our aim is to identify the local texture areas and to sample wavelet coefficients of corresponding frequencies that are supported in these domains. We address this problem by sampling from a dictionary composed of wavelets and local Fourier-window components over tiles of the image. The role of the local Fourier sampling is to identify the texture information. Once we identify textural areas, we use the correlation between Fourier frequency components and wavelet coefficients with coinciding frequencies to subsequently sample correctly the wavelet coefficients whose support overlaps these areas. In the experimental results, the suggested approach yields significant improvement in extracting the texture parts. 5.1. Texture and wavelets in the frequency domain. We very roughly sketch the properties of wavelets from a Fourier side perspective (for more detailed analysis, see, e.g., [31], [52]). In particular we focus on the (9,7) biorthogonal wavelet used in our experiments. Let ϕ, ψ and ϕ̃, ψ̃ be two dual pairs of scaling functions and wavelets that generate biorthogonal wavelet bases of L2 (R). Recall that the bivariate wavelet bases we use are created by a tensor product, yielding three types of dual wavelets: ψ̃ 1 (x) = ϕ̃(x1 )ψ̃(x2 ), (5.1) Denote ψ̃ 2 (x) = ψ̃(x1 )ϕ̃(x2 ), e (x) = 2−j ψ̃ e 2−j x − k , ψ̃j,k (5.2) ψ̃ 3 (x) = ψ̃(x1 )ψ̃(x2 ). e = 1, 2, 3, j ∈ Z, k ∈ Z2 . In the Fourier domain, the separable wavelets from (5.1) become (5.3) ˆ ), ψ̃ˆ2 (ω , ω ) = ψ̃(ω ˆ )ϕ̃(ω ˆ )ψ̃(ω ˆ ). ˆ 1 )ψ̃(ω ˆ 2 ), ψ̃ˆ3 (ω1 , ω2 ) = ψ̃(ω ψ̃ˆ1 (ω1 , ω2 ) = ϕ̃(ω 2 1 2 1 1 2 ˆ ˆ is concentrated around the origin and |ψ̃| Roughly speaking, |ϕ̃| away from the origin. Thereˆ1 fore, by (5.3), |ψ̃ (ω1 , ω2 )| is large at low horizontal frequencies ω1 and high vertical frequencies ω , |ψ̃ˆ2 (ω , ω )| is large at high frequencies ω and low vertical frequencies ω , and |ψ̃ˆ3 (ω , ω )| 2 1 2 1 2 1 2 is large at high horizontal and vertical frequencies . According to the dominant frequencies computed for the wavelet in scale j = 0, one can obtain the corresponding dominant frequencies at scale j by the relation e (5.4) ψ̂j,k (ω) = 2j ψ̂ e (2j ω) . Recall the Plancherel formula (5.5) e = f, ψ̃j,k 1 ˆ ˆe f , ψ̃j,k . 2π Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 82 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH We wish to examine the amount of oscillatory activity exhibited in the image f over an arbitrary local area defined by a cube Q: (5.6) fQ (x1 , x2 ) = f (x1 , x2 ) if (x, y) ∈ Q, 0 else. Intensive oscillatory activities may indicate that the support of Q contains texture and other e important features at fine scales. If the support of the wavelet coefficient ψ̃j,k satisfies e support(ψ̃j,k ) ⊆ Q, then e e = fQ , ψ̃j,k = f, ψ̃j,k (5.7) 1 ˆ ˆe fQ , ψ̃j,k . 2π Although the (9,7) wavelet is not compactly supported in the frequency domain, it is “essentially” compactly supported in the frequency domain, and by (5.4) its dilates have “peaks” at the corresponding frequencies. Thus, if, locally, f contains corresponding frequencies, then we should expect some of the coefficients of wavelets that have essential support around that frequency to have large absolute value. In practice, we estimate the “energy” of the significant local frequencies by sampling discrete cosine transform (DCT) coefficients. The DCT decomposes a local block of an image over [x1 , x1 + Ñ ] × [x2 , x2 + Ñ ] in the frequency domain Ñ −1 Ñ −1 π f (x1 + n1 , x2 + n2 ) cos (5.8) fˆ(k1 , k2 ) = Ñ n1 =0 n2 =0 $ 1 n1 + 2 % k1 π cos Ñ $ 1 n2 + 2 % k2 . Since the DCT is a linear, real-valued transform, we can compute each DCT coefficient as a combination of two positive functionals which correspond to two DMD measurements, as in (2.6). 5.2. The Tex-ACS algorithm. This algorithm is a second step that is applied after the Stat-ACS. For simplicity, we assume that the input image to be processed is of dyadic dimensions, N1 × N1 = N . We subdivide the image into blocks of 16 × 16 pixels each, {Ωi,l }, 1 ≤ i, l ≤ N1 /16. Over each patch Ωil , we compute using (5.8), with Ñ = 16, the three DCT coefficients fˆΩil (1, 8), fˆΩil (8, 1), fˆΩil (8, 8), which correspond to the dominant frequencies of the horizontal, vertical, and diagonal wavelet coefficients, respectively, at scale j = 2. This implies that the second step is initialized by taking about 0.012 × N12 local frequency samples. If any of the three DCT coefficients fˆΩil (k1 , k2 ) satisfies |fˆΩil (k1 , k2 )| ≥ tj for a large threshold tj , then we sample the coefficients of wavelets at the corresponding scale and subband whose support significantly overlaps the block Ωi,l . Also, if the frequency component fˆΩil (k1 , k2 ) is significant, it is also likely that significant texture information exists at adjacent scales. Thus in such a case we sample the frequency componentsfˆΩil (k1 /2 , k2 /2) and fˆΩil (2k1 , 2k2 ) to test oscillatory activities in the adjacent frequency domains. If |fˆΩil (k1 /2 , k2 /2)| ≥ tj , or |fˆΩil (2k1 , 2k2 )| ≥ tj , for a scale dependent threshold tj , then we subsequently also sample the corresponding wavelet coefficients of scale, compact support, and subband. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 83 The Tex-ACS algorithm is implemented through two separate stages. The first stage, based on the Stat-ACS algorithm, acquires wavelet coefficients which correspond to the cartoon areas. The second stage identifies texture areas using DCT sampling and subsequently acquires fine-scale wavelet coefficients at locations that with high probability contain textures. The Tex-ACS algorithm can be summarized as follows. Tex-ACS algorithm. Adaptive sampling for a cartoon-texture image based model. Sampling of the cartoon parts. Acquire significant wavelet coefficients according to the Stat-ACS algorithm from section 4.2. Sampling of the texture parts. Process each block Ωi,l , 1 ≤ i ≤ N1 /16, 1 ≤ l ≤ N1 /16, as follows: 1. Compute the DCT frequency components, fˆΩi,l (k1 , k2 ), (k1 , k2 ) = {(1, 8), (8, 1), (8, 8}, to detect oscillatory activities at the three subbands of scale j = 2. 2. If |fˆΩil (k1 , k2 )| ≥ tj , then (a) Sample the wavelet coefficients at the corresponding compact support and subband. (b) Compute the frequency components fˆΩil (2k1 , 2k2 ) and fˆΩil (k1 /2, k2 /2) in order to detect oscillatory activities at scales j = 1 and j = 3, respectively. If |fˆΩil (2k1 , 2k2 )| ≥ tj or |fˆΩil (k1 /2 , k2 /2)| ≥ tj , then sample the wavelet coefficients in the corresponding scale, compact support, and subband. 5.3. Tex-ACS: Experimental results. As in the case of the Stat-ACS algorithm, we use the near-best n-term wavelet approximation as a benchmark. In addition, we compare the efficiency of the Tex-ACS algorithm with our Stat-ACS algorithm from section 4. We test the Tex-ACS algorithm on the Barbara test image, which is an excellent representative of an image that exhibits a mix of cartoon and texture patches. It is evident (illustrated in Figure 12) that the Stat-ACS algorithm based on the cartoon model does not yield good results with regard to the recovery of the texture parts in the image. Moreover, increasing the number of measurements does not yield significant improvement visually or in terms of PSNR. The outcome of the Tex-ACS algorithm (illustrated in Figure 13) shows a significant improvement in terms of PSNR and succeeds in reconstructing texture patches in the image. In Figure 11(a) we see the Barbara input image. Figure 11(b) depicts a benchmark nearbest 12419-term biorthogonal (9,7) wavelet approximation, extracted from the full wavelet representation with PSNR = 27.76. In Figure 12 we show the results of the Stat-ACS algorithm with 12500 terms and PSNR = 24.55, obtained after pruning the insignificant coefficients from 26000 samples. In Figure 13 we see the results using our Tex-ACS algorithm with 12500 terms and PSNR = 25.11, obtained after pruning from 26000 measurements. Note that when we state 26000 samples, this total is composed of both local DCT and wavelet coefficients. In Figures 14 and 15 we zoom in on some textural areas of the result in Figure 13. We note that the Tex-ACS algorithm works well on Barbara since the texture parts are composed of stripes and checkers that are relatively well modeled by local DCT coefficients. Indeed, it is well known that local Fourier methods usually capture texture in this manner. Possibly, in other cases, we will need to create a different sampling dictionary composed of systems in which we believe the signal will be sparse. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 84 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH (a) Barbara test image. (b) Benchmark, N = n = 262144, 12419-term, PSNR= 27.76. Figure 11. Benchmark wavelet approximation of Barbara using 12419 terms. Figure 12. n = 26000, Stat-ACS (cartoon model based), approximation using 12500 most significant wavelet coefficients, PSNR = 24.55. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 85 Figure 13. n = 26000, TeX-ACS, approximation using 12500 most significant wavelet coefficients, PSNR = 25.11. (a) Zoom in on the chair. (b) Zoom in on the scarf and legs. Figure 14. Results from Figure 13. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 86 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH (a) Zoom in on the scarf. (b) Zoom in on the tablecloth. Figure 15. Results from Figure 13. 6. Conclusions. In this work we proposed a direct and fast method for adaptive transform coefficient acquisition, which relies on statistical modeling of images. While our algorithm uses the same order of measurements as nonadaptive CS methods, the main advantages of our direct approach are that no complex recovery algorithm is needed and that it allows more control over the compressed image quality. Our experimental results show, by way of software simulations, that our adaptive algorithms perform well in terms of image quality and speed. One of the possible hardware architectures on which such a scheme could be implemented is the DMD, although more work is required to fully analyze the influence of the DMD accuracy on the proposed method. Acknowledgment. We sincerely thank all the referees for their patience with us and their numerous comments that have significantly improved the quality of the paper. REFERENCES [1] R. Baraniuk, Compressive sensing, IEEE Signal Processing Mag., 24 (2007), pp. 118–120. [2] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, Model-based compressive sensing, IEEE Trans. Inform. Theory, 56 (2010), pp. 1982–2001. [3] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry property for random matrices, Constr. Approx., 28 (2008), pp. 253–263. [4] R. Berinde and P. Indik, Sparse Recovery Using Sparse Random Matrices, Technical report, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, 2008. [5] T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal., 27 (2009), pp. 265–274. [6] J. Bobin, J.-L. Starck, and R. Ottensamer, Compressed sensing in astronomy, IEEE J. Sel. Topics Signal Process., 2 (2008), pp. 718–726. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 87 [7] L. Boubchir and M. Fadili, Multivariate statistical modeling of images with the curvelet transform, in Proceedings of the 8th IEEE International Conference on Signal Processing and Its Applications, 2005, pp. 747–750. [8] R. W. Buccigrossi and E. P. Simoncelli, Image compression via joint statistical characterization in the wavelet domain, IEEE Trans. Image Process., 8 (1999), pp. 1688–1701. [9] E. Candès, Compressive sampling, in Proceedings of the International Congress of Mathematics, 2006, pp. 1433–1452. [10] E. Candès and D. Donoho, New tight frames of curvelets and optimal representations of objects with piecewise singularities, Comm. Pure Appl. Math., 57 (2004), pp. 219–266. [11] E. Candès and J. Romberg, Sparsity and incoherence in compressive sampling, Inverse Problems, 23 (2007), pp. 969–985. [12] E. Candès and J. Romberg, Robust signal recovery from incomplete observations, in Proceedings of the IEEE International Conference on Image Processing, 2006, pp. 1281–1284. [13] E. Candès and J. Romberg, Practical signal recovery from random projections, in Wavelet Applications in Signal and Image Processing XI, Proceedings of the SPIE 5914, 2004. [14] E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory, 52 (2006), pp. 489–509. [15] E. Candès, J. Romberg, and T. Tao, Signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math., 59 (2005), pp. 1207–1223. [16] E. Candès and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n, Ann. Statist., 35 (2007), pp. 2313–2351. [17] E. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory, 51 (2005), pp. 4203–4215. [18] E. Candès and M. Wakin, An introduction to compressive sampling, IEEE Signal Processing Mag., 25 (2008), pp. 21–30. [19] W. Carey, D. Chuang, and S. Hemami, Regularity-preserving interpolation, IEEE Trans. Image Process., 8 (1999), pp. 1293–1297. [20] S. G. Chang, Z. Cvetkovi’c, and M. Vetterli, Resolution enhancement of images using wavelet transform extrema extrapolation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995, pp. 2379–2382. [21] C. Chen, L. Pau, and P. Wang, Handbook of Pattern Recognition and Computer Vision, World Scientific, Singapore, 1993. [22] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput., 20 (1998), pp. 33–61. [23] Z. Chen and M. A. Karim, Forest representation of wavelet transform and feature detection, Opt. Eng., 39 (2000), pp. 1194–1202. [24] H. Choi and R. Baraniuk, Multiscale texture segmentation using wavelet-domain hidden Markov models, in Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, Vol. 2, Pacific Grove, CA, 1998, pp. 1692–1697. [25] C. Chrysafis and A. Ortega, Efficient context-based lossy wavelet image coding, in Proceedings of the Conference on Data Compression, Snowbird, UT, IEEE Computer Society, Washington, DC, 1997. [26] A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term, J. Amer. Math. Soc., 22 (2009), pp. 211–231. [27] R. Coifman, F. Geshwind, and Y. Meyer, Noiselets, Appl. Comput. Harmon. Anal., 10 (2001), pp. 27–44. [28] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed., MIT Press, Cambridge, MA, 2009. [29] M. Crouse, R. Nowak, and R. Baraniuk, Wavelet-based statistical signal processing using hidden Markov models, IEEE Trans. Signal Process., 46 (1998), pp. 886–902. [30] W. Dai and O. Milenkovic, Subspace pursuit for compressive sensing: Closing the gap between performance and complexity, IEEE Trans. Inform. Theory, 55 (2009), pp. 2230–2249. [31] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. [32] S. Dekel, Adaptive Compressed Image Sensing Based on Wavelet-Trees, GE Healthcare technical report, 2008. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 88 AMIR AVERBUCH, SHAI DEKEL, AND SHAY DEUTSCH [33] S. Deutsch, A. Averbuch, and S. Dekel, Adaptive compressed image sensing based on wavelet modeling and direct sampling, in Proceedings of the 8th International Conference on Sampling Theory and Applications, 2009. [34] R. DeVore, Nonlinear approximation, in Acta Numerica, Acta Numer. 7, Cambridge University Press, Cambridge, UK, 1998, pp. 51–150. [35] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289–1306. [36] D. Donoho, For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution, Comm. Pure Appl. Math., 59 (2006), pp. 797–829. [37] D. Donoho, I. Drori, Y. Tsaig, and J. L. Starck, Sparse Solution of Underdetermined Linear Equations by Stagewise Orthogonal Matching Pursuit, preprint, Department of Statistics, Stanford University, Stanford, CA, 2006. [38] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, Single-pixel imaging via compressive sampling, IEEE Signal Processing Mag., 25 (2008), pp. 83–91. [39] M. Duarte, M. Wakin, and R. Baraniuk, Fast reconstruction of piecewise smooth signals from random projections, in Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured Representations, Rennes, France, 2005. [40] L. Gan, T. Do, and T. Tran, Fast compressive imaging using scrambled Hadamard transform ensemble, in Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008), 2008. [41] O. Gonzalez and J. Maddocks, Global curvature, thickness, and the ideal shapes of knots, Proc. Natl. Acad. Sci. USA, 96 (1999), pp. 4769–4773. [42] O. Gonzalez, J. Maddocks, and J. Smutny, Curves, circles, and spheres, in Physical Knots: Knotting, Linking, and Folding Geometric Objects in R3 , AMS, Providence, RI, 2002, pp. 195–215. [43] H. W. Guggenheimer, Differential Geometry, Dover, New York, 1977. [44] L. He and L. Carin, Exploiting structure in wavelet-based Bayesian compressive sensing, IEEE Trans. Signal Process., 57 (2009), pp. 3488–3497. [45] K. Huang and S. Aviyente, Mutual information based subband selection for wavelet packet based image classification, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, Philadelphia, PA, 2005, pp. 241–244. [46] R. L. Joshi, H. Jafarkhani, J. H. Kasner, T. R. Fischer, N. Farvardin, M. W. Marcellin, and R. H. Bamberger, Comparison of different methods of classification in subband coding of images, IEEE Trans. Image Process., 6 (1997), pp. 1473–1486. [47] R. Kawakami, J. Wright, Y. Tai, Y. Matsushita, M. Ben-Ezra, and K. Ikeuchi, High-resolution hyperspectral imaging via matrix factorization, in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 2329–2336. [48] C. La and M. Do, Signal reconstruction using sparse tree representation, in Wavelets XI, Proceedings of SPIE, Vol. 5914, San Diego, CA, 2005. [49] J. Liu and P. Moulin, Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients, IEEE Trans. Image Process., 10 (2001), pp. 1647–1658. [50] M. Lustig, D. Donoho, and J. Pauly, Sparse MRI: The application of compressed sensing for rapid MR imaging, Magn. Reson. Med., 58 (2007), pp. 1182–1195. [51] D. Malioutov, S. Sanghavi, and A. Willsky, Sequential compressed sensing, IEEE J. Sel. Topics Signal Process., 4 (2010), pp. 435–444. [52] S. Mallat, A Wavelet Tour of Signal Processing. The Sparse Way, 3rd ed., Academic Press, Burlington, MA, 2009. [53] S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Pattern Anal. Mach. Intell., 11 (1989), pp. 674–693. [54] S. Mallat and W. L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inform. Theory, 38 (1992), pp. 617–642. [55] R. Moddemeijer, On estimation of entropy and mutual information of continuous distributions, Signal Process., 16 (1989), pp. 233–248. [56] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput., 24 (1995), pp. 227– 234. [57] D. Needell and J. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal., 26 (2009), pp. 301–321. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. ADAPTIVE COMPRESSED IMAGE SENSING 89 [58] L. Paninski, Estimation of entropy and mutual information, Neural Comput., 15 (2003), pp. 1191–1253. [59] A. Pentland and B. Horowitz, A practical approach to fractal-based image compression, in Digital Images and Human Vision, A. B. Watson, ed., MIT Press, Cambridge, MA, 1993, pp. 53–59. [60] A. Pentland, E. Simoncelli, and T. Stephenson, Fractal-Based Image Compression and Interpolation, U.S. Patent 5, 148, 497, Sept. 15, 1992. [61] Y. Pfeffer and M. Zibulevsky, A Micro-Mirror Array Based System for Compressive Sensing of Hyperspectral Data, Technical report CS-2010-01, Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel, 2010. [62] D. D.-Y. Po and M. N. Do, Directional multiscale modeling of images using the contourlet transform, IEEE Trans. Image Process., 15 (2006), pp. 1610–1620. [63] R. Rinaldo and G. Calvagno, Image coding by block prediction of multiresolution subimages, IEEE Trans. Image Process., 4 (1995), pp. 909–920. [64] J. Romberg, H. Choi, and R. Baraniuk, Bayesian tree-structured image modeling using waveletdomain hidden Markov models, IEEE Trans. Image Process., 10 (2001), pp. 1056–1068. [65] F. Rooms, A. Pizurica, and W. Philips, Estimating image blur in the wavelet domain, in Proceedings of the IEEE Benelux Signal Processing Symposium (SPS), 2002. [66] A. Said and W. Pearlman, A new fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Technol., 6 (1996), pp. 243–250. [67] E. Schwartz, A. Zandi, and M. Boliek, Implementation of compression with reversible embedded wavelets, in Proceedings of SPIE 2564, 1995. [68] C. E. Shannon, A mathematical theory of communication, Bell System Tech. J., 27 (1948), pp. 379–423, 623–656. [69] J. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Process., 41 (1993), pp. 3445–3462. [70] E. Simoncelli, Modeling the joint statistics of images in the wavelet domain, in Proceedings of SPIE 3813, 1999, pp. 188–195. [71] K. Timmermann and R. Nowak, Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging, IEEE Trans. Inform. Theory, 45 (1999), pp. 846–862. [72] J. Tropp and A. Gilbert, Signal recovery from partial information via orthogonal matching pursuit, IEEE Trans. Inform. Theory, 53 (2007), pp. 4655–4666. [73] Y. Tsaig and D. Donoho, Extensions of compressed sensing, Signal Process., 86 (2006), pp. 549–571. [74] X. Wu and J. Chen, Context modeling and entropy coding of wavelet coefficients for image compression, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 1997, pp. 3097–3100. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Related manuals
Download PDF
advertisement