A Parallel Compressive Imaging Architecture for One-Shot Acquisition Tomas Björklund, Enrico Magli arXiv:1311.0646v1 [cs.CV] 4 Nov 2013 Department of Electronics and Telecommunications Politecnico di Torino, Turin, Italy Abstract—A limitation of many compressive imaging architectures lies in the sequential nature of the sensing process, which leads to long sensing times. In this paper we present a novel architecture that uses fewer detectors than the number of reconstructed pixels and is able to acquire the image in a single acquisition. This paves the way for the development of video architectures that acquire several frames per second. We specifically address the diffraction problem, showing that deconvolution normally used to recover diffraction blur can be replaced by convolution of the sensing matrix, and how measurements of a 0/1 physical sensing matrix can be converted to -1/1 compressive sensing matrix without any extra acquisitions. Simulations of our architecture show that the image quality is comparable to that of a classic Compressive Imaging camera, whereas the proposed architecture avoids long acquisition times due to sequential sensing. This one-shot procedure also allows to employ a fixed sensing matrix instead of a complex device such as a Digital Micro Mirror array or Spatial Light Modulator. It also enables imaging at bandwidths where these are not efficient. I. I NTRODUCTION Compressed Sensing (CS) [1][2] is a novel framework for acquisition of compressible data at sub-Nyquist sampling rates, moving computational complexity from sensing phase to reconstruction. In Compressive Imaging (CI), CS is applied to reconstruct images from fewer measurements than the number of image pixels, under the condition that the image is sparse or at least compressible in some domain. The success of JPEG indicates that most natural images are highly compressible with only small losses of image quality. Seminal works on CI include the single-pixel camera [3] and single-pixel terahertz imaging system [4], which acquire the image through sequential measurements from a single sensor while changing random sensing patterns in front of it. Compressed coded aperture imaging [5] uses a coded aperture to project overlapping coded copies of the image onto a detector array to obtain superresolution using CS. Similarly, CMOS compressive imagers [6][7] use detector arrays performing combinations of analog measurements before converting into fewer digital compressed measurements. This allows to significantly decrease power consumption. A limitation of architectures based on the single-pixel camera lies in the sequential, and hence slow acquisition process. To some extent this can be addressed by block-based CS [9] and [10], which can in part parallelize the sensing process. In this paper we present a new CI framework which allows faster acquisition than [3][4][9][10], in which the total time is linearly proportional to the number of measurements. As in [5] we use fewer detectors than [6][7], but we also demonstrate that our architecture can be used even if diffraction from the sensing pattern is prominent; this enables smaller camera dimensions and the use of lower energy radiation for imaging. Besides shortening the total acquisition time, the proposed architecture also weakens the requirement on the modulation and acquisition rate of the sensing matrix and the detector array, allowing a cheaper and simpler construction, and paving the way for compressive video capture in real-time. II. BACKGROUND : C OMPRESSED S ENSING Consider the sensing process: y = Ax , (1) where x is the signal of interest, y are the measurements and A is an r ×c sensing matrix with r c. If x is sparse in some domain and A satisfies the Restricted Isometry Property (RIP) [8], x can be recovered with very high probability solving the minimization problem minr ||x̃||1 subject to y = Ax̃ . (2) x̃∈R For images, a common approach is to instead minimize the total variation norm [11] Xq 2 2 T V (x) = |xi+1,j − xi,j | + |xi,j+1 − xi,j | , i,j which assumes the image gradient to be sparse. This is the method used to recover images in this paper. In CI, the sensing matrix can take the form of a physical filter. The filter modulates the light of each image pixel before it reaches the detector(s). Each measurement uses a different modulation pattern. The filter can be realized by a Digital Micromirror Device (DMD) as in [3] or a Spatial Light Modulator (SLM) or plates with multiple fixed interchangeable filters with patterns of holes or transparencies [4]. III. P ROPOSED PARALLEL C OMPRESSIVE I MAGING A RCHITECTURE The architecture we propose to parallelize the sensing process is illustrated in Fig. 1. To simultaneously acquire measurements of multiple sensing patterns, the image is not focused when projected onto the sensing matrix. The unfocused projection can be seen as shifted copies of the image. These copies receive different encodings, which allows for parallel acquisition of measurements, without needing to update the sensing matrix between each single measurement. Shifting a longer sensing pattern in one direction has been shown to work for CS reconstruction with little impact on the reconstruction [12]. We will show later that the sensing matrix has a block-Toeplitz structure when shifting in two directions; this structure has been shown to satisfy the RIP property [5][13]. Fig. 1 shows an overview of the optical setup. After the target image, a first lens is located at distance so . This lens focuses an image at the distance si such that 1 1 1 so + si = f1 , where f1 is the focal length. At distance si from the lens, a diaphragm is placed with an aperture for the focused image to prevent objects outside the target image region from interfering with the measurements. A two dimensional SLM is placed out of focus such that different sets of parallel beams of the image hit the modulator in a shifted and overlapping manner. Through the modulator each set of parallel beams (i.e. each shifted image) receives a different encoding pattern. The distance from the aperture depends on the size and the diameter of the first lens such that all shifts are projected onto it. A second lens is positioned after the modulator to focus all parallel beams at one point in the focal plane. The position of the focal point for each set of parallel beams depends on the incident angle. Finally, a detector array is positioned in the focal plane of the second lens. In this manner each pixel detects the focused light from a set of originally parallel beams, which corresponds to the sum of all image pixels, uniquely modulated by a specific shift of the pattern on the modulator. Moreover, Fig. 1 illustrates how a set of parallel beams at an angle α is modulated by the lowermost shift of the modulator and focused onto the lowest detector pixel, while the beamset parallel to the optical axis is modulated by the center shift and detected by the center pixel. A. Design of the Sensing Matrix The light paths in the one-dimensional case are illustrated in Fig. 1. The acquisition process is described by the following set of equations for 5 image points and 3 detectors: D1 = M1 I1 + M2 I2 + M3 I3 + M4 I4 + M5 I5 D2 = M2 I1 + M3 I2 + M4 I3 + M5 I4 + M6 I5 , D3 = M3 I1 + M4 I2 + M5 I3 + M6 I4 + M7 I5 where Ii is the irradiance of the image point i, Mj is the transmittance of modulator pixel j and Dk is the total irradiance at detector k. The equation system is linear and can be formulated as, D = M I, (3) where M is the sensing matrix, I the image vector (not to be confused with the identity matrix) and D the vector of detector measurements. In a more realistic scenario the image, the modulator and the detector are all two-dimensional. To maintain a 2D form of the full sensing matrix, the image and sensor are rowwise wrapped into column vectors, and each measurement submatrix of the modulator matrix is column-wise wrapped into rows of the full sensing matrix. Let D be the k × l detector matrix and I the m × n image, and let D = [D1,1 , . . . , D1,l , D2,1 , . . . , Dk−1,l−1 , Dk,1 , . . . , Dk,l ]T and I = Aperture Lens 1 Lens 2 Modulator Image I1 I2 I3 I4 I5 so α α α α α Detector M1 M2 M3 M4 M5 M6 M7 α D1 D2 D3 f2 si Fig. 1. Overview of the optics setup. The highlighted beams after the aperture illustrates how two sets of parallel beams at different incident angle at the modulator are modulated by shifted patterns. Each set of parallel beams is focused at a unique point at the detector. [I1,1 , . . . , Im,1 , I1,2 , . . . , Im−1,n−1 , I1,n , . . . , Im,n ]T be their vectorized forms. An undersampling of 14 can be achieved by using matrix n D of dimensions k = m 2 and l = 2 , this suggests that a modulator matrix M of size (k−1+m)×(l−1+n) is required to allow for k vertical and l horizontal shifts of I. However, early experiments using this strategy were unsuccessful; we conjecture this is due to too high correlation between the measurements because neighboring pixels are too correlated in typical natural images. Instead we use D of dimensions k = m and l = n and M of size (k + m) × (l + n) and afterwards downsample D and M to fit the real detector dimensions, as described later. The sensing matrix M of row-wise vectorized shifts then has the form: M1,1 M1,2 .. . M1,l M2,1 .. . M = M2,l . .. . .. Mk,1 . .. Mk,l M1,2 M1,3 .. . ··· ··· .. . M1,n M1,n+1 .. . M2,1 M2,2 .. . ··· ··· .. . M2,n M2,n+1 .. . ······ ······ .. . Mm,1 Mm,2 ··· ··· .. . M1,l+1 M2,2 .. . ··· ··· .. . ··· .. . M1,l+n M2,n .. . M2,l M3,1 .. . M2,l+n M3,n .. . ······ ······ .. . Mm,l Mm+1,1 .. . M2,l+n .. ... . M3,l .. ... . ··· ··· .. . ··· .. . M3,l+n .. .. .. ······ .. . .. . Mm+1,l .. ... . ··· ··· .. . ··· .. . Mk,n .. . Mk+1,1 .. . ······ .. . Mk+m,1 .. . Mk,l+n Mk+1,l ······ Mk+m,l M2,l+1 .. ... . Mk,2 .. . Mk,l+1 ··· .. . ··· · · · Mk+1,n .. .. . . · · · Mk+1,l+n Mm,n Mm,n+1 .. . Mm,l+n Mm+1,n .. . Mm+1,n+l .. ... . · · · Mk+m,n .. .. . . · · · Mk+m,l+n .. . (4) After the vectorization of D, I and M the acquisition process can again be described by the matrix multiplication (3). The size of the full sensing matrix M is (k· l) × (m· n). Note that if k = m and l = n the matrix has a (left-shifted) blockwise Toeplitz form. We will consider two different approaches based on undersampled acquisition using m·n 4 measurements based on (4) for k = m and l = n: A. Double horizontal and vertical shifting by discarding measurements Di,j where at least one of i or j is even (and likewise for each row in M based on the first element Mi,j , this is the same downsampling used in [5]). This corresponds to a detector array with a fill factor of ≤ 25% such that the discarded measurements are projected on the dead-space between the real detector units. B. Group all measurements Di,j + Di+1,j + Di,j+1 + Di+1,j+1 , where i and j are odd. This corresponds to a detector array with half the amount of pixels both vertically and horizontally, in which every pixel is twice as large. B. Conversion of physical measurements to CS We limit the scope of this paper to a random sensing matrix with elements to take on value −1/1 with probability 0.5. In the physical matrix we use 0 and 100% transmittance respectively to represent −1 and 1 since a negative transmittance is not possible. This requires a mathematical correction of the measurements by D = 2Draw −Itotal , where Draw is the vector of raw detector measurements and Itotal is the total irradiance of all image pixels without modulation (100% transmittance). Itotal can be determined using an extra acquisition with all sensing pixels open, as in [3]. However, by constructing M such that Mi,j = Mm+i,j = Mi,n+j = Mm+i,n+j with 1 ≤ i ≤ m, 1 ≤ j ≤ n, k = m and l = n, each image pixel is sensed equally over all measurements because either Mi,j or one of the repeated twins is shifted over each image pixel exactly once. Then Itotal can be calculated as Pk Pl i=1 j=1 Di,j . Itotal = Pm Pn i=1 j=1 Mi,j With this limitation M has the same block-Toeplitz form as [5] which satisfies the RIP. Since architecture A discards measurements in D we still need a second acquisition with all sensing pixels open to measure Itotal , but since architecture B only sums the measurements of D, this method can still be applied. In this manner we can acquire all CS measurements in parallel and hence achieve a decrease of the acquisition time up to N1 , where N is the number of measurements, compared to a sequential acquisition process. Acquiring all measurements simultaneously also enables the use of a fixed sensing matrix (e.g. an opaque membrane with holes), this does not only simplify the construction but also significantly reduces the random numbers to be stored or generated for the sensing matrix from N · m· n down to 4· m· n. However, this comes at a price; by not focusing the image at the sensing matrix nor at the detector, diffraction at the apertures of the sensing matrix need to be considered. C. Diffraction Compensation of the Sensing Matrix The ray representation of light used in the previous sections is only accurate for large scales. At small scales, close to the wavelength of the radiation, diffraction becomes a prominent phenomenon [14]. In our architecture, diffraction will mostly be noticeable in the modulator pixels, since all other optics involved require an aperture large enough to cover all modulator pixels; they can safely be neglected when comparing to the aperture of a modulator pixel. The point spread function (PSF) describing the diffraction of a modulator pixel is estimated using Fourier optics to calculate the expected image of a point with an incoherent imaging system, as described in section 7.3.3 of [15]. In a real system, however, the effects of diffraction can be measured more accurately by acquiring the response of single image points on the entire modulator pattern. Fig. 2(a) illustrates the behaviour without diffraction and 2(b) with diffraction. Using this model the diffraction is a convolution (∗) of our measurements from the linear projection model (3) by the PSF h. Since convolution has commutative and associative properties we can write Ddif f = D∗h = (M I)∗h = (M ∗h)I (Note that this notation is simplified as the convolution need to be adapted on the non-vectorized forms). The right hand side provides a method to solve the deconvolution problem directly in the CS reconstruction stage by using the sensing matrix A = M ∗h and y = Ddif f in (1) to recover the original image. This is illustrated in Fig. 2(c). Instead of showing that the RIP condition still holds, we have simulated the acquisition and reconstruction of multiple test images. TABLE I C OMPARISON OF NORMALIZED RECONSTRUCTION ERROR AS M SE/M SE128×128 MEAN ( STANDARD DEVIATION ) Image Classic CI 4069 A 64×64 B 64×64 64×64 R 2.014 (0) 1.692 (0.003) 1.581 (0.007) 1.576 (0.083) Lena 2.120 (0) 2.088 (0.030) 1.792 (0.038) 1.750 (0.046) Birds 2.064 (0) 1.849 (0.017) 1.809 (0.026) 1.833 (0.060) Monarch 2.160 (0) 2.201 (0.039) 1.785 (0.034) 1.763 (0.036) Boat 1.886 (0) 2.079 (0.026) 1.782 (0.023) 1.760 (0.020) Peppers 2.139 (0) 1.930 (0.020) 1.696 (0.034) 1.682 (0.034) Goldhill 1.833 (0) 2.282 (0.030) 1.847 (0.032) 1.765 (0.023) Couple 1.730 (0) 2.151 (0.020) 1.848 (0.028) 1.779 (0.017) IV. E XPERIMENTS OF ACQUISITION AND R ECONSTRUCTION Our simulations are based on a system with realistic dimension limitations on the size of available lenses, SLMs and sensor arrays. Because of the ability to recover (to an unknown degree) the unconvolved image despite a large PSF, different dimensions were tested rather than basing our design on minimizing the PSF. After an extensive comparison of different alternatives in terms of SLM resolution and size as well as projection distance, an SLM of size 25.6 mm × 25.6 mm with 0.1 mm pixels at a projection distance of 60 mm proved to give the best results. The following simulations are based on these dimensions. We also limit the simulations to recovery with 1/4 of the measurements classically required by the Nyquist-rate. For an image resolution of 128×128 pixels, this requires a detector resolution of 64×64 pixels. Both architectures uses a 0/1 sensing matrix and converts the measurements as acquiring Itotal separately (A) or by deriving Itotal from the measurements (B). The PSF was calculated on the above given dimensions and considering incoherent light at a wavelength of 400 nm; the resulting PSF is shown in Fig. 2(d). As a comparison we simulate a normal digital camera with a resolution of 128×128 pixels and one with 64×64 pixels and a CI camera using independent measurements, such as a single pixel camera. All these cameras are considered to have a negligible PSF. Image acquisition with the classic cameras is simulated by averaging all original image pixels within the regions of the cameras larger pixels (4×4 and 8×8 respectively, all original images are 512×512 pixels). The same procedure is performed on the sensing matrix for the CI cameras and the PSF was applied on the measurements of (A) and (B). To recover the images of the CI reference D=MI Ddiff=(MI)*h Ddiff =(M*h)I (a) diffraction-free scenario (b) diffraction (c) diffraction model (d) PSF (e) PSF matrix Fig. 2. (a)-(c) Illustration how diffraction by the sensing pattern is modelled as a linear acquisition process through convolution on the CS matrix. (a) Without diffraction the light transmitted by Mi,j is all projected on the correct detector pixel. (b) The transmitted light is diffracted and spread as a PSF over multiple detector pixels. (c) The diffraction is modelled as a PSF on the modulator instead of on the detector, this results in a linear projection model as in the ideal case but with the measurements as acquired in (b). (d) PSF calculated over an area corresponding to 23×23 pixels based on the dimensions of the presented architecture (e) Distribution in percent of total irradiance in the central part of the PSF (rounded for display purposes only). Fig. 3. Reconstructed images. Column 1: Classic camera 128×128. 2: Classic camera 64×64. 3: Sequential CI camera 4096 measurements. 4: Parallel CI 4096 measurements, double shifts. 5: Parallel CI 4096 measurements, double detector pixel size. M SE/M SE128×128 is indicated above each image. camera and our parallel variants, TVAL3 v1.0 [11] was used. The simulations are programmed in MATLAB and all cameras are simulated without noise. We have compared the image quality of 8 test images based on the mean square error (MSE) with respect to the original images. Fig. 3 shows the reconstruction of two of the test images after simulated acquisition with all architectures and Tab. I shows the reconstruction errors of 8 test images, normalized by the errors of the reference camera with 128×128 pixels. Presented values are averages of 25 reconstructions using different random generations of M , with the standard deviation in parenthesis. The first column shows the error of a digital camera using the same amount of measurements (pixels) as the CI cameras, the CI cameras all show comparable results and architectures A and B both outperforms the digital camera on most images. The sequential CI camera sometimes show slightly worse results but still shows a significant improvement on “R”. V. C ONCLUSION In this paper we show that image reconstruction is possible through parallel acquisition of measurements subjected to diffraction with comparable results to a CI camera with independent, sequential measurements which require a significantly longer acquisition time. We are currently assembling a hardware prototype of this architecture, and we will report experimental results in a future paper. VI. ACKNOWLEDGMENT This work is supported by the European Research Council under the European Communitys Seventh Framework Programme (FP7/2007-2013) / ERC Grant agreement n.279848. R EFERENCES [1] E. J. Candès, “Compressive Sampling”, Proc. Int. Congress of Mathematics, 2006. [2] D. Donoho, “Compressed sensing”, IEEE Trans. on Information Theory, 52(4), pp. 1289 - 1306, April 2006. [3] M. F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, T. Sun, K. F. Kelly, R.G. Baraniuk, “Single pixel imaging via compressive sampling”, IEEE Sig. Proc. Mag., 25, pp. 8391, March 2008. [4] W. L. Chan, M.L. Moravec, R.G. Baraniuk, D.M. Mittleman, “Terahertz imaging with compressed sensing and phase retrieval”, Optics Letters, 33, pp. 974 - 976, 2008. [5] R. F. Marcia, Z.T. Harmany, R.M. Willett, “Compressive Coded Aperture Imaging”, Proc. SPIE Electronic Imaging, 2009. [6] L. Jacques, P. Vandergheynst, A. Bibet, V. Majidzadeh, A. Schmid, Y. Leblebici, “CMOS compressed imaging by random convolution”, Proc. ICASSP, 2009. [7] Y. Oike and A. El Gamal, “A 256x256 CMOS Image Sensor with DeltaSigma-Based Single-Shot Compressed Sensing”, Proc. ISSCC, 2012. [8] E. J. Candès and T. Tao, “Decoding by linear programming”, IEEE Trans. Inform. Theory, vol. 15, no. 12, pp. 42034215, 2005. [9] G. Coluccia, D. Valsesia, E. Magli “Smoothness-Constrained Image Recovery from Block-Based Random Projections”, to appear in Proc. MMSP, 2013. [10] H. Fang, S.A. Vorobyov, H. Jiang, O. Taheri, “2D Signal Compression via Parallel Compressed Sensing with Permutations”, in Proc. Asilomar SSC, 2012. [11] C. Li, “An Efficient Algorithm For Total Variation Regularization with Applications to the Single Pixel Camera and Compressive Sensing”, MSc Thesis, 2009. [12] A. Heidari and D. Saeedkia, “A 2D Camera Design with a Single-pixel Detector”, PROC. Int. Conf. on Infrared, Millimeter and Terahertz Waves, 2009. [13] W. U. Bajwa, J.D. Haupt, G.M. Raz, S.J. Wright, R.D. Nowak, “ToeplitzStructured Compressed Sensing Matrices”, Proc. of Stat. Sig. Proc. Workshop, 2007. [14] J. W. Goodman, Introduction to Fourier Optics 3rd ed., Roberts & Company Publishers, 2005. [15] D. Voelz, Computational Fourier Optics, SPIE, 2011.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement