A Parallel Compressive Imaging Architecture for One-Shot

A Parallel Compressive Imaging Architecture for One-Shot
A Parallel Compressive Imaging Architecture for
One-Shot Acquisition
Tomas Björklund, Enrico Magli
arXiv:1311.0646v1 [cs.CV] 4 Nov 2013
Department of Electronics and Telecommunications
Politecnico di Torino, Turin, Italy
Abstract—A limitation of many compressive imaging architectures lies in the sequential nature of the sensing process,
which leads to long sensing times. In this paper we present a
novel architecture that uses fewer detectors than the number
of reconstructed pixels and is able to acquire the image in
a single acquisition. This paves the way for the development
of video architectures that acquire several frames per second.
We specifically address the diffraction problem, showing that
deconvolution normally used to recover diffraction blur can
be replaced by convolution of the sensing matrix, and how
measurements of a 0/1 physical sensing matrix can be converted
to -1/1 compressive sensing matrix without any extra acquisitions.
Simulations of our architecture show that the image quality is
comparable to that of a classic Compressive Imaging camera,
whereas the proposed architecture avoids long acquisition times
due to sequential sensing. This one-shot procedure also allows to
employ a fixed sensing matrix instead of a complex device such
as a Digital Micro Mirror array or Spatial Light Modulator. It
also enables imaging at bandwidths where these are not efficient.
I. I NTRODUCTION
Compressed Sensing (CS) [1][2] is a novel framework
for acquisition of compressible data at sub-Nyquist sampling
rates, moving computational complexity from sensing phase to
reconstruction. In Compressive Imaging (CI), CS is applied to
reconstruct images from fewer measurements than the number
of image pixels, under the condition that the image is sparse
or at least compressible in some domain. The success of JPEG
indicates that most natural images are highly compressible
with only small losses of image quality. Seminal works on CI
include the single-pixel camera [3] and single-pixel terahertz
imaging system [4], which acquire the image through sequential measurements from a single sensor while changing random
sensing patterns in front of it. Compressed coded aperture
imaging [5] uses a coded aperture to project overlapping coded
copies of the image onto a detector array to obtain superresolution using CS. Similarly, CMOS compressive imagers
[6][7] use detector arrays performing combinations of analog
measurements before converting into fewer digital compressed
measurements. This allows to significantly decrease power
consumption.
A limitation of architectures based on the single-pixel camera lies in the sequential, and hence slow acquisition process.
To some extent this can be addressed by block-based CS [9]
and [10], which can in part parallelize the sensing process.
In this paper we present a new CI framework which allows
faster acquisition than [3][4][9][10], in which the total time is
linearly proportional to the number of measurements. As in [5]
we use fewer detectors than [6][7], but we also demonstrate
that our architecture can be used even if diffraction from
the sensing pattern is prominent; this enables smaller camera
dimensions and the use of lower energy radiation for imaging.
Besides shortening the total acquisition time, the proposed
architecture also weakens the requirement on the modulation
and acquisition rate of the sensing matrix and the detector
array, allowing a cheaper and simpler construction, and paving
the way for compressive video capture in real-time.
II. BACKGROUND : C OMPRESSED S ENSING
Consider the sensing process:
y = Ax ,
(1)
where x is the signal of interest, y are the measurements and
A is an r ×c sensing matrix with r c. If x is sparse in some
domain and A satisfies the Restricted Isometry Property (RIP)
[8], x can be recovered with very high probability solving the
minimization problem
minr ||x̃||1 subject to y = Ax̃ .
(2)
x̃∈R
For images, a common approach is to instead minimize the
total variation norm [11]
Xq
2
2
T V (x) =
|xi+1,j − xi,j | + |xi,j+1 − xi,j | ,
i,j
which assumes the image gradient to be sparse. This is the
method used to recover images in this paper.
In CI, the sensing matrix can take the form of a physical
filter. The filter modulates the light of each image pixel before
it reaches the detector(s). Each measurement uses a different
modulation pattern. The filter can be realized by a Digital
Micromirror Device (DMD) as in [3] or a Spatial Light
Modulator (SLM) or plates with multiple fixed interchangeable
filters with patterns of holes or transparencies [4].
III. P ROPOSED PARALLEL C OMPRESSIVE I MAGING
A RCHITECTURE
The architecture we propose to parallelize the sensing
process is illustrated in Fig. 1. To simultaneously acquire
measurements of multiple sensing patterns, the image is
not focused when projected onto the sensing matrix. The
unfocused projection can be seen as shifted copies of the
image. These copies receive different encodings, which allows
for parallel acquisition of measurements, without needing to
update the sensing matrix between each single measurement.
Shifting a longer sensing pattern in one direction has been
shown to work for CS reconstruction with little impact on
the reconstruction [12]. We will show later that the sensing
matrix has a block-Toeplitz structure when shifting in two
directions; this structure has been shown to satisfy the RIP
property [5][13]. Fig. 1 shows an overview of the optical
setup. After the target image, a first lens is located at distance
so . This lens focuses an image at the distance si such that
1
1
1
so + si = f1 , where f1 is the focal length. At distance
si from the lens, a diaphragm is placed with an aperture
for the focused image to prevent objects outside the target
image region from interfering with the measurements. A two
dimensional SLM is placed out of focus such that different sets
of parallel beams of the image hit the modulator in a shifted
and overlapping manner. Through the modulator each set of
parallel beams (i.e. each shifted image) receives a different
encoding pattern. The distance from the aperture depends on
the size and the diameter of the first lens such that all shifts
are projected onto it. A second lens is positioned after the
modulator to focus all parallel beams at one point in the focal
plane. The position of the focal point for each set of parallel
beams depends on the incident angle. Finally, a detector array
is positioned in the focal plane of the second lens. In this
manner each pixel detects the focused light from a set of
originally parallel beams, which corresponds to the sum of
all image pixels, uniquely modulated by a specific shift of
the pattern on the modulator. Moreover, Fig. 1 illustrates how
a set of parallel beams at an angle α is modulated by the
lowermost shift of the modulator and focused onto the lowest
detector pixel, while the beamset parallel to the optical axis is
modulated by the center shift and detected by the center pixel.
A. Design of the Sensing Matrix
The light paths in the one-dimensional case are illustrated in
Fig. 1. The acquisition process is described by the following
set of equations for 5 image points and 3 detectors:


D1 = M1 I1 + M2 I2 + M3 I3 + M4 I4 + M5 I5
D2 = M2 I1 + M3 I2 + M4 I3 + M5 I4 + M6 I5 ,


D3 = M3 I1 + M4 I2 + M5 I3 + M6 I4 + M7 I5
where Ii is the irradiance of the image point i, Mj is
the transmittance of modulator pixel j and Dk is the total
irradiance at detector k. The equation system is linear and can
be formulated as,
D = M I,
(3)
where M is the sensing matrix, I the image vector (not to be
confused with the identity matrix) and D the vector of detector
measurements.
In a more realistic scenario the image, the modulator and
the detector are all two-dimensional. To maintain a 2D form
of the full sensing matrix, the image and sensor are rowwise wrapped into column vectors, and each measurement
submatrix of the modulator matrix is column-wise wrapped
into rows of the full sensing matrix. Let D be the k × l
detector matrix and I the m × n image, and let D =
[D1,1 , . . . , D1,l , D2,1 , . . . , Dk−1,l−1 , Dk,1 , . . . , Dk,l ]T and I =
Aperture
Lens 1
Lens 2
Modulator
Image
I1
I2
I3
I4
I5
so
α
α
α
α
α
Detector
M1
M2
M3
M4
M5
M6
M7
α
D1
D2
D3
f2
si
Fig. 1. Overview of the optics setup. The highlighted beams after the aperture
illustrates how two sets of parallel beams at different incident angle at the
modulator are modulated by shifted patterns. Each set of parallel beams is
focused at a unique point at the detector.
[I1,1 , . . . , Im,1 , I1,2 , . . . , Im−1,n−1 , I1,n , . . . , Im,n ]T be their
vectorized forms.
An undersampling of 14 can be achieved by using matrix
n
D of dimensions k = m
2 and l = 2 , this suggests that a
modulator matrix M of size (k−1+m)×(l−1+n) is required
to allow for k vertical and l horizontal shifts of I. However,
early experiments using this strategy were unsuccessful; we
conjecture this is due to too high correlation between the
measurements because neighboring pixels are too correlated in
typical natural images. Instead we use D of dimensions k = m
and l = n and M of size (k + m) × (l + n) and afterwards
downsample D and M to fit the real detector dimensions, as
described later. The sensing matrix M of row-wise vectorized
shifts
 then has the form:

M1,1
M1,2

 ..
 .

 M1,l

M2,1

 ..
 .
M =
 M2,l

 .
 ..
 .
 ..

Mk,1

 .
 ..
Mk,l
M1,2
M1,3
..
.
···
···
..
.
M1,n
M1,n+1
..
.
M2,1
M2,2
..
.
···
···
..
.
M2,n
M2,n+1
..
.
······
······
..
.
Mm,1
Mm,2
···
···
..
.
M1,l+1
M2,2
..
.
···
···
..
.
···
..
.
M1,l+n
M2,n
..
.
M2,l
M3,1
..
.
M2,l+n
M3,n
..
.
······
······
..
.
Mm,l
Mm+1,1
..
.
M2,l+n
..
...
.
M3,l
..
...
.
···
···
..
.
···
..
.
M3,l+n
..
..
..
······
..
.
..
.
Mm+1,l
..
...
.
···
···
..
.
···
..
.
Mk,n
..
.
Mk+1,1
..
.
······
..
.
Mk+m,1
..
.
Mk,l+n
Mk+1,l
······
Mk+m,l
M2,l+1
..
...
.
Mk,2
..
.
Mk,l+1
···
..
.
···
· · · Mk+1,n
..
..
.
.
· · · Mk+1,l+n
Mm,n
Mm,n+1
..
.
Mm,l+n
Mm+1,n
..
.
Mm+1,n+l
..
...
.
· · · Mk+m,n
..
..
.
.
· · · Mk+m,l+n


..
.



















(4)
After the vectorization of D, I and M the acquisition process
can again be described by the matrix multiplication (3). The
size of the full sensing matrix M is (k· l) × (m· n). Note that
if k = m and l = n the matrix has a (left-shifted) blockwise
Toeplitz form.
We will consider two different approaches based on undersampled acquisition using m·n
4 measurements based on (4) for
k = m and l = n:
A. Double horizontal and vertical shifting by discarding
measurements Di,j where at least one of i or j is even
(and likewise for each row in M based on the first element
Mi,j , this is the same downsampling used in [5]). This
corresponds to a detector array with a fill factor of ≤ 25%
such that the discarded measurements are projected on the
dead-space between the real detector units.
B. Group all measurements Di,j + Di+1,j + Di,j+1 +
Di+1,j+1 , where i and j are odd. This corresponds to
a detector array with half the amount of pixels both vertically
and horizontally, in which every pixel is twice as large.
B. Conversion of physical measurements to CS
We limit the scope of this paper to a random sensing
matrix with elements to take on value −1/1 with probability
0.5. In the physical matrix we use 0 and 100% transmittance respectively to represent −1 and 1 since a negative
transmittance is not possible. This requires a mathematical
correction of the measurements by D = 2Draw −Itotal , where
Draw is the vector of raw detector measurements and Itotal
is the total irradiance of all image pixels without modulation
(100% transmittance). Itotal can be determined using an extra
acquisition with all sensing pixels open, as in [3]. However,
by constructing M such that Mi,j = Mm+i,j = Mi,n+j =
Mm+i,n+j with 1 ≤ i ≤ m, 1 ≤ j ≤ n, k = m and l = n,
each image pixel is sensed equally over all measurements
because either Mi,j or one of the repeated twins is shifted over
each image pixel exactly once. Then Itotal can be calculated
as
Pk Pl
i=1
j=1 Di,j
.
Itotal = Pm Pn
i=1
j=1 Mi,j
With this limitation M has the same block-Toeplitz form
as [5] which satisfies the RIP. Since architecture A discards
measurements in D we still need a second acquisition with all
sensing pixels open to measure Itotal , but since architecture
B only sums the measurements of D, this method can still be
applied.
In this manner we can acquire all CS measurements in
parallel and hence achieve a decrease of the acquisition time up
to N1 , where N is the number of measurements, compared to
a sequential acquisition process. Acquiring all measurements
simultaneously also enables the use of a fixed sensing matrix
(e.g. an opaque membrane with holes), this does not only
simplify the construction but also significantly reduces the
random numbers to be stored or generated for the sensing
matrix from N · m· n down to 4· m· n. However, this comes at
a price; by not focusing the image at the sensing matrix nor at
the detector, diffraction at the apertures of the sensing matrix
need to be considered.
C. Diffraction Compensation of the Sensing Matrix
The ray representation of light used in the previous sections
is only accurate for large scales. At small scales, close to the
wavelength of the radiation, diffraction becomes a prominent
phenomenon [14]. In our architecture, diffraction will mostly
be noticeable in the modulator pixels, since all other optics
involved require an aperture large enough to cover all modulator pixels; they can safely be neglected when comparing to the
aperture of a modulator pixel. The point spread function (PSF)
describing the diffraction of a modulator pixel is estimated
using Fourier optics to calculate the expected image of a
point with an incoherent imaging system, as described in
section 7.3.3 of [15]. In a real system, however, the effects
of diffraction can be measured more accurately by acquiring
the response of single image points on the entire modulator
pattern. Fig. 2(a) illustrates the behaviour without diffraction
and 2(b) with diffraction.
Using this model the diffraction is a convolution (∗) of our
measurements from the linear projection model (3) by the
PSF h. Since convolution has commutative and associative
properties we can write Ddif f = D∗h = (M I)∗h = (M ∗h)I
(Note that this notation is simplified as the convolution need
to be adapted on the non-vectorized forms). The right hand
side provides a method to solve the deconvolution problem
directly in the CS reconstruction stage by using the sensing
matrix A = M ∗h and y = Ddif f in (1) to recover the original
image. This is illustrated in Fig. 2(c). Instead of showing that
the RIP condition still holds, we have simulated the acquisition
and reconstruction of multiple test images.
TABLE I
C OMPARISON OF NORMALIZED RECONSTRUCTION ERROR AS
M SE/M SE128×128 MEAN ( STANDARD DEVIATION )
Image
Classic
CI 4069
A 64×64
B 64×64
64×64
R
2.014 (0)
1.692 (0.003)
1.581 (0.007) 1.576 (0.083)
Lena
2.120 (0)
2.088 (0.030)
1.792 (0.038) 1.750 (0.046)
Birds
2.064 (0)
1.849 (0.017)
1.809 (0.026) 1.833 (0.060)
Monarch
2.160 (0)
2.201 (0.039)
1.785 (0.034)
1.763 (0.036)
Boat
1.886 (0)
2.079 (0.026)
1.782 (0.023)
1.760 (0.020)
Peppers
2.139 (0)
1.930 (0.020)
1.696 (0.034)
1.682 (0.034)
Goldhill
1.833 (0)
2.282 (0.030)
1.847 (0.032)
1.765 (0.023)
Couple
1.730 (0)
2.151 (0.020)
1.848 (0.028) 1.779 (0.017)
IV. E XPERIMENTS OF ACQUISITION AND
R ECONSTRUCTION
Our simulations are based on a system with realistic dimension limitations on the size of available lenses, SLMs
and sensor arrays. Because of the ability to recover (to an
unknown degree) the unconvolved image despite a large PSF,
different dimensions were tested rather than basing our design
on minimizing the PSF. After an extensive comparison of
different alternatives in terms of SLM resolution and size as
well as projection distance, an SLM of size 25.6 mm × 25.6
mm with 0.1 mm pixels at a projection distance of 60 mm
proved to give the best results. The following simulations are
based on these dimensions. We also limit the simulations to
recovery with 1/4 of the measurements classically required
by the Nyquist-rate. For an image resolution of 128×128
pixels, this requires a detector resolution of 64×64 pixels.
Both architectures uses a 0/1 sensing matrix and converts the
measurements as acquiring Itotal separately (A) or by deriving
Itotal from the measurements (B). The PSF was calculated on
the above given dimensions and considering incoherent light
at a wavelength of 400 nm; the resulting PSF is shown in Fig.
2(d). As a comparison we simulate a normal digital camera
with a resolution of 128×128 pixels and one with 64×64
pixels and a CI camera using independent measurements, such
as a single pixel camera. All these cameras are considered
to have a negligible PSF. Image acquisition with the classic
cameras is simulated by averaging all original image pixels
within the regions of the cameras larger pixels (4×4 and 8×8
respectively, all original images are 512×512 pixels). The
same procedure is performed on the sensing matrix for the
CI cameras and the PSF was applied on the measurements
of (A) and (B). To recover the images of the CI reference
D=MI
Ddiff=(MI)*h
Ddiff =(M*h)I
(a) diffraction-free scenario
(b) diffraction
(c) diffraction model
(d) PSF
(e) PSF matrix
Fig. 2. (a)-(c) Illustration how diffraction by the sensing pattern is modelled as a linear acquisition process through convolution on the CS matrix. (a)
Without diffraction the light transmitted by Mi,j is all projected on the correct detector pixel. (b) The transmitted light is diffracted and spread as a PSF
over multiple detector pixels. (c) The diffraction is modelled as a PSF on the modulator instead of on the detector, this results in a linear projection model
as in the ideal case but with the measurements as acquired in (b). (d) PSF calculated over an area corresponding to 23×23 pixels based on the dimensions
of the presented architecture (e) Distribution in percent of total irradiance in the central part of the PSF (rounded for display purposes only).
Fig. 3. Reconstructed images. Column 1: Classic camera 128×128. 2: Classic camera 64×64. 3: Sequential CI camera 4096 measurements. 4: Parallel CI
4096 measurements, double shifts. 5: Parallel CI 4096 measurements, double detector pixel size. M SE/M SE128×128 is indicated above each image.
camera and our parallel variants, TVAL3 v1.0 [11] was used.
The simulations are programmed in MATLAB and all cameras
are simulated without noise.
We have compared the image quality of 8 test images
based on the mean square error (MSE) with respect to the
original images. Fig. 3 shows the reconstruction of two of
the test images after simulated acquisition with all architectures and Tab. I shows the reconstruction errors of 8 test
images, normalized by the errors of the reference camera
with 128×128 pixels. Presented values are averages of 25
reconstructions using different random generations of M ,
with the standard deviation in parenthesis. The first column
shows the error of a digital camera using the same amount
of measurements (pixels) as the CI cameras, the CI cameras
all show comparable results and architectures A and B both
outperforms the digital camera on most images. The sequential
CI camera sometimes show slightly worse results but still
shows a significant improvement on “R”.
V. C ONCLUSION
In this paper we show that image reconstruction is possible through parallel acquisition of measurements subjected
to diffraction with comparable results to a CI camera with
independent, sequential measurements which require a significantly longer acquisition time. We are currently assembling
a hardware prototype of this architecture, and we will report
experimental results in a future paper.
VI. ACKNOWLEDGMENT
This work is supported by the European Research Council
under the European Communitys Seventh Framework Programme (FP7/2007-2013) / ERC Grant agreement n.279848.
R EFERENCES
[1] E. J. Candès, “Compressive Sampling”, Proc. Int. Congress of Mathematics, 2006.
[2] D. Donoho, “Compressed sensing”, IEEE Trans. on Information Theory,
52(4), pp. 1289 - 1306, April 2006.
[3] M. F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, T. Sun, K. F. Kelly,
R.G. Baraniuk, “Single pixel imaging via compressive sampling”, IEEE
Sig. Proc. Mag., 25, pp. 8391, March 2008.
[4] W. L. Chan, M.L. Moravec, R.G. Baraniuk, D.M. Mittleman, “Terahertz
imaging with compressed sensing and phase retrieval”, Optics Letters,
33, pp. 974 - 976, 2008.
[5] R. F. Marcia, Z.T. Harmany, R.M. Willett, “Compressive Coded Aperture
Imaging”, Proc. SPIE Electronic Imaging, 2009.
[6] L. Jacques, P. Vandergheynst, A. Bibet, V. Majidzadeh, A. Schmid, Y.
Leblebici, “CMOS compressed imaging by random convolution”, Proc.
ICASSP, 2009.
[7] Y. Oike and A. El Gamal, “A 256x256 CMOS Image Sensor with DeltaSigma-Based Single-Shot Compressed Sensing”, Proc. ISSCC, 2012.
[8] E. J. Candès and T. Tao, “Decoding by linear programming”, IEEE Trans.
Inform. Theory, vol. 15, no. 12, pp. 42034215, 2005.
[9] G. Coluccia, D. Valsesia, E. Magli “Smoothness-Constrained Image
Recovery from Block-Based Random Projections”, to appear in Proc.
MMSP, 2013.
[10] H. Fang, S.A. Vorobyov, H. Jiang, O. Taheri, “2D Signal Compression
via Parallel Compressed Sensing with Permutations”, in Proc. Asilomar
SSC, 2012.
[11] C. Li, “An Efficient Algorithm For Total Variation Regularization with
Applications to the Single Pixel Camera and Compressive Sensing”, MSc
Thesis, 2009.
[12] A. Heidari and D. Saeedkia, “A 2D Camera Design with a Single-pixel
Detector”, PROC. Int. Conf. on Infrared, Millimeter and Terahertz Waves,
2009.
[13] W. U. Bajwa, J.D. Haupt, G.M. Raz, S.J. Wright, R.D. Nowak, “ToeplitzStructured Compressed Sensing Matrices”, Proc. of Stat. Sig. Proc.
Workshop, 2007.
[14] J. W. Goodman, Introduction to Fourier Optics 3rd ed., Roberts &
Company Publishers, 2005.
[15] D. Voelz, Computational Fourier Optics, SPIE, 2011.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement