Institutionen för systemteknik Department of Electrical Engineering Examensarbete

Institutionen för systemteknik Department of Electrical Engineering Examensarbete
Institutionen för systemteknik
Department of Electrical Engineering
Examensarbete
Compressed Sensing for 3D Laser Radar
Examensarbete utfört i signal- och bildbehandling
vid Tekniska högskolan vid Linköpings universitet
av
Erik Fall
LiTH-ISY-EX–14/4767–SE
Linköping 2014
Department of Electrical Engineering
Linköpings universitet
SE-581 83 Linköping, Sweden
Linköpings tekniska högskola
Linköpings universitet
581 83 Linköping
Compressed Sensing for 3D Laser Radar
Examensarbete utfört i signal- och bildbehandling
vid Tekniska högskolan vid Linköpings universitet
av
Erik Fall
LiTH-ISY-EX–14/4767–SE
Handledare:
Christina Grönwall
FOI
Henrik Petersson
FOI
Hannes Ovrén
isy, Linköpings Universitet
Examinator:
Maria Magnusson
isy, Linköpings Universitet
Linköping, 9 juni 2014
Avdelning, Institution
Division, Department
Datum
Date
Organisatorisk avdelning
Department of Electrical Engineering
SE-581 83 Linköping
2014-06-09
Språk
Language
Rapporttyp
Report category
ISBN
Svenska/Swedish
Licentiatavhandling
ISRN
Engelska/English
Examensarbete
C-uppsats
D-uppsats
Övrig rapport
—
LiTH-ISY-EX–14/4767–SE
Serietitel och serienummer
Title of series, numbering
ISSN
—
URL för elektronisk version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-107195
Titel
Title
Compressed Sensing för 3D Laserradar
Författare
Author
Erik Fall
Compressed Sensing for 3D Laser Radar
Sammanfattning
Abstract
High resolution 3D images are of high interest in military operations, where data can be used
to classify and identify targets. The Swedish defence research agency (FOI) is interested in
the latest research and technologies in this area. A drawback with normal 3D-laser systems
are the lack of high resolution for long range measurements. One technique for high long
range resolution laser radar is based on time correlated single photon counting (TCSPC). By
repetitively sending out short laser pulses and measure the time of flight (TOF) of single
reflected photons, extremely accurate range measurements can be done. A drawback with
this method is that it is hard to create single photon detectors with many pixels and high
temporal resolution, hence a single detector is used. Scanning an entire scene with one detector is very time consuming and instead, as this thesis is all about, the entire scene can be
measured with less measurements than the number of pixels. To do this a technique called
compressed sensing (CS) is introduced. CS utilizes that signals normally are compressible
and can be represented sparse in some basis representation. CS sets other requirements on
the sampling compared to the normal Shannon-Nyquist sampling theorem. With a digital
micromirror device (DMD) linear combinations of the scene can be reflected onto the single
photon detector, creating scalar intensity values as measurements. This means that fewer
DMD-patterns than the number of pixels can reconstruct the entire 3D-scene. In this thesis
a computer model of the laser system helps to evaluate different CS reconstruction methods
with different scenarios of the laser system and the scene. The results show how many measurements that are required to reconstruct scenes properly and how the DMD-patterns effect
the results. CS proves to enable a great reduction, 85 − 95 %, of the required measurements
compared to pixel-by-pixel scanning system. Total variation minimization proves to be the
best choice of reconstruction method.
Nyckelord
Keywords
compressed sensing, compressed sampling, TCSPC, TV minimization, l1 minimization,
DMD
Abstract
High resolution 3D images are of high interest in military operations, where data
can be used to classify and identify targets. The Swedish defence research agency
(FOI) is interested in the latest research and technologies in this area. A drawback with normal 3D-laser systems are the lack of high resolution for long range
measurements. One technique for high long range resolution laser radar is based
on time correlated single photon counting (TCSPC). By repetitively sending out
short laser pulses and measure the time of flight (TOF) of single reflected photons, extremely accurate range measurements can be done. A drawback with this
method is that it is hard to create single photon detectors with many pixels and
high temporal resolution, hence a single detector is used. Scanning an entire
scene with one detector is very time consuming and instead, as this thesis is all
about, the entire scene can be measured with less measurements than the number
of pixels. To do this a technique called compressed sensing (CS) is introduced. CS
utilizes that signals normally are compressible and can be represented sparse in
some basis representation. CS sets other requirements on the sampling compared
to the normal Shannon-Nyquist sampling theorem. With a digital micromirror
device (DMD) linear combinations of the scene can be reflected onto the single
photon detector, creating scalar intensity values as measurements. This means
that fewer DMD-patterns than the number of pixels can reconstruct the entire
3D-scene. In this thesis a computer model of the laser system helps to evaluate
different CS reconstruction methods with different scenarios of the laser system
and the scene. The results show how many measurements that are required to
reconstruct scenes properly and how the DMD-patterns effect the results. CS
proves to enable a great reduction, 85 − 95 %, of the required measurements compared to pixel-by-pixel scanning system. Total variation minimization proves to
be the best choice of reconstruction method.
iii
Sammanfattning
Högupplösta 3D-bilder är väldigt intressanta i militära operationer där data kan
utnyttjas för klassificering och identifiering av mål. Det är av stort intresse hos
Totalförsvarets forskningsinstitut (FOI) att undersöka de senaste teknikerna inom detta område. Ett stort problem med vanliga 3D-lasersystem är att de saknar
hög upplösning för långa mätavstånd. En teknik som har hög avståndsupplösning är tidskorrelerande enfotonräknare, som kan räkna enstaka fotoner med
extremt bra noggrannhet. Ett sådant system belyser en scen med laserljus och
mäter sedan reflektionstiden för enstaka fotoner och kan på så sätt mäta avstånd.
Problemet med denna metod är att göra detektion av många pixlar när man bara
kan använda en detektor. Att skanna en hel scen med en detektor tar väldigt lång
tid och istället handlar det här exjobbet om att göra färre mätningar än antalet
pixlar, men ändå återskapa hela 3D-scenen. För att åstadkomma detta används
en ny teknik kallad Compressed Sensing (CS). CS utnyttjar att mätdata normalt
är komprimerbar och skiljer sig från det traditionella Shannon-Nyquists krav på
sampling. Med hjälp av ett Digital Micromirror Device (DMD) kan linjärkombinationer av scenen speglas ner på enfotondetektorn och med färre DMD-mönster
än antalet pixlar kan hela 3D-scenen återskapas. Med hjälp av en egenutvecklad
lasermodell evalueras olika CS rekonstruktionsmetoder och olika scenarier av lasersystemet. Arbetet visar att basrepresentationen avgör hur många mätningar
som behövs och hur olika uppbyggnader av DMD-mönstren påverkar resultatet.
CS visar sig möjliggöra att 85 − 95 % färre mätningar än antalet pixlar behövs för
att avbilda hela 3D-scener. Total variation minimization visar sig var det bästa
valet av rekonstruktionsmetod.
v
Contents
1 Introduction
1.1 Background . . . . .
1.2 Problem formulation
1.3 Objectives . . . . . .
1.4 Limitations . . . . . .
1.5 Related work . . . . .
1.6 Thesis Outline . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
3
3
4
2 The Laser System
2.1 TOF Laser system . . . . .
2.2 System characterization . .
2.3 Detection noise . . . . . .
2.4 Digital micromirror device
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
8
9
3 Compressed Sensing
3.1 Sparsity and compressible signals . . . . . . .
3.1.1 Transform definitions . . . . . . . . . .
3.2 The compressed sensing problem . . . . . . .
3.2.1 Designing the measurement matrix . .
3.2.2 Designing a reconstruction algorithm
3.2.3 The lp - norm . . . . . . . . . . . . . . .
3.2.4 Total variation minimization . . . . . .
3.2.5 The single-pixel camera approach . . .
3.2.6 3D-compressed sensing . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
13
13
14
14
15
16
17
4 Method
4.1 Feasibility study . . . . . . .
4.2 Computer model . . . . . . .
4.2.1 Configuration . . . .
4.2.2 Laser calculations . .
4.2.3 Image reconstruction
4.2.4 3D-reconstruction . .
4.3 Evaluation . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
19
20
20
21
22
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
viii
Contents
5 Simulations and Results
5.1 Sparse signal representation . . . . . . . . . .
5.1.1 Intensity image reconstruction results
5.1.2 Sparsity comparison . . . . . . . . . .
5.2 Signal strength and noise . . . . . . . . . . . .
5.3 DMD-array choice . . . . . . . . . . . . . . . .
5.4 Optimization methods . . . . . . . . . . . . .
5.4.1 Reconstruction speed . . . . . . . . . .
5.4.2 Reconstruction quality . . . . . . . . .
5.4.3 Occlusion . . . . . . . . . . . . . . . .
5.4.4 Reconstructing trees . . . . . . . . . .
5.5 Other 3D-reconstruction methods . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
27
31
33
38
39
39
41
43
45
46
6 Discussion
6.1 Results . . . . . . . . . . . . . . . . . . . .
6.1.1 Sparse signal representation . . . .
6.1.2 Signal strength and noise . . . . . .
6.1.3 DMD-array choice . . . . . . . . . .
6.1.4 Optimization methods . . . . . . .
6.1.5 Other 3D-reconstruction methods
6.2 Analysis method . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
49
50
50
51
52
52
7 Conclusions and Future Work
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
55
56
Bibliography
57
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
Introduction
1.1
Background
In recent years at the Swedish Defence Research Agency (FOI) there has been
research on laser technology for high resolution 3D-range images at long range
distances. High resolution 3D-laser radar is of interest in target classification,
especially for partly hidden targets. To achieve the best result the range resolution needs to be better, higher, than the distance between camouflage and target.
High resolution at far distances is of interest since more details means better
opportunities for object classification or even identification. In military and police operations classification at large distances gives more time to perform target
specific actions, compared to close range classification. At the same time, short
scanning times are of high value minimizing the risk of one self being detected.
One of the technologies in 3D-laser radar is based on time correlated singlephoton counting (TCSPC). By repetitively sending out short laser pulses and measure the time of flight (TOF) of single reflected photons extremely accurate range
measurements can be done. A drawback with this method is that it is hard to
create single photon detectors with many pixels and high temporal resolution,
therefore a single detector is used.
1.2
Problem formulation
To capture a scene with a single TCSPC detector a pixel-by-pixel scanning system
can be used. The drawback with this method is that scanning the scene pixel-bypixel is too slow and too mechanically sensitive. Scanning every pixel can be
avoided by using compressed sensing (CS), also referred to as compressed sam1
2
1
Introduction
pling. CS utilizes that measurement data from a scene normally contain redundant information. It is based on theory that differs from the traditional ShannonNyquist sampling theorem. In the measurements, basis functions are utilized to
sparsely represent the data. With help of a digital micromirror device (DMD)
and a detector element, a single-pixel camera with TCSPC technology can scan a
3D-scene with much fewer measurements than a pixel-by-pixel scanning system.
The challenge is how this can be done properly and the question is how images
that can be reconstructed from fewer measurements than the number of pixels.
To solve this problem, knowledge about how a TCSPC laser system is characterized and how CS techniques can reconstruct laser intensity images are needed.
An overview of the imaging system is shown in Figure 1.1.
Scene
DMD
NxN
e
nc
sta
Di
Laser source
Laser pulses
Reconstructed NxN
depth map
Signal processing &
reconstruction
Figure 1.1: Overview of the system.
1.3
Objectives
The objectives of this thesis are:
• Develop a computer model of a TCSPC laser system, see Figure 1.1, and a
CS reconstruction scheme for 3D-scenes. This model enables easy testing
and evaluations of different 3D-scenes, system settings and CS algorithms.
• Determine how signal basis representation, undersampling ratio, choice of
DMD and other critical factors affect the reconstructions.
• Determine how much faster CS can make the acquisition of the scene compared to a pixel-by-pixel scanning system.
1.4
Limitations
1.4
3
Limitations
Limitations and restrictions of the simulations are:
• Static scenes. The scenes in this thesis are all static, which means that everything is assumed to be completely still during all simulated measurements.
• Objects and scene complexities are restricted to simple cases. Simulated
objects are boxes, spheres, boats and trees which all have constant surface
properties. Objects are only placed inside the measured depth intervals.
• System resolution is limited to 128 × 128 pixels. Higher resolution would
have increased the required time for all simulations. Resolutions ( > 200 ×
200) also require more system memory than available on the computers
used for the simulations.
1.5
Related work
There are many articles which utilize compressed sensing when acquiring images using a single detector element, both in 2D-imaging and 3D-range imaging.
Some of these are:
• Exploiting sparsity in time-of-flight range acquisition using a single timeresolved sensor [11] by Kirmani et al., 2011. This article describes the usage
of a spatial light modulator (SLM), that illuminates the scene with different
binary patterns. First a fully transparent SLM is used to estimate different
depth ranges. This gives information about the depth range where objects
are present, which can be utilized in the 3D-reconstruction. The reconstruction uses assumptions about planar facets and the main limitation of this
work is its inapplicability to scenes with curved linear objects. [3] and [12]
are two articles which are shorter or slightly modified versions of this article.
• Gated viewing laser imaging with compressive sensing [15] by Li et al.,
2012. This article presents a prototype of gated viewing imaging with compressive sensing. Total variation minimization promotes sparsity using the
TVAL3 algorithm and the 3D scene is reconstructed using the mean time of
flight for photons.
• Photon-counting compressive sensing laser radar for 3D imaging [10] by
Howland et al., 2011. This article uses a TCSPC system and shows good
reconstruction quality of images using Haar wavelets as the sparse basis
representation of the signal.
• Single-pixel imaging via compressive sampling: Building simpler, smaller,
and less-expensive digital cameras [5] by Duarte et al., 2008. This article
describes the theory behind single-pixel camera and compressed sensing in
detail. Theoretical and practical performance are evaluated compared to
conventional cameras based on pixel arrays and raster scanning. Results
4
1
Introduction
show that compressed sensing enables faster image acquirement than both
raster scan and basis scan.
• Photon counting compressive depth mapping [9] by Howland et al., 2013.
This article demonstrates a compressed sensing photon counting lidar system based on the single-pixel camera. The method presented recovers both
depth and intensity information. The article also demonstrates the advantages of extra sparsity when reconstructing the difference between two instances of a scene.
• A Compressive Sensing Photoacoustic Imaging System Based on Digital
Micromirror Device [16] by Naizhang et al., 2012. A photoacoustic compressed sensing imaging scheme is described in this article. Compressed
sensing is used to speed up acquisition time using a digital micromirror
device (DMD) as illumination mask.
• Compressive confocal microscopy: 3D-reconstruction algorithms [23] by Ye
et al., 2009. In this article compressed sensing is used for image acquisition
in Confocal Microscopy. A DMD is used together with a single photon detector to generate images. Three different 3D-reconstruction methods are
evaluated, the two most interesting of them are not possible to implement
in this master thesis, since their experiment setup differ too much.
1.6
Thesis Outline
In this report, theoretical backgrounds to the 3D-laser system and CS are given
in Chapter 2 and Chapter 3, respectively. The analysis method together with the
developed computer model are described in Chapter 4. Simulations and results
are presented in Chapter 5 followed by discussion in Chapter 6 and conclusions
and future work in Chapter 7.
2
The Laser System
The laser system is the center part of the entire range image system, which is illustrated in Figure 1.1. The laser system includes the laser source, noise sources
affecting the laser light and the detection. The signal processing and image reconstruction is covered in Chapter 3.
2.1
TOF Laser system
Laser pulses are emitted onto a scene and by measuring the time of flight (TOF)
of the reflected photons range measurements are created. Knowing the TOF t the
distance to a target is calculated as
R=
ct
,
2
(2.1)
where c is the speed of light. The light travels the distance ct, but since it is
a reflection the range is given by half that distance [17, 19]. Generally when a
laser pulse hits a surface it is reflected in different directions, depending on the
surface properties and the angle of the incoming light. The returning power can
be expressed by a simplified radar range equation,
PR =
PT η T
AR ηR −2αR
· A4 ·
·e
,
2
ΩT R
R2
(2.2)
where PR is the received power, PT the transmitted power, ηt and ηR are the
efficiency of the transmitter and the receiver. The laser beam divergence (angular
measure of the increase in beam diameter with distance) is denoted ΩT , A4 is
the target cross section and AR is the aperture area of the receiver. Atmospheric
5
6
2
The Laser System
attenuation is denoted as α and R is the distance given in (2.1) [17, 19].
When measuring the 3D-contents of a scene, the task is to determine the range
value for each pixel, where the round trip time is tr , given by (2.1). By deciding
when to measure after transmitting the laser pulse, the range to the measured
depth interval can be chosen.
TCSPC is a method for range profiling with high resolution. One laser pulse
equals one measurement and a range profile, histogram, is created from summarizing many measurements. The TCSPC system emits short laser pulses with a
high repetition rate. The detector is then able to detect single photons, which
creates an electrical signal. Detection probability of a transmitted pulse is in
the range of 1-10%. Thanks to that many pulse cycles photons are measured
and a histogram can be generated provided that the TOF of detected photons are
registered. The histogram’s bins correspond to when, in time, the photons were
detected [17, 20]. The time bins are then translated to range bins by using (2.1).
Examples of histograms from a scene are shown in Figure 2.1.
160
140
120
Photons
100
80
60
40
20
0
10
10.1
10.2
10.3
Distance [m]
10.4
10.5
10.6
Figure 2.1: Histograms of the received, noise free, signals from all pixels,
where each colored curve corresponds to one pixel. In this case there are two
objects at a distance of around 10.15 m.
The detector is always active and the quantization of time intervals defines the
time bins. A time bin is defined as ti = t0 + i · ∆t, where t0 is the initial delay,
which corresponds to how far away the measured depth interval is, ∆t is the bin
length and i = 1, 2, ..., Nbins , where Nbins is the number of time slices in the depth
interval. The pulse repetition time determines the number of time bins, where
the total time of all time bins are the time between two pulses. Having longer
pulse repetition time means that the measured depth interval will be longer. In
2.2
7
System characterization
this thesis the time bins are assumed to be 8 ps long and are in total 400, which
means a bin resolution of approximately 1.2 mm and total depth range interval of
approximately 0.48 m. The detector is always active which causes objects outside
a measured depth range interval to be aliases into the measurements. This is
caused by that transmitted laser light also hit objects further away than wanted,
in the range interval, where the reflected light will not have synced TOF.
The laser system in this thesis is a combination of CS single-pixel camera technique and the described TCSPC technique.
2.2
System characterization
The histogram will, for a reflective target at a certain distance, contain photons detected in multiple time bins which is caused by random variations in time called
time jitter. The temporal accuracy is determined by this uncertainty. System time
jitters can be described by the following expression
q
(2.3)
∆tsystem = ∆t2laser + ∆t2detector + ∆t2electronics ,
where ∆tlaser , ∆tdetector and ∆telectronics are the jitter caused by the laser, detector and the electronics, respectively [19]. The characteristics of the system time
jitter distribution is described by the instrumental response function (IRF), in
signal theory referred to as point spread function, where the width of the IRF
commonly is used as the approximation of ∆tsystem . The IRF (unpublished work;
courtesy of FOI), of the system described in this thesis, is as follows
 −t2 t−t1
1

T
2

 e 2s 2 · e 0 , t < t1
−t
(2.4)
y = e 2s2 , t1 < t < t2


−(t−t2 )
−(t−t2 )
−(t−t2 )

G · (a · e T1 + b · e T2 + q · e T3 ), t > t2
where y is the IRF and G is defined as
−t2
2
G = e 2s2 ,
(2.5)
q =1−a−b
(2.6)
q is defined as
and T3 is defined as
T3 =
q
t2
s2
−
a
T1
−
b
T2
,
(2.7)
where T0 , T1 , T2 , t1 , t2 , a, b and s are all system parameters specific for the TCSPC
system at FOI and t is the time interval where the IRF is calculated. The IRF in
(2.4) is used in the simulated laser model. In Figure 2.2 an example of the IRF
is shown. Note the exponential decreasing tail; this will cause traces of objects
in the scene at distances further away from where the objects actually are, this is
8
2
The Laser System
discussed later.
1
0.9
Normalized counts
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
50
100
150
Bins
200
250
300
Figure 2.2: An example of the IRF for a single point target at time 100 ps.
Here one time bin equals 4 ps.
2.3
Detection noise
There are three main noise sources that are taken into consideration in the laser
system model: Shot noise, background illumination and dark counts.
The sampling of photons suffer from shot noise, which is associated with the particle nature of light. The detection is modeled as a Poisson process [7]. Measuring
a certain amount, K, photons during the time interval [t, t + τ ] has the following
probability density function:
P (K; t, t + τ ) = P (K) =
K̄ K −K̄
e ,
K!
(2.8)
where K is the detected photons and K̄ is the expected value of K in the time
interval t + τ . For large K̄ the Poisson distribution approaches a normal distribution. The standard deviation is the square root of K̄, which means that for large
K̄ the influence of the shot noise is small.
Background illumination is the detected light which does not originate from the
laser transmitter, but from other illuminators such as lamps and the sun. Their
illumination of the scene means that unwanted photons hit the detector. The
background illumination can be simulated by adding a constant to every pixel
measurement according to reference values, which later is going to be sampled
according to the Poisson process in (2.8), [18].
In the detector, on its semiconductor material, electron-hole pairs are generated
2.4
9
Digital micromirror device
from thermal energy. This causes dark counts, which add false detections of
photons to the measurements.
2.4
Digital micromirror device
A digital micro mirror device (DMD) is a precise light switch which modulates
light using an array of small mirrors. A single photon detector element, together
with different DMD-patterns, create unique measurements of the same scene.
The different array elements can be directed to reflect light towards the detector
or away from it. This means that every mirror element in the array corresponds
to a pixel (limiting the system resolution).
The principles of the single element detector were described in Chapter 2. Mathematically one DMD-pattern can be represented as a matrix with values 0 or 1,
see example in Figure 2.3.
20
y
40
60
80
100
120
20
40
60
x
80
100
120
Figure 2.3: A DMD-pattern with resolution 128 × 128 where the probability
of a pixel being 0 or 1 is set to 0.5. Black pixels are 1 and white pixels are 0.
Using compressed sensing, described in Chapter 3, the DMD-array provide the
possibility of generating complete depth images, of the same resolutions as the
DMD-arrays, using fewer samples than there are pixels.
3
Compressed Sensing
Compressed sensing is a method for acquiring and reconstructing signals by finding solutions to underdetermined linear systems using the knowledge of signals’
compressibility and sparseness. The underdetermined systems are solved by
adding the constraint of sparsity, which only allows solutions with a small number of non-zero coefficients. To perform compressed sensing the signal needs to
be very sparse in some basis, while keeping its measurement matrix incoherent
with the sparse transform.
3.1
Sparsity and compressible signals
A signal can often be expressed concise in some appropriate basis. Natural images can be expressed by e.g. discrete cosine transform (DCT), where few large
coefficients can capture almost all information. This means that a lot of small coefficients can be discarded without losing much information and consequentially
this is one of the basic ideas in image compressing. Mathematically the image can
be described as a vector f ∈ Rn (where n is the number of pixels of the image).
This can be represented in another basis, e.g. DCT, Ψ, as
f=
n
X
xi ψi = Ψx,
(3.1)
i=1
where xi are the coefficients in the sparse basis representation of the signal f [2].
Note that certain types of images can be sparse already in the pixel basis, which
means that Ψ equals the n×n identity matrix. Finding the most appropriate basis
for reconstructing a scene is evaluated in Section 5.1.
11
12
3
Compressed Sensing
By keeping the S largest values of |x|, in the expansion in (3.1), f can be described
as fs = Ψxs , where xs is x with its S smallest values set to zero. If this representation can be done without much perceptual loss the signal is considered to be
S-sparse in that basis representation [2].
3.1.1
Transform definitions
In image compressing it is well known that natural images have sparse representations using the DCT and wavelets, which are used in e.g. JPEG and JPEG2000.
In [9] discrete Haar wavelets (DHWT) were used as sparse basis and higher order
Daubechies wavelets were tested but did not significantly improve their results.
Three different transformations are used in this thesis: DCT, DHWT and the
Hadamard transform. In (3.1) their inverse transform matrices corresponds to
Ψ.
The orthonormal DCT of the signal f is defined as
n−1
n−1
X
X
(2k + 1)jπ
F [j] = a[j]
f [k] cos
=
f [k]c[j, k],
2n
k=0
(3.2)
k=0
where F [j] is element j of the transformed signal f and a[j] is defined as
(p
1/n if j = 0
a[j] = p
.
2/n if j = 1, 2, ..., n − 1
The n × n cosine transform matrix C is then defined as


···
···
···


C =  ... c[j, k] ...  ,
···
···
(3.3)
(3.4)
···
where C is orthogonal, which means that F = Cf and f = C T F .
The Hadamard transform is a generalized class of Fourier transforms, which computes the discrete Fourier transform of a signal, although the matrix is purely real.
The Hadamard transform matrix is defined as
1 Hn−1 Hn−1
,
(3.5)
Hn = √
2 Hn−1 −Hn−1
where H0 = 1, from which the matrix Hn can be generated iteratively.
The Haar wavelet is the simplest possible wavelet, its definition will not be covered , instead its structure is described. The general n×n Haar wavelet transform
matrix, used in this thesis, is described as
1
Hn/2 ⊗ [1, 1]
√
Hn =
,
(3.6)
2 In/2 ⊗ [1, −1]
3.2
13
The compressed sensing problem
where H1 = 1, I is the identity matrix and ⊗ is the Kronecker product. The
normalization constant √12 ensures that HnT Hn = I. The Haar transform of the
n × n signal f is then F = Hn f with the inverse transform f = HnT F .
3.2
The compressed sensing problem
Consider a general linear measurement process that takes m < n inner products
between f ∈ Rn and a collection of vectors φj , where j = 1, 2, ..., m. The measurements, yj , form the m × 1 vector y,
y = Φf = ΦΨx = Ax,
(3.7)
where f is substituted according to (3.1) and A = ΦΨ is an m × n matrix. The
equation in (3.7) is ill-conditioned and has an infinite number of solutions.
The CS problem consists of solving the equation in (3.7) which implies:
• Designing a reconstruction algorithm to recover x.
• Designing a stable measurement matrix Φ such that the important information in a general S-sparse or compressible signal is not damaged by reducing the dimensionality from Rn to Rm .
• Find a basis where the signal is sparse, i.e designing Ψ.
3.2.1
Designing the measurement matrix
Since the number of measurements m are fewer than n the problem of recovering
the signal f is ill-conditioned. If the signal f is S-sparse in some domain and if
we know which elements that are non-zero the problem can be solved if m ≥ K.
For this problem to be well conditioned it is required, for all S-sparse vectors x̂
and for each S = 1, 2, ..., that
1 − δS ≤
kAx̂k2
≤ 1 + δS ,
kx̂k2
(3.8)
where δS is some small error, not too close to one, and k · k2 is the Euclidean
norm. This simply means that the matrix A must preserve the lengths of the
vectors, to a certain limit. This is referred to as the restricted isometries problem
(RIP). Generally there is no knowledge about which elements are non-zero. The
goal then is to find the matrix A that satisfies the RIP, without knowing which
elements in x̂ that are non-zero [2, 6].
Another condition that needs to be satisfied is incoherence between the measurement matrix Φ and base matrix Ψ. In other words, the rows φj of Φ must not
sparsely represent the columns ψj of Ψ [2]. Fortunately random matrices are
incoherent with any fixed fix basis matrix Ψ, which means that choosing the measurement matrices randomly is a very good idea.
14
3
Compressed Sensing
Choosing a random measurement matrix Φ can be shown to fulfill the RIP with
high probability using only m ≥ C · S · log(n/S) << n measurements, where C is
some small constant and n is the length of the signal f [2]. This is the reason why
the DMD patterns, explained in Section 2.4, will be randomly generated.
3.2.2
Designing a reconstruction algorithm
The inverse problem that needs to be solved is formulated as
y = ΦΨx = Ax,
(3.9)
which intuitively is solved by minimizing the Euclidean norm kAx − yk2 , which
has the closed form solution x̂ = AT (AAT )−1 y. This approach is ill-conditioned
and does not give sparse solutions.
Another approach of expressing the problem in (3.9) is to define the lp norm and
rewrite the expression as
x̂ = argminkxklp subject to Ax = y,
(3.10)
where the lp norm should promote sparsity while being convex.
3.2.3
The lp - norm
The lp norm of the vector x, for p ≥ 1, is defined as
!1/p
n
X
p
kxklp =
|xi |
,
(3.11)
i=1
where n is the dimension of x. The l0 norm counts the number of non-zero elements of a vector, but minimizing using the l0 norm is a non-convex problem, as
is all lp norms for p < 1, and therefore not computationally tractable or numerically stable. This is the reason why the l0 norm is not used in the compressed
sensing problem.
We can get a intuition why the l1 norm is a great substitute for sparsity by studying the sketches in Figure 3.1, where x ∈ R2 . The red lines represents the subspace H = {x : ΦΨx = y}, which corresponds to the noiseless case in (3.15). The
minimal norm where each l-ball (all points where kxklp is the same constant)
intersects H is marked as x∗ .
Figure 3.1b shows the l1 -ball when it intersects the subspace H. Note that it
is spiky along the axes as the non-convex l1/2 -ball in Figure 3.1a, whereas the
l2 -ball in Figure 3.1c is not. This is commonly denoted as that the l1 and l1/2 ball are anisotropic whereas the standard Euclidean l2 -ball is isotropic. Larger
p spread the norms more evenly along two coefficients, whereas smaller p give
norms where the coefficients are more unevenly distributed and sparse. This also
generalizes to higher signal dimensions n, where x ∈ Rn [4].
3.2
15
The compressed sensing problem
2
2
1.5
1.5
H
x*
1
0.5
x2
x2
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5
−2
−2
H
x*
1
−1
0
x1
1
−2
−2
2
0
x1
−1
(a) l1/2 norm
1
2
(b) l1 norm
2
1.5
H
x*
1
x2
0.5
0
−0.5
−1
−1.5
−2
−2
−1
0
x1
1
2
(c) l2 norm
Figure 3.1: Three different lp norms plotted at minimal intersection, x∗ , with
the subspace H = {x : ΦΨx = y}. Coordinate axes are marked in green.
3.2.4
Total variation minimization
Total variation (TV) minimization is another approach compared to l1 minimization. Instead of assuming that the signal is sparse in some basis it considers the
gradient of the signal being sparse. In other words, TV minimization seeks the solution with the sparsest gradient. Advantages of using TV minimization are good
preservation of edges and good recovery of dense staircase signals or piecewise
constant images [14]. The TV compressed sensing problem is defined as
X
argmin
kDi,j f kl2 subject to Af = y,
(3.12)
i,j
where f is the signal, in this case image, Di,j f ∈ R2 is the gradient in pixel (i, j),
A the measurement matrix (A = Φ) and y the measurements [14].
16
3
3.2.5
Compressed Sensing
The single-pixel camera approach
To understand the sampling process of the TCSPC in combination with CS the
simple 2D single-pixel camera is first described. The single-pixel camera is a
digital camera with one single sensor element, a photodiode. Using a DMD, the
single-detector element can samples the entire image of a scene. A simplified
illustration of the single-pixel camera is shown in Figure 3.2.
DMD
NxN
ht
Lig
1
Scene
Photodiode
Light
Reconstruction
Figure 3.2: Simplified illustration of the single-pixel camera, where N × N
is the number of pixels n. The objects marked as 1 and 2 are placed somewhere in a 3D-scene reflecting light from the surroundings onto a lens which
focuses the light to the current DMD-array, which creates a linear combination of the light which focused onto the single photodiode by a lens. The CS
scheme then reconstructs the image from the measured data.
Sampling the original image signal f , with a DMD-matrix can be described as
a summation of the inner product of f and the measurement function φi ∈ Rn ,
generating a single measurement yi ∈ R. Taking detection error, which for the
TCSPC system is described in Section 2.3, into consideration yi can be formulated
as
yi = hf, φi i + i .
(3.13)
If not using CS the image is sampled using one active mirror element for each
DMD-pattern, which requires as many DMD-patterns as there are pixels, n. Undersampling the signal f is represented by taking m < n measurements, writing
Φ as the m × n measurement matrix, where each row φi corresponds to one DMD
pattern, and expressing the measurements as
y = Φf + = ΦΨx + ,
(3.14)
where Ψx is the sparse representation of f . Assume that the signal is compressible in some domain and x is K-sparse. Then the problem can be formulated as
3.2
17
The compressed sensing problem
(3.10), but with noise taken into consideration, as
argminkxkl1 subject to ky − ΦΨxkl2 ≤ ,
(3.15)
This particular formulation is referred to as basis pursuit denoise (BPDN) [2]. TV
minimization, described in Section 3.2.4, is another approach.
3.2.6
3D-compressed sensing
The single pixel camera gives a good intuition of how one intensity image, which
corresponds to a specific time bin, can be reconstructed. Imagine the single pixel
camera taking images of 2D-slices out of the 3D-scene in Figure 3.2 and by combining all 2-D slices an image of the scene can be created. Using this approach,
for intensity images from laser, a range image can be created.
When measuring using the TCSPC system each slice corresponds to an intensity
image, where the intensity values are how many photons that hit the detector
in that specific TOF interval. This means that if all the intensity images can be
reconstructed a depth map can be created.
Mathematically the measurements can be described as
(3.16)
Y = ΦF,
where Y is the m × bins matrix containing all the measurements and F is the
n × bins signal matrix.
k=8
bins
Intensity image
k=4
k=1
Figure 3.3: Illustration of how the intensity images are stacked in 3D-space.
In this figure, the entire scene is considered to be covered in 8 bins. These
intensity images from different bins generate the range image. Here, the
front sails have their maximum intensity values in bin 4. This means that
their range image corresponding pixels will have range corresponding to bin
4.
The measurement process is done for one DMD-array at the time, for the entire
range, filling one row of Y for each DMD-array. The data used for reconstructing
18
3
Compressed Sensing
slice j is then in column j in Y . Reconstructing the total signal F is performed
using e.g. (3.15) or TV minimization for each slice fj .
A range image is created by setting the range in each pixel corresponding to its
bin with the maximum intensity value, i.e. finding the maximum intensity in all
the intensity images. An illustration of how the intensity images are stacked in
3D-space is shown in Figure 3.3.
Noise is often present in the reconstructed images, normally taking negative or
smaller values than those origin from the laser. This means that the simple operation of thresholding often can remove very much or all noise. Other image
filtering techniques e.g. median filtering can be used to remove noise, but like
thresholding it comes with the risk of removing information.
4
Method
4.1
Feasibility study
A pre-study was initially performed, finding related work, covering compressed
sensing and laser systems. Related articles and publications were found using
web search and article database services provided by LiTH and FOI. Another prestudy, already done by FOI, was also used in the early stages [8] as inspiration.
In Section 1.5 related works are summarized. Inspiration and ideas, which were
good and possible to implement, were taken from these related works.
4.2
Computer model
To evaluate different compressed sensing and 3D-reconstruction schemes, a computer model of the laser system was needed. The developed model simulates
the TCSPC laser system, described in Chapter 2. The computer model was implemented in Matlab® . It is based on a general laser simulator developed by
FOI. In Figure 4.1 the processing steps, implemented in the computer model, are
described using a block scheme.
19
20
4
Scene, detector &
laser configuration
Start
Method
Ray tracing:
Laser - Objects - Detector
Create randomized
DMD-arrays
Simulate IRF
Simulate sampling:
Raytraced photons and DMD arrays
Reconstruct intensity
slices
Thresholding
Noise
Build depth
range image
Figure 4.1: Block scheme of the simulation process steps. The blue blocks
are implemented based on the laser simulator developed by FOI. The green
blocks are directly related to CS. The red blocks are related to the characteristics of the laser detector. The gray blocks are neither directly related to the
CS nor to the laser simulations.
A scene is configured with objects of different sizes, shapes and surface properties.
Ray tracing, of the laser light, is performed and returns the amount of laser light
that hits the detector. A model of the detector is applied on the light, that hits
the detector, and in the end gives a measurement of photons in different time
bins and DMD-patterns. Intensity images of specific bins is then reconstructed
using compressed sensing and finally a depth image is created using the intensity
images.
4.2.1
Configuration
The configuration of the laser model and scene include different type of objects
with different surface parameters, laser settings and detector settings. The objects settings are parameters controlling diffuse and specular reflectance. The
laser source settings are pulse energy, pulse width, focus range and divergence.
The only parameter that is varied here is the pulse energy, which results in different received signal strengths. The aperture settings are resolution, detector
size, distance, height, and orientation. The detector also has settings, the used
ones were; the number of bins, time resolution, system efficiency, bandwidth and
aperture radius.
4.2.2
Laser calculations
The ray tracing calculates how much of the transmitted laser light that hits the
detector and when it hits the detector. In Figure 4.2 the concept of ray tracing
is illustrated. The laser model calculates reflections depending on the surfaces’
diffuse and specular properties. Specularity makes the reflected light dependent
of the reflection angle to the observer.
4.2
21
Computer model
DMD
Reflected light
Detector
Surface
Light source
Laser ray
Figure 4.2: Illustration of light being scattered in different directions when
hitting a surface. The figure illustrates diffuse reflections, which reflects
equal amounts of light in all direction, independent on the angle of the incoming light.
The instrumental response function, described in Section 2.2, is combined with
the ray traced data using 2D convolution. This creates the characteristics of the
TCSPC laser detector. Background illumination noise is added here.
Simulation of the DMD is done by sampling the returned light from the ray tracing and IRF with different randomized matrices, see example in Figure 2.3. The
matrices consists of values 0 or 1, each with a user-selected probability for being
1. p = 0.1 is used in the simulations in Chapter 5. Elements in the matrix are
multiplied with corresponding pixel coordinates and the returning measure is a
scalar value. The returned value equals the total amount of photons hitting the
DMD-pixel elements which have value one.
The detector’s shot noise is added after the simulated DMD-sampling, according
to (2.8). Finally dark counts are added, where the total amount of dark counts
increase linearly with the number of measurements.
4.2.3
Image reconstruction
The CS solvers were downloaded from the web and were configured, tuned and
tweaked to work for multiple intensity images. The image reconstructions are
done by using one of the software packages called ProPPa, SPGL1 or TVAL3.
ProPPa solves a basis pursuit problem or a l1 -regularized optimization problem.
The basis pursuit solved by ProPPa is formulated as
argminkxkl1 subject to Ax = y , x ∈ Rn ,
(4.1)
where, x is the sparse representation of the signal, A = ΦΨ, y the measurements
and n the number of pixels.
The l1 -regularized least squares solved by ProPPa is formulated as
argmin (kAx − ykl2 + βkxkl1 ) , x ∈ Rn ,
(4.2)
22
4
Method
where β is some constant and the rest of the variables are as in (4.1). Although
noise is not included in the optimization formulations in (4.1) and (4.2) ProPPa
works for noisy signals too.
SPGL1 solves the BPDN problem described in (3.15) or the LASSO problem defined as
argminkAx − ykl2 subject to |x| ≤ τ,
(4.3)
where τ is some constant limiting the norm of x [22].
TVAL3 is a software package, which solves the TV minimization problem described in Section 3.2.4 [14].
4.2.4
3D-reconstruction
The range image construction is done by first reconstructing all intensity images,
one for each range bin. After that, the intensity images are thresholded to remove
noise. The threshold level is chosen to remove intensities that are so low that they
most likely are noise.
After the thresholding the range image is constructed by for each pixel finding
its bin with the maximum intensity intensity value. Each found maximum corresponds to certain depth which becomes the range value of that specific pixel.
If no maximum is found the thresholding has zeroed all bins in that particular
pixel. That range pixel is then assigned to the maximum range in the measured
depth interval.
4.3
Evaluation
Evaluations of the different methods were done by running simulations in Matlab® applying different parameter settings and scene setups. Every test was
specifically developed for the specific type of evaluation and generally basic parameters for solvers and settings were set to chosen default values. The reason
for this was to have consistency in the results, making them comparable across
different evaluations.
All simulations were performed with Matlab® 2012b, running on a 64 bit PC
with the hardware specs; Intel Xeon W3530 CPU and 12 GB of RAM.
When deciding the best CS solvers the choices were based on the following criteria:
• The solver should work on the provided computer hardware.
• The solver should handle large problems. The length of the 1D vector representation of images scales quadratically with the image side lengths. This
means the signal vectors have the dimensions f ∈ RN1 · N2 , where N1 and
N2 are the image side lengths. Hence f becomes very large when N1 and
N2 grow.
4.3
23
Evaluation
• The solver should be able to solve the optimization problems within reasonable times (minutes instead of hours).
• The solver should work for different types of intensity images (different
scene setups and signal strengths) without individual pre-tuning of parameters.
• The solver should preserve geometry, shapes and distances.
• Equally fast solvers are differentiated by their quality of the signal reconstruction, which is compared with reference data.
Simulated image sizes were 128 × 128 pixels with a total of 400 range bins per simulated scene. Mainly three reference scene were used in the evaluations, which
are described in Figure 5.1. Another scene containing trees was also used, see
Figure 5.22a.
The measures when evaluating different scenarios are either based on visual inspection or SNR of the reconstruction. The SNR, given in dB, is defined as
SNR = 20 log
kXkF
kX − X̂kF
,
(4.4)
where X is the reference image, X̂ is the reconstructed image and k · kF the Frobenius norm defined as

1/2
n
X
kAkF = 
|aij |2  ,
(4.5)
i,j=1
where aij is the matrix element i, j.
Undersampling ratio, q, is defined as the number of measurements m divided by
the number of pixels n, q = m/n. This means that the undersampling ratio is a
direct measure of how much faster the measuring of the scene is, compared to a
pixel-by-pixel scanning system.
The evaluation was divided into the following test scheme:
• Evaluation of signal basis representation determine the best basis or bases
when reconstructing the intensity images.
• This/these basis/bases are then used in the following evaluations which
start with testing the importance of signal strength and noise.
– The probability of the DMD-elements being 1 is evaluated.
– Evaluation of optimization methods is performed, where reconstruction speed and reconstruction quality are tested.
– Reconstruction of occluded objects and trees are tested, with focus on
geometry preserving properties.
24
4
Method
• Other approaches than the slice-by-slice intensity image reconstruction are
briefly tested.
All the results are presented in Chapter 5.
5
Simulations and Results
In this chapter parameters and methods related to the CS reconstruction are evaluated. The objectives of the simulations is to give answers to the questions in
Section 1.3. An important limitation is that most simulations were very time
consuming, therefore limiting the number of data sets and resolution in plots.
Four different simulated scenes were used as test scenes. The scenes were chosen
with unique settings covering the cases; objects at different distances, objects at
the same distance, different object types and obscured objects. The reason for
this was to get as reliable results as possible, since in reality a scene can contain
almost anything. The four scenes are shown in Figure 5.1 and Figure 5.22a.
Figure 5.2 shows the variances in the scenes intensity images, which gives a intuition of how the pixelP
values
in each intensity image. The variance σ 2
P are spread
1
2
2
is defined as σ = n i j (xi,j − x̄) , where, n is the number of pixels, i and j
are the pixels coordinates of x and x̄ its mean. Here, xi,j has values distributed
from 0 to approximately 160. The plots show that the boat scene have more bins
with object content than the other scenes.
5.1
Sparse signal representation
Representing the sampled signal in a sparse basis is essential in compressed sensing and the choice of basis representation should be the one that represents the
signal as sparse as possible. As mentioned in Section 3.1, a signal can be transformed to another basis using a matrix multiplication with Ψ and this discrete
representation of the transform is required when using the chosen solvers.
The acquired signals, which should be represented in a sparse basis, are intensity
25
26
5
(a) Simulated scene with two squares at
different distances, Reference scene 1.
Simulations and Results
(b) Simulated scene with a sphere in front of a
square background, Reference scene 2.
(c) Simulated scene with two boats at the same distance, Reference scene
3.
Figure 5.1: The three different test scenes where x, y and z are in meters and
the detector is placed at (x, y, z) = (0, 0, 0).
images from different depth bins. These intensity images may contain a lot of
information with many non-zero pixel values or less information with many pixel
values that are zero. Generally, the intensity images closest to the detector contain
more zero-valued pixel elements than measures further away. This is because of
traces in the measurements due to the tail of the IRF, described in Section 2.2.
5.1
27
Sparse signal representation
4000
7000
3500
6000
3000
5000
Variance
Variance
2500
2000
4000
3000
1500
2000
1000
1000
500
0
10.05
10.1
10.15
10.2
Distance [m]
10.25
10.3
(a) Intensity variance of Reference scene
1 in Figure 5.1a.
0
10.05
10.1
10.15
10.2
Distance [m]
10.25
10.3
(b) Intensity variance of Reference scene
2 in Figure 5.1b.
500
450
400
Variance
350
300
250
200
150
100
50
0
10.05
10.1
10.15
10.2
Distance [m]
10.25
10.3
(c) Intensity variance of Reference scene 3 in
Figure 5.1c.
Figure 5.2: The variances, for different distances from the detector, of the
reference scenes in Figure 5.1.
This means that intensity images can be sparse in pixel basis in a measured depth
interval close to the detector, while not in the other depth intervals. To be able to
reconstruct the signal in all depths, a good basis that represents the signal sparse
in all depths needs to be determined.
5.1.1
Intensity image reconstruction results
Four different basis representations were tested by reconstructing intensity images from the three different reference scenes in Figure 5.1. The signals were reconstructed and evaluated using discrete cosine transform (DCT), Haar wavelets
(DHWT), Hadamard transform, normal pixel basis representation and TV minimization. TV is not a specific basis transformation, but is evaluated together with
the different bases because it is used a lot in various CS literature and the question is if it suits this case. The following results are the SNR from reconstructing
28
5
Simulations and Results
slices out of the three test scenes with four different bases and TV minimization
for different undersampling ratios.
The plots in Figure 5.3 and 5.4 show that the Reference scene 1 was best reconstructed using TV minimization. When considering basis choice, both the DCT
and the DHWT outperformed the pixel base representation and the Hadamard
transform.
Results for Reference scene 2 are shown in Figures 5.5 and 5.6. It is clear that
TV minimization performed best and it is notable that, together with the DCT
case, it performed well even for very low undersampling ratios. The pixel basis
performed much worse than all other basis representations for all undersampling
ratios, which makes it a bad choice for Reference scene 2. This is expected, since
in this case the intensity images are not sparse in pixel basis.
Reconstruction results for Reference scene 3 are shown in Figures 5.7-5.9. The
best reconstruction performances were achieved using TV minimization or when
choosing basis representation DHWT or DCT. TV minimization handled all three
investigated slices of the scene well. The scene had an increasing level of nonzero pixel values for larger distances. As a result of the increase of non-zero
pixel values in range, the pixel basis representation performed worse compared
to others. This was expected.
70
60
Reconstruction SNR [dB]
20
40
60
80
50
DHWT
DCT
Hadamard
Pixel basis
TV
40
30
20
100
10
120
20
40
60
80
100
(a) Intensity image.
120
0
0.1
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.3: Intensity image and reconstruction results from Reference scene
1 for the approximate depth 10.1 m (first peak in Figure 5.2a).
5.1
29
Sparse signal representation
70
DHWT
DCT
Hadamard
Pixel basis
TV
60
Reconstruction SNR [dB]
20
40
60
80
50
40
30
20
100
10
120
20
40
60
80
100
0
0.1
120
(a) Intensity image.
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.4: Intensity image and reconstruction results from Reference scene
1 for the approximate depth 10.16 m (second peak in Figure 5.2a).
The use of TV minimization generally performed better or much better compared
to the DCT and DHWT. There is no clear winner when deciding which basis that
has the best overall performance. The DCT and the DHWT perform good in
almost all cases, which separates them from the Hadamard transform and pixel
basis. Therefore the bases used in the remaining evaluations are the DCT and the
DHWT.
60
20
Reconstruction SNR [dB]
50
40
60
80
40
30
20
DHWT
DCT
Hadamard
Pixel basis
TV
100
10
120
20
40
60
80
100
(a) Intensity image.
120
0
0.1
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.5: Intensity image and reconstruction results from Reference scene
2 for the approximate depth 10.09 m (first peak in Figure 5.2b).
30
5
Simulations and Results
60
20
Reconstruction SNR [dB]
50
40
60
80
40
DHWT
DCT
Hadamard
Pixel basis
TV
30
20
100
10
120
20
40
60
80
100
0
0.1
120
(a) Intensity image.
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.6: Intensity image and reconstruction results from Reference scene
2 for the approximate depth 10.16 m (second peak in Figure 5.2b).
30
20
DHWT
DCT
Hadamard
Pixel basis
TV
Reconstruction SNR [dB]
25
40
60
80
20
15
10
100
5
120
20
40
60
80
100
(a) Intensity image.
120
0
0.1
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.7: Intensity image and reconstruction results from Reference scene
3 for the approximate depth 10.13 m (beginning of the peak in Figure 5.2c).
5.1
31
Sparse signal representation
60
50
Reconstruction SNR [dB]
20
40
60
80
40
DHWT
DCT
Hadamard
Pixel basis
TV
30
20
10
100
0
120
20
40
60
80
100
120
−10
0.1
(a) Intensity image.
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.8: Intensity image and reconstruction results from Reference scene
3 for the approximate depth 10.14 m (middle of the peak in Figure 5.2c).
40
35
Reconstruction SNR [dB]
20
40
60
80
30
25
DHWT
DCT
Hadamard
Pixel basis
TV
20
15
10
5
100
0
120
20
40
60
80
100
(a) Intensity image.
120
−5
0.1
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction results.
Figure 5.9: Intensity image and reconstruction results from Reference scene
3 for the approximate depth 10.16 m (end of the top level of the peak in
Figure 5.2c).
5.1.2
Sparsity comparison
I this section the correlation between sparse signal representation and reconstruction results are evaluated. The reconstruction results of two different intensity
images are correlated with the number of non-zero (or close to zero) coefficients
for different basis representations. The histograms in this section are normalized,
meaning that values have been scaled so they vary between min 0 and max 1.
The histograms in Figure 5.10a show that the image in Figure 5.7 represented
in DCT or Hadamard transform have more non-zero coefficients compared to
32
5
Simulations and Results
DHWT or pixel basis. Note that reconstructing using DCT or Hadamard give
bad results for this image, see Figure 5.7b.
The histograms in Figure 5.10b show that the image in Figure 5.5 represented
in different bases have approximately the same amount of non-zero coefficients.
Note that when reconstructing this image the results were more similar, between
bases, than when reconstructing Figure 5.7b.
Pixel basis
4
2
x 10
Pixel basis
15000
1.5
10000
1
5000
0.5
0
0
0.05
0.15
0.2
0
0
0.05
DHWT
4
2
0.1
x 10
0.1
0.15
0.2
0.15
0.2
0.15
0.2
0.15
0.2
DHWT
15000
1.5
10000
1
5000
0.5
0
0
0.05
0.1
0.15
0.2
0
0
DCT
0.05
3000
2
0.1
DCT
4
x 10
1.5
2000
1
1000
0
0
0.5
0.05
0.1
0.15
0.2
0
0
Hadamard
2000
2
1500
1.5
1000
1
500
0.5
0
0
0.05
0.1
0.05
0.15
0.2
(a) Normalized histogram of Figure 5.7 represented in different bases.
x 10
0
0
0.1
Hadamard
4
0.05
0.1
(b) Normalized histogram of Figure 5.5 in
different bases.
Figure 5.10: The longitudinal axes shows the number of values of a certain
image value (the horizontal axis). The entire interval of values (0 − 1) is not
included.
5.2
33
Signal strength and noise
5.2
Signal strength and noise
To evaluate different noise levels in the measurements, signal strengths were varied in five different simulations. The Poisson noise has a bigger impact on small
values, because the standard deviation of the Poisson noise is the square root of
the expected value. The amount of dark counts is dependent on the number of
measurements and will have a bigger impact on small signal values. In these
simulations there are about 50 dark counts per intensity image.
45
45
40
40
35
35
Reconstruction SNR [dB]
Reconstruction SNR [dB]
Five different signal strengths were evaluated, each case reconstructing an intensity image, using the DCT or DHWT as sparse basis or the TV minimization. The
different signal levels are found in Table 5.1. The reconstructions were performed
for different undersampling ratios. Results are shown in Figure 5.11.
30
25
20
5
4
3
2
1
15
10
25
20
5
4
3
2
1
15
10
5
5
0
0.1
30
0
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(a) Reconstruction performance using
DCT as sparse basis.
−5
0.1
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(b) Reconstruction performance using
DHWT as sparse basis.
45
40
Reconstruction SNR [dB]
35
30
5
4
3
2
1
25
20
15
10
5
0
−5
0.1
0.2
0.3
0.4
Undersampling ratio
0.5
0.6
(c) Reconstruction performance using TV.
Figure 5.11: The reconstruction performance for the five different signal
strengths, according to Table 5.1. The reconstruction was done using DCT
and DHWT as sparse basis and using TV minimization.
34
5
Simulations and Results
Table 5.1: Signal cases with their approximate maximum received signal
strength. Signal case 2 is the same signal strength as used in Section 5.1.
Signal
1
2
3
4
5
Approx. max photons/pixel
2000
160
10
2
0.2
Signal SNR [dB]
102
75
48
4
9
The reconstruction SNR drops significantly for the signal cases 3, 4 and 5 when
using DCT or DHWT, whereas the TV minimization handles case 3 better. The
reconstruction performances in signal case 4 and 5, which has the lowest signal
strengths, are almost constant for all three reconstruction methods. The others
signal cases have improved reconstruction results for higher undersampling ratios. In Figure 5.13 reconstructed images are shown. Note that even with reconstruction SNR close to 0, as in the case of Figure 5.13b and Figure 5.13d, the boats
are visible, although not clearly.
The images reconstructed from Signal case 3 are very noisy and the question is if
noisy images like those can reconstruct accurate depth maps. The most important
thing is that high intensities are at correct positions in the reconstructed images,
whereas other lower intensities can be seen as traces left from the IRF(which can
be thresholded). These two signal cases 2 and 3 are therefore tested for reconstruction of the entire scene (generating depth maps), see reference in Figure 5.12.
The results are shown in Figure 5.14.
The results in Figure 5.14 show that both signal cases generate depth maps with
different noise levels. In Figure 5.14b and Figure 5.14d it is clear that depth information is missing, specially if considering the sails, and there is many small
artifacts. The reconstructed depth map in Figure 5.14a and Figure 5.14c are looking much better with only small range deviations on the boats and very few single
pixel artifacts. DCT shows better results, in both signal cases, than the DHWT.
TV minimization reconstructs the range image from Signal case 2 almost perfect
with nearly no artifacts at all and the range image from Signal case 3 is much
better reconstructed than using the other two approaches. TV minimization is
clearly best at handling low signal SNR.
35
Signal strength and noise
10.17
10.165
20
10.16
40
10.155
60
10.15
80
10.145
10.14
100
10.135
120
20
40
60
80
100
120
Figure 5.12: Reference depth map.
Distance [m]
5.2
36
5
20
20
40
40
60
60
80
80
100
100
120
Simulations and Results
120
20
40
60
80
100
120
(a) Reconstructed intensity image from
Signal case 2 using DHWT as sparse basis.
20
60
80
100
120
(b) Reconstructed intensity image from
Signal case 3 using DHWT as sparse basis.
20
20
40
40
60
60
80
80
100
100
120
40
120
20
40
60
80
100
120
(c) Reconstructed intensity image from
Signal case 2 using DCT as sparse basis.
20
60
80
100
120
(d) Reconstructed intensity image from
Signal case 3 using DCT as sparse basis.
20
20
40
40
60
60
80
80
100
100
120
40
120
20
40
60
80
100
120
(e) Reconstructed intensity image from
Signal case 2 using TV minimization.
20
40
60
80
100
120
(f) Reconstructed intensity image from
Signal case 3 using TV minimization.
Figure 5.13: Reconstructed intensity images. All images are reconstructed
using an undersampling ratio of 0.4. A reference intensity image is shown
in Figure 5.9a. The reconstruction SNR of the intensity images are found in
Figure 5.11 and their signal strengths in Table 5.1.
37
Signal strength and noise
10.17
10.17
10.165
20
10.165
20
10.16
10.16
60
10.15
80
10.145
10.14
100
10.155
60
10.15
80
10.145
Distance [m]
40
10.155
Distance [m]
40
10.14
100
10.135
10.135
120
120
20
40
60
80
100
120
20
(a) Reconstructed depth map from Signal case 2 using DHWT.
40
60
80
100
120
(b) Reconstructed depth map from Signal case 3 using DHWT.
10.17
10.17
10.165
20
10.165
20
10.16
10.16
10.155
60
10.15
80
10.145
10.14
100
10.155
60
10.15
80
10.145
Distance [m]
40
Distance [m]
40
10.14
100
10.135
10.135
120
120
20
40
60
80
100
120
20
(c) Reconstructed depth map from Signal case 2 using DCT.
40
60
80
100
120
(d) Reconstructed depth map from Signal case 3 using DCT.
10.17
10.17
10.165
20
10.165
20
10.16
10.16
40
10.155
60
10.15
80
10.145
10.14
100
Distance [m]
40
10.155
60
10.15
80
10.145
10.14
100
10.135
120
10.135
120
20
40
60
80
100
120
(e) Reconstructed depth map from Signal case 2 using TV minimization.
20
40
60
80
100
120
(f) Reconstructed depth map from Signal case 3 using TV minimization.
Figure 5.14: Four reconstructed depth maps and their reference depth map.
The shown depth interval is set to maximize the contrast on the boats. The
depth images are reconstructed using DHWT, DCT or TV minimization with
an undersampling ratio of 0.4. No image filtering, apart from hard thresholding, was used when generating the depth maps from the intensity images.
Distance [m]
5.2
38
5
5.3
Simulations and Results
DMD-array choice
The choice of how to form the DMD-arrays have limitations. The array elements
can either reflect photons towards the detector or away from it, which mathematically means that its elements can only have the integer values 0 or 1.
When designing the DMD-array patterns, the incoherence with the signal basis
representation is important, as discussed in Subsection 3.2.1. In [2] they present
the idea of having a noiselet as measurement basis and states that it gives very
low coherence with the Haar wavelet basis. Unfortunately, noiselets have complex values and such measurement bases are not implementable for this sensor
system.
Uniformly random measurement matrices are incoherent with almost any basis,
but an unanswered question is how the randomness itself affect the reconstruction results. In [3], the uniformly random DMD-arrays are chosen with elements
being 0 or 1 with equal probability p = 0.5. p is the probability for each DMDarray element being 1, i.e. the ratio of mirror elements which reflect light to
the detector. By changing this probability the reconstruction results may change.
This is studied here and results are shown in Figure 5.15.
60
40
TV
DHWT
DCT
35
Reconstruction SNR [dB]
Reconstruction SNR [dB]
50
40
30
20
30
25
20
15
TV
DHWT
DCT
10
5
10
0
0 −5
10
−4
10
−3
−2
10
10
−1
10
0
10
p
(a) Reconstructions for Reference scene 2,
same slice as in Figure 5.6.
−5 −5
10
−4
10
−3
−2
10
10
−1
10
0
10
p
(b) Reconstructions for Reference scene 3,
same slice as in Figure 5.9.
Figure 5.15: Reconstruction performance for different probabilities p of two
different intensity images. The undersampling ratio was 0.35. The two blue
lines around the TV curve marks the standard deviation of the TV results
and the solid line marks the mean of the results (average of 10 evaluations).
From the results in Figure 5.15 it is clear that it is better to chose a p around 0.1,
rather than 0.5 as in [3]. When p is close to zero, the performance drops significantly, which makes it inappropriate to pick a too small p. Patterns generated by
very small p, each with around 1 − 3 numbers of non-zero elements per pattern,
were not able to reconstruct the intensity images at all using DCT and DHWT.
TV could reconstruct the images with less than 1 (in average over all patterns)
5.4
39
Optimization methods
non-zero elements per DMD-array. TV failed when p was larger than 0.55 and
performed best around p = 0.1. This could be a result of that the tuning of the
TV-solver was done for settings which had p = 0.1, however the tuning of the
solver using DCT and DHWT was definitely not dependent on a specific p. In all
other simulations in Chapter 5 p was set to 0.1.
5.4
Optimization methods
This section present results regarding properties of different optimization methods and solvers. Three solvers were used. The solvers SPGL1 and ProPPa utilizes
the l1 norm to solve the CS problem, while TVAL3 uses TV minimization.
5.4.1
Reconstruction speed
40
40
35
35
30
30
Reconstruction SNR [dB]
Reconstruction SNR [dB]
The methods where tested by varying the number of allowed iterations, measuring the elapsed times and calculate the corresponding SNR.
25
20
15
10
ProPPa BP
ProPPa l1
SPGL1 BPDN
TV
SPGL1 Lasso
5
0
−5
−10
0
50
100
150
Iterations
200
250
25
20
15
10
ProPPa BP
ProPPa l1
SPGL1 BPDN
TV
SPGL1 Lasso
5
0
−5
300
(a) Reconstruction performance using
DHWT, together with the performance
using TV.
−10
0
50
100
150
Iterations
200
250
300
(b) Reconstruction performance using
DCT, together with the performance using
TV.
Figure 5.16: SNR plotted against the number of allowed iterations for
DHWT and DCT using different solvers. The performance using TV is also
included. The evaluation is performed on the same slice as in Figure 5.9a
and the undersampling ratio was set to 0.35.
40
5
40
40
ProPPa BP
ProPPa l1
SPGL1 BPDN
TV
SPGL1 Lasso
Reconstruction SNR [dB]
30
25
30
20
15
10
5
25
20
15
10
5
0
0
−5
−5
10
20
30
40
Time [s]
50
60
70
ProPPa BP
ProPPa l1
SPGL1 BPDN
TV
SPGL1 Lasso
35
Reconstruction SNR [dB]
35
−10
0
Simulations and Results
−10
0
80
(a) Reconstruction performance using
DHWT, together with the performance
using TV.
10
20
30
40
Time [s]
50
60
70
80
(b) Reconstruction performance using
DCT, together with the performance using
TV.
Figure 5.17: SNR plotted against the reconstruction time for DHWT and
DCT using different solvers. The performance using TV is also included.
The evaluation is performed on the same slice as in Figure 5.9a and the undersampling ratio was set to 0.35.
80
ProPPa BP
ProPPa l1
SPGL1 BPDN
TV
SPGL1 Lasso
70
60
Time [s]
50
40
30
20
10
0
0
50
100
150
Iterations
200
250
300
Figure 5.18: Reconstruction time plotted against the number of allowed iterations.
In Figure 5.16a and Figure 5.16b it is clear that both ProPPa solvers reaches their
maximum reconstruction SNR after very few iterations(≈ 20). The BP solution
and the l1 regression converges to the same SNR and most likely to the same
solution. The SPGL1 BPDN solver converges after approximately 60 iterations
and reaches the same SNR as the ProPPa solvers. The SPGL1 LASSO solver needs
more iterations for convergence in the DHWT case and when using the DCT it
does not reach 20 dB, even when using 300 iterations.
The TV minimization solver reached 20 dB after approximately 35 iterations.
5.4
Optimization methods
41
Apart from the other solvers it does not converge at 20 dB and reaches a maximum of approximately 36 dB using 300 iterations. Increasing the iterations to
more than 300 does not give reconstruction SNR above 36 dB. This is not shown.
All solvers’ reconstruction times, shown in Figure 5.18, increase linearly with the
number of allowed iterations, where the ProPPa BP solver and the TV solver demands least time per iteration. TV performs best if allowing it to do 300 iterations,
which takes about 40 seconds (Figure 5.17), where it reaches an unmatched SNR
level. If an SNR of 20 dB is good enough the ProPPa solvers are the fastest ones
and requires approximately 20 iterations, which only takes a couple of seconds
per reconstruction.
5.4.2
Reconstruction quality
To determine how much faster the acquisition of a scene can be done, different
scenes need to be reconstructed for different undersampling ratios. The undersampling ratio is directly related to how much faster the measuring of the scene
is done, compared to a full scan. An undersampling ratio of 0.2 equals a measuring time of only 20 % of a full scan.
The SNR of the reconstructed depth images are calculated as in (4.4), but with the
value 10.05 m subtracted from them, since that is the start of the range interval.
42
5
55
140
DHWT
DCT
TV
DHWT
DCT
TV
50
Depth range image SNR [dB]
Depth range image SNR [dB]
120
100
80
60
40
20
0
0.05
Simulations and Results
45
40
35
30
25
20
15
0.1
0.15
0.2
0.25
0.3
Undersampling ratio
0.35
0.4
(a) Range image reconstruction performance of Reference scene 1.
10
0.05
0.1
0.15
0.2
0.25
0.3
Undersampling ratio
0.35
0.4
(b) Range image reconstruction performance of Reference scene 2.
50
Depth range image SNR [dB]
45
40
35
DHWT
DCT
TV
30
25
20
15
0.05
0.1
0.15
0.2
0.25
0.3
Undersampling ratio
0.35
0.4
(c) Range image reconstruction performance
of Reference scene 3.
Figure 5.19: Range image reconstruction performance for different levels of
undersampling ratio. The three reference scenes are reconstructed using TV,
DHWT or DCT.
The results in Figure 5.19 show that the best reconstruction is achieved when using either DCT or TV minimization. DHWT is worse than both the other methods
for all undersampling ratios. According to the SNR, the difference between using
DCT or TV minimization are small. For low undersampling ratios, where reconstruction artifacts are present, the characteristics of the noise are important. The
noise should preferably not disable operations such as object characterization. In
Figure 5.20 range images are shown for low undersampling ratios, which gives
an intuition of the noise characteristics.
43
Optimization methods
10.17
10.17
10.165
20
10.165
20
10.16
10.16
60
10.15
80
10.145
Distance [m]
10.155
10.14
100
10.155
60
10.15
80
10.145
Distance [m]
40
40
10.14
100
10.135
10.135
120
120
20
40
60
80
100
120
20
(a) Reconstructed range image using
DCT with an undersampling ratio of
0.05.
40
60
80
100
120
(b) Reconstructed range image using TV
minimization with an undersampling ratio of 0.05.
10.17
10.17
10.165
20
10.165
20
10.16
10.16
40
10.155
60
10.15
80
10.145
10.14
100
Distance [m]
40
10.155
60
10.15
80
10.145
10.14
100
10.135
120
Distance [m]
5.4
10.135
120
20
40
60
80
100
120
(c) Reconstructed range image using
DCT with an undersampling ratio of
0.15.
20
40
60
80
100
120
(d) Reconstructed range image using TV
minimization with an undersampling ratio of 0.15.
Figure 5.20: Depth range images of Reference scene 3 for different levels of
undersampling ratio.
In Figure 5.20a and Figure 5.20b the type of noise differ a lot. When using the
DCT approach there is more noise around the boats, compared to the TV case,
and the boats’ shapes are not that clear. The TV minimization results in a discontinuous look of the ranges on the boats and the artifacts are more in clusters.
Using an undersampling ratio of 0.15 gives range images with small amount of
noise, which is shown in Figure 5.20c and Figure 5.20d. Using DCT gives somewhat higher noise levels on and around the boats than TV minimization.
5.4.3
Occlusion
An important property in target recognition is how well the reconstruction handles occlusion, e.g. when scanning for vehicles hidden behind vegetation. TV
minimization and l1 minimization using DCT and DHWT were tested for reconstruction of a scene occluded by a checkered pattern. The checkered pattern
44
5
Simulations and Results
increases the TV of the intensity images by a mean of 273 %. The range image
reconstruction results are shown in Figure 5.21c and two reconstructed example
images are shown in Figure 5.21a and Figure 5.21b.
10.17
10.17
10.165
20
10.165
20
10.16
10.16
40
60
10.15
80
10.145
10.155
Distance [m]
10.155
60
10.14
100
10.15
80
10.145
Distance [m]
40
10.14
100
10.135
10.135
120
120
20
40
60
80
100
120
20
(a) Reconstructed range image using TV
minimization with an undersampling ratio of 0.25.
40
60
80
100
120
(b) Reconstructed range image using l1
minimization with DCT and an undersampling ratio of 0.25.
Depth range image SNR [dB]
40
DHWT
DCT
TV
35
30
25
20
15
0.05
0.1
0.15
0.2
0.25
0.3
Undersampling ratio
0.35
0.4
(c) Range image reconstruction performance.
Figure 5.21: Results showing reconstruction performance using TV minimization, l1 minimization using DCT and DHWT. The SNR is calculated
with a reference range image, with the same checkered pattern.
The SNR curves for TV and DCT develop differently for increasing undersampling ratio. TV minimization has worse performance compared to DCT, according to the SNR. DHWT performed worst of all, but did not fill the holes. The
difference between the reconstruction results in Figure 5.21a and Figure 5.21b
are that TV minimization fills some of the holes created by the checkered pattern. Increasing the undersampling ratio causes less holes to be filled by the TV
minimization.
5.4
45
Optimization methods
5.4.4
Reconstructing trees
An attempt to simulate a more natural scene, than the three reference scenes, was
done using three tree models. The scene is illustrated in Figure 5.22. The trees
were placed at different distances and reconstructed using TV minimization or
l1 minimization with DCT and DHWT. The reconstruction results are shown in
Figure 5.23.
10.165
20
10.16
10.15
60
10.145
10.14
80
Distance [m]
10.155
40
10.135
100
10.13
10.125
120
20
(a) The simulated scene with the three
trees. x, y and z are in meters and the detector is placed at (x, y, z) = (0, 0, 0).
40
60
80
100
120
10.12
(b) Reference range image.
Figure 5.22: The simulated scene with the three trees and the corresponding
reference range image.
TV minimization reconstructs the trees scene best. All three methods have problems reconstructing the small details of the trees, like leaves and branches, even
for higher undersampling ratios. DCT performed better than DHWT and both
methods again show the same reconstruction noise characteristics as before. The
reconstructed range images, for higher under sampling ratios, are not included
in this report.
46
5
Simulations and Results
10.165
10.165
20
20
10.16
10.16
60
10.145
10.14
80
Distance [m]
10.15
10.155
40
10.15
60
10.145
10.14
80
10.135
100
10.13
10.125
120
20
40
60
80
100
120
10.135
100
10.13
10.125
120
10.12
(a) Reconstructed range image using
DCT with an undersampling ratio of
0.15.
Distance [m]
10.155
40
20
40
60
80
100
120
10.12
(b) Reconstructed range image using TV
minimization with an undersampling ratio of 0.15.
Depth range image SNR [dB]
35
30
25
DHWT
DCT
TV
20
15
10
0.05
0.1
0.15
0.2
0.25
0.3
Undersampling ratio
0.35
0.4
(c) Depth range image reconstruction performance.
Figure 5.23: Reconstructed range images of the scene Figure 5.22 and the
reconstruction results for different undersampling ratios.
5.5
Other 3D-reconstruction methods
In all results in earlier sections the 3D-range data was reconstructed using intensity images corresponding to certain bins. The fact that the data is all part
of a common 3D-volume is not utilized in the CS reconstruction. Here different
approaches are presented and studied.
By considering the entire 3D-volume as one signal, which could be more sparse
than the individual intensity images, the reconstruction results could be improved.
This approach has an obvious problem, the dimension of the 3D-volume signal
is very big. Tests were performed by folding out four images and reconstructing
them together as one signal. Matlab® could not handle the dimension increase,
hence giving no reconstruction results.
5.5
Other 3D-reconstruction methods
47
Another approach utilizes that intensity images for neighboring bins are similar
and hence can be iteratively reconstructed the difference between intensity image
slices. This is modeled according to
y k − y k−1 = A(xk − xk−1 ) = AxMk ,
(5.1)
where xk is the intensity image corresponding to bin k. Each intensity image is
given by xk = xMk + xk−1 . These differences, xMk , are possibly more sparse than
the intensity images themselves. The problem with this method is that it requires
interlacing images(normally reconstructed). The noise from each reconstruction
is not removed in each iterative steps, which means that noise propagates. Figure
5.24 shows reconstruction of the tree scene using no interlacing images, interlacing images every fourth xk , and normal reconstruction. The result in Figure
5.24a shows that more noise is present using this method compared to the normal method shown in Figure 5.24c. Using an interlacing image every fourth reconstruction, see Figure 5.24b, has similar results as the normal method. In this
test there was 400 bins, which means that about 300 differences and 100 normal
intensity images were reconstructed.
48
5
Simulations and Results
10.17
10.17
10.165
10.165
20
20
10.16
10.16
40
10.15
60
10.145
80
Distance [m]
10.155
10.155
10.15
60
10.145
80
10.14
100
Distance [m]
40
10.14
100
10.135
10.135
10.13
10.13
120
120
20
40
60
80
100
10.125
120
20
(a) Reconstructed range image using no
interlacing images. l1 minimization with
DCT was used with an undersampling
ratio of 0.3.
40
60
80
100
120
10.125
(b) Reconstructed range image using
interlacing images every fourth reconstruction. l1 minimization with DCT was
used with an undersampling ratio of 0.3.
10.17
10.165
20
10.16
40
10.15
60
10.145
80
Distance [m]
10.155
10.14
100
10.135
10.13
120
20
40
60
80
100
120
10.125
(c) Reconstructed range image using the
normal method. l1 minimization with
DCT was used with an undersampling
ratio of 0.3.
Figure 5.24: Reconstructed range images of the scene Figure 5.22 using the
method in (5.1).
6
Discussion
In this chapter the results are discussed and related to the theory. Then the analysis method is discussed, especially the reliability and replicability of the chosen
methods.
6.1
6.1.1
Results
Sparse signal representation
The results show that the DCT gives the best reconstruction results of the different basis representations when using the l1 norm minimization. The DHWT did
not perform as well as the DCT, but maybe other wavelets will perform better.
Sparsity in a basis is dependent on the content of the image. The simulated images may not look like real images at all, which makes the simulation results a bit
uncertain.
The sparsity and reconstruction result are closely related, which Figure 5.10a and
Figure 5.10b shows. The image in Figure 5.7 were better reconstructed using the
two basis representations (pixel basis and DHWT) which clearly represents the
image more sparse. These results are two examples which happens to correlate
nicely to the theory of CS and sparseness. Generating the same sort of histogram
plots from the other images gave equal correspondences between sparseness and
reconstruction quality, but are not included in the report.
TV minimization outperforms all different basis representations, which means
that the total variation is the best sparsity promoter for the simulated intensity
images. A weakness regarding TV is the results shown in Section 5.4.3, where
TV minimization proves to have decreased performance for increased variation
49
50
6
Discussion
in the intensity images.
6.1.2
Signal strength and noise
The best results, this time regarding noise, are again received using TV minimization, which handles low signal SNR better than DCT and DHWT. TV minimization seems to give more stepwise blurred type of noise while the two others
produces spiky noise in areas where there should not be any intensities. TV minimization does not produce the spiky noise, which is a great advantage for TV
when reconstruction the range images. High valued spikes are not removed by
thresholding.
6.1.3
DMD-array choice
The results for designing the DMD suggest to pick a probability p, of a DMDelement being 1, to about 0.05 − 0.1. This is quite interesting. The CS theory only
states that a random sensing matrix will be incoherent with any basis representation and therefore work, not how the randomness itself effect the result.
When the reconstruction does not work, for very small p, it is probably because of
that the DMD-arrays need to, at least once, sample information (DMD-element =
1) from every pixel. If p is low enough, e.g. n1 , where n is the number of pixels,
the m < n DMD-arrays will not reflect photons from all pixels. This means that
there would be pixels where no information was sampled and therefore resulting
in bad results in the reconstruction.
For large p the reconstruction also fails. Having a large p means that when sampling information with a DMD-pattern, the certainty about where the information comes from will be smaller than for a smaller valued p. Larger p could also
mean larger coherence between measurement matrix and basis matrix, since the
different DMD-patterns become more alike.
If p = 0.5, we know that the produced scalar value comes from a combination
of 50 % of the pixels. Some of these pixels does not add intensity to the scalar
value while some do and its impossible to know which does and which does not
(with only one measurement). If the scalar value turns out to be e.g. 0, we know
that all pixels corresponding to the DMD-elements which are one (reflecting to
the sensor), should equal 0. By lowering p to e.g. 2/n, where n is the number of
pixels, the scalar value will be produced by an average of two mirror elements
per DMD-pattern. The scalar value produced now will more likely be 0 than for
larger p which means that if so, two pixel elements can be connected to the value
0. Likewise if the scalar value is e.g. 1 there are only two possibilities; either
one of the pixel elements is 1 and the other 0 or vice versa. Imagine getting the
scalar value 1 but having p = 4/n. This means that in average there are four active
mirror elements and therefore four combinations of pixels giving the scalar value
1 for that specific pattern. This means that more measurements (more patterns),
are needed to receive more certainty of the information for larger p.
The best choice for p proved to be approximately 0.05 − 0.1, which can be seen as
6.1
Results
51
some kind of equilibrium where there is a high probability of always getting information from all pixels while not loosing track of where the information comes
from.
6.1.4
Optimization methods
The speed of the reconstruction is somewhat important but the measuring time
is considered much more important and costly. All calculations where done
in Matlab® which may be slower compared to reconstruction schemes implemented in e.g. C++ and GPU [1]. All three solvers used in this thesis, together
with a few others, fulfilled the criteria of reasonable computation times.
The solver ProPPa was used when evaluating different scenarios because the creators of ProPPa claims that their package is faster and better than other well
known, fast, optimization packages such as SPGL1, YALL, NESTA and FISTA
[13]. Their claims, together with some confirming tests, were the reasons why
this package was chosen. SPGL1 and ProPPa were tested against each other to
confirm these statements.
ProPPa proved to be very fast, but TV minimization using the TVAL3 software
package gave better reconstruction results and better looking range images overall. TVAL3 required more time to converge to its optimal solutions which is its
only drawback. If neglecting the speed of the reconstruction TV minimization
handles lower undersampling ratios, lower signal SNR and creates less noise in
the reconstructed signals. In Figure 5.20 TV minimization shows better performance and produces images more suitable for automatic detection and classification. Figure 5.20a has noise that interferes with the shapes of the boats. This
noise would make it, with no prior knowledge about the objects, almost impossible to classify the boats for a human observer or image processing algorithm.
However TV minimization performs better and those results show that accurate
range images can be obtained using undersampling ratios somewhere between
0.05 and 0.15.
The question is if the scenes used to evaluate these optimization methods suit TV
minimization better than real world intensity images. The simulated scenes have
quite few edges, which means low total variation, and maybe a scene with more
edginess would suit l1 minimization using DCT better. What contravenes this
is the results [15] where TV minimization performs very well on natural scenes
with e.g. trees, which have a lot of edges. The results in Section 5.4.4 also show
that TV minimization handles trees even better than the other methods. There is
a loss of small details on the reconstructed trees, but that is common for all three
tested methods.
The results in Section 5.4.3 show that TV minimization does not perform as well
as using the l1 minimization with DCT when the TV increases for the intensity
images. In this case the checkered pattern increases the TV by 273 % compared
to no checkered pattern. A very interesting result is that the TV minimization
fills holes created by the checkered pattern. This is most probably a result of
the TV minimization algorithm forcing the solution to have low TV. The question
52
6
Discussion
is if this is a preferred behavior or not. If the scene has natural holes that are
considered important information this is a very bad behavior.
6.1.5
Other 3D-reconstruction methods
Reconstructing each intensity image slice separately and then building the range
image is not the only method. Reconstruction differences between neighboring
intensity image slices shows promising results. Similar results compared to the
normal method are achieved using a interlacing image every forth reconstruction.
Without interlacing images noise propagates in the reconstructions. If some intensity images needs to be reconstructed fully the benefit of the method is questionable. The difference between images could be more sparse and hence giving
better result, but they become dependent of the reconstruction result from the
interlacing images. One benefit could be to let the interlacing images be reconstructed with more iterations than the differences, thus saving reconstruction
time. This works only if the reconstructions of the differences require less iterations than the normal intensity images, for equal quality.
Too large signal dimensions limits the approach considering the entire 3D-scene
as one signal. Increasing the signal dimension makes the transform matrices too
big. As an example, reconstructing a 256 × 256 image requires 2562 × 2562 transform matrices, using the problem setup as in Chapter 3. Other approaches using
functions handles in the CS solvers could work. Designated hardware could also
be a solution for handling larger problems.
6.2
Analysis method
Gathering all the results required at least two different solvers. TV minimization required one software package whereas the l1 minimization required another.
This means that there are uncertainties in the evaluations since the different results could be a consequence of not the methods themselves, but tuning parameters or different skill levels of the programmers behind the solvers. Possibly this
means that better results could have been achieved using l1 minimization with
another software package than ProPPa or SPGL1.
Using SNR ratio as a measure gives a mathematical estimation of the quality of
reconstruction. The image quality is often closely related to the SNR, lower SNR
means worse image quality. Two reconstructed images of the same scene can
however have the same SNR while containing different kinds of noises. Examples
of this is seen in Figure 5.13b and Figure 5.13d. The SNR are the same for both
images, but the images have different types of noise. The results in Figure 5.13d
suits range image construction better.
The evaluations could have been done for more varying of settings. This would
give more general results. What limited this was that a lot of simulations were
very time consuming and required a lot of development time. With more computational power, simulations would have been faster and maybe larger images
6.2
Analysis method
53
could have been tested. Higher resolution images were not evaluated since the
signal dimension is N1 · N2 , where N1 is the width and N2 the height of the image. This means that a small increase in image size largely increases the problem
complexity and computational cost.
Originally the simulated model was supposed to be compared to a real TCSPC
lab system setup at FOI. This would most probably have meant some changes in
the simulation settings and would of course increased the reliability of the simulated laser data. A real functional system would also have meant that real objects
and scenes of nature easily could have been tested. The simulated laser model
also lacks in some of its theoretical implementation, where e.g. atmospheric turbulence is not taken into consideration. There is also a question whether the
background illumination effects the results or not, since it just adds a constant
intensity value to all pixels.
The replicability of the results are dependent on how well similar laser data can
be generated. The simulated laser data are based on a software provided by FOI,
which is not open to public. The CS implementations, used in the simulations, are
all based on open source packages and should therefore be replicable, provided
proper laser data.
Related work and other references are all articles or publications with detailed
explanations and high reliability. Most articles and publications have many and
reliable references to others. The methods are well described theoretically, but
most articles lack repeatability in implementation because important information lacks description. For example p for the DMD-arrays are almost never specified. It is not always clear that the specific method will work, or work as good,
for other cases or scenes than the specific cases in the articles.
7
Conclusions and Future Work
7.1
Conclusions
A laser model of a TCSPC laser system was developed, which enabled easy simulations. The computer model has the advantage over a real lab system because
it can be configured in many ways, which simplifies testing and development of
image reconstruction algorithms. The model also gives an great intuition of how
to configure the real system for optimal results.
Results show that CS enables laser intensity image acquisition using less measurements than the number of pixels. Accurate range images can be obtained
using undersampling ratios somewhere between 0.05 and 0.15, which is shown
in Figure 5.20. Neglecting the reconstruction time, the undersampling ratio directly translates to how much faster CS can make the image acquisition. This
means that using CS, 85 − 95 % fewer measurements are required compared to a
pixel-by-pixel scanning TCSPC laser system.
Some critical factors have been identified. Too low signal levels results in too low
signal SNR, which proves to be critical factors affecting the 3D-reconstruction.
Spiky noise are present in the reconstructed intensity images, using l1 minimization, which also affected the reconstructed range images. For TV minimizations
a critical factor is when putting a checkered pattern in front of the scene, which
basically is an increase of the total variation of the intensity images.
TV minimization gives best reconstruction results in most cases, but requires
more reconstruction time to reach its optimal solutions. Computational time is
not considered a significant cost, which makes TV minimization the best reconstruction method. Using l1 minimization, DCT was the best basis.
55
56
7
Conclusions and Future Work
Designing the DMD-arrays are not straight forward and assuming that the best p
is 0.5 is not correct. Results show that an optimal choice of p is probably in the
interval 0.05 − 0.1. Further evaluations are needed.
There are other methods reconstruction the 3D-data. Increasing the signal dimensions causes computational problems. Reconstructing differences of intensity images works and could speed up reconstruction times.
7.2
Future work
Based on discoveries during this thesis work, the following problem areas have
been identified for future work:
• A real lab system should be developed, which would make the simulated
model more realistic by enabling model validation.
• More tests of the systems capability of handling noise needs to be tested.
This is related to having a working lab system, where the real noise levels
easily can be set in the simulated model. Simulations with object aliasing
should also be performed.
• The current CS solvers should be implemented in e.g. C++ or on GPU,
which will give faster reconstruction performance.
• More studies and theoretical analysis are needed to determine if the best
choice of p generally is 0.05 − 0.1 or if it is scene dependent. More scenarios need testing, including other scene setups for different undersampling
ratios.
• Find and test other basis representations, for example other wavelets. Better basis representations could mean that that l1 minimization suddenly
outperforms the TV minimization.
• More studies of how the entire 3D-scene can be reconstructed smarter needs
to be done. Instead of reconstructing 2D-slices separately the whole 3Dvolume should be considered.
• Deconvolution could increase sparsity in the measured data. The tail of
the IRF could be removed in the measurements by deconvolution of the
measurements y. This should be investigated further.
Bibliography
[1] J.D. Blanchard and J. Tanner. Gpu accelerated greedy algorithms for compressed sensing. Mathematical Programming Computation, 5(3):267–304,
2013. ISSN 18672949. Cited on page 51.
[2] E.J. Candes and M.B. Wakin. An introduction to compressive sampling. Signal Processing Magazine, IEEE, 25(2):21–30, March 2008. ISSN 1053-5888.
doi: 10.1109/MSP.2007.914731. Cited on pages 11, 12, 13, 14, 17, and 38.
[3] A. Colaco, A. Kirmani, G.A. Howland, J.C. Howell, and V.K. Goyal. Compressive depth map acquisition using a single photon-counting detector:
Parametric signal processing meets sparsity. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 96–102, June
2012. doi: 10.1109/CVPR.2012.6247663. Cited on pages 3 and 38.
[4] M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok. Compressed
sensing: Theory and applications: Introduction to compressed sensing.
Cambridge University Press, pages 1–55, 2011. Cited on page 14.
[5] M. F. Duarte, M. A. Davenport, D. Takbar, J. N. Laska, T. Sun, K. F. Kelly, and
R. G. Baraniuk. Single-pixel imaging via compressive sampling: Building
simpler, smaller, and less-expensive digital cameras. IEEE Signal Processing
Magazine, 25(2):83–91, 2008. URL www.scopus.com. Cited on page 3.
[6] S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive
Sensing. Birkhäuser, 2013. ISBN 9780817649470. Cited on page 13.
[7] J. W. Goodman. Statistical Optics, volume 1. Wiley, 2000. Cited on page 8.
[8] Markus Henriksson, Lars Sjöqvist, and Lars Allard. Compressive sensing 3d
laser radar. literature study and model experiments. FOI-D–0561–SE, 2013.
Cited on page 19.
[9] G. A. Howland, D. J. Lum, M. R. Ware, and J. C. Howell. Photon counting
compressive depth mapping. Opt. Express, 21(20):23822–23837, Oct 2013.
doi: 10.1364/OE.21.023822. URL http://www.opticsexpress.org/
abstract.cfm?URI=oe-21-20-23822. Cited on pages 4 and 12.
57
58
Bibliography
[10] G.A. Howland, P.B. Dixon, and J.C. Howell. Photon-counting compressive
sensing laser radar for 3d imaging. Applied Optics, 50(31):5917–5920, 2011.
ISSN 00036935. Cited on page 3.
[11] A. Kirmani, A. Colaçdo, F.N.C. Wong, and V.K. Goyal. Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor. Opt. Express, 19(22):21485–21507, Oct 2011. doi: 10.1364/OE.
19.021485. URL http://www.opticsexpress.org/abstract.cfm?
URI=oe-19-22-21485. Cited on page 3.
[12] A. Kirmani, A. Colaçdo, F.N.C. Wong, and V.K. Goyal. Codac: A compressive depth acquisition camera framework. In ICASSP, IEEE International
Conference on Acoustics, Speech and Signal Processing - Proceedings, number 2012 IEEE International Conference on Acoustics, Speech, and Signal
Processing, ICASSP 2012 - Proceedings, pages 5425–5428, Research Laboratory of Electronics, Massachusetts Institute of Technology, 2012. Cited on
page 3.
[13] Ranch Y. Q. Lai and Pong C. Yuen. Proppa: A fast algorithm for l1 minimization and low-rank matrix completion. CoRR, abs/1205.0088, 2012. Cited
on page 51.
[14] C Li. Compressive sensing for 3d data processing tasks: Applications,
models and algorithms. 2011. URL http://www.caam.rice.edu/
~optimization/L1/TVAL3/. Cited on pages 15 and 22.
[15] L. Li, L. Wu, X. Wang, and E. Dang. Gated viewing laser imaging with
compressive sensing. Appl. Opt., 51(14):2706–2712, May 2012. doi: 10.
1364/AO.51.002706. URL http://ao.osa.org/abstract.cfm?URI=
ao-51-14-2706. Cited on pages 3 and 51.
[16] F. Naizhang, S. Mingjian, and M. Liyong. A compressive sensing photoacoustic imaging system based on digital micromirror device. International
Proceedings of Computer Science and Information Technology, 48:23, 2012.
ISSN 2010460X. Cited on page 4.
[17] T. Neimert-Andersson. 3d imaging using time-correlated single photon
counting. Master’s thesis, University of Uppsala, 2010. Cited on pages 5
and 6.
[18] R. D. Richmond and S. C. Cain. Direct-detection LADAR Systems. SPIE,
2010. ISBN 978-0-8194-8072-9. Cited on page 8.
[19] L. Sjöqvist, M. Henriksson, P. Jonsson, and O. Steinvall. Time-of-flight
range profiling using time-correlated single-photon counting. In Proceedings of SPIE - The International Society for Optical Engineering, volume
6738, Dept. of Optronic Systems, Swedish Defence Research Agency, FOI,
2007.
URL https://lt.ltag.bibl.liu.se/login?url=http:
//search.ebscohost.com/login.aspx?direct=true&db=
Bibliography
edselc&AN=edselc.2-52.0-42149159479&site=eds-live.
on pages 5, 6, and 7.
59
Cited
[20] O. Steinvall, L. Sjöqvist, and M. Henriksson. Photon counting ladar work at
foi, sweden. Proc. SPIE, 8375:83750C–83750C–14, 2012. doi: 10.1117/12.
920294. Cited on page 6.
[21] E. Van Den Berg and M. Friedlander. Sparse optimization with least-squares
constraints. SIAM Journal on Optimization, 21(4):1201–1229, 2011. doi: 10.
1137/100785028. URL http://epubs.siam.org/doi/abs/10.1137/
100785028. Not cited.
[22] E. Van Den Berg and M. P. Friedlander. SPGL1: A solver for large-scale
sparse reconstruction, June 2007. http://www.cs.ubc.ca/labs/scl/spgl1.
Cited on page 22.
[23] P. Ye, J.L. Paredes, Y. Wu, C. Chen, G.R. Arce, and D.W. Prather. Compressive
confocal microscopy: 3d reconstruction algorithms. In Proceedings of the
SPIE - The International Society for Optical Engineering, volume 7210, page
(12 pp.), University of Delaware, Department of Electrical and Computer
Engineering, Newark, DE, 19716, USA, 2009. Cited on page 4.
60
Bibliography
Upphovsrätt
Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —
under 25 år från publiceringsdatum under förutsättning att inga extraordinära
omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid
en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman
i den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet — or its possible replacement — for a period of 25 years from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for his/her own use and
to use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity, please
refer to its www home page: http://www.ep.liu.se/
© Erik Fall
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement