SpatiotemporalAnalysisOfRangeImagery MartinSchmidt2008 HeiDOK unprot

SpatiotemporalAnalysisOfRangeImagery MartinSchmidt2008 HeiDOK unprot
Dissertation
submitted to the
Combined Faculties for the
Natural Sciences and for Mathematics
of the Ruperto-Carola University of Heidelberg, Germany
for the degree of
Doctor of Natural Sciences
put forward by
Diplom-Physiker Martin Otmar Schmidt
born in Nürnberg
Oral examination: 5. November 2008
Spatiotemporal Analysis of
Range Imagery
Referees:
Prof. Dr. Bernd Jähne
Prof. Dr. Ulrich Platt
Zusammenfassung
Die vorliegende Arbeit befasst sich mit der Fragestellung, wie aus einer Tiefenbildsequenz
das zugehörige dreidimensionale Bewegungsfeld bestimmt werden kann. Wir untersuchen das
Signal von Tiefenkameras, die auf dem Laufzeitverfahren basieren und sich eines neuartigen
optoelektronischen Bauelements bedienen, dem Photomischdetektor (PMD). Dieser liefert
neben der Tiefe auch Informationen zur mittleren Strahlungsleistung und deren Modulationsamplitude. Wir erörtern wie dieser erweiterte Informationsgehalt genutzt werden kann.
Die Rekonstruktion eines Bewegungsfeldes aus einer Bildsequenz ist ein schlecht gestelltes
inverses Problem und kann allgemeingültig nicht gelöst werden. Überdies enthält das raumzeitliche Signal einer PMD-Kamera diverse, teilweise sehr spezifische, systematische und statistische Fehler von explizit räumlicher wie zeitlicher Abhängigkeit (z.B. Bewegungsartefakte).
Wir analysieren die unterschiedlichen Fehler und entwickeln ein Verfahren zur Korrektur systematischer Tiefensignalfehler. Mit einem neuartigen Two-State-Channel-Smoothing verbessern wir von Rauschen und Ausreißern verfälschte Tiefenkarten. Wir erweitern das Strukturtensorverfahren, um damit erstmals den erweiterten Informationsgehalt der PMD-Kameras
zur Verbesserung der Bewegungsschätzung zu nutzen und Aussagen zur Güte der Schätzung
zu ermöglichen. Bei den entwickelten Algorithmen wurde darauf geachtet, dass deren Berechnungskomplexität eine Verwendung in eingebetteten Systemen nicht ausschließt. Die Algorithmen werden anhand von synthetischen und realen Einzelbildern wie auch Bildsequenzen
überprüft.
Abstract
The present thesis handles the topic of how to determine the three dimensional motion field
from a corresponding sequence of range images. We investigate signals given by range cameras
that are based on the time-of-flight principle for which they employ the novel optoelectronic
photonic-mixer-device (PMD). Its signal comprises information about the range, the mean
radiant flux and its modulation amplitude. We discuss how to take advantage of this wealth
of information.
The estimation of a motion field from image sequences is an ill-posed inverse problem which
can not be solved in general. Moreover, the spatiotemporal signal of a PMD-camera is
corrupted by several kind of, partially rather specific, errors of systematic and statistical
nature depending explicitly on time and space (e.g. motion-artifacts).
We analyze those errors and develop a method to correct for systematic errors in the range
signal. By means of a novel two-state-channel-smoothing we improve range images corrupted
by noise and outliers. We use and extend the structure tensor approach to come for the first
time to an improved motion estimate that exploits the PMD-signal and provides an inherent
measure for its confidence. The presented algorithms were developed under the premise to be
of a computational complexity that not forbids their application within an embedded system.
They are tested on synthetic and real images and image sequences.
Acknowledgments
I gratefully acknowledge the support of many people who contributed in various ways
to the completion of this thesis.
First of all I would like to thank Prof. Dr. Bernd Jähne for giving me the opportunity
to work on various interesting topics of computer vision and for supervising my thesis.
I am grateful for his kind support in both scientific and organizational issues. I thank
Prof. Dr. Ulrich Platt for agreeing to act as the second referee.
Thanks go to the staff of the IWR and HCI that do an excellent job in keeping
things running, especially to Barbara Werner and Karin Kubessa-Nasri for making
bureaucracy less painful and Dr. Hermann Lauer, Markus Riedinger and Dr. Ole
Hansen letting the data streams flow right were they should.
I am grateful to Pavel Pavlov, the most suave person I know, for giving work at the
office a congenial feel and being a inexhaustible source of mathematical knowledge.
I would like to thank Dr. Michael Klar for giving me an introduction to camera
calibration and support in various related algorithmic issues. A big thank-you goes
to PD Dr. Ullrich Köthe, who always took the time to answer my questions on various
image processing topics. Thanks to PD Dr. Christoph Garbe for giving suggestion
and tips on various topics.
For proof-reading and comments on the thesis I am deeply grateful to Dr. Achim
Falkenroth, Roland Rocholz, Claudia Kondermann, Zhuang Lin, Andreas Schmidt
and Marion Zuber.
I enjoyed working at the lab, which I blame mostly the Windis for and in particular Dr. Kai Degreif for introducing me to the small wind-wave-flume, Dr. Achim
Falkenroth, Roland Rocholz for the various discussion on water-wave-measurements,
Dr. Uwe Schimpf, Alexandra Herzog offering always some tea, Kerstin Richter,
Florian Huhn, Rene Winter, Steffen Haschler, and last but not least Dr. Günther
Balschbach for giving excellent administrative support - thank you all.
With respect to the research done for the PMD-cameras I would like to thank Holger
Rapp for all his work with the experimental setup, Mario Frank giving me access to
his range measurements, Matthias Plaue for discussions about the PMD’s working
vii
principle, Dr. Markus Jehle for experimental and theoretical support, Michael Erz
for the demodulation measurements and Dr. Hagen Spies for advice on range flow
algorithms.
Many thanks go to Dr. Björn Menze and Dr. Michael Kelm (telling me what the prior
does in the monastery and why he ROCks under the trees of a random forest), Dr. Linus Görlitz, Daniel Kondermann (helping Charon over the Styx), Dr. Ralf Küsters,
Christoph Sommer, Dr. Nikolaos Gianniotis, Dr. Marc Kirchner, Prof. Dr. Fred
A. Hamprecht, Björn Andres, Bernhard Renard, Michael Hanselmann, Frederik
Kaster, Sebastian Boppel, Bjoern Voss, Jörg Greis, Stephan Kassemeyer, Lars Feistner and Natalie Müller.
I would like to thank all the people of the IWR, HCI and IUP that gave me a cheerful
time in Heidelberg, particularly the members of DIP, MIP and IPA group.
Along my time at the IWR I worked together with numerous external collaborators
on various projects whom I would like to thank as well.
Thanks to all members of the LOCOMOTOR team for interesting discussions and
friendly collaboration; especially Dr. Ingo Stuke for giving me support with his multiple motion algorithms, Dr. Hanno Scharr for illuminating talks and Dr. Kai Krajsek
for a crash course in Kriging.
I enjoyed working together with the members of the Bosch corporate research team
in Hildesheim (CR/AEM5), in particular Henning Voelz; very likely this was the first
and last time in my life, that I can say my job is to drive a BMW through sun, rain,
and snow around Heidelberg.
I appreciate the support of PD Dr. Michael Felsberg by giving me access to his
channel smoothing algorithm.
I would like to thank all collaborators within the Smartvision project, in particular
Hermann Hoepken for the fruitful and pleasant cooperation on the demonstrator.
Working within the LYNKEUS project was a pleasant experience. Thanks to all
collaborators and in particular to Sandra Stecher for her friendly and straight cooperation, Prof. Dr. Andreas Kolb and Maik Keller, giving me support for the TOFSimulator and trying to find solutions for my application specific problems, and
Stefan Fuchs for the egomotion sequences. I gratefully acknowledged the financial support of the BMBF within the project LYNKEUS (promotional reference:
16SV2296).
Last but not least, I would like to thank my parents, my brothers and my friends (in
particular the Kumperla) for their words of encouragement and emotional support.
Contents
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
3
5
I
7
Theory
2 Range Data and Time-of-Flight Measuring Principle
2.1 Optical Range Measurement Techniques . . . . . . . . . . . . . . . . .
2.1.1 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Time-of-Flight Based Methods . . . . . . . . . . . . . . . . . .
2.1.3 Interferometry . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The Photonic Mixer Device . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Demodulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1.1 Sinusoidal Modulation . . . . . . . . . . . . . . . . . .
2.2.1.2 Rectangular Modulation . . . . . . . . . . . . . . . .
2.2.1.3 Demodulation Contrast . . . . . . . . . . . . . . . . .
2.2.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2.1 Systematic Errors . . . . . . . . . . . . . . . . . . . .
Periodic Phase Error . . . . . . . . . . . . . . . . . . . .
Fourier Approximation . . . . . . . . . . . . . . . . . . .
Constant Phase Error per Pixel . . . . . . . . . . . . . .
Overexposure and Saturation . . . . . . . . . . . . . . .
Exposure Time / Amplitude Dependent Phase Deviation
2.2.2.2 Random Errors . . . . . . . . . . . . . . . . . . . . . .
9
9
10
11
12
13
16
17
19
20
21
22
23
23
25
27
28
30
3 Image Processing and Filters
33
3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Discretization and Sampling . . . . . . . . . . . . . . . . . . . . 33
Derivatives and Gradient . . . . . . . . . . . . . . . . . 33
ix
3.1.2
3.1.3
3.1.4
3.2
3.3
Fourier Transform . . . . . . . . . . . .
Interpolation . . . . . . . . . . . . . . .
Convolution, Point Spread Function and
Filter Design and Optimization .
3.1.5 Normalized Averaging . . . . . . . . . .
Band Enlarging Operators . . . .
Edge Preserving Smoothing . . . . . . . . . . .
3.2.1 Robust Estimators . . . . . . . . . . . .
3.2.2 Bilateral and Diffusion Filtering . . . .
Two State Channel Smoothing . . . . . . . . .
. . . . .
. . . . .
Transfer
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . .
. . . . . .
Function
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
4 Motion Estimation
4.1 Optical Flow and Range Flow . . . . . . . . . . . . . . . . .
4.1.1 Optical Flow and Motion Field . . . . . . . . . . . .
4.1.1.1 Barber’s pole illusion and complex motion
4.1.2 Brightness Change Constraint Equation . . . . . . .
4.1.3 Aperture Problem . . . . . . . . . . . . . . . . . . .
4.1.4 Range Flow Constraint Equation . . . . . . . . . . .
4.1.5 Aperture Problem Revisited . . . . . . . . . . . . . .
4.1.6 Local and Global Flow Estimation . . . . . . . . . .
4.1.6.1 Local Total Least Squares Estimation . . .
Gradient Based Weighting . . . . . . . . . . .
Minimum Norm Solutions . . . . . . . . . . .
4.1.6.2 Regularization of Local Flow Estimates . .
4.1.6.3 Performance Issues . . . . . . . . . . . . . .
4.1.7 Confidence and Type Measure . . . . . . . . . . . .
4.1.8 Combining Range and Intensity Data . . . . . . . .
4.1.9 Equilibration . . . . . . . . . . . . . . . . . . . . . .
4.2 Motion Artifacts . . . . . . . . . . . . . . . . . . . . . . . .
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
37
39
41
42
43
44
44
49
53
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
62
62
63
65
67
69
72
74
74
77
78
79
80
81
83
86
88
Experiments and Applications
93
5 Testbench Measurements
5.1 Experimental Setup . . . . . .
5.1.1 Power Budget . . . . . .
5.2 Depth and Amplitude Analysis
5.2.1 Fixed Pattern Noise . .
95
95
98
100
101
x
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5.2.2
5.2.3
Range Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 102
Interdependence of Amplitude and Range . . . . . . . . . . . . 105
6 Applications
6.1 Still Image Processing . . . . . . . . . . . . . . .
6.2 Synthetic Test Sequences . . . . . . . . . . . . .
Tabular Result Scheme . . . . . . .
Algorithms and Performance Issues
6.2.1 Motion of a plane . . . . . . . . . . . . .
6.2.2 Motion of a sphere . . . . . . . . . . . . .
6.3 Real World Sequences . . . . . . . . . . . . . . .
6.4 Summary . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
. 111
. 115
. 115
. 117
. 118
. 123
. 127
. 133
7 Conclusion and Outlook
135
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2 Evaluation and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 137
III Appendices
141
Acronyms and Notation
145
Bibliography
147
xi
xii
Chapter 1
Introduction
1.1 Motivation
There is a vast number of tasks in science and industry that involve the analysis
of image data. Generally, these image data are images of two or three dimensional
kind, i.e. projections of features of some physical object or scene on a two or three
dimensional (regular) grid. These features are physical properties like the spectral
reflectance of a surface (i.e. its color) or the absorbance of body tissue. The data
is acquired with some imaging device like a (digital) camera, microscope, computer
tomography scanner or magnetic resonance scanner, to name only a few. The sampled information can be scalar, vector valued (e.g. multispectral imaging) or even of
tensorial kind.
Sometimes it is sufficient to analyze single images. But with the advances in computing power and storage capacity more and more one wants to utilize the additional
information embedded in the temporal domain. Or spoken differently, some problems can be tackled properly only if their inherent dynamic nature is caught in image
sequences.
Often the motion of single or multiple objects is of interest, like for time-to-collision
estimation in automotive industry. Sometimes the temporal evolution of (non rigid,
deformable) surfaces is studied, e.g. the motion of waves on the water-surface. Even
if the dynamics itself are not of interest they still might need to be accounted for,
because they introduce perturbations that need to be corrected. As for instance
image registration techniques in medical imaging try to compensate the motion of
a patient during the (long time) acquisition of an image series. Particularly for
1
Chapter 1. Introduction
many scientific applications an exact measurement of the motion field is of major
importance (e.g. calculation of growth rates of plant leaves).
The calculation of a three dimensional physical motion field from a corresponding
(2D) image sequences is however not an easy problem to solve, as we will discuss in
the context of this thesis. In fact it can not be solved generally.
The recent development of the so called photonic-mixer-device (PMD) marks a substantial progress towards that goal to reconstruct the physical motion field from an
image sequence. For with them not only the irradiance information (known as gray
value of conventional image sensors) but additionally the distance to the observed
(object-) surface can be acquired at the same time.
A PMD-sensor is an integrated circuit of an array of single PMD-pixels. It measures the distance based on the time-of-flight principle using an active illumination.
The observed scene is illuminated by infrared, modulated, incoherent light which is
reflected and gathered in an array of solid-state image sensors comparable to those
used in common digital (CMOS/APS) cameras. The major benefits in comparison
with range measurement techniques like stereo vision or laser scanners are
• real-time range- and brightness-image acquisition
• no correspondence problems and low algorithmic effort
• inexpensive (because conventional) manufacturing process
• compact and robust ”off-the-shelf”-cameras are available
Before we continue, we need to state some words about the denotation of some terms
we use throughout this thesis. If we are talking of the PMD-technique or simply the
PMD, we mean sensors or cameras that are based on the principles of optoelectronic
modulation based time-of-flight measurement, that we present in section 2.2. While
the acronym PMD (photonic-mixer-device) relates to a specific realization of this
technique protected by patent (held by PMDTechnologies GmbH), we still use it
for all similar realization as there is no other common acronym for this technique.
With gray value sensors we denote conventional imaging devices (digital cameras)
that measure the irradiance on the sensor pixels. By 3D motion field we mean a
two dimensional field of three dimensional velocity vectors that corresponds to the
physical motion of surface patches projected on a two dimensional array (the sensor)
by some optical receiver (an objective).
2
1.2. Outline
The image-signal of the PMD-sensor comprises information about the range, the
mean radiant flux and its modulation amplitude. In the present thesis we show
how to utilize this extended information content for the improvement of the range
measurement itself and for the estimation of an exact 3D motion field corresponding
to an acquired image sequence. We illuminate the PMD-measurement technique in
the context of image processing, particularly motion estimation.
The signal of a current PMD-sensor bears several kind of errors. Some of them are
rather specific to the PMD. For example motion-artifacts or a bias in the rangemeasurement that is modulated in its magnitude with increasing distance. The error
types are of systematic and statistical nature and depend explicitly on time and
space. We analyze those errors and develop a method to correct for the bias in the
range signal.
The work presented in this thesis partially evolved under collaboration within the
LYNKEUS project, funded by the Federal Ministry of Education and Research
(BMBF). The project addresses problem solutions by means of 3D-vision for applications in automation engineering and robotic, autonomous vehicles, man-machine
interaction, safety engineering, medical engineering and quality control. Therefore
the computational efficiency was an important constrain for the algorithms to develop. For the solution of tasks that might be implemented on an embedded system
(like e.g. motion estimation), we tried to come up with algorithms of not more than
linear complexity in time and space and we avoided iterative algorithms whenever
possible.
1.2 Outline
The single chapters of this thesis are not self-contained. The subject is just too
complex for that to be possible. However, we tried to always refer to the respective
sections in the thesis when a concept or term is used for the first time within another
chapter and we do not believe it to be common knowledge. A reader who just wants
to have a look at a specific section and is not about to read the thesis from beginning,
is advised to use the electronic version of the thesis, as following the references via
the hyperlinks is possible with ease. For the introduction to motion estimation in
chapter 4 we recommend the electronic version too, because there is an illustrating
animation embedded that (naturally) can be viewed only using the electronic version
(with Acrobat Reader).
3
Chapter 1. Introduction
The thesis is split in two parts: Theory and Experiments and Applications. In the
former we give the theoretical fundament for the proposed methods and analyze
errors under simplified, idealized assumptions. In the latter we present and discuss
results of the experimental analysis regarding data acquisition, still image processing
and motion estimation. The content of the chapters is the following
Chapter 2 We give a short overview of common range measurement techniques and
describe the relevant details of the PMD measuring principle for image processing. We specify the features of the PMD, both the advantageous and the
problematic ones. We analyze its errors of common and specific nature and
present a new method to correct for a specific kind of systematic error in the
range measurement.
Chapter 3 We describe image processing particularly in the context of filtering. We
explain the basics of linear filtering and introduce more advanced concepts of
non-linear kind, like robust estimation. We show how the PMD-signal can be
exploited by means of non-linear filtering and present an extension to B-spline
channel smoothing, that we named two state channel smoothing, to improve
range imagery in the presence of noise and outliers.
Chapter 4 We give an introduction to motion estimation and illuminate the general
problem that it poses. We describe how a three dimensional motion field can be
calculated from the PMD-signal, by extending the structure tensor approach to
motion estimation for the specialties of an active vision system like the PMD.
We describe a specific error of PMD-range measurement that occurs in the
presence of motion, so called motion artifacts.
Chapter 5 We describe the experimental setup that was used to acquire image sequences and calibration data. We exemplify diverse errors of the PMD and
correct for fixed-pattern-noise and systematic range deviations. Particularly
we study the interdependence of range and amplitude, important for motion
estimation.
Chapter 6 We present and discuss results for still image processing and motion
estimation on synthetic and real-world data. We systematically investigate
by various examples the basic difficulties in 3D motion estimation from range
imagery. We give a proof-of-concept of the method, demonstrate its successful
application on real-world data and discuss its limitations.
4
1.3. Contribution
Chapter 7 We give a résumé of the achieved results and an evaluation of the PMDtechnology w.r.t. motion estimation. We try to highlight important topics that
should be addressed regarding the sensor technology as well as algorithmic
aspects of motion estimation.
In the appendix III the reader may find a list of acronyms and abbreviations and a
description of the notation used throughout the thesis. The author hereby apologizes
if he should have failed to adhere this notation style anywhere and thereby diminished
the comprehensibility of the text.
1.3 Contribution
The following is a list of what the author believes to be the novel contributions of
this thesis
• a largely self contained presentation of 3D motion field estimation using the
structure tensor approach extended for PMD-technology
• a concise overview on the most prominent error types of PMD-sensors, particularly those introducing interdependencies in the range and amplitude measurement
• a novel method to correct for a specific kind of systematic error in the range
measurement
• an extension for B-spline channel smoothing to improve range imagery in the
presence of noise and outliers, which exploits the technicalities of the PMDsignal
• an extension for the structure tensor approach to motion estimation that takes
advantage of the wealth of information embedded in the PMD-signal, achieving
betters motion estimates w.r.t. the aperture problem and in the presence of
noise
• first time application of the structure tensor approach to 3D motion estimation
on PMD-data
5
Chapter 1. Introduction
6
Part I
Theory
7
Chapter 2
Range Data and Time-of-Flight
Measuring Principle
2.1 Optical Range Measurement Techniques
There is an abundance of techniques for measuring distances or ranges ∗ to objects.
Objects can be punctiform like a star in the sky or spread-out like the ground of the
sea below a ship. What kind of range or object is of interest depends on the specific
application, just like the method that is suitable for doing the measurement. Range
measurements can be contactless (via sound or light) or tactile. Tactile measuring
devices can be very simple like a plumb line or fancy like an atomic force microscope.
Computer vision implies the use of contactless optical techniques for range measurement; in most cases used to find an accurate description of surfaces. The wealth of
methods there are, demonstrates the importance of range measurement for all kinds
of applications.
The most important optical range measurement techniques can be divided into three
categories: Triangulation, interferometry and time-of-flight based methods.
9
Chapter 2. Range Data and Time-of-Flight Measuring Principle
Figure 2.1: A stereo camera setup (from [Jäh02])
2.1.1 Triangulation
Triangulation is based on the fact that a scene is seen under a different viewing angle
if the viewing position changes. The difference in the viewing angle causes a shift
in the projected image, from which the distance to the elements of the scene can be
inferred, if specific conditions are met.
In Stereoscopy, which is a specific realization of the triangulation principle, depicted
schematically in figure 2.1, a point X is projected onto two different positions on
the image plane of two different cameras with parallel optical axes, separated by a
distance b, the stereoscopic basis. The difference in the position is called disparity
or parallax p and is inverse proportional to the range X3 , as can be derived from
geometrical optics:
X1 + b/2 X1 − b/2
d0
p = rx1 − lx1 = d0
−
=b
(2.1)
X3
X3
X3
Assuming the noise in the parallax measurement to be Gaussian, with a standard
deviation σp , and applying the laws of error propagation, we deduce that the absolute
precision of a range estimate decreases with the squared distance.
z ≡ X3 =
∗
b d0
z2
bd0
; σz = 2 σp =
σp
p
p
b d0
(2.2)
The commonly used term range, can be somewhat confusing, as it is also a synonym for domain;
at passages where the meaning might be ambiguous, we will use the terms depth range and
distance synonymous with range to clarify. Besides, from a physical point of view, a distance
measurement is, because of Heisenberg’s uncertainty principle, always associated with a range of
distances (or rather a probability distribution) and never with a precise single value
10
2.1. Optical Range Measurement Techniques
Another variant of triangulation is Active Triangulation, which replaces one of the
cameras by a light projector.
Triangulation suffers from the fact, that the observed object needs to be textured,
because corresponding points in two images need to be identified. If this is not possible this is typically due to the so called aperture problem that can not be solved (see
section 4.1.3 for a discussion of the problem in the context of motion estimation);
then the parallax, which is essential for triangulation based range estimation, can
not be determined. Moreover, one needs at least two, preferably calibrated, optical devices if the observed objects are moving.† Those systems are typically neither
cheap nor easy to maintain, due to necessary calibrations, if not operated in an ideal
environment.
2.1.2 Time-of-Flight Based Methods
The basic idea of time-of-flight based methods (TOF) is to determine the delay on
a signal (typically on an electromagnetic carrier wave) induced by the time it needs
to travel a certain distance. This delay is directly given as time τ needed to cover
twice the distance z between a detector and an object, for a light pulse that travels
to an object, is (diffusely) reflected from it and then detected on the same position
from which it was sent:
τ=
2z
, where c is the speed of light.
c
(2.3)
With the upper equation for an ideal time-of-flight measurement, we infer that in
contrast to triangulation the precision of the distance measurement is independent
of the distance but direct proportional to the precision of the time measurement τ :
z=
c
c
τ ; σz = στ .
2
2
(2.4)
Using a light pulse or pulses (i.e. pulse modulation) and measuring the time delay is
rather demanding regarding the speed of light and typical distances to be measured,
because frequencies in the order of GHz and above need to be handled properly.
†
If the objects are not moving while the image acquisition takes place, one could also move the
camera to generate the parallax.
11
Chapter 2. Range Data and Time-of-Flight Measuring Principle
Using continuous wave modulation (CW), the carrier wave is modulated periodically
with a frequency ν. Here, not a time delay but a phase shift φ between outgoing and
incoming signal is determined:
z=
c
c
φ ; σz =
σφ .
4π ν
4π ν
(2.5)
The phase shift can be determined by correlating the signals in time. This is less
demanding and more robust with respect to tolerances of the used measuring device
components compared to pulse modulation.
The chosen modulation frequency ν determines the distance range ∆z = c/2ν that
can be measured uniquely and which is known as unambiguity range. Because the
phase shift φ used to calculate z is a cyclic variable, distances z 0 above ∆z yield an
erroneous range z = z 0 mod ∆z.
2.1.3 Interferometry
Interferometry can be regarded as a special case of time-of-flight using continuous
wave modulation, where the modulation is given by the frequency of the electromagnetic wave itself. The radiation needs to be coherent as otherwise the crosscorrelation of out- and incoming signals can not be used to determine the phase
shift, i.e. the waves do not interfere. For an electromagnetic wave, c is ν · λ. Substitution in equation (2.5) yields:
z=
λ
λ
λ
φ ; σz ∼
σφ and ∆z =
4π
4π
2
(2.6)
Detailed analysis shows that for classical interferometry, as realized in a Michelson
interferometer, σz is proportional to the inverse of the distance or the aperture of
the observation, as it features an optical averaging over the micro-topology of the
object under investigation: σz ∼ z −1 ([Häu+99]).
To overcome limitations given by the small, wavelength-determined unambiguity
range ∆z, multiwavelength interferometry can be used. Due to speckle noise that
occurs when coherent light is reflected from a rough surface, classical interferometry
is only suited well for smooth surfaces.
White light interferometry, more precisely coherency radar, uses a radiation that has
a coherence length of only a few wavelengths, thus interference patterns arise only for
12
2.2. The Photonic Mixer Device
distances of this coherence length. The range is found while scanning over a distance
range and looking for interference of maximum contrast. As the interference contrast
is used as the basis for the range measurement, unlike phase information in classical
interferometry, speckle noise has no influence and rough surfaces can be measured.
There are numerous interferometry methods for all kinds of applications reaching the
highest possible depth-resolution and competitive measuring ranges of several meters,
at the cost of a sensitive, expensive, highly specialized and therefore inflexible setup.
2.2 The Photonic Mixer Device
The following section is a condensed description of the photonic mixer device (PMD)
based on the work of Lange [Lan00]; Heinol [Hei01]; Justen [Jus01]; Schneider [Sch03]
and our theoretical and experimental findings (see also chapter 5). The focus is on
highlighting aspects that are relevant for the tasks of image processing, especially
motion estimation, avoiding the very details of technical realization. Furthermore,
we describe a new way to correct for systematic errors in the phase measurement and
give a new formula for calculating the phase shift of purely rectangular modulated
signals.
The photonic mixer device realizes TOF-range measurement by continuous wave
modulation. Compared to more conventional systems, that do the necessary detection of the optical signal and its cross-correlation with the reference signal separately
from each other, the PMD integrates both in one semiconductor based circuit. By
moving the process of mixing and correlation of the signals from a separate electronic component to the integrated optoelectronic interface, the major sources of
errors of conventional systems are avoided. Moreover, the parallelization of range
point measurements on a sensor matrix, as it is essential for a 3D-camera system, is
significantly simplified [see Sch+98]. Figure 2.2 illustrates the principle of concurrent
detection and mixing, i.e. the simultaneous generation of photoelectrons and mixing
with the electronic reference signal, which is the essential new feature of the PMD.
Today’s PMD-sensors combine the functionality of the CCD-principle with the benefits of a realization in a CMOS-process see [Lan00]. The CCD principle offers an
almost noise-free addition of optically generated charge carriers and defined local and
temporal charge separation in the charge domain, being crucial for an optimal performance of a PMD. As the PMD successively adds short-time integrated sampling
13
Chapter 2. Range Data and Time-of-Flight Measuring Principle
Modulated
Reference Signal
∫
3D
Scene
Controller
Transmitting
Optics
Distance
Signal
Receiving
Optics
Figure 2.2: Principle of concurrent detection and mixing in the PMD
(based on [Sch03])
points, a reasonable signal-to-noise ratio (SNR) can only be achieved, if this works
practically free of noise. Furthermore, only the numerous addition of the short-time
samples (i.e. the cross-correlation) suppresses the higher-order and non-harmonic frequencies in the modulation signal (that are unavoidable due to technical constraints)
and allows the pixels to act as lock-in pixels, insensitive to frequencies other than the
modulation frequency.
CMOS-processes are widely available and as such relatively cheap. They allow the
implementation of the PMD as an active pixel sensor (APS), what means pixels
with an active stage [Fos93]. An APS permits random access pixels (and therefore
application specific performance enhancements) and an improved SNR. The CCDprinciple can not only be realized within a CCD-process, but also within a CMOSprocess. The maximization of the charge transfer efficiency CTE, that is one of
the major benefits of the CCD-process, is of minor importance with respect to the
PMD, as only a few charge transfers are needed (for details see [Lan00, chap. 5]).
Using CMOS circuitry, first signal processing steps (like the temporal sampling, i.e.
demodulation) can already be realized in the pixel, while maintaining reasonable
fill factors. The fill factor is of major importance, because the performance of the
PMD-camera depends on the modulated signal from an active illumination. As the
intensity of back-scattered light decreases with the square of the target’s distance to
the camera, again the active illumination accounts for the need of a high dynamic
range of the PMD-pixels.
The pixel’s working principle is exemplary illustrated in figure 2.3a: The potential
gradient in the semiconductor is controlled by applying proper gate voltages (umod,A
14
2.2. The Photonic Mixer Device
a
uout,A
e
e-
e-
e-
e-
e
Separation Gate B
Readout Diode B
S(t)
R(t+θ/ω)
θ =0°
uint
=
S(t)
S(t)
R(t+θ/ω)
R(t+θ/ω)
θ =90°
θ =180°
+
n+
A
B
ee-
x
y (x)
b
uSep
Modulation
Gate B
Modulation
Gate A
Readout Diode A
Separation Gate A
umod,A Modulation umod,B
n+ Silicon dioxide p-Substrate
-
ee- ee- e-
uout,B
Readout Circuitry
e-
=
0°
-
90°
180°
270°
θ
Figure 2.3: PMD-pixel’s working principle: a schematic structure of the circuit
b sampling of the cross-correlation function (adapted from [Alb07])
and umod,B ) to the photogates A and B. The potential gradient is changed synchronously with the modulation of the incoming light, so the optically generated
charge carriers are either driven into the left or the right integration gate (under
the readout diodes A and B), each for half of the modulation period. For a typical
modulation frequency of e.g. 20MHz and an integration time of 5ms these short time
integrations repeat a hundred thousand times.
It is worth noting that within one short time integration (for the upper example,
half a period is 25ns) only a few electrons (or none, depending on the optical power)
are generated and transported to the integration gates. As the accumulation of
single electrons under the integration gate is essentially noise free, this is actually an
advantage, because self-induced drift ‡ , which would lower the charge transfer to the
integration gates, has no influence for the PMD [Lan00].
The accumulated charge in the capacity under one integration (or readout) gate,
results in a measurable voltage uout at the readout diodes. This voltage approximates
a cross-correlation of the optically generated input signal S(t) with the phase-shifted
sampling signal R(t), the phase-shift θ being the correlation variable. Figure 2.3b
illustrates the sampling for 3 different phase-shifts θ at 0◦ , 90◦ and 180◦ and the
‡
self-induced drift is induced by the coulomb forces between free charge carriers of same polarity,
making them repel one another. It’s of practical significance only, if many free charge carriers are
located close to each other, and as such depends heavily on the number of generated electrons.
15
Chapter 2. Range Data and Time-of-Flight Measuring Principle
resulting voltages at the readout gates A and B. The electrooptical input-signal S(t),
indicated by the red bars, is always the same (because the scene does not change).
Only the sampling (or reference) signal is phase shifted. If the readout gates A and B
are identical w.r.t. the manufacturing process then uout,A (θ + 0◦ ) = uout,B (θ + 180◦ ).
The output signal is only an approximation of the correlation, because not the sampling signal but some not necessarily linear function, dependent on the sampling
signal and properties of the semiconductor circuit, describes what fraction of the
photoelectrons are deposited in the integration capacity. For example a generated
photoelectron may recombine with its hole before it reaches the capacity, because
it was generated too far from it. In the ideal case, the fraction would be one, as
soon as the potential drives the electrons to the corresponding integration gate,
and zero otherwise. This way the correlation would be that of a square wave (its
range being {0, 1}) with the electrooptical signal of the same period. The phase
of the square wave is shifted by 180◦ for the two different gates. For the ideal
case of a sinusoidal modulated electrooptical signal, the correlate is sinusoidal too:
R 2π
0 sin(x + ϕ)·H(sin(x + θ))dx = 2 cos(ϕ − θ) with H being the Heaviside step
function.
2.2.1 Demodulation
The described process of demodulation shall now be put in mathematically formalism, making some idealizing assumptions.
The irradiance E(t) seen by the PMD-sensor may be modeled as
E(t) = G0 0 + A0 ·M (t),
with M (t) = M (t + T ).
(2.7)
M (t) is a periodic function with (time-) period T (and angular frequency ω = 2π/T )
that originates from the modulated illumination of a spatial scenery. G0 0 is a constant
irradiance offset and A0 is the amplitude associated with the (normalized) modulation
M (t). If we assume the photosensitive semiconductor to have a linear responsivity,
the induced electrooptical signal S(t), which is proportional to the generated charge
carriers, is
S(t) = G0 + A·M (t),
(2.8)
with G0 and A just being linearly scaled versions of G0 0 and A0 . Of course this
is only an approximation as both G0 and A are influenced by various optical and
16
2.2. The Photonic Mixer Device
electrooptical properties of the system (e.g. transmission of optical filters or quantum
efficiency of the PMD). The details of these properties are discussed in [Lan00; Lua01;
Jus01; Sch03] and are out of the scope of this thesis.
Because of the time it takes for the light to travel to the object and back again,
the modulation M (t) is delayed for a phase-difference ϕ with respect to a reference
modulation M 0 : M (t) = M 0 (t−ϕ/ω). Under ideal conditions, the PMD demodulates
the signal S(t) by correlating it with a discretized version of M 0 , namely R(t) =
H(M 0 (t)), H being the Heaviside step function. The cross-correlation function I(θ)
reads
mT
Z
θ
I(θ) =
S(t)·R(t + )dt.
ω
(2.9)
0
2.2.1.1 Sinusoidal Modulation
Assuming the modulation being sinusoidal M 0 (t) = sin(ωt), we find for a correlation
range of m (∈ N+ ) periods
mT
Z
I(θ) =
(G0 + A sin(ωt − ϕ))·H(sin(ωt + θ))dt
(2.10a)
0
m
ZT
(G0 + A sin(ωt − ϕ − θ))·H(sin(ωt))dt
=
0
T
Z2
= m (G0 + A sin(ωt − ϕ − θ))dt
0
T
2
+
Z
ϕ+θ
ω
=m
(G0 + A sin(ωt))dt
(2.10b)
ϕ+θ
ω
= mT
A
G0
cos (ϕ + θ) +
π
2
,
(2.10c)
considering that H(sin(ωt)) is one for half a period [0, T /2] and zero otherwise.
17
Chapter 2. Range Data and Time-of-Flight Measuring Principle
Equation (2.10c) describes the idealized demodulation for a sinusoidal modulation
that is done by the PMD. A generalized version of it is given by Plaue [Pla06] for
modulations M (t) that can be approximated by a Fourier series. I(θ) corresponds
to the output of the PMD, and θ is the variable of the cross-correlation function
that the PMD implements. Choosing a specific θ corresponds to sampling the crosscorrelation function. According to equation (2.9) we may select a specifc θ by phaseshifting the reference signal R(t) respectively.
With respect to a TOF-camera, we are interested in the unknowns ϕ, A and G0 .
ϕ corresponds to the range r by ϕ = 2ωr/c, c being the speed of light (see equation (2.5)). A is proportional to the amplitude of the modulation of the active illumination of the camera. G0 sums up background illumination and the DC-component
of the active illumination.
As we have three independent unknowns, we need at least 3 equations to find a
solution. By taking samples of I(θ) for several θ between [0, 2π], we obtain those
equations. One equation (2.10c) for each sample. As the output of the PMD is
subject to noise, it is reasonable to take more samples to find the optimal solution in
a least squares sense; anyhow the implementation of the PMD, which was described
above, returns at its two output gates in one shot two samples shifted by 180◦ .
Taking another shot with 90◦ (and therefore also 270◦ ) phase-shift gives us enough
information to solve for the unknowns of equation (2.10c) optimally in a least square
sense. Of course, one can take more than 4 samples to improve the variable estimation
with respect to noise. For N equidistant sampling points, the optimal solution in a
least squares sense is given by:
"N −1
#
X
ϕ = arg
In e−ıθn
n=0
N −1
2π X
−ıθn A=
In e
N G0 =
18
2
N
n=0
N
−1
X
In
n=0
with
I(θn )
,
mT
n
θn = 2π
.
N
In =
(2.11)
2.2. The Photonic Mixer Device
A proof for an equal formulations of equation (2.11) can be found in [Pla06] or [Xu99].
It simplifies for N = 4 to:
ϕ = arg [(I0 − I2 ) + ı(I3 − I1 )]
πp
A=
(I0 − I2 )2 + (I3 − I1 )2
2
I0 + I1 + I2 + I3
G0 =
2
I(θn )
,
mT
with
π
θn = n .
2
In =
(2.12)
Most PMD-based TOF-cameras use essentially these equations to determine phase
and amplitude. However, the amplitude A calculated here is that of the electrooptical input signal, while formulas given in literature (e.g. [LS01]) often calculate the
amplitude of the correlation signal. It is important to be aware of the difference, if
it comes to the interpretation of A as a physical property of the optical signal: equation (2.12) compensates for the so called demodulation contrast. In particular, if the
demodulation contrast depends on the measurement itself (i.e. it is not a constant)
the difference is of relevance, as we will discuss in section 2.2.1.3 on page 20.
2.2.1.2 Rectangular Modulation
Now let us suppose that the modulation is not sinusoidal but rectangular. In equation (2.10b) we then have to replace sin by sgn ◦ sin, i.e. a square wave with range
{−1, 1}. The resulting correlation function is a triangle wave:
T
2
+
Z
ϕ+θ
ω
(G0 + A · sgn(sin(ωt)))dt
I(θ) = m
(2.13)
ϕ+θ
ω
= mT
where tri(φ) =
tri(0) = π2 .
π
2
A
G0
tri (ϕ + θ) +
π
2
,
− arccos(cos(φ)), i.e. a triangle wave with range [− π2 , π2 ] and
If we apply equation (2.12) to correlation samples (2.13) that result from a rectangular modulation, there is a systematic error in the phase estimation, as the model
assumption of a sinusoidal modulation do not apply. We will discuss this in detail
19
Chapter 2. Range Data and Time-of-Flight Measuring Principle
in section 2.2.2. Here we just want to state an exact solution for the unknowns in
equation (2.13) given 4 equidistant sampling points:
A = max [|(I0 − I2 ) + (I1 − I3 )| , |(I0 − I2 ) − (I1 − I3 )|]
I0 − I2 1
1
ϕ = sgn(I1 − I3 ) · (
+ )+
4A
4
2
I0 + I1 + I2 + I3
G0 =
2
I(θn )
mT
with
π
θn = n
2
In =
(2.14)
The author found the solution by analyzing the symmetries in the 4 correlation
signals I(θn , ϕ) (2.13) and checked it by inserting these in the solution (2.14) using a
computer algebra system. However, he is not sure if it is really new, as the problem
seems to be a quite common; but he could find nothing similar in PMD-related
literature.
2.2.1.3 Demodulation Contrast
In optics, the modulation transfer function MTF is of major importance for the
description of an optical system. It is based on the modulation (or modulation
contrast) M , which typically refers to a spatial pattern:
Lmax − Lmin
Lmax + Lmin
Mimage
MTF =
.
Mobject
M=
(2.15)
(2.16)
L is the radiance (or luminance) and Mobject and Mimage the modulation of an
object and its image. If the object has a modulation of 100% (if Lmin = 0) the MTF
cancels to Mimage . Equation (2.16) is correct only if we assume there is only a single
frequency (of e .g. a spatial pattern), as the MTF depends on the frequency. More
precisely, the MTF is the magnitude of the (optical) transfer function of a (optical)
linear system (see section 3.1.4), and describes the response of an optical system to
an image decomposed into sine waves.
The demodulation contrast is defined similarly but refers to a modulation in time:
measured amplitude
(2.17)
measured offset
It quantifies the PMD’s performance of charge separation. The amplitude of the
sampled correlation I(θ) (2.10c), assuming a perfect charge separation, is with respect
Cdemod =
20
2.2. The Photonic Mixer Device
to the integration time m T attenuated by a factor of 1/π relative to the original
modulation of S(t); the offset is G0/2. If we assume a modulation contrast of 100%
of the electrooptical signal S(t) then G0 = A. So the demodulation contrast of this
idealized PMD is:
Cdemod =
A
π
G0
2
=
A
π
A
2
=
2
≈ 64%.
π
The various realizations of the PMD, however, neither have a demodulation contrast
of this magnitude nor it is constant. It tends to be below 40%. Moreover, Cdemod
depends on the modulation frequency and the radiant energy deposit on a PMDpixel (and other hardwired system features, that are of no interest in the scope of
this thesis) during the exposure [Lan00; Lua01].
While the dependence on frequency is negligible for motion estimation, as typically
the modulation frequency is not changed during image acquisition, the dependence
on radiant energy is of major importance: We want to use the amplitude of equations (2.12) or (2.14) as a measure for the radiance emitted from a (moving) objectpatch in the scene. We then can use this information in a physical motivated model
of the scene for motion estimation, as will be discussed in section 4.1. If however
the demodulation contrast itself depends on the radiance, we have to compensate for
this, as (2.12) and (2.14) only apply for constant Cdemod .
This can be achieved by doing a calibration measurement Cdemod subject to G0 and
use it to correct the measured amplitude for the varying contrast. Then for example
the amplitude calculated for a sinusoidal modulation in (2.12) becomes:
p
(I0 − I2 )2 + (I3 − I1 )2
A∼
.
(2.18)
Cdemod (G0 )
We dropped for the sake of simplicity any constants of proportionality because they
are marginal within the context of motion estimation. Furthermore, equation (2.18)
only holds true if there is no background illumination during calibration and image
acquisition, as the demodulation contrast depends on background illumination itself.
2.2.2 Error Analysis
Current PMD camera types show various errors — systematic as well as random
ones — in their range signal. We first investigate the systematic errors, which result
21
Chapter 2. Range Data and Time-of-Flight Measuring Principle
from deviations of the technical realization from the model assumptions that were
used to derive equation (2.11). Most of them may be corrected by an appropriate
calibration. To do a proper calibration, we need at least a model for the errors
and a way to measure them. Hence, in the following we will describe the errors,
model them, and show ways to correct them; how the errors are measured is part of
chapter 5.
2.2.2.1 Systematic Errors
The investigated systematic errors are
• a periodic, sinusoidal deviation of the phase measurement over the unambiguity
range
• a constant phase deviation per pixel, which corresponds to the dark-response
nonuniformity (DRNU) of classical CMOS sensors (often laxly called ”fixed
pattern noise”) but has quite different reasons
• overexposure of individual pixels
• an exposure time dependent constant phase deviation
There are some more known systematic errors that will not be addressed here but
are nevertheless of some importance:
• Phase drift due to thermal effects
• Near field errors due to the extended, non-punctiform, non-radial (in respect
to the object lens) modulated illumination
22
2.2. The Photonic Mixer Device
Periodic Phase Error If we assume a periodic modulation of the illumination which
induces a modulation of the number of emitted photoelectrons and assume the reference signal to be modulated with the same frequency, then the samples the PMDsensor returns are those of a periodic signal too. The reason for this is that the PMD
performs an operation that corresponds to cross-correlating the light- and reference
signal as discussed in section 2.2.1. The sampled signal however is not necessarily
a sinusoidal one. If for example the light and reference signal are assumed to be
rectangular, then the cross-correlation is a triangle wave (see equation (2.13)).
Applying
~ = arg [(I0 − I2 ) + ı(I3 − I1 )]
ϕsin (I)
on the vector I~tri of cross-correlation samples
A
G0
tri (ϕ + θn ) +
,
π
2
yields a periodic error in the phase estimation:
Itri (θn ) ∼
Eϕ (ϕ) = ϕsin (I~tri ) − ϕ ∼ arg [arcsin (cos(ϕ)) + ı arcsin (sin(ϕ))] − ϕ
(2.19)
1
π
sin(4ϕ)
(2.20)
≈ (3 − arctan(3)) sin(4ϕ) ≈
8
14
Thus the maximum absolute error is 1/14/2π = 1.14% of the unambiguity range. The
error of the approximation using sin(4ϕ) is less then 10% of the phase error. Because
motion estimation deals with derivatives of range measurements, more important
than the absolute error, is the relative error with respect to the slope of ϕsin :
Erel =
∂ϕ Eϕ
4
= ∂ϕ Eϕ ≈
cos(4ϕ).
∂ϕ ϕ
14
This means that we introduce a maximum error of 4/14 ≈ 29% if we calculate rangeslopes based on measurements that assume that the modulation is sinusoidal, while
it is actually rectangular.
Fourier Approximation If we have a look at the testbench range measurements
(see section 5.2.2), we find that the systematic error has indeed a major component
going with sin(4ϕ). However, there are additional smaller components of higher and
lower harmonics of the angular base frequency n = 1. Therefore, we may approximate
the error by a Fourier series:
E 0 (ϕ) = offset +
k
X
(an sin(nϕ + θn ))
(2.21)
n=1
23
Chapter 2. Range Data and Time-of-Flight Measuring Principle
The Fourier coefficients can be found numerically by doing a least squares fit. If
data can be acquired for the whole phase range of 2π one may use also a FFT. The
erroneous phase measurement ϕerr , then is described by
ϕerr (ϕ) = ϕ + offset + E(ϕ),
where
E(ϕ) = E 0 (ϕ) − offset,
(2.22)
and by defining φ(ϕ) = ϕerr (ϕ) − offset, we get rid of the constant offset error:
φ(ϕ) = ϕ + E(ϕ).
We need to find the inverse function ϕ(φ), if we want to correct the measured data
φ to become the true value ϕ. As there is no analytic exact solution for the inverse
function we may take E(ϕ) to be a small perturbation and approximate the inverse
function as the inverse of its Taylor polynomial.
The first order Taylor series of φ(ϕ) at ϕ = ϕ0 is
φ(ϕ) = ϕ0 + E(ϕ0 ) + ∂ϕ E(ϕ0 ) · (ϕ − ϕ0 ) + O (ϕ − ϕ0 )2 .
The inverse of the Taylor series (given by Mathematica) is
φ 1 (φ(ϕ)) = ϕ(φ) = ϕ0 −
Choosing ϕ0 to be φ we obtain
ϕ0 − φ + E(ϕ0 )
+ O (ϕ0 − φ + E(ϕ0 ))2
∂ϕ E(ϕ0 ) + 1
§
φ − φ + E(φ)
+ O E(φ)2
∂ϕ E(φ) + 1
E(φ)
≈φ−
.
∂ϕ E(φ) + 1
ϕ(φ) = φ −
(2.23)
The inverse Taylor polynomial (2.23) is a good approximation for ϕ(φ) if |E(φ)| 1
!
and E(φ) ≈ E(ϕ + ∂ϕ E(ϕ) · E(ϕ)) ≈ E(ϕ) (which conditions ∂ϕ E(ϕ) to be small
too), because only then the remainder indicated by O(E(φ)2 ) is small compared to
E(ϕ) (the error that needs to be corrected). Additionally φ(ϕ) is invertible only
if it is monotonic, implying that ∂ϕ E > −1. All requirements are fulfilled if the
Fourier-coefficients an and the number of modes k needed to describe the error are
small, which is in good agreement with the measurements.
Equation (2.23) is a compact, analytic solution for the problem of correcting a phase
error that can be described by equation (2.21). If necessary an approximation by a
§
mathematical more precise we take ϕ(ϕ0 ) in the limit of φ: limϕ0 →φ ϕ(ϕ0 )
24
2.2. The Photonic Mixer Device
higher order Taylor polynomial can be derived with ease, at the cost of increasing the
resources needed to calculate the inversion. The necessary data for calculating the
Fourier coefficients has to be acquired during calibration of the camera. Typically
k = 4 Fourier modes are sufficient to suppress the error considerably (see chapter 5
for results). Compared to a lookup table or B-spline approximations [LK06; KRI06]
(in extreme cases for every pixel) this is very efficient with respect to needed memory
resources and acceptable regarding processing time.
Constant Phase Error per Pixel Typical images of conventional CCD- or CMOScameras show two types of pixel specific systematic errors (both being more prominent for CMOS sensors): dark-response nonuniformity (DRNU) and photo-response
nonuniformity (PRNU), that can be directly related to the pixels’ varying offset
and gain (due to e.g. variations in oxide thickness and doping concentrations over
the sensor). Sometimes these (in respect to the measurement systematic) errors are
somewhat sloppy called fixed pattern noise. Range imagery appears to have the same
kind of nonuniformity errors. But differences in offset and gain (assuming a linear
gain) should actually have no influence on the phase measurement, because offset
and gain just cancel out of equation (2.12).
The explanation for the pixel nonuniformity is that the reference signal R(t) (of equation (2.9)) connected to each pixel receives a phase delay due to the slightly varying
capacitance of the individual pixels and other hardware-design and -processing related reasons that may affect the phase of R. As the error is constant over time it
can be corrected using appropriate calibration methods.
Let K denote the matrix of fixed pattern phase errors (where the matrix elements
correspond to image pixels) and N the (temporal) noise corresponding to a field of
t
random variables, such that the expectation value hN i is 0. Then N ≈ 0, the bar
denoting the (temporal) arithmetic mean over a sequence of acquired data. With the
true range at each pixel given by T , we may model a taken range (or phase) image
R as:
R = T + K + N.
(2.24)
Taking the arithmetic mean over a sequences of frames of a fixed view we get:
t
t
t
t
R = T + K + N ≈ T + K.
If we are able to create a homogeneous incident illumination with respect to phase
(and preferably irradiance), i.e. T = I · T , the estimation of K is easy; we regard the
25
Chapter 2. Range Data and Time-of-Flight Measuring Principle
fixed pattern error K as a sample (taken once during manufacturing of the sensor)
of a field of i.i.d. random variables that have an expectation value of zero, then
the spatial average over K is approximately zero and we just have to subtract the
st
t
spatiotemporal average R from R :
t
st
R − R ≈ (T + K) − (T
st
st
st
+ K + N ) ≈ (T + K) − T = K.
(2.25)
A homogeneous phase may be achieved by modifications to the camera hardware
using a telecentric illumination, but involves a complex and somewhat expensive
experimental setup. Also the assumption of i.i.d. may be violated as the variations
in the manufacturing process are not necessarily spatially uncorrelated.
If we use a simple whiteboard-like target for the calibration, a paraboloid-like phase
pattern is irradiated on the sensor. If we remove the lens from the camera this
improves the homogeneity of the data, and neighborhoods of the image pixels may be
approximated by planar surface patches. The average over a symmetric neighborhood
of a central pixel then is the value of the central pixel itself. So we may estimate K
via
t
t
R − BR ≈ (T + K) − (BT + BK) ≈ (T + K) − T = K.
(2.26)
B denotes binomial convolution of an appropriate mask size, which is large enough
to let BK ≈ 0, while small enough that the approximation of a pixel neighborhood
as a planar surface patch is still valid and BT ≈ T . Image borders may be treated
with respect to convolution by mirroring the data at the image borders. For results
we refer to section 5.2.1.
The demodulation images In (2.11), from which R is calculated, are subject to DRNU
and PRNU just like conventional image sensors. The data may be corrected for both
errors by doing a calibration analog to conventional systems, if the modulated illumination is replaced by an unmodulated one, as then the PMD acts essentially like an
irradiance sensor. The channels In may be corrected individually in a preprocessing
step and its result used for calculating an improved phase and amplitude estimate.
As explained before, offset and gain variations are of minor importance for the phase
estimation, but the amplitude estimate is essentially influenced by gain. Looking at
equation (2.11) we find that if the single demodulation images depend on a common
image of gain factors α, the same is true for the amplitude estimate:
(2.11)
if In ∼ α −−−→ A ∼ α
26
2.2. The Photonic Mixer Device
For a description of methods like photon-transfer technique, flat-fielding or statistical
approaches and their application to correct for DRNU and PRNU we refer to [MF81;
Fow+98; TRK01; Wag03; Grö03; Kla05].
Overexposure and Saturation For a gray value sensor overexposure occurs if the
full well capacity (i.e. the saturation level) of a sensor pixel is exceeded. Then one can
no longer map from the measured signal on the irradiance or the physical property of
interest (in our case the distance to an object) even if the sensor response is known.
If we want to correct for overexposure we first have to detect it.
A simple but somewhat unreliable method to detect heavy overexposure, even if one
does not have access to the cross-correlation samples In , but only to the calculated
amplitude A (2.11), is to check if A is zero; because for heavy or total overexposure, all
capacities are saturated and all In are equal and thus A calculates to zero. However,
this only works if really all samples In are saturated, which is not the case for partial
overexposure, as we may see. Furthermore, A = 0 is ambiguous w.r.t. underexposure
which may lead to an amplitude of zero too. So both false positive and false negative
rate with respect to detection of overexposure may be high.
Overexposure of a PMD-pixel depends on both radiance and distance of an imaged
object. Using equation (2.10c) and the inverse-square law for a punctiform light
source, we come to a simplistic approximation of the demodulation signals In dependent on the distance r of an object of constant reflectivity:
In (r) ∼
A(r)
2π
π
A(r)
cos(r
+n )+
∼
π
R
2
2Cmod
2Cmod
π
π
cos(r 2π
R + n2) + 1
.
r2
(2.27)
We assume the modulation contrast Cmod of the light source to be 100%, the unambiguity range R = 7.5m (≡ fmod = 20MHz). Normalizing the demodulation signals by
I0 yields figure 2.4. Taking e.g. I2 (4m) to be exactly the saturation level of a sensor
pixel, we find that the other demodulation samples are not in saturation yet. Or
vice versa only if I0 (4m) is at saturation level, we can be sure to detect overexposure
by testing if all samples are equal. We may denote this two kinds of overexposure
specific to the PMD as partial and total overexposure. Furthermore, a maximum
signal ratio of more than 4 indicates that irradiance needs to be high, such that an
overexposure can be detected by testing on A = 0.
If the raw data is technical accessible, it is more reliable to check if any In is saturated and then, if positive, to flag the measurement as overexposed. Detected single
27
Chapter 2. Range Data and Time-of-Flight Measuring Principle
Demodulation signal ratio against distance
5
I 0(r)/I 0(r)
I 1(r)/I 0(r)
I 2(r)/I 0(r)
I 3(r)/I 0(r)
ratio
4
3
2
1
0
0
2
4
6
distance [m]
Figure 2.4: Demodulation signal ratio against distance [m]
pixels or pixel regions then may be corrected (under specific assumption about the
neighborhood) using techniques like inpainting (see e.g. [Tsc06] for an elaborated
example) or simpler interpolation techniques, or may be excluded from further processing if possible. Overexposure may lead to effects like blooming (see [Jäh04]) so
that the confidence in the information content of the neighbor-pixels shall be reduced.
Exposure Time / Amplitude Dependent Phase Deviation Experimental data
shows that current PMD-camera types bear a systematic error in phase measurement
which is constant for a specific exposure time (or integration time with respect to
the process of cross-correlation) and independent of the measured range. Figure 2.5
gives an overview of the error for some PMD-type range cameras.
The simplistic models discussed so far, can not give an explanation for this dependency. A probable explanation could be that the electronics that control the phase
shifting of the reference signal are somehow correlated to the integration time circuit. We are not fully convinced that the error explicitly (and exclusively) depends
on the integration time. [Rap07] argues that the error is not related to amplitude
as otherwise it would change with depth (while the measurements show that it is
constant). However, we observed a similar constant offset that is rather constant
with increasing depth and only depends on the reflectivity of the observed surface
(see section 5.2.3). Therefore exposure time, amplitude A and also offset G0 (the
latter both depending on the reflectivity) are candidates for being the source of the
28
2.2. The Photonic Mixer Device
100
Depth offset [mm]
90
80
70
60
50
40
0.1
1
Integration time [ms]
(a) PM D 19k
10
100
10
100
100
Depth offset [mm]
80
60
40
20
0
-20
-40
0.1
1
Integration time [ms]
(b) SR-3000
-30
Depth offset [mm]
-35
-40
-45
-50
-55
-60
-65
-70
-75
-80
0.1
1
Integration time [ms]
(c) IFM -O3D
10
Figure 2.5: Range error against integration time (at r = 2.5m) [Rap07]
29
Chapter 2. Range Data and Time-of-Flight Measuring Principle
error. The author thinks that there is some kind of bias in the phase-calculation
formula, that turns up if the idealized model assumptions are not met.
A exposure time dependent error is of special interest for cameras that use adaptive
or multiple exposure times (like the IFM O3D) to increase the effective measurable
range. Then an offset correction needs to be applied specific for every exposure time.
For details regarding the experimental investigation we refer to [Rap07]. One may
subsume this error under fixed pattern error if it is extended for exposure dependence.
2.2.2.2 Random Errors
The essential noise sources for PMD-sensors are the same as for conventional CCDor CMOS-image sensors. Theses are electronic and photon shot noise, thermal noise,
reset noise, 1/f noise and quantization noise (for details see [JGH99]). For a detailed
analysis of quantization noise (in context of a TOF-camera system) that is of some
importance in weak illumination environments we refer to [Fra07]. Except photon
shot noise, all of these random errors can be significantly reduced or eliminated
by respective signal processing or hardware related techniques (like cooling) [LS01].
Therefore, the influence of shot noise on the range measurement resolution shall be
analyzed.
Shot noise is a fundamental property of the quantum nature of light and arises from
statistical fluctuations in the number of photons emitted from a light source. The
same is true for the generation process of electron-hole pairs, which is discrete as
well. Shot noise is unavoidable and always present in imaging systems. In terms of
signal-to-noise ratio, the best a detector can do is to approach the shot noise limit.
The pseudo signal X produced by shot noise can be described by Poisson statistics
for which applies Var(X) = hXi = rate of charge carrier generation.
From the basic law of error propagation we know that the uncertainty in the phase
calculation (2.11) is given by:
"
#
N
−1 X
∂ϕ 2
Var(ϕ) =
Var(In ) .
(2.28)
∂In
n=0
30
2.2. The Photonic Mixer Device
Assuming In (measured in units of electrons) to be a Poisson distributed random
variable, Var(In ) = In applies. Using the 4 sample algorithm (2.12) and (2.10c) we
find the variance to be:
#
"
3
X
π 2 G0
∂ϕ 2
π 2 G0
Var(ϕ) =
In =
∼
(2.29)
∂In
4mT A2
4 A2
n=0
We dropped integration time mT (assumed to be constant) with the proportionality.
For the amplitude and offset calculation we determine by analogical reasoning:
Var(A) ∼
π2
G0
4
1
Var(G0 ) ∼ G0
2
If we drop the assumption of an optimal demodulation contrast of 2/π and express
equation (2.29) using modulation contrast and demodulation contrast and G0 by
DC + B, B being the background illumination and DC the DC-component of the
modulated illumination, we find:
Var(ϕ) ∼
DC + B
(Cmod Cdemod DC)2
The additional noise sources, 1/f -, reset- and thermal noise, may be summarized
as dark noise and modeled by an additional number of electrons D that contribute
exclusively to the constant background illumination as this noise does not correlate
with the modulation. The standard deviation of the range measurement error is then
given by
√
DC + B + D
σϕ ∼
(2.30)
Cmod Cdemod DC
Figure 2.6 shows qualitative how the uncertainty in the range measurements depends on various parameters. Only the denoted parameter was changed while the
others were kept constant: mild background illumination and dark noise equivalent
to 30000e generated during exposure time, electrooptical signal of 2 · 105 e , modulation contrast of 90% and demodulation contrast of 50%. For the range curve the
inverse square law of irradiation was applied and 105 e were assumed to be integrated
in a distance of 1m. These values are in magnitude those of a realistic setup (see
[Lan00, chap. 4.2]). The range curve was cut at 50cm, because the pixel would go
into saturation below this limit (and the linear model employed for demodulation
would no longer be valid).
31
uncert
offset/amplitude
range
0.01
1
10
100
3
1×10
4
1×10
parameter
Auflösung in Prozent des Eindeutigkeitsbereichs
π C2 52 B+DC
2 Cmod
DC 0.785
Chapter 2.resolRange
Data
and
Time-of-Flight
Measuring Principle
( DC, Cmod
, B) :=
22
C5
22
C5
22
C5
( 2π %)
uncertainty [percent of unambiguity range]
10
1
0.1
DC [1000e]
Cmod [‰]
range [cm]
B+D [2000e]
0.01
10
100
1000
parameter
Figure 2.6: σϕ given in percent of the unambiguity range in dependence of optical
power (DC), modulation contrast, range and background illumination
We find that the range dependence has the most prominent impact on the measurement accuracy. Due to limited capacitance of a PMD-pixels it is not possible
to achieve a decent range accuracy over the entire unambiguity range at constant
exposure time (or optical power) and constant reflectivity. Multiple exposure times
or adaptive illumination is needed to achieve a good accuracy.
Once image acquisition is completed, we can improve range measurement with respect to noise, by making assumptions about the smoothness of the range data in a
local neighborhood; we may apply simple linear smoothing filters, which will however
blur image features at edges. Using more advanced (nonlinear and/or robust) filters,
we are able to preserve edges in range imagery both at intensity and range edges. Its
important to notice in this context, that we can use equation (2.29) to determine an
uncertainty (or confidence) measure from the measurements A and G0 , from which
appropriate filters can take advantage of and improve their denoising performance
(see section 3.1.5 and section 3.2).
32
Chapter 3
Image Processing and Filters
3.1 Basics
3.1.1 Discretization and Sampling
Dealing with PMD-imagery and motion estimation involves handling of several kinds
of discretization: We have discretization in space due to the sensor grid. Discretization in time, as we can only take a finite number of image-frames during a specific
time-slice. And a discretization in the range of image-data, i.e. quantization, due to
the quantization on sensor-level.
Discretization is closely related to the term sampling, which denotes the reduction
of a continuous signal to a discrete signal. And a sample refers to a value or set
of values at a point in time and/or space. For signal and image processing the
Nyquist–Shannon sampling theorem is of utmost importance. It states that if a
function f (t) contains no frequencies higher than or equal to ν, then it is completely
1
apart.
determined by giving its ordinates at a series of points tn spaced 2ν
For the different kinds of discretization different models apply, or rather the continuous physical models, which are the basis for the interpretation of the data, need to
be transformed into their discrete counterparts.
Derivatives and Gradient For example in context of motion estimation derivatives
on signals are of major importance, because these are basic to compound operators
like the gradient which is involved in estimating orientation (or equivalently velocity)
33
Chapter 3. Image Processing and Filters
in an image sequence. Unfortunately derivatives are defined exactly for continuous
functions only and not for discrete signals.
One might try to approximate the derivative-operator by a discrete counterpart e.g. a
finite difference. Finite differences are successfully applied in finite difference methods
for solving (partial or ordinary) differentials equations on a digital computer (on an
analog computer things are treated quite different). Fornberg [For98] gives a very
compact, analytic solution for calculating derivatives of any order, approximated
to any level of accuracy on equispaced grids. In the context of finite difference
methods, the consistency of an operators is of major importance, i.e. that the discrete
approximation converges towards the continuous operator for vanishing difference h
between grid points. Well known examples of these approximations are forward,
backward and central difference quotient.
However, for image processing these operators, though applicable, are not first choice.
The basic reason is that typically the grid of the data-samples is fixed, while it is
adaptable with respect to the solution of a differential equation. So even though
consistency of the operator is given it is not necessarily a good measure for the
performance of the operator with respect to image processing.
For image processing other features are for various reasons of practical relevance.
For instance the separability of an operator is of relevance w.r.t. its speed , i.e.
necessary floating point operations. The isotropy of the gradient (or rather the
viewpoint invariance of its result in the sense of a tensor) is important to an unbiased
orientation estimate. Isotropy w.r.t. a vectorial quantity like the gradient has two
aspects: isotropy in magnitude and in direction. For motion estimation the isotropy
in direction is of major importance as the flow field can be calculated from the
spatiotemporal orientation of structures in an image sequence.
Given a plane wave
!!
m cos(φ) T
w(k) = A exp(ı(kx + θ)) = A exp ı
·x+θ
=: w(m, φ) (3.1)
m sin(φ)
and the gradient ∇x = [∂x1 ∂x2 ]T in two dimensions, the anisotropy in magnitude of
the gradient w.r.t. the wavenumber k may be described by
∆(m) := |∇x w(m, φ)| − |∇x w(m, 0)|
34
(3.2)
3.1. Basics
and anisotropy in direction by
∂x2 w(m, φ)
−φ
∆(φ) := arctan
∂x1 w(m, φ)
(3.3)
which are zero for the continuous gradient applied at coordinates of equal phase, e.g.
x = 0.
An approach, that allows a detailed analysis of an operator w.r.t. anisotropy and
other features, is to interpolate a continuous signal from the discrete samples, do the
continuous operation and sample the result at the grid points. As we will see, these
tasks can be realized by means of digital filters without leaving the discrete domain.
However, for the analysis of digital filters we need to change from spatial domain to
frequency domain, or rather project the spatial data into the Hilbert space of plane
waves (3.1), what corresponds to Fourier transformation that we will introduce in
the following.
3.1.2 Fourier Transform
A function g : Rn 7−→ C is called Fourier transformable if the Cauchy principal value
1
ĝ(k) := F(g(x)) :=
(2π)n
Z∞
exp(−ıkx)g(x) dn x
(3.4)
−∞
for all k exists. ĝ(k) is called the (multidimensional, forward) Fourier transform
(FT) of g (adapted from [MV06]). The inverse Fourier transform of ĝ : Rn −
7 → C is
F (ĝ(k)) :=
1
Z∞
exp(ıkx)ĝ(k) dn k
( = g(x))
(3.5)
−∞
if the integral exists as Cauchy principal value. g(x) and ĝ(k) are called a Fourier
transform pair, denoted abridged as g(x) c s ĝ(k). Due to the finite energy and
range and the continuity of physical processes the Cauchy principal value always
exists in the context of image processing.
There are several other common and equivalent definitions for the Fourier transform,
which differ in to which kind of frequency domain the spatial domain is mapped
and how the factor (1/2π)n is distributed among the (forward) Fourier transform and
35
Chapter 3. Image Processing and Filters
its inverse. We have chosen the frequency domain to be the vectorial wavenumber
k := 2π/λ. One needs to be very careful about which definition was used, if looking up
formulas in that context in textbooks, and how these shall be applied in one’s own
calculations, especially because various textbooks show inconsistencies with their
own definitions throughout the text.
Features of the Fourier Transform Some important features of the (multidimensional) Fourier transform, that are used throughout the thesis are depict here via
their respective Fourier transform pairs:
Linearity
Separability
ag(x) + bh(x)
n
Y
f (xp )
c
c
s aĝ(k) + bĥ(k)
s
n
Y
fˆ(kp ),
(3.6)
where f : R 7→ C (3.7)
p=1
p=1
g(s x)
c
s ĝ(k/s)/|s|
g(Ax)
c
s ĝ(AT 1 k)/|A|
(3.8)
g(U x)
c
s ĝ(U k), U is unitary
(3.9)
Convolution
(g∗h)(x)
c
s (2π)n (ĝ · ĥ)(k)
(3.10)
Multiplication
(g·h)(x)
c
s (2π)n (ĝ ∗ ĥ)(k)
(3.11)
g(x − x0 )
c
s ĝ(k) exp(−ıkx0 )
g(x)(exp ık0 x)
c
s ĝ(k − k0 )
(3.12)
∂xp g(x)
c
s ıkp ĝ(k)
(3.13)
c
s (1/2π)
(3.14)
c
s
Similarity
Rotation
Translation
Derivatives
δ-impulse
δ-comb
δ(x)
X
δ(x − m∆x)
m
Gaussian
Box
xT x
)
2 σ2
c
H(w − |xp |)
c
exp(−
n
Y
p=1
n
2π X
2π
δ(k − n
)
(3.15)
∆x n
∆x
σ 2 kT k
σ n
s
√
exp(−
)
(3.16)
2
2π
n
n
Y
Y
sin(w kp )
w
w
s
=
sinc( kp )
π kp
π
π
p=1
p=1
where sinc(x) := sin(πx)/πx
(3.17)
The features given embed also those of the discrete Fourier transform (DFT), that is
implemented on computers typically via a Fast Fourier Transform (FFT). The DFT
36
3.1. Basics
is the FT of a function with a finite extension and a finite bandwidth (FEF). Finite
extension functions can always be extended to a periodic function by concatenating it
to infinity with the range-values of its domain (we may call this process periodization).
According to the Nyquist–Shannon sampling theorem such a function may be represented without loss of information by a set of samples of this function, if these are
taken as ideal samples (i.e. via convolution with a Dirac delta function) and the sampling frequency is bigger than two times the highest frequency in the function-signal
(i.e. the onesided baseband-bandwith B): freqsamp > 2B. Vice versa a signal that is
ideally sampled with a sampling distance ∆x must not contain any frequency equal
or above the Nyquist frequency kny , if the sampling shall neither loose information
nor introduce artifacts (called Moiré pattern for 2D-imagery or in general aliasing):
!
kmax < kny =
π
= π freqsamp .
∆x
(3.18)
We denote the wavenumber which is scaled to the Nyquist frequency as k̃:
k̃ :=
k
k ∆x
2∆x
=
=
kny
π
λ
⇐⇒
k = ±kny ≡ k̃ = ±1 .
(3.19)
The DFT corresponds to FT applied on a FEF function that is periodizated (i.e.
convolution with a delta-comb with a spacing as big as the domain of the function)
and sampled (multiplication with a delta-comb of spacing smaller then half of the
smallest wavelength of the periodizated signal). In Fourier domain this corresponds
to a sampling of the frequency space (multiplication with delta-comb of spacing
inverse to the extension of the function’s domain) and periodization of it with a
period of freqsamp , due to equations (3.10) and (3.15).
3.1.3 Interpolation
If we want to interpolate a sampled FEF function (e.g. an image), we may do this by
reversing the effect of sampling in the Fourier domain, which is the periodization; by
multiplication with a box-function of half width w = π freqsamp we can achieve this.
In the spatial domain this corresponds to a convolution with a scaled sinc function
(3.17). While theoretically the sinc function is the ideal interpolation function, from
a practical perspective it does not help too much. First of all the support of sinc is
unbounded (i.e. it has no compact support) and the function decreases only linear
37
Chapter 3. Image Processing and Filters
with its variable, which makes it a poor candidate to be used in a numerical convolution, even if small errors are acceptable. Furthermore it is not direction isotropic
(or rotational-invariant), e.g. in a multidimensional image the interpolation result
depends on the orientation of structures in the image.
Another candidate for interpolation are properly scaled Gaussian functions: What
we need to do for a proper interpolation, is to set the signal samples in Fourier space
to zero outside the signal’s original baseband-bandwidth. Or in analogy to solid-state
physics: All but the first Brillouin zone have to be zero. Additionally for a lossless
reconstruction the signal must not be suppressed within the first Brillouin zone. A
properly scaled Gaussian multiplied with the Fourier transform approximates those
requirements sufficiently well, and corresponds to a convolution with a Gaussian of
inverse width (3.16). Moreover the multidimensional Gaussian is the only function
that is both separable and rotation-invariant.
Interpolation using Gaussian functions, while not ideal, still works well for digital
imagery that complies to the sampling theorem, because real world image data typically has a low signal to noise ratio for wavenumbers near to the Nyquist frequency.
So even though the Gaussian becomes approximately zero close to the Nyquist frequency, it tends to suppress more noise than signal. Furthermore the sampling of a
digital camera is not ideal but influenced by the MTF (see section 3.1.4) of the optics
involved and typically acts as a low pass filter which narrows the effective bandwidth
of the signal additionally. Such the missing flank of the Gaussian (compared to the
box-function) is less problematic.
Interpolation of a continuous function gc from sampled data g(xn ) on a grid xn is
realized via a discrete convolution with a continuous function h(x):
gc (x) =
X
g(xn ) h(x − xn ).
(3.20)
n
The interpolation function h(x) needs to fulfill the interpolation condition:
(
1 for x = 0
h(x) =
.
0 for x = xm − xn where m 6= n
38
(3.21)
3.1. Basics
Applying a partial derivative along the grid dimensions xp yields due to linearity of
the derivative:
X
gc0 (x) := ∂xp gc (x) = ∂xp
g(xn ) h(x − xn )
n
=
X
g(xn ) ∂xp h(x − xn ) =
X
g(xn ) h0 (x − xn )
n
n
Resampling on the original grid positions xm equals:
g 0 (xm ) := gc0 (xm ) =
X
g(xn ) h0 (xm − xn )
(3.22)
n
So for calculating the derivatives at the original grid points we only need to do a
discrete convolution of the signal with a sampled version of the (partial) derivative
of the interpolation function. If the interpolation function is chosen to be a Gaussian, h0 (xm − xn ) may be approximated by a filter mask H with a finite number of
filter coefficients. It is only an approximation, because h0 has no compact support,
and we need to truncate it at its ends. But as h0 decreases rapidly with increasing
|xm − xn |, the error introduced is only a small one. The filter or convolution mask
H is completely independent of the signal and the position that it is applied on.
The operation of applying such a mask via convolution belongs to the class of linear shift-invariant filters, that we introduce more formally in the following section.
For the sake of simplicity we will stick to a formulation for 2D scalar images; anyhow things may be generalized to multidimensional, spatiotemporal (image-)data or
tensor valued data see [Big06, chap. 3.6].
3.1.4 Convolution, Point Spread Function and Transfer Function
Filtering in the context of image processing is realized for the class of linear shiftinvariant (LSI) filters as convolution. Convolution of a 2-dimensional image G of
size M × N with a square convolution mask H of (2R + 1)2 elements hmn is given
by
0
gmn
=
R
X
R
X
hm0 n0 gm−m0 ,n−n0 =: [H∗G]mn
m0 =−R n0 =−R
The filter is by definition linear and shift invariant, as it has the properties of a
linear operator and does not depend on the position (m, n) at which it is applied.
39
Chapter 3. Image Processing and Filters
The point spread function (PSF) is defined as the filter-response on a point image P
(pm,n = {1 for m = n = 0, 0 otherwise}) and is identical to the convolution mask H
R
X
PSFmn :=
R
X
hm0 n0 pm−m0 ,n−n0 = hmn = [H∗P ]mn
(3.23)
m0 =−R n0 =−R
It fully describes a LSI filter, as its response to an arbitrary image is just a linear
combination of shifted PSFs, with the coefficients being the pixel values of the image.
The optical transfer function (OTF) is defined as the Fourier transform of the PSF.∗
It is the wavelength dependent multiplication factor of a LSI filter in the Fourier
domain. This is easy to see, regarding the convolution theorem (3.10) that states,
that convolution in the spatial domain corresponds to a multiplication in the Fourier
domain (and vice versa). The discrete delta peak image P of equation (3.23) transforms due to equation (3.14) to a constant value for all Fourier domain pixels. Thus,
the filter is, not surprisingly, described completely by Ĥ. The magnitude of the optical transfer function is called the modulation transfer function (MTF) and describes
the attenuation of the sinusoidal waveforms as a function of their spatial frequency.
OTF := F(PSF) = F(H∗P ) = Ĥ · P̂ = Ĥ · const.
MTFmn := |F(PSF)mn | = |Ĥ mn |·const.
(3.24)
(3.25)
Eventually, all a LSI filter does, is to attenuate sinusoidal waves and to translate their
positions (where the translation vector is encoded in the argument of the respective
complex Fourier transform entry Ĥ mn ). The (continuous) transfer function ĥ of a
LSI-filter H on an orthogonal grid is
T
R
R
X
X
n
ĥ(k̃) =
hmn exp(−π ı
· k̃)
(3.26)
m
m=−R n=−R
From Euler’s formula exp(ıx) = cos(x) + ı sin(x) we derive for filter masks of even
symmetry
ĥ(k̃) = h0 +
R
X
2 hn cos(π n k̃),
(3.27)
n=1
and for odd filter masks
ĥ(k̃) = ı
R
X
2 hn sin(π n k̃).
(3.28)
n=1
∗
The term transfer function has a more general definition, but is used sometimes synonymously
with OTF
40
3.1. Basics
Due to the separability of the Fourier transform (3.7), the transfer function of a
multidimensional, separable filter - composed by convolving one dimensional filters
(of specific symmetry) - is just the product of the individual filters.
Filter Design and Optimization Now that we introduced LSI filtering by means
of convolution we come back to discrete operators. We have shown on page 37
that interpolating the discrete data, taking the derivative and resampling on the
original grid can be done approximatively by applying a single discrete filter by
means of convolution. If we do as proposed we arrive at the filter family of so
called Derivatives of Gaussian. The anisotropy of a gradient operator composed by
these filters is w.r.t. its magnitude (3.2) definitively lower than those of the central
difference quotient, but identical w.r.t. the direction estimation (3.3). A very well
known derivative filter is the Sobel operator. Compared to the previously mentioned
filters its anisotropy is more than a factor 2 lower for the angle estimate and also
smaller w.r.t. the magnitude of the gradient for details see [JH00, chap. 9.7]. This is
achieved by introducing an asymmetry in the width of the interpolating Gaussians
in direction of the derivative and normal to it. But the maximum angle anisotropy
is still around 20◦ and is independent of the filter-size (which improves magnitude
anisotropy only).
To further reduce anisotropy one can treat the filter design as an optimization problem. This means that we look for a filter that differs from the ideal (continuous)
filter as less as possible under given constraints, arising from the discrete and finite
extension of the applicable filter mask. The measure for the difference between the
ideal and optimized filter and how the ideal filter actually should look like is based
on the problem and its specific requirements. E.g. a discrete derivative filter can
due to its finite extension never have a transfer function, that is both ideal w.r.t.
equation (3.13) such that ĥ(k̃) = ık̃ for k̃ within ]-1,1[ and zero outside, as a discontinuity in Fourier space would require an infinite extension of the filter. Therefore, its
a matter of design in which frequency band the so called reference function should
approximate the ideal best, and which of the desired features (like isotropy) may be
violated at which cost, within the optimization. A suitable optimization strategy
then returns the filter coefficients, minimizing the cost or error between the ansatz
function (basically equation (3.26)) and the reference function, in compliance with
the given constraints. For details we refer to [JSK99] and in closing would like to
point out that the derivative filters used in context of motion estimation for this the-
41
Chapter 3. Image Processing and Filters
sis were optimized w.r.t. a maximum precision in orientation estimation as described
by Scharr [Sch00].
3.1.5 Normalized Averaging
The PMD-data we are dealing with is affected by errors, statistical and systematic
ones. Here we show a simple method to improve the range data with respect to the
statistical errors.
As we have seen in section 2.2.2.2 the amplitude and offset of the PMD signal gives
us a measure for the reliability of the range measurement. With equation (2.29)
we find the variance of the range signal as proportional to G0 /A2 . However, most
of the PMD-camera models we know do not give direct access to the offset G0 in
their standard configuration. The PMD19k for example returns besides range R and
amplitude A also a third channel denoted as intensity I. But this intensity is not
the DC-offset of the electrooptical signal but the amplitude signal weighted by the
distance in some (unknown) manner.†
Only if we have access to all raw channels, we might calculate G0 by equation (2.12).
If this is not possible one might approximate G0 as proportional to A, if we assume no background illumination. Then the variance in the range measurement is
approximated by var(R) ∼ A−1 .
If we want to denoise the data correctly by averaging over a specific neighborhood, we
know from elementary statistics that appropriate averaging requires the weighting of
each data value with the inverse of the variance, i.e. using the upper approximation
we just need to multiply by A.
As it is well known that box filters do not have very good properties from a signal
processing point of view (due to their infinite and slowly decreasing transfer function),
the neighborhood itself needs to be weighted too. So we need to incorporate another
set of weights in the averaging procedure by using a filter such as a Binomial. Both
weightings can be achieved with a technique that is known as normalized averaging
[GK95].
†
b
The relative error of I is even
p higher than that of A and R: suppose I = AR , then error
2
2
propagation leads to σI/I = (σA/Ā) + (b σR/R̄) .
42
3.1. Basics
Normalized averaging is a special case of a more general filtering technique called
normalized convolution that is described in detail by [KW93; Wes94; Far03]. The
employed filter (e.g. a Binomial) is called the applicability B. If the measurement
data are denoted with R and the weighting image with W (e.g. the amplitude
image A) normalized averaging reads:
R0 =
B ∗ (W · R)
.
B∗W
(3.29)
The weighting image is not necessarily associated with an error. It can be used to
exclude or amplify pixels with certain features. In this way, normalized averaging
becomes a versatile operator, that was used for various tasks in the context of this
thesis.
However, it should be noticed that applying normalized averaging as described, leads
to a bias toward smaller range values at depth-edges, as the confidence measure
(or weighting image) is correlated with the (physical) quantity to denoise, i.e. the
amplitude decreases with increasing depth: In the neighborhood of a depth edge,
surface patches of the same reflectivity near to the camera (denoted in the following
as near surfaces) will be weighted stronger than those far away. This leads to an
anisotropic, biased blurring of image features, such that the near surfaces tend to
grow while those away shrink. So normalized averaging should not be applied at
surface edges.
Band Enlarging Operators Normalized averaging is a potentially band enlarging
operation, because it involves multiplication of two images W · R, which corresponds
to a convolution in the wavenumber domain (3.11). If the sum of the bandwidths
of Ŵ and R̂ is larger than k̃ = 1 in any dimension, aliasing occurs. Thus, it is
important to adapt the bandwidth of the images w.r.t. the Nyquist wavenumber
before multiplying the images, either by upsampling the images, which is a lossless
but somewhat expensive operation (w.r.t. processing time and memory consumption)
or by pre-smoothing with e.g. a binomial filter, which is fast but potentially lossy.
The same rules apply for operations where rotations (3.9) are involved, as a FT-image
that is rotated exceeds the Nyquist borders - the corner areas lie outside the first
Brillouin zone - and if the corresponding Fourier coefficients are not zero, aliasing
will occur.
Köthe [Köt03] points out, that the influence of band enlarging operators was frequently neglected in computer vision literature in conjunction with more complex
43
Chapter 3. Image Processing and Filters
operators like e.g. the Canny edge detector and the structure tensor. With modern high resolution image sensors of several million pixels resolution however, the
aspect is of less importance, because typically the camera’s optics act as a low-pass
filter with respect to the sensors resolution, especially in the field of consumer market products. Not so however for current PMD-sensors, as they have a low sensor
resolution compared to the resolution of the optics yet.
3.2 Edge Preserving Smoothing
The methods to denoise range imagery discussed so far, all lack the ability to denoise
or smooth the data without blurring image features like edges or corners. This is due
to the fact that the models they are based on assume a planar neighborhood or at
least one with a very specific symmetry, and thus are violated around the mentioned
features. There are basically two ways to handle this problem. One is to extend the
model to explain the data better in a specific neighborhood. The other is to improve
the estimate on the model parameters, by means of a robust estimator, i.e. one that
gives a correct estimate in the presence of a minority of data points that do not fit to
the model, so called outliers. Both approaches may be combined and transitions are
smooth. Applying robust methods of statistics to the field of computer vision is not
trivial. Taking for example the simple case of a corner of a cube seen from atop with
a range camera: In the vicinity of the corner there are 3 planes, thus the majority
of the pixels in a neighborhood of any pixel near to the corner, will violate a single
planar model, and therefore cannot be treated as classical outlier w.r.t. this model.
3.2.1 Robust Estimators
Robust estimation is concerned with the accurate estimation of model parameters
in the presence of data that violates the model for which the parameters shall be
determined and/or the assumptions about which errors the measurements show (i.e.
the employed noise model): the data may contain classical, gross outliers that are
not consistent with an assumed data model exposed to e.g. Gaussian noise. For
a low-level model of PMD-data this might stem from specular reflections of the
modulated illumination, leading to saturation of the capacities and in turn to a
completely arbitrary depth measurement. Defective pixels or interreflection of light
from multiple surfaces are other sources of outliers. Another class of outliers consists
44
3.2. Edge Preserving Smoothing
of pixels belonging to a minority of the data, a population that is compatible with
a different‡ , potentially unknown data model; e.g. in the case of a planar surface
model every step- or roof-edge of an object or partial occlusion will give rise to these
kind of outliers. With respect to image processing the same pixel can be either an
outlier or an inlier, depending on the position of the model to be estimated.
As each pixel measurement is subject to small-scale random variations, the parameter estimation problem is heavily overconstrained (for both low-level and high-level
models), which suggests that a maximum likelihood estimation technique should be
employed to solve the problem. Under the assumption of normal (Gaussian) distributed, additive noise, least squares estimation is a maximum likelihood estimator
[LP02, chap. 20.2.6], i.e. the probability for the observed measurements is maximal
for the estimated parameters.
Let yi be the measurements at the independent (or control or explanatory) variables
xi , e.g. the sensor grid coordinates, of the model m for which the (vector of) parameters p are to be estimated, e.g. the surface normal and intercept of a planar surface
model. Then the ordinary least squares (OLS) estimate p̂ is given as
p̂ = argmin
p
= argmin
p
X yi − m(xi , p) 2
i
σi
X r(yi , xi , p) 2
i
σi
,
leading to the more general formulation known as M-estimator :
X ri,p p̂ = argmin
ρ
.
σi
p
(3.30)
(3.31)
(3.32)
i
The expression to be minimized is called the objective function, and ρ(r) is known as
the loss function (or error norm), which is ρ(r) = r2 for the least squares estimate.
The residual function r describes the (error) distance between a measurement and
the model determined by p (and xi ). The residuals need to be normalized to the
scale (noise level) σi associated with the measurements yi ; in the simplest case the
measurements are i.i.d. thus σi = σ̂ ∀ i, which in turn can be neglected for least
squares estimation.
‡
different means, either an instance of the same model with different parameters or a completely
different model
45
Chapter 3. Image Processing and Filters
An advantage of the least squares estimation problem (3.30) is, that it can be solved
efficiently for models m, which are linear in their parameters p (but possibly nonlinear w.r.t. the independent variables), by means of the LSI-filters introduced in the
previous section (for details see [JHG99]). However, most real world problems cannot
be described sufficiently by a single model under a Gaussian noise assumption, and
because the least squares loss function grows unlimitedly with increasing |r|, a single
outlier can corrupt the estimation seriously; this is why the breakdown point of least
squares is 0. The breakdown point is the minimum fraction of outlying data that can
cause an estimate to diverge arbitrarily far from the sought value. The theoretical
maximum breakdown point of any ”general purpose” estimator is 0.5, because with
more than 50% outliers, these can be arranged in a way that, in terms of regression
analysis, a fit through them will minimize the objective function.
The breakdown point of an estimator does not say anything about its efficiency,
which is defined as the minimum possible variance for an (unbiased) estimator divided
by its actual variance, with the minimum possible variance being determined by a
target distribution (e.g. a Gaussian one). Typically robust estimators with a high
breakdown point tend to have a low efficiency, thus the estimates have a high variance
and require a big number of measurements to gain a reasonable (statistical) precision.
The least squares estimator belongs to the class of M-estimators (”M” for ”maximum
likelihood type” [Hub81, page 43]). These are of the form (3.32), with ρ(r) being
a function of even symmetry (ρ(r) = ρ(−r)) with an unique minimum at zero and
monotonically increasing for r > 0. The robustness against outliers is achieved by
a loss function that grows subquadratically. This becomes clearer if we look at the
solution of equation (3.32) which is determined by the root of its derivative w.r.t.
p. If ∇p denotes the vector of partial derivative operators (∂/∂pn ) of the n = 1 . . . N
parameters of m, then we find a system of N equations for the vanishing derivatives
at the minimum of the objective function:
X ri,p X 1 ri,p !
∇p
ρ
=
ψ
∇p ri,p = 0, where ψ(r) = ∂r (ρ(r))
(3.33)
σi
σi
σi
i
i
For a model m linear in its parameters p, ∇p ri,p simplifies to xi :
X ri,p xi
ψ
=0
σi
σi
(3.34)
i
and for the simplest model m = 1·p it is
X ri,p 1
ψ
= 0.
σi σi
i
46
(3.35)
3.2. Edge Preserving Smoothing
If the model m is not linear in x then xi in equation (3.34) may denote a vectorial
function dependent on the explanatory variables only. The derivative of the loss
function is known as the influence function ψ. The name is reasonable as it is plain
to see from system (3.34) that ψ directly determines the influence of a single residual
on each constraint equation to become zero. For least squares the influence function
is identical to the normalized residual and thus, its absolute value can become arbitrarily large. Robust estimators use an influence function that is bounded above and
below and which may become zero for large residuals. Influence functions tending to
zero most quickly (known as hard redescenders) permit the most aggressive rejection
of outliers. This feature is of major importance if the outliers have small residuals
in the range of 4 to 10 σ [Ste99]. Redescending influence functions however make
P
i ρ(ri,p /σi ) nonconvex, such that solvers for equation (3.33) may converge to local
minima, if the initial guess is not close to the optimum.
Iteratively reweighted least squares (IRLS) is such a solver, which deduces from equation (3.33), where ψ is substituted by w(r) r, with w(r) = ψ(r)/r known as the weight
function:
X 1 ri,p w
ri,p ∇p ri,p = 0
(3.36)
σi
σi2
i
This can be iteratively solved by means of common weighted least squares solvers
(e.g. SVD or Gaussian elimination for m linear in p and Gauss–Newton or LevenbergMarquardt for a non-linear model), if for each iteration the weights w(r) are calculated for the current guess of p and then fixed for the least squares solver; a least
squares solver is applicable because the term ri,p ∇p ri,p in (3.36) is just the derivative
of the least squares problem (3.31), while the other terms are kept constant.
Black and Rangarajan [BR96] give a survey of the various, in statistics and computervision literature proposed influence/loss functions in the light of related outliers
processes; one example of a redescending influence function is the Leclerc function,
depicted in Figure 3.1:
r2
),
η2
2r
r2
ψη (r) := ∂r ρη (r) = 2 exp(− 2 )
η
η
ρη (r) := 1 − exp(−
(3.37)
and wη (r) :=
ψη (r)
2
r2
= 2 exp(− 2 )
r
η
η
A second look at equation (3.34) tells us that standard M-estimators still have a
breakdown point of zero, because an erroneous measurement at a point xi , which is
47
Chapter 3. Image Processing and Filters
a
b
−4
1
1
0.5
0.5
−2
0
− 0.5
−1
2
loss function ρ
influence function ψ
weight function w
4
−4
−2
0
− 0.5
2
4
loss function ρ
influence function ψ
weight function w
−1
Figure 3.1: Robust and nonrobust M-estimators: a The robust Leclerc functions for
η 2 = 2 and b the respective functions for least squares (ψ(r) := r)
far away from the bulk of the data may still corrupt the whole measurement. An
alternative to M-estimators is the least median of squares (LMS) estimator, which
has the maximum possible breakdown point of 0.5; compared to least squares the
objective function is not the sum, but the median of the squared residuals:
ri,p 2
p̂ = argmin median
(3.38)
i
σi
p
For a simple linear regression model the LMS solution corresponds to the ”narrowest strip covering half of the observations” [RL87]. LMS buys its excellent robustness
against outliers at the cost of a less efficient (random sampling) search technique, because the median is not differentiable and thus gradient descent or Newton’s method
are not applicable; moreover, LMS has an abnormally slow convergence rate [RL87].
A robust estimator between OLS and LMS is least trimmed squares (LTS) which
minimizes like OLS the sum of squared residuals, but excludes (at most) 50% of the
residuals of larger magnitude from summation, which leads to an improved convergence rate while maintaining a high breakdown point.
Figure 3.2 illustrates some of the properties of robust estimators for a simple linear
model (a straight line with unknown slope and intercept). While the LMS estimate
finds the majority population model independent of the initial guess, the Leclerc
M-estimator finds local minima which might belong to the majority population (the
increasing straight) or the minority (the decreasing straight) or are completely wrong.
The LMS estimate tends to be worse than the M-estimate, if both succeed and use
the same initial guesses and the same convergence tolerance (the limiting difference
in the objective function of two succeeding guesses to stop the minimization), indicating that LMS has a lower convergence rate. All optimizations were done with
48
3.2. Edge Preserving Smoothing
a
b
200
200
100
100
0
− 100
− 200
0
data
Leclerc M-est (1)
Leclerc M-est (2)
least squares fit
least median
0
20
40
data
Leclerc M-est (1)
Leclerc M-est (2)
least squares fit
least median
− 100
60
80
100
− 200
0
20
40
60
80
100
Figure 3.2: Illustration of sensitivity of M-estimators to local minima and slow but
robust convergence of the LMS estimator. a two populations with fractions
of 56% and 33% and 11% gross outliers; M-est (1) and (2) just differ in the
inital guess. b same as a but with different initial guesses.
a nonlinear conjugate gradient solver. η 2 was chosen to be 2 for the Leclerc function and the residuals were scaled to the noise level (σ = 5) of the Gaussian i.i.d.
model populations. The gross outliers stem from an uniform distribution in the range
[250,300].
Figure 3.3 illustrates a problem more typical for image processing, in the context
of robust estimators: a step edge. The model is the same as for figure 3.2 but the
data contains no gross outliers and the two populations (constant lines with different
offset) hardly overlap. Again Leclerc finds local minima and LMS gives a worse
estimate. Furthermore, we see that the robust estimators break down if the residuals
of the outliers w.r.t. the larger population only have a magnitude of some σ (the step
of the edge for figure b is only 4σ).
3.2.2 Bilateral and Diffusion Filtering
Bilateral and diffusion filtering are very popular image processing methods for the
task of denoising image data. Black et al. [Bla+98] show that anisotropic diffusion
as introduced by Perona and Malik [PM90] may be regarded as a robust estimator. Durand and Dorsey [DD02] point out the that bilateral filtering as introduced
by Tomasi and Manduchi [TM98] and what they call 0-order anisotropic diffusion
49
Chapter 3. Image Processing and Filters
a
b
150
data
Leclerc M-est (1)
Leclerc M-est (2)
least squares fit
least median
100
data
Leclerc M-est (1)
Leclerc M-est (2)
least squares fit
least median
40
20
50
0
0
0
20
40
60
80
100
− 20
0
20
40
60
80
100
Figure 3.3: Illustration of breakdown of robust estimators with decreasing difference
in magnitude between residuals of outliers and model samples, at a step
edge. a the outliers (minority population) have a distance of more than
20σ from the majority; b the distance is only 4σ and all estimators fail,
leading to a ”bridging” estimate between the two population models.
(while inhomogeneous diffusion would be more appropriate) belong to the same family of robust estimators, the major difference being, that inhomogeneous diffusion
filtering is energy preserving, while bilateral filtering is not (due to an asymmetric
normalization term w.r.t. single pixels); energy preserving for a gray value image
means that the arithmetical mean of its pixel’s gray values does not change due to
the applied filter.
Diffusion filtering is motivated by a physical observation expressed by Fick’s law
j = −D ∇u,
(3.39)
which states, that a concentration (or temperature) gradient ∇u causes a flux j
that tries to compensate the gradient, in a way that is determined by the diffusion
tensor D, which is a positive definite, symmetric matrix. If j and ∇u are parallel we
speak of isotropic diffusion. Then D degenerates to D, the diffusivity. If D depends
on (local) features of the field u and therefore is not constant, we are speaking of
inhomogeneous (but isotropic) diffusion. Only if j is in general not parallel to the
gradient this shall be called anisotropic diffusion.
50
3.2. Edge Preserving Smoothing
du(x, y, t)
For a closed system, mass (or heat) do not vanish, i.e.
= 0. Applying the
dt
∂t x
chain rule and identifying j as u· ∂t y gives us the continuity equation
∂t u + ∇j = 0.
(3.40)
Substituting j from Fick’s law (3.39) yields the diffusion equation
∂t u = ∇(D ∇u).
(3.41)
For image processing the local concentration may be replaced by e.g. the gray value
of an image pixel, implying a discretization of the diffusion equation in space. Using
a constant diffusivity is only appropriate, if we assume a constant gray value, such
that the inhomogeneity introduced by (Gaussian) noise in the data is distributed
and therefore leveled out by the homogeneous diffusion. At an edge in an image this
assumption is clearly not fulfilled and homogeneous diffusion introduces errors (e.g.
a blurring of the gray value edge). If image structures shall not become corrupted,
the diffusion tensor needs to depend on the local structure in the evolving image;
if so, the time dependence leads to a feedback, which indicates the nonlinearity of
such a diffusion filter. Discretization in time and approximation of the derivatives by
finite differences leads to an iterative solver. Perona and Malik [PM90] proposed a
discretization for an inhomogeneous diffusion (which they called, somewhat sloppy,
anisotropic):
Ist+1 = Ist +
λX
λX
w(∇Is,p )∇Is,p ≈ Ist +
w(Ip − Ist )(Ip − Ist ),
|n| p∈n
|n| p∈n
(3.42)
where w(x) was proposed to be exp(−x2 /σ 2 ). Ist denotes the (gray) value of a
sampled image pixel s at time step (or rather iteration) t and n a neighborhood of |n|
pixels p around s. ∇Is,p indicates the directional derivative of I at s in the direction
of p. If we look back at the definitions of the M-estimator equation (3.32) we may
identify the model m as that of a constant gray value p = Is , the measurements yi
as the pixel values Ip , and w(x) as the Leclerc weight function (3.37). Comparing
equation (3.42) with (3.36) while remembering that ∇p ri,p = 1 for m = p, we find
that equation (3.42) is just the gradient descent solution of (3.36) (i.e. IRLS).
Thus, inhomogeneous isotropic diffusion is a robust M-estimator for the very simple
model of a constant neighborhood. If one extends equation (3.42) for a weighting of
the addends by their distance (e.g. w(p − s)) this corresponds to a generalized Mestimator (GM). While the solution of the homogeneous isotropic diffusion equation
51
Chapter 3. Image Processing and Filters
converges to an image of a single constant value (where the iteration steps are well
approximated by Gaussian or binomial filtering), inhomogeneous diffusion converges
to segments of constant value if hard redescenders are used as influence functions
(the number and size of the segments depends on the noise level and structure of
the image as well as the chosen influence function). Anisotropic diffusion allows
to smooth along edges but not perpendicular to these, achieving edge-enhancing
smoothing. A thorough discussion of (anisotropic) diffusion filtering, relations to
curvature-preserving PDEs and their application is given in [Tsc02; Tsc06].
A similar reasoning as above is possible for bilateral filtering, which is motivated
by introducing a weighting of the addends not only w.r.t. their spatial distance (as
Gaussian filtering does) but also w.r.t. their distance in range to the pixel s:
Is :=
X
1
ws (p − s) wr (Ip − Is ) Ip
norm(s) p∈n
with the normalization term norm(s) :=
P
p∈n ws (p
(3.43)
− s) wr (Ip − Is ).
For details regarding the relation to robust estimators and diffusion filtering we refer
to [DD02] and just want to annotate that the formal similarity to equation (3.42)
already suggests their close relation and that the performance of the methods heavily depend on the chosen weight- or respective influence function. Jones, Durand,
and Desbrun [JDD03] developed an interesting extension of bilateral filtering to (3D)
surface meshes, that can be used to estimate the position of mesh vertices in a robust manner. The extension introduces the concept of predictors, which incorporate
shape information in the filtering process, by means of normals on the (non-robustly)
smoothed surface.
We realized an optional bilateral filtering, dependent on the range information, for
the robust regularization of the structure tensor used in range flow estimation. For
the task of denoising single PMD-frames however, we employed another robust estimation technique, an extended version of channel smoothing; it exhibits the advantage of being computationally very efficient compared to bilateral and (even more)
anisotropic diffusion filtering and is more relaxed about its exact parametrization.
52
3.3. Two State Channel Smoothing
3.3 Two State Channel Smoothing
Channel smoothing, or more precisely w.r.t. this work B-spline channel smoothing is
a technique introduced by Forssén, Granlund, and Wiklund [FGW02] and thoroughly
discussed in [FSF02; FFS06], that allows robust smoothing of low-level signal features
without the main drawback of conventional robust smoothing concerning its applicability in image processing: the high computational complexity, arising from the
typically employed iterative solvers for finding the (local) minimum of the objective
function.
Channel smoothing uses a channel representation[NGK94] of the signal to be smoothed.
Channel representation is closely related or analogous to concepts in other fields of
research, e.g. population coding (computational neurobiology), radial basis functions
(neural networks) or fuzzy membership functions (control theory). From a viewpoint
of classical statistics, averaging of the channel representation can be regarded as a
regularized sampling of the probability density function (pdf ) of the signal measurements, by means of a kernel density estimator (for a detailed discussion see [For04,
chap. 4]).
A (nonlinear) decoding of the averaged channel representation allows to extract the
modes of the distribution, i.e. the local maxima of the distribution. The modes
correspond in terms of section 3.2.1 to the different model instances or populations
comprised in the signal. It is essentially the decoding step, what makes channel
smoothing a robust estimator. We extend regular channel smoothing for PMD-range
data, by applying a weighting of the single channel vectors w.r.t. the confidence in the
single pixel-measurements and using a new smoothing technique that differentiates
between pixels for which the weighting is taken into account and those that use the
unweighted channel entries.
53
Chapter 3. Image Processing and Filters
The steps involved in the application of our extended B-spline channel smoothing to
PMD-data are:
Encoding creation of the B-spline channel representation from the PMD-range data
Two State Smoothing smoothing the channels with a technique we named two state
smoothing, which allows to weight the range measurement w.r.t. some confidence measure, without the tendency to enlarge the near surfaces as observed
for common normalized averaging
Decoding extracting the mode that approximates maximum likelihood from the
averaged channel representation, yielding a robust estimate of the surface distance
In section 6.1 you can find an application of this novel extension to B-spline channel
smoothing, that we will describe in the following paragraphs in detail.
Encoding The range signal is transformed to the B-spline channel representation,
i.e. (sparse) vectors of B-spline values at every pixel position. The channel representation for a bounded signal f (x) ∈ [1.5, N − 0.5] is given by an encoding into N
channels at pixel positions x
cn (x) = B2 (f (x) − n),
n = 1 . . . N,
(3.44)
where the quadratic B-spline B2 (f ) is given by convolving the rectangle function
Π(x) = H(1/2 − |x|) two times with itself, yielding the explicit piecewise definition
B2 (f ) :=





3/4
1/2 |f |
− f2
|f | < 1/2
− 3/4 for
1/2
≤ |f | < 3/2 .
0
3/2
≤ |f |
(3.45)
As the signal needs to be bounded between [1.5, N − 0.5] we have to scale the range
data accordingly. If r(x) is the range signal bounded to [A, B], it may be transformed
as
f (x) =
N −2
(r(x) − A) + 1.5.
B−A
(3.46)
As the depth information of a PMD-sensor is based on a phase measurement, implying a specific unambiguity depth-range, and phase corresponds to a periodic domain,
one might think about adapting channel representation to this circular topology.
Felsberg, Forssén, and Scharr [FFS06] show that this is easily done, by adding the
54
3.3. Two State Channel Smoothing
lower two and upper two channels into two single channels (as they are the same
for a periodic domain). However, for PMD-data this would not be of much help,
because while phase is periodic, range is originally not, i.e. the periodicity of the
PMD-sensor’s depth-range is only a technical shortcoming.
Two State Smoothing We want to weight the range data according to the confidence measure we derived for the PMD-signal. As described above the (averaged)
channel representation may be interpreted as an estimate of the pdf . Multiplying
the individual channel vectors by the respective pixel confidence, does not change the
depth value but only the weighting of the vector with respect to the pdf -estimate,
similar to the weighting done by a GM-estimator. The mathematical proof that
such a weighting is sound from a statistical point of view w.r.t. the validity of the
pdf is outstanding, but experimental results show a good performance of the proposed
method w.r.t. robustness and noise suppression.
Lets suppose the weight image is w(x), then the weighted channels vectors (c0n ) are
given by
c0n (x) = cn (x) w(x)
We need to average the data in each channel to come to a reasonable estimate for
the pdf w.r.t. the range of the signal. With a model that assumes local constancy
(or smoothness) this can be achieved by Gaussian (or binomial) convolution, as it
respects the locality of the model by weighting distant pixels less. However, a pure
binomial filtering tends to bias the estimate toward nearer values, if the channel vector are weighted, similar to the case of normalized convolution in section 3.1.5. In the
resulting image near surfaces tend to grow, but different to normalized convolution
the edges are not blurred but sharp.
The reason for this is that binomial smoothing of the channels creates a nonzero
probability estimate for zero-value channel pixels (and the respective value range), if
the neighboring channel-pixels are nonzero. Furthermore a nonzero channel pixel will
be diminished if it is in the neighborhood of zero-valued pixels, i.e. an edge. Because
the weighting has a bias to weight the near surfaces more than those far away, the
probability estimate for the near value, which has been zero before smoothing, tends
to become bigger than that of a channel pixel farther away.
Thus, we propose a new smoothing algorithm for a channel representation of range
data that is going to be weighted and which differentiates between zero and nonzero
55
Chapter 3. Image Processing and Filters
channel pixels, and therefore was named two state channel smoothing. For zero valued
pixels we use the unweighted (w.r.t. the confidence measure) neighborhood to find a
pdf -estimate, while for the nonzero pixels we use the weighted neighborhood. The
nonzero pixel estimates need to be normalized w.r.t. the weighting to be comparable
with the zero pixel estimates. Eventually, we calculate the estimate c0 (x) of the pdf
for pixel x as follows :
B ∗ (W · Cn )
B∗W
Cz,n = B ∗ Cn
(
cnz,n (x)
cn (x) 6= 0
0
cn (x) =
if
,
cz,n (x)
cn (x) = 0
Cnz,n =
(3.47)
(3.48)
(3.49)
where small letter variables denote the functional representation of the corresponding
matrices C, with indices z for zero-pixel, nz for nonzero-pixel and n indicating the
channel number. Equation (3.47) is normalized averaging with a binomial applicability B, while (3.48) is just plain binomial smoothing.
Decoding The decoding of the encoded signal cn (x) = B2 (f (x) − n) can be
achieved by the linear interpolation
f (x) =
N
X
n cn (x) .
(3.50)
n=1
This is a result from describing a function P (f ) by a B-Spline approximation
X
P (f ) =
αn B2 (f − n),
n
and requiring that P (f ) = f , i.e. the identity function. For this case one obtains the
approximation coefficients to be αn = n [FSF02].
If we interpret f (x) as a random variable and c0 as a kernel density estimate of its
pdf , we come to an estimate of the first moment of f , by replacing c with c0 in (3.50).
Thus, (3.50) gives us an estimate of the expectation of f , because the first moment
about zero of a probability distribution is the expectation value of the corresponding
random variable. We may reformulate (3.50) as the first central moment, which is
zero for a pdf
X
(n − fˆ) c0n = 0 ,
(3.51)
n
56
3.3. Two State Channel Smoothing
where fˆ is an estimate of the unperturbed signal. This formulation corresponds to the
X
constraint equation (3.35), but
is hidden in the channel vector. To understand
i
this, we take a look at the continuous formulation of the optimization problem we
are dealing with in the limit of an infinite number of measurements:
fˆ = argmin E(f0 ) ,
(3.52)
f0
Z
where
E(f0 ) :=
ρ(f − f0 ) pdf(f ) df = (ρ ∗ pdf)(f0 ) .
(3.53)
The condition of a vanishing derivative at the minimum gives us the continuous
formulation of (3.35):
Z
0 = ∂f0 E(f0 )|f0 =fˆ = − ρ0 (f − fˆ) pdf(f )df
= (ψ ∗ pdf)(fˆ) ,
where ρ0 (r) = ∂r ρ(r) = ψ(r) .
(3.54)
Looking back at equation (3.51) and comparing it with equation (3.54), we may
identify the first central moment as a discrete convolution at fˆ of the sampled identity
function (n) with the sampled pdf (c0n ), and we conclude that the influence function
ψ of channel smoothing with linear decoding (3.50) is the identity function ψ(r) = r.
As we know, this corresponds to a least squares estimate and is therefore not robust.
We need to make the decoding robust, as the pdf estimated by c0 may be multimodal
or contain outliers. This can be achieved by making ψ a hard redescender, doing
a windowed reconstruction about the mode of c0 . The window size is chosen to be
three, because we need to keep the window size as small as possible, to achieve a
minimum computational effort; and three channels are the minimum to reconstruct
a measurement f encoded via (3.44) without errors. Instead of changing the width
of the influence function (corresponding to the decoding window size), the degree
of robustness of channel smoothing can be controlled by adjusting the number of
encoding channels N. This is because the robustness is determined by the width of
the influence function relative to the number of channels N .
The appropriate number of encoding channels N depends on the (Gaussian) noise
level σf of the signal f (x). In order to reject not more than 5 percent of the inlier
samples, the distance between two channels must be greater than 4σf [FFS06].
57
Chapter 3. Image Processing and Filters
The robust reconstruction reads
fˆn0 (x) =
nX
0 +1
c0
(x) − c0n0 −1 (x)
1
n c0n (x) = n0 + n0 +1
,
E(n0 )
E(n0 )
(3.55)
n=n0 −1
where E(n0 (x)) = c0n0 +1 (x) − c0n0 −1 (x) is the probability for the estimate fˆ to be
within the value-range of the corresponding decoding window about n0 . The channel
window center n0 should be near to the mode of pdf , i.e. the location of its global
maximum, such that the decoded signal value becomes a maximum likelihood estimate of the unperturbed signal. There are several possibilities to choose n0 with the
given kernel density estimate c0 . We decided for the computational efficient, but not
necessarily best method to choose n0 = argmaxn0 (E(n0 )), such that the determined
channel window has the largest sum of channel values. For a multimodal pdf the
modes may be located near to each other, such that n0 might be chosen to lie in
between the modes, what leads to a wrong estimate.
Based on the windowed reconstruction of the signal value equation (3.55), taking
into account the definition of the channel vector entries cn (3.44) and assuming an
infinite number of samples, the effective influence function of channel smoothing can
be calculated analytically [FFS06]:
ψ(∆f ) = B2 (∆f − 1) − B2 (∆f + 1), where ∆f := f − n0 .
(3.56)
The depicted function ψ(∆f ) in figure 3.4 is however not precisely the influence
1
0.5
−3
−2
−1
0
− 0.5
1
2
3
loss function ρ(Δ f)
influence function ψ(Δ f)
−1
Figure 3.4: Influence and loss function of robust channel smoothing
function, because therefore ∆f had to be the residual of the measurement, which
is defined as f − fˆ. Thus, the true influence function is shifted for the rounding
difference n0 − fˆ. Therefore, the influence function is no longer ideal w.r.t. the
58
3.3. Two State Channel Smoothing
broken (odd) symmetry about zero, what introduces a (minor) quantization error for
all estimates fˆ that are no integer values.
Felsberg, Forssén, and Scharr [FFS06] propose a method called virtual shift decoding
that resolves this problem of channel smoothing. However, this method is somewhat
involved and expensive w.r.t. computation time. We compared the results of both
methods on PMD-data and found no significant improvements, given a high number
N of encoding channels, as it is appropriate for PMD-data. For signals that have a
high noise variance (low SNR), implying a low number of encoding channels, virtual
shift decoding is of more interest, because its computational effort scales with the
number of channels and the quantization errors are the more prominent the smaller
the number of channels is.
59
Chapter 3. Image Processing and Filters
60
Chapter 4
Motion Estimation
Motion estimation has become an important discipline in computer vision. There
is hardly any complex computer vision task, that has not to deal with motion. In
various industrial and scientific applications the movement within a scene needs to
be accounted for. There is a multitude of tasks that are obviously related to motion
estimation, like time-to-collision estimation or pedestrian detection in automotive
industry. Another example is particle tracking for the visualization of flow fields
of liquids or gases or more general the analysis of dynamical processes in scientific
applications. The calculation of displacement fields for motion-based compression of
video sequences involves motion estimation too. For many other tasks, however, the
link to motion estimation is not so obvious, although it is still inherent. For instance
image registration or disparity estimation in stereo vision. Even for still image processing the visual systems of mammals employ the motion analysis pathways of the
brain. For example, while humans are looking at a picture, their eyes perform socalled (micro-)saccades for the analysis of the scene. These micro saccades introduce
artificial motion on the retina which is then processed by parts of the visual cortex
sensitive to the direction of motion and spatial structures (see [Big06]).
Typically the motion estimate is not the actual target of real-world applications.
Most times the specific motion estimation algorithm is only one link in a process
chain or chains. Therefore, the input and output, as well as the computational efficiency and qualitative performance are subject to various constraints and limitations.
This might be one reason why there is such a vast number of different approaches,
algorithms and specific implementations for motion estimation. Another reason is
that motion estimation from image sequences, which is the topic of this chapter, is
in general an ill-posed inverse problem. We will see in the following sections, that
various assumptions have to be made in order to make the problem a well-posed one.
61
Chapter 4. Motion Estimation
This is also the reason why we are talking of an estimate. It is only a guess that
is true (or rather approximately correct) only if the various necessary assumptions
are met. Basically the different assumptions that are made lead to the various algorithms proposed in computer vision literature. Often it turns out that one concept is
equivalent to the other or just a special case, formulated in a different manner; this is
no wonder, since computer vision is an interdisciplinary research field incorporating
the jargon and concepts of various scientific disciplines.
4.1 Optical Flow and Range Flow
4.1.1 Optical Flow and Motion Field
Before we start discussing our approach of motion estimation, we first need to clarify
in which kind of motion estimate we are interest and what a motion estimate is. We
are interested in the motion of objects or rather their surfaces in three dimensional
space. This physical motion is partially captured by an optical device, e.g. a camera,
by taking several images of it over time. Taking an image, typically means projection
of the 3D scenery on a 2D plane. Thus the physical 3D vector field of velocity vectors
associated with the motion, is projected to the image plane and becomes a 2D vector
field known as the motion field. The basic motion estimation algorithms try to
estimate this motion field from the sequence of images. Horn [Hor87] showed that
the reconstruction of the physical 3D motion field from the 2D motion field is possible
in most cases if the optical characteristic and the external parameters of the setup
(especially the parameters of the projection) are known. However, what the camera
(or the human eye) sees is not necessarily as closely related to the motion field as
one might think.
The apparent motion at the image plane that is based on the visual perception
is known as the optical flow or image flow. An extreme example of the potential
disagreement between optical flow and motion field is given by Horn [Hor86] and
depicted in figure 4.1. The figure shows an ideal sphere with an uniform surface.
It may rotate around any axes through its center of gravity without any apparent
motion. Therefore the optical flow field of the rotating Horn sphere is zero everywhere. In contrast, a moving light source that illuminates the sphere will change
the brightness distribution on the sphere over time, inducing an apparent motion.
62
4.1. Optical Flow and Range Flow
a
b
Figure 4.1: Disagreement of physical motion field and optical flow field: a a spinning
ideal sphere with fixed illumination shows no apparent motion; b a moving
illumination causes an apparent motion of the brightness distribution at
no motion of the sphere (images from [JH00])
Therefore we find a nonzero optical flow field while actually the sphere might be at
rest.
4.1.1.1 Barber’s pole illusion and complex motion
Because human vision is very good in motion estimation, one might presume that
the problems described so far are somewhat academic and only problematic because
of technical weaknesses or a lack of intelligence in the employed algorithms. We want
to stress that, while additional intelligence might help to come to a better motion
estimate or rather to a guess that has a higher probability to be correct, it does not
solve the general problems. Therefore, we want to present another less academic
example. It demonstrates the weaknesses in human vision and illuminates some
further problems of associating the optical flow with the motion field. Moreover, it
demonstrates that the optical flow is potentially ambiguous because of the so-called
aperture problem.
63
Chapter 4. Motion Estimation
The barber’s pole on the righta is a good example for a situation that appears to be simple to analyze but features various
aspects of complex motion that are demanding w.r.t. motion
estimation and only to handle if specific, rather restrictive,
assumptions are made. The barber’s pole is a cylinder with
diagonal running colored stripes (of a single orientation) rotating around its axis of symmetry. A human observer has the
illusion of a motion upward despite the fact that he knows that
the cylinder is spinning to the left. This phenomenon is a manifestation of the aperture problem. To put it simply (we will
discuss the details later), it describes the following: If we have
access only to a limited field of view (the aperture) of a moving
object and this object has a texture that exhibits only a single
orientation, then we can not say anything about motion along
this orientation.
a
The electronic version (PDF with JavaScript enabled) of this thesis is
needed to see the described optical flow estimation phenomena; zoom
in to focus easier on the different apertures; click on the pole to temporarily stop the animation
Figure 4.2: The
barber’s pole
illusion
demonstrates the
general ambiguity
of optical flow
Thus the motion estimate becomes ambiguous, as only the vector component of
the motion normal to the orientation can be determined. Hildreth [Hil82] found an
elegant rule that the human vision system applies to come to an unique estimate:
The constructed motion field is that which is compliant with the apparent brightness
changes and of maximum uniformity within the aperture. Focusing on one of the
small rectangular apertures on the right side of figure 4.2, we construct a flow field
pointing from right to left, because it is more uniform: there is a discontinuity
in the direction of flow only at the shorter vertical edges, while along the longer
horizontal there is no discontinuity in the direction of the flow field. If we focus on a
larger aperture centered between the pole and the small clippings, our visual system
constructs a mixture of both motions which runs normal to the orientation of the
stripes. This type of flow field corresponds to a motion estimate known as normal
flow which we will discuss in section 4.1.3.
It is important to understand that this ambiguity is not a problem of human vision
only and not specific to the rather seldom cases of optical illusions. With only the
sequence given on the current page one just can not be sure which motion is the
true one. Also physical assumptions about possible motions do not help. While a
rotation is a good explanation for the horizontal motion, the same is true for the
64
4.1. Optical Flow and Range Flow
vertical one, if one assumes a striped, colored ribbon moving like a belt drive. While
the single orientation of the barber’s pole is rather unnatural, weak textured regions
and step edges in images occur quite often, and the scale and size of the analyzed
region determines if an aperture problem exists, given the unavoidable uncertainty
(i.e. noise) in the measurements.
The barber’s pole features some other typical problems of motion estimation. There
are specular reflections on the cylinder. In the upper part of the pole they are of a
magnitude such that the color of the stripes is occluded. Within a small neighborhood
along the occlusion boundary there exist two motions. The one of the stripes and
that of the fixed specular highlight (a zero-motion). A similar situation we have in
the lower part of the pole. The specular reflections are of less magnitude and are
transparent superimposed on the moving stripes. This time the two motions are not
along the boundary but spread over the area of specular reflection. Remembering
the discussion about robust estimation, we realize that this could be handled by a
two motion model or a robust estimator. However, both approaches will fail for 3
motions: Imagine the pole protected by some glass tube. The reflections on it might
transparently superimpose an additional layer of motion in the surrounding scenery.
Another problem is that we might not be interested in the rotation of the pole but
only in translations of the pole’s position. For example a computer vision system
installed on an automobile, might not know anything about barber poles. How to
decide if the apparent motion is of relevance or not, if all information accessible is a
sequence of gray or color valued images?
Now that we have illustrated some of the problems in recovering the physical motion field from optical flow we will show how the PMD-signal can be used to come
to a motion estimate that is more robust with respect to the motion field we are
interested in. We will use an optical flow based approach to motion estimation. The
advantage of optical flow compared to correspondence based methods is that they
are inherent continuous. Continuous problems can be tackled in a profound way with
the mathematical apparatus of analysis and in particular calculus.
4.1.2 Brightness Change Constraint Equation
How to describe optical flow mathematically? Let the point p belong to a small image
patch in the 2D-image plane g. If the patch moves at constant velocity f = uv along
65
Chapter 4. Motion Estimation
a line and does not rotate, i.e. does a translation, then the motion of p is described
by
"
#
x(t)
p(t) :=
= p(0) + f t .
(4.1)
y(t)
If the motion is different, i.e. in some way accelerated (like for a rotational or curved
motion), p(t) shall be a first order approximation of the true motion, valid for small
t. Lets assume that the image patch just changes its position over time but not its
appearance, i.e. its gray values (if g is a gray value image). The constancy of the
gray value texture along with the motion of the image patch may be stated as
g(p(t), t) = const .
(4.2)
If we take the (total) derivative in time of equation (4.2), applying the chain rule
and considering (4.1) we find:
∂g dx ∂g dy ∂g dt
d
g(p(t), t) =
+
+
= gx u + gy v + gt = 0
dt
∂x dt
∂y dt
∂t dt
−→
or
(∇g)T · f + gt = 0
T
(4.3)
T
(∇st g) · f̃ = 0 , where f̃ = [u v 1]
.
(4.4)
Equations (4.3) and (4.4) are equivalent formulations of the well known brightness
change constraint equation (BCCE). f is the velocity of a point p(t0 ) in the image
patch, ∇g and ∇st g are the spatial and spatiotemporal image gradient at p(t0 ) at
time t0 , and gt the partial derivative in time at the same point. The BCCE is valid
only inside the patch: At the borders there is a velocity discontinuity as well as
a potential discontinuity in the image signals; the spatial and temporal derivatives
on g are not defined on both sides of the border. This is of relevance, because the
continuous formulation of the BCCE is going to be discretized and the patch border
in general does not coincide with a pixel border and thus has a ”spatial extension”.
The BCCE belongs to the inverse problem of finding the model parameters f for given
data ∇st g. It relates the spatiotemporal image structure with the sought optical flow
velocity vector f , which consists of two unknown scalar values. One equation is not
enough to solve the problem uniquely, but it constrains the solution to a line in the
(u, v) flow space. However, if the image is constant in the neighborhood of p, ∇st g
is a null vector, such that the BCCE is fulfilled for arbitrary f and therefore gives
no constraint at all. To come to an unique solution we may take additional points
66
4.1. Optical Flow and Range Flow
pn of the moving patch into account, each one related to a BCCE (4.4), yielding a
system of linear equations
G∇ · f = 0 ,
(4.5)
where the matrix G∇ contains in its row vectors the respective spatiotemporal derivatives of g at the points pn . It depends on the signal g, if we succeed in finding an
unique solution: if ∇g is linear dependent on gt , i.e. there is only a single orientation
in the data, all constraining equations are equivalent and the system is underdetermined. Only if there are exactly two linear independent equations in the system,
the solution will be unique. Due to noise this will never be the case for real world
data, and typically the system is overconstrained; we may compensate for the given
uncertainty in the data by writing G∇ · f ≈ 0. A solution can then be found only
from a probabilistic point of view, trying to minimize the error in the estimate f̂ of
f , by e.g. a (total) least squares approach (we will discuss this in section 4.1.6).
We realize that in general, motion estimation from image sequences is an ill-posed
inverse problem, i.e. a problem that does not fulfill the postulates of Hadamard
[Had02] about well-posedness: it might not have a solution in the strict sense (i.e.
only a probabilistic estimate can be given), the solutions might not be unique and/or
might not depend continuously on the data, in some reasonable topology. This is
both true for estimation of optical flow and even more for the estimation of the
motion field. Only if one can assure that the various implied assumption are true
(like the rigidity of observed objects, which move in conformance to a specific motion model) either by a specific experimental setup or by a sophisticated analysis
of the image content, the problem may become partially well-posed. In general any
quantitative motion estimate is associated with an uncertainty and therefore motion
estimation algorithms should supply a confidence measure, describing the accuracy
of the estimate.
In the following we will not address the additional problems involved with noise
explicitly, but always keep in mind that there is noise and that all real world flow
estimates are subject to noise. For example if we speak of two equal BCCE (4.4) for
two different points, equality is to be understood as relative to the SNR of the data.
4.1.3 Aperture Problem
We have noted that the BCCE (4.4) is underdetermined. The system (4.5) is underdetermined too, if the single equations correspond to samples of an image patch g of
67
Chapter 4. Motion Estimation
a single orientation. To clarify this we look at the time evolution of such an image
patch, which corresponds to a rank one signal : Given the vector n = nn12 normal
(knk2 = 1) to the single oriented texture in the image patch g, and the differentiable
function s : R 7−→ R, we define
g 1 (x, y, t) := s([x y t] · ñ) = s(l) ,
where ñ ∈ R3 is n extended for the temporal dimension. We find ñ3 by substituting
g 1 for g in the BCCE (4.3)
T
(∇g 1 ) f + gt1 = n1 s0 u + n2 s0 v + ñ3 s0 = s0 (nT f + ñ3 ) = 0
(4.6)
ds
and solve for ñ3 = −nT f = −fn , if
dl
s0 is not zero, i.e. s(l) must neither be constant nor at an extremum.
such that we may cancel the derivative s0 =
All equations for all points at all times within the image patch are of the form (4.6)
and therefore equivalent. This is the aperture problem of optical flow. We can only
determine the flow component fn normal to the orientation of the image, i.e. in the
direction of n. We may express the raw normal flow vector fn = fn n in terms of
the spatiotemporal image derivatives by the following reasoning
ñ3 s0 = − fn s0 = gt1
1
∇g = s0 n = s0 knk = s0
2
2
2
=⇒
fn = −
gt1
k∇g 1 k2
,
n=
;
;
∇g 1
k∇g 1 k2
g1
fn = − t0
s 1
0
s = ∇g 2
and fn = −
gt1 ∇g 1
k∇g 1 k22
.
(4.7)
If the patch is not of single orientation we might find the full flow f by taking other
points into account. If we take other points into account we have to be sure that they
are inside the image patch and not on or beyond the motion boundary, as otherwise
the BCCE is not valid anymore. Thus we can not just blindly extend the region
around our point of interest, to solve for the aperture problem.
The problem that we need to take additional points into account, but are restricted
to a region to choose this points from, which is in general unknown and depends on
the flow itself, is referred to as generalized aperture problem. And while we would
like to take as many data points into account as possible, to achieve a good estimate
w.r.t. noise in the data, we are restricted in doing so for the same reasons. There are
several ways to deal with the generalized aperture problem, and to find the best flow
68
4.1. Optical Flow and Range Flow
estimate under random, or more precisely, generic conditions is a topic of ongoing
research [Pap+06; Tel+06; Gov06; Xia+06].
Equation (4.4) is the simplest formulation of the BCCE, which may be extended
for more complex motion models like affine flow (see e.g. [BA96]) and for multiple
motions (see [MSB01; Bar+03; Stu+03]). So far we totally neglected the specific
properties of the PMD-signal. For the amplitude-signal of the PMD, the BCCE is a
rather bad approximation. A PMD-camera uses an active illumination and therefore
motion in depth will involve a major change in the optical power irradiated on each
sensor pixel and therefore violates the BCCE. We will discuss later how this can be
handled, but first show how to use the range-signal of the PMD, because this also
helps to understand how to deal with the amplitude-signal.
4.1.4 Range Flow Constraint Equation
A time varying surface may be viewed as a depth function Z(X, Y, t), with the
Cartesian coordinates (X, Y, Z). The coordinate of a point of interest on this surface
may be described by
P (t) = [X(t), Y (t), Z(X(t), Y (t), t)]T .
The function Z(t) := Z(X(t), Y (t), t) with one argument is the Z-coordinate of the
moving point on the surface, while the function Z with three arguments describes
the time evolution of the surface.
If we take the derivative of P (t) in time and assume a pure translation with constant
velocity

 

X(t)
X0 + U ·t

 

 Y (t)  =  Y0 + V ·t  = P (0) + f t , where f is the velocity vector of P ,
Z(t)
Z0 + W ·t
we yield by applying the chain rule on Z(t):

 

U
U
d

  
P = f = V  = 
V
 ,
dt
W
U ZX + V ZY + Zt
69
Chapter 4. Motion Estimation
where ZX and ZY are the partial derivatives of Z(X, Y, t). The herein embedded
equation
W = U ZX + V ZY + Zt
(4.8)
is called the range flow constraint equation (RFCE) [Yam+93]; an analogon to the
BCCE, that deals with range instead of brightness values.
It constrains the sought solution f for a given range-data-set Z(X, Y, t) at (Xt0 , Yt0 , t0 ).
However, the constraint may only be applied if the surface is smooth with respect to
the spatial resolution of the data set, as otherwise the partial derivatives of Z can
not be calculated properly (on depth-edges they are not defined at all). Furthermore,
with respect to the temporal resolution of the data set, the motion of the surface
patch must be well approximated by a translation.
In order to evaluate the RFCE, the partial derivatives of the depth function Z(X, Y, t)
with respect to world coordinates X and Y have to be computed. Range data,
as delivered by the PMD-camera, is given in sensor coordinates r(x, y, t), with r
being the radial distance |poscamera − possurf ace | and x, y being the sensor-pixel
coordinates∗ . After applying the transformation from sensor to world coordinate
system the range-data is unevenly sampled.
Thus computing the derivatives is no longer straight forward. One can apply TLS
(total least squares) or OLS (ordinary least squares) estimation from a local firstorder approximation of the surface or resample the data on a Cartesian grid [SG02].
However, both methods have the disadvantage of being rather slow and in the case
of resampling the necessary interpolation may introduce additional errors.
In the following we will employ fast derivative filters to compute the world coordinate
derivatives; Spies and Barron [SG02] showed that these have competitive accuracy
when applied to real-world range data sequences. Because derivative filters are applied via convolution, which implicitly assumes an evenly sampled grid, we have to
find a way to compensate for the deviation due to uneven sampling.
Here the objects of interest are 2D surfaces in the 3D world Z = Z(X, Y, t). The
data points are sampled at locations on the sensor array, which in turn depend on
∗
this is a simplified description of the sensor coordinates, for a more precise description see [Jus01,
pp. 61 ff]
70
4.1. Optical Flow and Range Flow
the 3D data points observed: x = x(X, Y, Z); y = y(X, Y, Z). The transformation
from r(x, y, t) to world coordinates yields one data set for each of X, Y and Z on a
sampling grid (x, y, t) (e.g. X = X(x, y, t)).
For the total differential of the three data sets we obtain
dX = Xx dx + Xy dy + Xt dt,
dY = Yx dx + Yy dy + Yt dt,
dZ = Zx dx + Zy dy + Zt dt .
(4.9)
Eliminating dx and dy from equation (4.9) results in:
dZ
=
1
− ∂(Y,X)
∂(x,y)
∂(X, Z)
∂(X, Y, Z)
∂(Z, Y )
dX +
dY +
∂(x, y)
∂(x, y)
∂(x, y, t)
.
1 ,··· ,An )
The expressions of type ∂(A
∂(a1 ,··· ,an ) denote the determinant of the Jacobian matrix of
the functions A1 , · · · , An with respect to their arguments a1 , · · · , an , which we may
abbreviate as the Jacobian hereinafter:


∂A1 ∂A1
···
 ∂a1
∂an 


∂(A1 , · · · , An )
.
.
.


.
.
.
:=  .
.
.

∂(a1 , · · · , an )
 ∂A
∂An 
n
·
·
·
∂a1
∂an
Differentiation in time and rearranging
0 =
dZ
dt
= W yields
∂(Z, Y )
∂(X, Z)
∂(Y, X)
∂(X, Y, Z)
U+
V +
W+
∂(x, y)
∂(x, y)
∂(x, y)
∂(x, y, t)
(4.10)
This is the RFCE using derivatives on the (evenly sampled) sensor coordinates x, y
(and time t) and thus can be evaluated by convolving the range-data with derivative
kernels. Using equation (4.10) implies having transformed the radial range data
r(x, y, t) to Cartesian world coordinates.
Instead of applying the filters on the transformed data, it is possible to substitute
X, Y and Z with the analytic expressions from the sensor model, so that the derivative filters are applied directly on r(x, y, t).
71
Chapter 4. Motion Estimation
We use a pinhole camera model, thus
X(x, y, t) =
y r(x, y, t)
f r(x, y, t)
x r(x, y, t)
√
√
√
, Y (x, y, t) =
, Z(x, y, t) =
e
e
e
e := x2 + y 2 + f 2
with
and f being the focal length.
(4.11)
After substituting X, Y and Z in equation (4.10) we obtain a somewhat bulky expression, which we simplify by further substitutions and rearrangements to
0 =
where
d =
U (r x − rx e) + V (r y − ry e) + W d − r rt
e (rx x + ry y)
fr+
f
√
e
(4.12)
This new variant of the RFCE reduces the number of necessary filter operations and
simplifies error analysis regarding noise in r(x, y, t) and systematic errors introduced
by the derivative kernels.
4.1.5 Aperture Problem Revisited
The RFCE poses only one constraint in three unknowns. It describes a plane C
in (U, V, W )-space with surface normal [ZX ZY 1]T . The best solution, given this
constraint, is the minimal vector fr between the (U, V, W )-space-origin and C (see
figure 4.4a). The raw normal flow for range data is analogous to that of the BCCE
equation (4.7)
fr =
ZX
2
−Zt
Zt ∇Z
,
[ZX ZY 1]T = −
2
+ ZY + 1
k∇Zk22
(4.13)
where ∇Z denotes the 3D spatial gradient of the range measurement Z at the surface
point of interest in Cartesian coordinates. Different to the BCCE however a constant
neighborhood (or rank null signal) is already sufficient to determine a normal flow,
because the RFCE (4.8) contains the additional velocity term W .
Three characteristic types of neighborhoods for range data are illustrated in figure 4.3. Depending on the neighborhood only a specific flow type can be estimated:
plane flow If the neighborhood is of planar structure all constraints are linearly
dependent and only the plane flow can be calculated (see figure 4.4a). This
72
4.1. Optical Flow and Range Flow
Plane Flow
Line Flow
Full Flow
Figure 4.3: Illustration of the three characteristic types of neighborhoods encountered
in range data and the corresponding flow types that can be estimated in
the respective neighborhood.
a
b
C
f
¢
c
f
f
Figure 4.4: The number of independent constraint planes in velocity-space (U, V, W )
associated with the RFCE’s within an aperture determine the type of flow
that can be estimated: a plane flow, b line flow or c full flow.
means that if the considered field of view (the aperture) shows a pure plane
(with no additional structure), only the movement perpendicular to this plane
can be detected.
line flow Linear structures in position space such as intersecting planes correspond
to two distinct classes of constraint planes in the examined aperture (see figure 4.4b). The point on the common line closest to the origin gives the appropriate line flow. This line flow lies in the plane perpendicular to the linear
structure. Any movement along the direction of the structure, e.g. an edge,
can not be resolved by a local analysis; this corresponds to a rank one signals
(that we introduced with the aperture problem for optical flow), where s0 (l) is
not constant.
73
Chapter 4. Motion Estimation
full flow On corner- or peak-like structures clearly all three components of the movement can be determined locally. In such a neighborhood three linearly independent constraint equations can be found. These correspond to three mutually distinct, i.e. non-parallel, constraint planes in the velocity space (see
figure 4.4c). The full 3D-flow is readily computed by the intersection of the
constraint planes, assuming that the flow is constant in the neighborhood.
The existence of plane and line flow, being the only possible local estimates within
specific neighborhoods, is the manifestation of the aperture problem for range flow.
4.1.6 Local and Global Flow Estimation
4.1.6.1 Local Total Least Squares Estimation
In the following we will describe how a local estimate is obtained by means of a total
least squares (TLS) technique. We use TLS because it can be computed efficiently
and is the appropriate choice if not only the measurement vector b but also the model
(or ”data”) matrix M of an overdetermined (linear) system of equations M x = b is
contaminated by noise.
Ordinary least squares (see section 3.2.1) minimizes the residual r = M x − b
argmin kM x − bk2 ,
x
which is appropriate if only b is subject to noise; in other words OLS corresponds
to perturbing the measurements b by a minimum amount r such that b + r can be
explained by M for the model parameters x.
Total least squares minimizes
argmin kDpk2 ,
subject to pT p = 1 and where D = [M , b] ,
p
which corresponds to perturbing both M and b. This is the appropriate problem
description for both optical and range flow because, as we will see, the entries of D
correspond to the single Jacobians in the RFCE (4.10); the Jacobians, while they
dependent on the explanatory variables (x, y, t), are also subject to noise because
they are based on the PMD-signal. For a detailed study of TLS we refer to [VHV91]
or to [GL80] for a less extensive discussion.
74
4.1. Optical Flow and Range Flow
Assuming constant flow in a neighborhood of the point of interest, we get n constraint
equations (4.10) if we take n neighboring samples into account. These can be written
as
dT f˜ = 0 , k = 1 . . . n
T ∂(X,
Z)
∂(Y,
X)
∂(X,
Y,
Z)
∂(Z,
Y
)
k
where
d=
∂(x, y) ∂(x, y) ∂(x, y) ∂(x, y, t)
(xk ,yk ,tk )
T T
T
˜
and
f = f 1 = [U V W 1]
k
(4.14)
Stacking up all equations in the (spatiotemporal) neighborhood gives analogous to
equation (4.5)
D f˜ = 0
where data matrix D = [1d, . . . , nd]T
(4.15)
For real world data the rank of D is due to noise typically greater three; at least if
more than three samples (n > 3) were taken to build the system of equations. It
follows that the system (4.15) is overdetermined and there exists no exact solution.
One way to deal with this problem is to recast it to an optimization problem and
find a solution f˜ in a total least squares sense, i.e.:
T
T
˜
D
f
(4.16)
= f˜ D T D f˜ = f˜ S f˜ −→ min.
2
Be aware that f˜4 = 1 imposes a constraint. As the more generic (but equivalent)
constraint pT p = 1 is easier to handle, we replace f˜ by the generic parameter vector
p. Restating the upper minimization problem in a continuous form gives
Z∞
p̂ = argmin
p
w(x − x0 , t − t0 )(dT p)2 dx0 dt0
subject to pT p = 1 .
(4.17)
−∞
The weighting function w(x − x0 , t − t0 ) defines the data points d = d(x0 , t0 ) that
are taken into account for the estimate and allows to weight these according to
their position relative to the point of interest (x, t). A common choice for w is
a three dimensional Gaussian function or rather Gaussian pdf , if we require the
weighting to correspond to a probability distribution, such that w is normalized, i.e.
R∞
0 0
−∞ w dx dt = 1. A Gaussian weighting pays tribute to the generalized aperture
problem, in that it implies the reasonable assumption that far from the point of
interest the probability to find the same flow is lower than near to it. From a signal
75
Chapter 4. Motion Estimation
processing point of view it additionally has the advantage, in contrast to a box filter,
that it will not introduce any aliasing because of its limited bandwidth. Nevertheless,
it is important that the sampling theorem was not violated by prior signal processing
steps like pixel-wise multiplication, as explained in the previous chapter in the context
of band enlarging operators.
The requirement pT p = 1 can be incorporated by means of a Lagrangian multiplier
λ in the objective or energy function E we need to minimize:
"
#
Z∞
n
X
E=
w(x − x0 , t − t0 ) (dT p)2 + λ(1 −
p2i ) dx0 dt0 .
(4.18)
i=1
−∞
Taking the derivatives with respect to the parameter vector p we find the constraint
equations for a minimum of E to be
∂E
= 2
∂pi
Z∞
!
w(x − x0 , t − t0 ) di (dT p) − λpi dx0 dt0 = 0 ∀ i = 1 . . . 4 . (4.19)
−∞
We may take the pi out of the integral, as they are assumed to be constant:
Z∞
p1
0
Z∞
0
w di d1 dx dt + . . . + p4
−∞
w di dn dx0 dt0 = λ pi
∀ i = 1...4 .
(4.20)
−∞
The right hand side follows if we require the weight function to be normalized.
The 4 equations (4.20) can be written in matrix form:
Z∞
S p = λp
where
S=
w (x − x0 , t − t0 ) (d dT ) dx0 dt0 .
(4.21)
−∞
The real symmetric (and positive semidefinite) 4 × 4 matrix S is an extension of
the structure tensor [HS99] for range flow introduced by Spies et al. [Spi+99]. Equation (4.21) is the eigenvalue equation for S. Each of the 4 eigenvectors corresponds
to an extremum of the objective function E, and the (always positive) value of the
corresponding eigenvalue is a measure for how close to zero the extremum is (the
smaller the closer).
In a discrete implementation, the components of S can be computed using standard
image processing operations:
S i,j = hdi dj i , i, j = 1 . . . 4 ,
76
4.1. Optical Flow and Range Flow
where h· · · i denotes an averaging operator like normalized averaging or plain binomial
smoothing.
Gradient Based Weighting While binomial (or Gaussian) averaging is an optimal
choice for a smooth flow field and data terms subject to i.i.d. noise, it basically
ignores motion boundaries. Because a large spatial gradient in range data typically
coincides with a motion boundary (at the border of an object) the author proposes
an additional weighting dependent on the magnitude of the spatial gradient to reduce
the influence of occlusion on the estimate:
2 m
,
(4.22)
wo (m, σ, µ) = exp −
−µ
σ
p
where m is the magnitude of the spatial range gradient ( d21 + d22 ), while σ and
µ control width and center of the function. While the weighting does not solve the
aperture problem, it can exclude data points near to an edge, which are likely to bear
information that contradicts the motion model due to occlusion. If the aperture of
the spatial weighting is small, probability is high that no data points corresponding
to a different motion are integrated.
wo( m, 1 , 0)
1
weight
wo( m, 2 , 0)
wo( m, 1 , 0.5)
0.5
wo( m, 2 , 0.5)
wo( m, 1 , 1)
0
1
2
magnitude of spatial gradient
m
3
Figure 4.5: Weight function to suppress influence of data at a motion boundary
Furthermore, if the parameters are chosen correspondingly (see figure 4.5), wo can
attenuate the influence of data points with a small gradient, which typically are
more critical to integrate w.r.t. the noise in data. To understand this, let us look at
the BCCE: (∇g)T · f + gt = 0. If ∇g is small and gt is large and both are due to
noise rather bad measurements, then the BCCE implies an incommensurable large
and erroneous normal flow, i.e. an outlier. The same is true for range flow. So it
is reasonable to attenuate data points with a small gradient. If the neighborhood
77
Chapter 4. Motion Estimation
contains only gradients of similar magnitude, the attenuation will not influence the
estimate. On simulated test data we achieved improved flow estimates in the vicinity of occlusion boundaries using the above weighting. However, the parameters σ
and µ were tuned manually and a detailed analysis based on real world data is outstanding. We also used 1-step bilateral filtering for weighting the rows of the data
matrix D based on the the difference in depth between central pixel and neighbors
(analog to equation (3.43), whereas we replaced wr (Ip − Is ) by wr (Zp − Zs )) but
achieved no satisfactory results. The resulting flow estimates were very sensitive to
the parametrization of the weighting function wr and the local structure of the data.
Minimum Norm Solutions The structure tensor S contains all necessary information to determine the local spatiotemporal structure of the data. The estimate p̂ is
found as the eigenvector to the smallest (or vanishing) eigenvalue λ4 of S, if and only
if the aperture problem was solved by pooling over an adequate neighborhood, i.e.
one that corresponds to a full flow. The sought solution is then given by fi = p̂i /p̂4
with i = 1 . . . 3 or equivalently by
 
e41
1  
ff =
(4.23)
e42  .
e44
e43
The emn are the entries of the matrix of eigenvectors of S:


e11 . . . e41


E S = [e1 · · · e4 ] =  ... . . . ...  ,
e14 . . . e44
where the eigenvector en belongs to the eigenvalue λn and the eigenvalues are sorted
in descending order, i.e. λ1 ≥ λ2 ≥ λ3 ≥ λ4 (≈ 0, if model assumption were met).
In the ideal case λ4 is zero. Deviations from this are either contributed to noise in the
data or, if the eigenvalue is large relative to the noise, indicate a violation of the model
assumptions of a local constant flow, like in the case of occlusion (corresponding to
at least two motions) or multiple transparent motion. However, if the aperture
problem was not solved by integrating over a set of data points, because for example
the surface we are looking at is a pure plane, multiple eigenvalues will be rather
small. For this case only the plane flow or the line flow can be estimated locally,
as explained in the section 4.1.5. The normal flow is determined by taking the
78
4.1. Optical Flow and Range Flow
eigenvectors of either all vanishing eigenvalues or of all non-vanishing eigenvalues
into account.
For line flow two eigenvalues vanish and the flow estimate calculated from the eigenvectors of the two non-vanishing eigenvalues is
 
  
e21
e11
1
 
  
(4.24)
fl =
e14 e12  + e24 e22  .
1 − e214 − e224
e23
e13
For plane flow three eigenvalues vanish and the flow estimate calculated from the
eigenvector of the single non-vanishing eigenvalue is
 
 
e11
e11
e14  
e14
 
fp =
=
(4.25)
e
 12 
e12  .
1 − e214
e211 + e212 + e213
e13
e13
The equality 1 − e214 = e211 + e212 + e213 above holds true, because we optimized under
the requirement pT p = 1, which every eigenvector needs to fulfill.
Spies [Spi01] derives the general formula for minimum norm solutions† of TLS range
flow estimates which is given by
T
T
Pn
Pq
i=q+1 ein ei1 . . . ei(n−1)
i=1 ein ei1 . . . ei(n−1)
P
Pn
=
,
(4.26)
f=
2
1 − qi=1 e2in
i=q+1 ein
where q is the number of non-vanishing eigenvalues of a n × n structure tensor
S. The left expression calculates the flow based on the vanishing eigenvalues (and
corresponding eigenvectors), while the right one does it based on the non-vanishing
eigenvalues.
4.1.6.2 Regularization of Local Flow Estimates
If we want to estimate the full flow for a neighborhood that allows only plane or
line flow to be determined locally, we needs to make further assumptions, which give
further constrains, in a global sense. This is typically done in a variational framework
†
With respect to TLS also the full flow estimate is a minimum norm solution as the structure tensor
is rank deficient for all flow types. Only if the model assumptions are violated the structure tensor
is of full rank.
79
Chapter 4. Motion Estimation
that uses a data and a smoothness term, which together make up an energy that
is to be minimized globally. The data term derives itself from constraints (or vice
versa) of a kind like those we used for the local TLS estimation. The smoothness
term is motivated by assumptions of global nature, like for example that the motion
of a rigid object is smooth.
Spies and Garbe [SG02] present a variational approach that is based on the local
TLS-estimates. Restated for our problem, the regularized motion vector v̂ is found
as the solution to the following minimization problem
v̂ = argmin
Z v
ωc (P v − f )2 + α
A
3
X
(∇vi )2 dxdy
(4.27)
i=1
The projection matrix P projects the sought parameter vector p on the solutionsubspace to which the minimum norm solution f of the local TLS estimate belongs,
such that solutions v that are distant to f w.r.t. this subspace increase the cost
within the objective function. The parameter ωc describes the confidence in the local
TLS-solution, and will be defined in the next section. The parameter α controls the
overall smoothness of the regularized flow field v̂(x, y); it controls the influence of the
smoothness term which penalizes flow fields that have large gradients in the vector
components, i.e. are not smooth.
The projection matrix P is calculated from the reduced eigenvectors bi of S (which
are a basis of the minimum norm solution (4.26)):
 
ek1
1
 
T
(4.28)
P = Bq Bq , Bq = [b1 . . . bq ] , bk = qP
ek2  ,
3
2
e
i=1 ki ek3
where q is the number of non-vanishing eigenvalues. For further details on such
(TLS) regularization techniques we refer to the work of [SG02; GHO99].
4.1.6.3 Performance Issues
As S is real and symmetric, the eigenvalues and -vectors can easily be computed
using the Jacobi eigenvalue algorithm (Jacobi rotations), which has the advantage to
be inherently parallel (and therefore a parallel implementation is possible [GL96]).
However, the method calculates all 4 eigenvalues and corresponding eigenvectors of
80
4.1. Optical Flow and Range Flow
the 4×4 matrix, while for a full flow it would be sufficient to calculate the eigenvector
to the smallest eigenvalue. This implies that there is potential for a reduction of the
computational effort.
Barth [Bar00] shows that the minors of the structure tensor of optical flow (a 3 × 3
matrix) can be used to calculate a flow estimate with only a fifth of flops needed
for conventional structure tensor analysis. However, its questionable if the algorithm
can be efficiently extended for 4 × 4 matrices. Moreover, no normal flows may be
calculated with this method (as it assumes only a single eigenvalue to vanish).
If not a complete eigenvalues analysis is of interest, then partial total least squares
(PTLS) [VHV91] may be used to directly calculate the minimum norm solution.
Depending on the structure of the data a performance increase of up to 50% compared
to a complete eigenvalue analysis of the structure tensor seems achievable.
Computational most expensive however, is the regularization of the local flows with
a variational method, if there is an aperture problem in the local neighborhood. The
aperture problem can partially be solved by taking the amplitude information of the
PMD-signal into account, as will be discussed in section 4.1.8. This increases the
density of full flow estimates that can be achieved by the local method. If the further
processing depends on a dense flow field, one may calculate the regularized flow field
on a (spatially) downsampled grid, because the spatial resolution of a flow estimate
is due to the employed aperture (to overcome the aperture problem) always lower
than the original resolution of the data. The original resolution may be regained
after regularization, by a cheap bicubic or B-spline interpolation on the original grid.
A further speed improvement could be achieved by using multigrid methods similar
to those Bruhn et al. [Bru+06] employed for accelerating various variational optical
flow techniques.
4.1.7 Confidence and Type Measure
So far we explained how to calculate a flow estimate appropriate to the specific
neighborhood. But we also need to decide which neighborhood exists so we may
choose the right flow type. Furthermore, we would like to get a measure for the
likelihood of the estimate. For this we may exploit the spectrum of the structure
tensor S.
81
Chapter 4. Motion Estimation
The smallest eigenvalue λ4 of S corresponds to the residual of the TLS estimate.
Therefore, if it is large relative to the noise level of the data samples kd of equation (4.14), it indicates a violation of the model assumptions and we need to reject
the estimate. This may be accomplished by introducing a threshold τ which λ4 must
not surpass if the flow estimate shall be accepted. Spies, Jähne, and Barron [SJB00]
propose a confidence measure based on λ4 :

0
if λ4 > τ (or tr(S) < η)


2
ωc =
.
(4.29)
τ − λ4


otherwise
τ + λ4
Since the trace of the structure tensor S is essentially the sum over the squared
magnitude of the spatiotemporal gradients in the neighborhood and the trace of a
symmetric matrix is rotation invariant, tr(S) is a measure for structure in the data
independent of its orientation. This is why in [SJB00] pixels are excluded from
a further eigenvalue analysis if the trace falls below a specific threshold. While
this is reasonable for the optical flow, this is not necessarily so for range data; the
plane flow can be calculated for any kind of neighborhood, as depth information is
an unambiguous feature of an observed object (compared to brightness information
that is somehow fuzzy w.r.t. its significance to describe features of an object).
The author proposes to use the amplitude information A of the PMD-signal to exclude pixels from a local flow estimate, because only if the range information itself
is not reliable a flow estimate seems unreasonable.

if λ4 > τ or A < κ
0


.
(4.30)
ωc (λ4 , τ ) =
τ − λ4 2


otherwise
τ + λ4
Only if we are not interested in the plane flow, we might use additionally or alternatively the trace of S to reject flow estimation. If the trace is small or equal compared
to the noise level in the data, one might use it, without further analysis, to assume
a plane flow which is approximately zero parallel to the optical axis (i.e. Z), such
that the subspace of a probable flow vectors is the plane spanned by velocity vectors
perpendicular to the optical axis (i.e. (U, V, W = 0)).
The confidence measure is 1 if there is no residual in the estimate and quickly decreases to zero towards λ4 = τ as depicted in figure 4.6. Because least squares
estimation is a maximum likelihood estimator if the model assumptions (Gaussian
noise, constant flow) are met, the residual (or λ4 ) corresponds to the width of the
82
4.1. Optical Flow and Range Flow
0.8
(
)
0.6
(
)
0.4
ωc λ 4 , τ
ωt λ q, τ
0.2
0
τ
λ 4, λ q
3τ
Figure 4.6: Confidence and type measure for range flow
likelihood function and is therefore also a measure for the likelihood of the estimate.
However, because a likelihood function is generally not normalized, the author doubts
that ωc is a consistent measure of the likelihood over the various structures that an
image sequence may exhibit.
The type measure [SG02] allows to measure how well the dimensionality of the
nullspace of S, i.e. the space spanned by the eigenvectors of vanishing eigenvalues, has been determined. This can be done by examining how much the smallest
non-vanishing eigenvalue (λp > τ ) is above the noise-level dependent threshold τ . A
normalized measure for each encountered type then is
λq − τ 2
ωt (λq , τ ) =
,
(4.31)
λq
The type measure ωt depicted in figure 4.6 increases slowly from ωt (τ, τ ) = 0 converging to 1 in the limit of λq → ∞.
4.1.8 Combining Range and Intensity Data
Apart from the range information discussed so far, the PMD sensor also returns
light intensity information that is proportional to the amplitude of the backscattered
modulated illumination. Do not confuse this amplitude intensity with the intensity
information of an ambient illumination, like it is typical for conventional video systems, as this intensity signal is directly related to the distance sensor - reflecting
surface.
83
Chapter 4. Motion Estimation
As for common optical flow we can derive a constraint on the solution from this
kind of intensity information. This intensity constraint is deviating from the classical
BCCE (4.3), in that it depends on the depth coordinate. Haußecker and Fleet [HF01]
showed that the classical BCCE can be augmented in a way, that the brightness can
change according to a parametrized time-dependent function h:
g(p(t), t) = h(g0 , t, l) ,
where g0 denotes the gray value at time t = 0 and l represents the parametrization.
p(t) is the point of interest as defined in equation (4.1), but we substitute v for f
to differentiate between the optical flow v and the Cartesian 3D-flow f . The total
derivative in time on both sides yields the generalized brightness change constraint
equation:
d
d
g(p(t), t) = gx u + gy v + gt = (∇g)T · v + gt = h(g0 , t, l).
dt
dt
Substituting the flow vector f for l and modeling the dependence of brightness on
distance according to a power law we find
a
|r 0 |
,
(4.32)
h(g0 , t, f ) = g0 ·
|r(t, f )|
where r(t, f ) = r 0 + f t is the point of interest in Cartesian camera coordinates
and r = |r| the radial distance measured by a PMD-camera. For e.g. a = 2 equation (4.32) corresponds to the inverse square distance law of a punctual light source.
Differentiation with respect to t and approximation by a first order Taylor series
valid for small t yields:
dh
rT
≈ −a g0 02 f ,
dt
r0
where r 0 = [X Y Z]T |t=0
Taking into account the necessary coordinate transformation from sensor- to cameracoordinates and renaming g0 to A(x, y) (for the measured amplitude at a specific pixel
[x, y]) the extended BCCE is found analogous to the RFCE (4.10) and (4.14) as
∂(A, Y ) a A X ∂(X, A) a A Y ∂(Y, X) a A Z ∂(X, Y, A)
+
,
+
,
+
,
· f˜ = 0 (4.33)
∂(x, y)
r2
∂(x, y)
r2
∂(x, y)
r2
∂(x, y, t)
This equation, in contrast to the classical BCCE, constrains all three velocity components and takes into account the brightness change due to a change in the radial
distance. Analog to equation (4.12) it may be transformed to a formulation where
84
4.1. Optical Flow and Range Flow
the derivative filters are applied directly on the measured range values r(x, y) using
the pinhole camera model (4.11)
0 = At −
rt e · (Ax x + Ay y)
f 2 r + e · (rx x + ry y)
+U·
+V ·
!
Ax f 2 + x2 + Ay x y + y·b·e
r
x·d+
c
!
Ay f 2 + y 2 + Ax x y − x·b·e
r
y·d+
c
(4.34)
+ W · (f · d)
where
b := Ax ry − Ay rx ,
a
d := A √ ,
r e
c :=
√
f2 r
e (rx x + ry y) + √
e
e := x2 + y 2 + f 2 ,
power law exponent a and focal length f .
The exponent a may depend on r itself. While we would expect it to be constant
a = 2 for a Lambertian scatterer, radiometric analysis for the PMD19k showed, that
it depends on depth (see section 5.2.3). Therefore, if the observed depth range is
large, it may be necessary to replace a by a(r). The function can be determined by
a radiometric calibration of the specific camera.
While the upper equation looks rather complicated and computational costly, it contains various terms like for instance e or f 2 + x2 that are constant over time and
therefore need to be calculated only once. And while we need to calculate 9 different
derivatives by convolution (expensive compared to multiplication and addition) for
the Jacobians in equation (4.33), there are only 6 needed for equation (4.34). Furthermore, if we are interested in the motion field only, an explicit calculation of the
Cartesian range coordinates is not necessary anymore.
The outer form and the sought flow f of equation (4.33) is identical to that of
equation (4.14). And thus we can combine both information channels to estimate f
in the manner of equation (4.16):
T
f˜ (SR + β SA )f˜ −→ min.
=⇒
S p = λ p,
with S = SR + β SA , (4.35)
where R and A subscript the r ange and amplitude based extended structure tensors.
β is used to weight the two data channels with respect to possible differences in signal-
85
Chapter 4. Motion Estimation
to-noise ratio or other reasons that affect confidence in the data. This way both data
channels can be joined in a single, combined structure tensor on which the eigenvalue
analysis is applied. Thus we can increase the accuracy of the flow estimate, because
we take more samples for a single estimate into account. And what is even more
important, we increase the probability to estimate locally a full flow, because the
additional structure of the intensity channels may solve the aperture problem.
4.1.9 Equilibration
So far we were a little bit sloppy about the noise in the data. We assumed it to be
i.i.d. in the components of the data vector d. If we assume the di (i = 1 . . . 4) to be
i.i.d. random variables with variance σ 2 , then their covariance matrix is simply
Cov(d) = σ 2 I
and the structure tensor nS subject to noise is approximately given by‡
n
S ≈ S + Cov(d) = S + σ 2 I ,
where S is the ideal structure tensor given there is no noise in the data matrix D.
Under this premise the determined eigenvectors of nS are identical to that of S,
because the addition of the scaled unity matrix only affects the eigenvalues but not
the eigenvectors of the matrix. Therefore a structure tensor corrupted by i.i.d. noise
yields an unbiased flow estimate if the model assumptions are met.
Unfortunately in general the noise in the data vector entries is neither independent
nor identical distributed, but correlated and of different variances, as a glance at
equations (4.10) and (4.33) suggests and as it was discussed in [FPA99]. We will not
handle the case of correlated noise in the data vector elements but rather stick to
the less involved case of different variances and refer to [MM01; Müh04] for details
on this topic.
If the errors in the data entries are uncorrelated and of zero mean but different
variance, then the covariance matrix is given by
Cov(d) = diag( (σi2 ) ) and nS ≈ S + diag( (σi2 ) )
The eigenvector of nS are no longer identical to those of S but biased by the noise
variance (for details see [VHV91]). We can correct for this if we multiply the data
‡
only in the limit of an infinite number of data samples the relation becomes an identity
86
4.1. Optical Flow and Range Flow
matrix D by a right hand equilibration matrix W , which is just the square root of
the inverse covariance matrix i.e.
W W T = Cov(d)−1 = diag( (1/σi2 ) )
The equilibrated data matrix is then given as
e
D = D W = D diag( (1/σi ) )
i.e. the columns of the data matrix are weighted by the inverse of the respective
standard deviation. The covariance matrix of the respective data vector ed is then
the identity matrix. This is easy to see if we model the equilibrated data vector
entries as
e
di =
di + ni
,
σi
where the di are the unperturbed entries and ni a noise term of variance σi . The
expectation values of the structure tensor entries Sij are then
di dj + ni nj + di nj + ni dj
hdi dj i + hni nj i + hdi nj i + hni dj i
e
e
h di · dj i =
=
σi σj
σi σj
Because the noise terms ni are of zero mean and uncorrelated all expectation values
with a noise term ni are zero except for hni ni i = σi2 and therefore the covariance is
just the identity matrix.
Thus we come to an unbiased eigenvector estimate. However, we need to scale the
found parameter vector entries epi of the equilibrated structure tensor, to find the
parameter vector that belongs to the original (unequilibrated) problem:
pi = epi /σi
A flow vector to a full flow is then given by fi = epi σ4 /(ep4 σi ), ∀ i = 1 . . . 3.
Finding the correct σi for the data vector is not trivial. For example the 4 data vector
entries of the extended BCCE (4.34) correspond to 4 nonlinear scalar function di (v)
in the random variables v being A, r and their derivatives. It should be possible to
calculate the variance of the resulting random variables from (see [Jäh04, chap. 7.3.2])
σi2 ≈ (∇di )T Cov(v) ∇di ,
or if we take d as a vector valued function d(v)
Cov(d) ≈ J Cov(v) J T ,
87
Chapter 4. Motion Estimation
where the Jacobian matrix J (of d w.r.t. v) is to be taken at the expectation value of
d. The covariance matrix Cov(v) needs to be determined from the filter-masks of the
employed derivative filters and an estimate about the noise level of the PMD-signals
A and r.
However, to speak of an expectation value of d within a necessarily structured and
therefore not constant spatiotemporal neighborhood is dubious. The underlying
pdf ’s are multimodal. Because (σi ) needs to be defined only up to a scaling factor,
a normalization of the involved functions might help to solve the problem. However,
the author is doubtful about how to handle this topic and a final examination is
outstanding.
Therefore we determined the equilibration factors empirically. We used a test pattern
subject to Gaussian noise of specific variance σ 2 as input and determined the resulting
noise σi in the data elements di of the RFCE. The constant equilibration factors are
then 1/σi , because the input noise is only a scaling factor that can be ignored. This
approach is however dependent on the specific test pattern and does not consider
the local structure of real data. Therefore the presented results in this thesis are
improvable w.r.t. equilibration.
4.2 Motion Artifacts
Motion artifacts are a PMD-specific error that occurs around moving reflectivity or
distance edges. In figure 4.7 we see two cylinders rotating about their symmetry axis.
The cylinders are painted in black and white such that regions of articulate different
reflectivity are side by side. Around these moving reflectivity edges we identify heavy
errors in both amplitude and range image. The artifact regions have a clear border.
It is not blurred like in the case of motion blur of conventional cameras. The faulty
distance measurements span the whole unambiguity range of the sensor.
Motion artifacts are a technical weakness of current PMD-sensor. The four crosscorrelation samples In = I(θn ) (2.9), that are used to calculate the range estimate
(in a least square sense) based on equation (2.12) (for sinusoidal modulation), are
not acquired in parallel (i.e. at the same time) but serially. The technical details
of the several camera models differ, but non to the author known and commercially
available camera model acquires all 4 samples at once. The PMD19k and the O3D
88
4.2. Motion Artifacts
a
b
Figure 4.7: Motion artifacts at an irradiance edge. The rotating cylinders show articulate errors around the step edges in reflectivity: a amplitude image, b
range image.
(see figure 5.4) take 2 samples at one time§ , the SR3000 is documented to take
only 1 sample per time. Therefore, the sensor pixels take cross-correlation samples
of different surface patches if the observed surface is moving. Depending on the
reflectivity texture and the spatial structure of the surface the calculation (2.12)
might yield results that are very different from an averaged measurement (an average
would be the ”best” possible measurement in such a situation).
Starting from equation (2.10c)
A
G0
,
I(θ) = m T
cos (ϕ + θ) +
π
2
we might model the correlation samples In as being proportional too
In ∼ magn (C cos (ϕn + θn ) + 1) ,
(4.36)
where magn is proportional to the irradiance at the pixel and C is the modulation
contrast of the illumination. If we assume that two samples of the cross correlation
In and In+2 are taken at the same time, then the phase ϕn and magnitude magn are
identical for the samples at θn = θ + nπ/2 and θn+2 (the addition in the index n
shall be in modulo arithmetic mod4).
With the upper equation we can easily calculate the results for different kind of edges
(at reflectivity and/or distance edges) if we substitute magn and ϕn by values of our
§
the two samples correspond to the outputs A and B of figure 2.3 that are phase shifted for 180◦ .
Ultimately, the sensor needs to take all in all 8 samples (i.e. 4 snapshots) to compensate for
manufacturing variations in the capacities of gate A and B.
89
Chapter 4. Motion Estimation
choice. The results can be calculated from equation (2.12) or equally by doing a least
squares fit of I(mag, ϕ, C, θ) = mag (C cos (ϕ + θ) + 1) to the sampling points In at
θ = θn (as this is exactly what (2.12) corresponds to).
a
error at a reflectivity edge (1:3) at 3m distance
cross correlation
6
τ fit
I_mag1
I_mag2
samples 0°,180°
samples 90°,270°
least squares fit
4
τ1
magfit
2
0
0
2
4
6
phase
b
error at a distance edge (1m/3m) (inducing an irradiance change 9/1)
cross correlation
2
τ fit
τ mean
I_τ 1,mag1
I_τ 2,mag2
samples 0°,180°
samples 90°,270°
least squares fit
1.5
1
magfit
0.5
0
0
2
phase
4
6
Figure 4.8: Illustration of how motion artifacts occur due to a least squares fit to
sampling points of different populations at a a reflectivity edge and b at a
distance edge (where the irradiance was modeled according to the inverse
square power law and constant reflectivity)
Figure 4.8 shows how the least squares fit produces the motion artifacts. τ 1 in 4.8a
is the phase corresponding to the distance of 3m at the reflectivity edge. τf it (≡ ϕ) is
the phase calculated from the fit. Notice that a positive phase/distance τ 1 induces a
90
4.2. Motion Artifacts
shift of the correlation function to the left, such that the maximum of the correlation
function is not at τ 1 but at 2π-τ 1 . Figure b shows how arbitrary such a least squares
fit can be, if the used samples are of two different populations. The calculated value
τ 1 (near to zero) is far from the mean value at 2m.
Figure 4.9 gives insight to the quite interesting structure of the motion artifact errors
at reflectivity edges: a plots the calculated range against the phase (or distance) and
the ratio of the two irradiance magnitudes at the edge. b shows the resulting error in
meters. c and d are analogous but show the calculated amplitude. Most interesting
is that there are distances where hardly any error occurs.
b
a
error in range
phase at a gray value edge
d
c
amplitude at a gray value edge
relative error in amplitude
Figure 4.9: Insight into the structure of motion artifacts at reflectivity edges
Figure 4.10 shows the even more complex structure of the errors at distance edges.
Again the irradiance was modeled according to the inverse square power law, while
the reflectance of the surface was assumed to be constant. The relative error depicted
in Figure 4.10b is defined as the quotient |ϕf it − ϕmean | / |ϕ1 − ϕ2 |, where ϕ1 , ϕ2 are
the phases (or distances) at the edge and ϕmean their mean value. Also at distance
edges we can find phase regions where the motion artifact error is less prominent.
91
Chapter 4. Motion Estimation
b
a
phase at a distance edge
relative error in range
Figure 4.10: Insight into the structure of motion artifacts at distance edges: a calculated range, b relative error in range
We showed the plots above, because they may be of interest regarding the planning of
experimental setups that incorporate PMD-cameras. If there is a degree of freedom
in the choice of the distance or reflectivity range that one plans to use for the setup,
one might reduce the errors introduced by motion artifacts by choosing the distance
and/or reflectivity according to the regions in the upper (or adequate) plots, where
the errors are less dominant.
If we have a sensor that is rather ideal in its correlation measurements, we might
detect the presence of motion artifacts by exploiting the symmetry of the correlation
signal: For an ideal correlation signal (2.9) the differences in the correlation samples
D1 = I0 − I1 and D2 = I3 − I2 should be identically. Therefore we might just test
on D = D1 − D2 < τ , where τ is a noise dependent threshold. If D is above the
threshold it indicates a motion artifact, and we might process the respective pixel in
an appropriate way.
So far, we have not found a way to correct for the motion artifact errors properly. But
actually we think that the best way to solve this problem is technologically within
the sensor, just by taking all 4 correlation samples at once. While we know that this
introduces other problems (like varying capacities of the necessary 4 readout gates)
and might be in conflict with constrains to the manufacturing process, we think that
it is essential for a robust and uncomplicated processing of sequences which image
processes of (highly) dynamic content, e.g. structured objects that move rapidly.
92
Part II
Experiments and Applications
93
Chapter 5
Testbench Measurements
5.1 Experimental Setup
Most of the sequences we analyzed in the context of this thesis, were acquired with
an experimental setup that we will describe in the following and to which we will
refer as testbench. The setup consists of two motor-driven linear positioner tables
mounted on top of standard industry tables. The industry tables are mobile and
can be locked with a spoke. Both positioner tables hold object carriers. One carrier
holds a TOF-camera the other a target.
Between camera and target a zig-zag shader of black photo pasteboard has been
installed, to avoid spurious reflections of the modulated IR light at the glossy positioner table rails; light emitted from the modulated illumination but reflected by
other surfaces before it is reflected back by the target to the sensor pixels, has to
pass a longer distance then light of direct illumination. The phase information of
the irradiated light on the sensor pixels would no longer be consistent and the range
measurement becomes corrupted.
We used two arrangements of the tables to acquire the image sequences: a linear
arrangement and a T-shaped one. The linear arrangement depicted in figure 5.1
was chiefly used for the calibration measurements as it allows to capture a depth
range of 6m, as each positioner has a range of 3m. The depth range acquired was
approximately from 30cm to 630cm . Besides calibration measurements also a pure
translatory motion in depth can be acquired with this setup.
The second, T-shaped arrangement depicted in figure 5.2 was used for motion measurements. It allows to acquire sequences of motion in the plane within an area equal
95
Chapter 5. Testbench Measurements
Spurious reflections
Target
Camera
Zig-Zag shaders
Linear positioner
tables
Mobile industry tables with spoke
Figure 5.1: Linear arrangement of the positioner tables for calibration measurements
Target
Camera
Zig-Zag shaders
Figure 5.2: T-shaped arrangement of the positioner tables for motion measurements
to that of an isosceles triangle with 3m base and variable height (depending on the
basic distance between the tables perpendicular to each other).
Both camera and positioner tables are controlled by a standard PC. This allows a
full automation of the experiments (besides the changing of targets and cameras on
the carriers). The motor driven tables can be moved stepwise or continuous. The
position accuracy is at least 1mm. We used both stepwise and continuous mode
to acquire motion sequences. The stepwise mode allows to acquire several images
of one position and therefore to do a statistical analysis of the temporal noise in
the camera-signal and to separate it from the fixed pattern noise. Images may be
denoised by averaging over a number of snapshots. Moreover, motion sequences
taken in step mode do not suffer from PMD-specific motion-artifacts and motion
blur. Sequences taken in continuous mode are therefore more realistic because they
show these artifacts. By noting down the basic distance between target and camera,
which we defined as the shortest distance between a point on the camera casing and
96
5.1. Experimental Setup
a point on the target, we have ground truth information about the relative position
of the target to the camera.
We used three different targets shown in figure 5.3. Two of them, whiteboard and
checkerboard target, have a plane surface and were used mostly for calibration. The
whiteboard target consists of 8 photo-cards of a specified reflectivity of 84%. The
checkerboard target is made of patches of photo-cards of 4 different reflectivities:
84%, 50%, 25% and 12.5%. For a more detailed description of the hardware used in
the context of calibration we refer to [Rap07].
a
b
c
Figure 5.3: Targets used for the experiments: a whiteboard target of high-reflectivity
b checkerboard target and c pyramid target
The third target is a pyramidal one. It consists of five wooden quadratic plates, put
centered above each other. The bottom plate has a width of 20cm, the top one has
4cm, in between the width is decreasing with 4cm per plate. The depth of each plate
is 2cm. The pyramid target was used for motion estimation.
We acquired sequences with three different TOF-camera models. An overview on
their technical specifications is given in figure 5.4. All three models are based on
the principles of optoelectronic modulation-based time-of-flight measurement, that
we presented in section 2.2, and to which we refer as PMD-technique. While the
acronym PMD (photonic mixer device) relates to a specific realization of this technique protected by patent (held by PMDTechnologies GmbH), we still use it for all
similar realization as there is no other common acronym for this technique. The
R
PMD[vision]
19k and O3D both use sensors from PMDTec, while the SR3000 is
a development of CSEM/MESA. Little is known about the details of the SR3000
sensor, the main difference seems to be that the SR3000 a single tap sensor, i.e. it
has only one storage site (i.e. capacity) per pixel, while the sensors of PMDTec have
2 storage sites per pixel. For different reasons in the context of this thesis mainly the
PMD19k data was used. While the PMD19k is the oldest model it has a reasonable
97
Chapter 5. Testbench Measurements
PMD19k
PMD19k
(PMDTec)
SR3000
SR3000
(MESA)
O3D
O3D
(IFM)
Resolution [Pixels]
160 x 120
176 x 144
64 x 50
Pixel Dimensions [µm]
40 x 40
40 x 40
100 x 100
Focal Length [mm]
12.0
8.0
8.6
1 LED Array
1x LED Array,
Laser
850
850
Light-Source
λ [nm]
ν [MHz]
2 LED Arrays
870
20
20, variabel
20, variabel
20
Frame rate [FPS]
1-15
10-20
1-50
Connection
FireWire, Ethernet
USB 2
Ethernet
Dimensions
LxWxH[mm]
220x210x55
60x50x65
55x45x85
Figure 5.4: Overview on the technical specifications of the used TOF-camera models
resolution compared to the O3D. Furthermore, we needed to get access to the raw
(or correlation) data of the cameras, and not only to the depth, amplitude and offset
data, that have been calculated by the driver software of the camera. While there
is a good documentation on how to access this data for the PMD19k and the O3D,
nothing similar exists for the SR3000. While we could get access to the raw-data
for the SR3000, we found that a depth map calculated from this data shows major
systematic errors, which could not be corrected without further knowledge about the
internal details of the camera. Because MESA would not give us any information
about how to process the data, we worked preferentially with the PMD19k.
5.1.1 Power Budget
Before we discuss the testbench measurements of the PMD-signal, we need to introduce the relevant radiometric terms and state the power budget for the PMD-camera
system. We will give a concise description of results and refer for details on this topic
to [Sch03, chap. 2.1.1] and [Lan00, chap. 4.1].
We are interested in the incident radiant energy Q(r) on a PMD-pixel within the
exposure time T , in dependence of the observed object obj or rather object surface
patch at a distance r of the camera. The active illumination i (the transmitting optic,
e.g. a LED-array) is modeled as a light source emitting a radiant flux Φi in a solid
angle Ωi . The light is reflected from an object obj with a surface that is assumed
98
5.1. Experimental Setup
to be Lambertian. Specular reflection is neglected. The reflected light is seen by
a sensor pixel p of size Ap . The PMD-camera’s receiving optic that projects the
reflected light of the surface patch on the pixel is characterized by the corresponding
solid angle Ωp of the pixel.
Given the above model assumptions, the following equations hold true and relate
illumination i, surface patch obj and sensor pixel p to each other:
Φi
cos(αi ) exp(−ka r)
Ωi r 2
Eobj (r) = Ei (r) + Eb
Ei (r) =
Φ(r) = Eobj (r) Avir (r) η(r) % = Eobj (r)
(5.1)
Ωp r 2
η(r) %
cos(αr )
Ωp % η(r)
cos(αr )
=
(Ei (r) + Eb )
2
πr
π
Q(r) = Ep (r) T Ap
Ep (r) = Φ(r)
(5.2)
(5.3)
where the
Ei (r)
Φi
Ωi
αi
used physical quantities are
irradiance of active illumination on the scene/object at distance r [W m−2 ]
radiant flux of illumination / sending optic [W ]
solid angle of emitted radiant flux, characterizing the sending optic[sr]
illumination angle: angle of the object’s surface normal against the direction
of illumination [rad]
ka
absorption coefficient (Lambert–Beer law) [1/m]
Eobj (r) overall irradiance on the object, including background irradiance Eb [W m−2 ]
Avir (r) area of a virtual sensor pixel (i.e. its projection/image on the object surface)
Φ(r)
reflected radiant flux of a virtual pixel [W ]
αr
angle of reflection w.r.t. the sensor pixel [rad]
%
diffuse reflectance of the object surface,
i.e. 1 − absorption coefficient − specular reflectance
η(r)
fraction of exposed pixel-area (< 1, if a surface patch is smaller than a virtual
pixel; corresponds to small/distant objects or object edges)
Ωp
solid angle of pixel, characterizing the receiving optic [sr]
Ep
irradiance seen by a sensor pixel [W m−2 ]
We learn from equation (5.2) that the deposit radiant energy Q(r) does not depend
on the distance from object surface to camera but only on the distance from object
to light source by Ei (r) (5.1). Because camera and illumination distance are approximated to be equal, Q still depends on the camera distance. Naı̈vely one might
99
Chapter 5. Testbench Measurements
assume that Q(r) decreases not with r2 but (2r)2 . This is not the case because the
solid angle Ωp seen by a pixel is independent of r. Only if the observed surface patch
is smaller than the virtual pixel size Avir (r), which is determined by Ωp , the irradiance decreases additionally with η(r). Actually, this is quite obvious if we think of a
scene with ambient (homogeneous) illumination (e.g. a landscape on a cloudy day):
we will not perceive an object moving away as becoming darker until it is very small
and begins to vanish (except for dust or foggy air, corresponding to a high ka ).
Furthermore we note that Q is independent of the angle of reflection αr and only
depends on the illumination angle αi , by cos(αi ). If the illuminations angle is approximately zero (i.e. the object surface is rather perpendicular to the straight line
illumination–object), then small changes in the illumination angle (due to e.g. a
translatory motion) will change Q only marginal because then cos(αi ) ≈ 1 − α2 ≈ 1.
So the radiant power or energy of the active modulated illumination, that can be
used to determine the distance r, decreases with 1/r2 , while the irradiance Eb of the
background illumination (e.g. from sunlight) stays constant. This implies rather high
signal dynamics. Taking into account the dependence of the noise in the range-signal
on the background illumination, as discussed in section 2.2.2.2, we find here the most
serious problems regarding the applicability of PMD-cameras.
5.2 Depth and Amplitude Analysis
The TLS motion estimation technique presented in chapter 4 depends on range as well
as amplitude data of the PMD-camera. Because the flow estimate is essentially based
on derivatives of the input data, not so much the absolute value of range information
is of major importance but rather the linearity of the range signal in depth (whereas
there is also a dependence on the absolute range value in equations (4.14) and (4.34)).
Therefore we need to do a calibration of the camera to come to a range measurement
as linear in depth as possible. Rapp [Rap07] discussed some of the most important
errors of PMD-cameras. We will present results only that are new compared to those
in Rapp [Rap07].
100
5.2. Depth and Amplitude Analysis
5.2.1 Fixed Pattern Noise
First thing we did was to correct for the (temporal) constant depth (or phase) error in
each pixel, known as fixed pattern noise. We described the approach in section 2.2.2.1
and used equation (2.26) to calculate a depth image E of the fixed pattern noise of
our PMD19k model. Figure 5.5a and b shows the noise image and the corresponding
histogram of the errors in depth. The histogram is centered at zero and has a
b
a
256
0
-0.015
0.015
c
d
e
f
g
h
i
j
Figure 5.5: Correction of fixed pattern noise: a noise image, b histogram of noise,
c original and d corrected image of pyramid target at a distance of 86cm,
e & f the same at 134cm. g-j same distances but for input images with a
higher level of temporal noise.
101
Chapter 5. Testbench Measurements
standard deviation of approx. 0.425cm. Thus approx. 95% of the pixel errors lie
within a range of [−0.85, 0.85] cm, assuming normal distribution of the errors.
The images 5.5c-f show how the subtraction of E from two depth maps taken at
different distances (c:86cm and e:134cm) of the pyramid target, improves the depth
accuracy (image d and f). The input images c and e were temporal denoised by
averaging over a set of 100 images taken with an exposure time of 5ms. The images
5.5g-j show the same as c-f but the input images g and i are single shots (not averaged)
taken with 5ms exposure time. The images shown are cropped and scaled (in size
and color range) versions of the original images to reveal the changes in detail.
While the fixed pattern noise correction improves the data in all cases its clear to see
that for bad illumination conditions (or too short exposure times) as for g and even
more for i, the temporal noise dominates and the improvements due to correction for
fixed pattern noise are marginal.
5.2.2 Range Calibration
The sequences for range calibration were taken is step mode. Step mode means that
we acquired at a specific distance [target–camera] a subsequence of a specific number
of frames (here 100) of the steady scene and then moved the target (and/or the camera) for a specified distance, to take another subsequence. From each subsequence a
mean image was calculated by taking the arithmetic mean in each pixel. Furthermore
the variance in each pixel was calculated giving a variance image.
For range calibration we took several sequences at different exposure times in a depth
range from 0.26m–6.22m, at 150 positions with a step-size of 40mm. The different
exposure times were necessary to have for all distances images that are neither in
underexposure nor overexposure, and to keep the noise level for the calibration rather
low and constant. We used the mean images as input to the calibration.
Figure 5.6 shows the error in range measurement calculated from the ground truth
data and the whiteboard sequences. We tracked eight positions on the target over
the measured depth range using a pinhole camera model (4.11), with a focal length
of 12mm, as specified for the PMD19k. The positions are marked as magenta points
in figure 5.7. We interpolated the depth values at the tracking positions (these
are not necessarily on the pixel grid) using cubic interpolation. The ground truth
distance, denoted as range in the following plots, is the radial distance calculated
102
5.2. Depth and Amplitude Analysis
from the pinhole model using the known position of the targets. The difference
between tracked range values and ground truth distance is the range error shown in
figure 5.6 as crosses. The range error offset of approx. 0.62m is that large, because the
offset-calibration of the camera was erroneous (possibly due to a firmware upgrade).
Range error and its Fourier approximation
0.66
pos1
pos2
pos3
pos4
pos5
pos6
pos7
pos8
error in range [m]
0.65
0.64
0.63
0.62
0.61
0.6
0.59
1
2
3
range [m]
4
5
6
Figure 5.6: Range error (crosses x) and approximation (line —) of range error by 3
modes of a Fourier series for 8 positions on the whiteboard target
2.44
20
2.42
40
2.4
60
2.38
80
2.36
2.34
100
2.32
120
20
40
60
80
100
120
140
160
Figure 5.7: Range image with the tracked positions (magenta dots) on the whiteboard
Based on the range errors we fitted a Fourier series equation (2.21) with only the
first, second and fourth harmonic. The first harmonic has a wavelength of 7.5m,
the unambiguity range of the camera. The approximation, the continuous lines in
figure 5.6, fits the error quite well. We used the approximative inverse function of
the range error (2.23) to correct for the periodic error in the depth measurements.
Figure 5.8a shows the remaining error, if the calibration is applied on the same
data set from which the calibration coefficients were calculated. We notice that the
approximation errors, indicated by the circles (o) all near to zero, are marginal; the
approximation error is the difference between the error in the Fourier fit (indicated
103
Chapter 5. Testbench Measurements
by crosses (x)) and the error in the corrected (or calibrated ) range measurement.
Therefore the remaining error of the calibration (line —) is approximately equal to
the difference between the Fourier fit and the original errors.
a
Range error after calibration
8
remaining error
fit error
inversion error
6
error [mm]
4
2
0
−2
−4
−6
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
range [m]
b
Range error after calibration
12
10
8
error [mm]
6
4
2
0
−2
remaining error
fit error
inversion error
−4
−6
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
range [m]
Figure 5.8: Remaining errors for 4 different tracked target positions after a applying
the calibration coefficients on the same position they were calculated from
and b using a single set of coefficients for the positions
Figure 5.8b shows the errors of the calibration, where we used a single set of Fourier
coefficients for the different tracked target positions. The errors were corrected for
the position 1,3,5,7, two in the center of the target and two more peripherally, while
the calibration coefficient set was calculated from position 2. The errors are clearly
104
5.2. Depth and Amplitude Analysis
increased but still small against the original errors: the standard deviation σ for the
set of 4 position is 3.3mm at a mean error of 4.7mm, while σ of the uncalibrated
errors is 15.1mm. The errors introduced by the approximative inversion are rather
small again.
Actually, a calibration based on data from tracked points on the target, is not optimal: we intermix the errors of various pixels and can not separate the influence from
a change in depth from that of a change in irradiance completely (even by compiling
sequences of different exposure time). A more advanced experimental setup would
allow to change the depth information at all pixels over the whole unambiguity range.
Thus the need for tracking individual target positions would cease to exist and the
calibration could be done per pixel as well as the correction of the range errors. For
each pixel an individual set of calibration coefficients could be calculated. As we
need only 7 real numbers (1 offset and 3 complex Fourier coefficients) per pixel, the
memory requirements would be rather low even for an embedded system.
5.2.3 Interdependence of Amplitude and Range
For amplitude analysis and its interdependence to the range measurement we used
the checkerboard target. Similar to above we tracked 8 positions on the target. But
this time we used a single sequence at an exposure time of 12ms. Figure 5.9 shows a
range and an amplitude image of the checkerboard target and the tracked positions
on it (the magenta dots).
b
a
350
2.1
2.09
20
2.08
2.07
40
Central
Positions
20
250
40
84% 50%
2.06
60
Overexposure
2.05
300
60
200
25% 12½%
150
2.04
80
2.03
2.02
100
80
Border
Positions
100
100
50
2.01
120
20
40
60
80
100
120
140
160
2
120
20
40
60
80
100
120
140
160
0
Figure 5.9: Tracked positions (magenta dots) on the checkerboard:
a range image, b amplitude image
105
Chapter 5. Testbench Measurements
The most notable difference between the range image 5.9a of the checkerboard and
the whiteboard target figure 5.7 are the range variations of several cm within the
smooth plane of the checkerboard target. Another interesting, while not intended
feature, is the bright white spot between 84% and 25% patch. Its corresponds to a
(partial) overexposure due to a defect in the checkerboard target: there is a small
gap between the reflectivity patches and the underlying metal plate reflects the light
in contrast to the patches not (approximately) Lambertian but specular. While
the irradiance is not that strong that the PMD-pixel capacities are all saturated
(otherwise the amplitude was zero), the partial saturation leads to a lower amplitude
and a higher range measurement compared to the 84% patch above. Right to the
bright spot there is a similar defect in the target (between 50% and 12.5% patch),
but the gap is smaller and the specular reflection does not cause overexposure. Here
both range and amplitude measurement are increased relative to the neighborhood.
Looking at the range-error plot figure 5.10 we easily identify the correspondence
between reflectivity differences and differences in the range measurements, while
the ground truth distance has not changed. A lower reflectivity coincides with a
smaller range measurement. Between the 84% and 12.5% reflectivity patch there is a
difference of about 5cm. The reflectivity dependent range difference (RDRD) is quite
constant over the acquired depth range. However, near the camera and far away from
it, there are obvious variations in the RDRD, which we want to investigate a little
further.
Range measurement errors for various reflectivities and positions
0.4
0.35
error [m]
0.3
84%
50%
25%
12.5%
84%brdr
0.25
0.2
50%brdr
0.15
25%brdr
12.5%brdr
0.1
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
range [m]
Figure 5.10: Range measurement errors for various reflectivities and positions
106
5.2. Depth and Amplitude Analysis
Depth over ground truth depth
84%
50%
25%
12.5%
84%brdr
measured range [m]
6
5
4
50%brdr
25%brdr
3
12.5%brdr
2
1
2
3
4
range [m]
5
6
Figure 5.11: Measured range over ground truth range for the tracked positions
For large distances the statistical errors in the range measurement for the low reflectivity patches increase due to underexposure particularly strong, with an evident bias
for too small distance values. While we are not sure about the reason for the bias,
which is even more clear to see in the range plot 5.11, we know that the increase in
statistical error is due to the noise dependence of the PMD-signal, that we described
by equations (2.29) and (2.30) and illustrated in figure 2.6.
The variations in the RDRD near to the camera can be exemplified best for the 50%
center and border position (light and dark green lines in figure 5.10). We find the
positions with increasing ground truth range to converge to a common measurement
range value (at approx. 2.4m) to keep a constant RDRD to the 84% positions, which
behave similar.
Thus we not only have RDRD errors but also an explicit dependence of the range
on the position (central or border). The reason for this are near field errors of
the camera, because its illumination system is neither punctual nor of rotational
symmetry. Generally speaking the assumptions we made for finding equation (5.3)
are only valid approximatively in the far field of an extended light source.
The PMD19k has two LED-arrays for illumination, that are aligned along the horizontal axis (see the small picture in figure 5.4). Therefore in the near field of the
camera the illumination can not be approximated as a punctual light source: while a
border position pixel is irradiated by a mixture of light that traveled (essentially) two
different distances, a central pixel sees light of only one distance. The demodulation
is ideally a linear process, therefore the resulting distance is the arithmetic mean of
107
Chapter 5. Testbench Measurements
the, simplified spoken, ”phase information” of the generated photo electrons. Because the irradiance Ei (5.1) depends on the distance from the light source, there will
be always more electrons in the mix corresponding to photons that took a shorter
path. Therefore the border position pixels are biased for too short distances in the
near field of the camera. This is exactly what we see in figure 5.10, most pronounced
for the 50% border position that has a too small range (compared to the central
positions) up to at least 2.2m.
Another reason for the RDRD variations is overexposure. While for example the
dark green line (50% center position) is at 1m not any longer in total overexposure,
still some of the correlation samples may belong to (nearly) saturated capacities that
corrupt the linearity of the demodulation, as we have explained in section 2.2.2.1 on
page 27. Therefore the dark green line at approx. 1m or the dark blue line at 1.4m
show too high range measurements.
This partial overexposure can be identified even better in the amplitude signal A(r)
shown in figure 5.12. While total overexposure corresponds to zero amplitude, partial
overexposure shows varying amplitude. We marked the region of partial overexposure
by reddish ellipses in figure 5.12. Looking at the log-log plot 5.12b, we see that
the amplitude for the high reflectivity patches, reaches a maximum after which it
decreases first slowly (small negative slope) to decrease faster for higher distances.
a
350
84%
50%
25%
12.5%
84%
200
brdr
50%brdr
150
25%
brdr
12.5%brdr
100
a= 2
8
log2(amplitude)
250
6
84%
50%
25%
12.5%
84%brdr
4
2
50%brdr
25%brdr
0
50
0
Amplitude over ground truth depth (log-log)
10
300
amplitude
b
Amplitude over ground truth depth
-1
1
2
3
4
range [m]
5
6
-2
-1
12.5%brdr
0
0
1
log2(range) [m]
2
3
Figure 5.12: Amplitude signal A of the PMD19k: a amplitude over radial distance,
b log2-log2 plot of a. The reddish ellipses indicate the range of partial
overexposure.
The RDRD (including near field errors and partial overexposure) are of the same
magnitude as the periodic systematic distance dependent errors, that we corrected
108
5.2. Depth and Amplitude Analysis
in the previous section. Being constant within only a limited distance range not too
near to or too far from the camera, it is an open question if and how they may be
corrected.
The bluish ellipse marks a region where we can observe an unusual increase of the
amplitude. However, this is only due to a weakness in the experimental setup and
an imperfect tracking of the patches: the tracked positions run into false patches;
and the manufacturing defects in the target (shown in figure 5.9) induce due to the
specular reflection in the surrounding pixels an increased irradiance. Hence we can
ignore the measurements in the ellipse as corrupted.
Power law exponent corresponding to amplitude signal
2
84%
50%
25%
12.5%
84%brdr
D[log(amp(log(r)))]
1
0
−1
50%brdr
−2
−3
−4
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
range r
Figure 5.13: The scaling exponent for the amplitude modeled locally by a power law
In figure 5.13 we plotted the scaling exponent a of the power law model (4.32) that
we would like to use for motion estimation. We calculated the exponent as the first
derivative of the log-log data, by resampling the (binomially smoothed) logarithmic
amplitude-range data (of the log-log plot 5.12b) using spline-interpolation at evenly
spaced sampling position in the logarithmic domain. The derivative may then be
calculated as forward difference of the resampled amplitude measurements multiplied
by the sampling frequency. This is necessary to compensate for the increased noise
in the measurements at farther distance and to adapt for the scaling behavior w.r.t.
the power law.
Unfortunately the exponent is not constant, not even beyond the overexposure range
in the far field of the camera. Neither a is approx. 2 as Q(r) (5.3) would suggest.
The pale green diagonals in figure 5.12b indicate the resulting scaling exponent, if
we assume the measured amplitude A(r) to be proportional to Q(r). We suppose
that the deviation is partially due to the demodulation contrast (see section 2.2.1.3)
109
Chapter 5. Testbench Measurements
which is not independent of the deposit radiant energy Q(r), as discussed in [Lan00,
chap. 5.2.4]. Therefore A(r) is not proportional to Q(r) (see equation (2.18)). Furthermore, if one excludes the range of (partial and total) overexposure one might
identify a kind of periodic modulation of the scaling exponent, similar to the periodic range error; but this is very speculative. A final evaluation seems within the
limitations of the current experimental setup not possible.
For similar reflectivities and not too bad under- or overexposure the scaling exponents are similar too (compare the blue (84% central), green (50% central) and
magenta (84% border) line within a range from 2.7–5.2m in figure 5.13). However,
the uncertainty interval is rather high (standard deviation is approx. 0.2). Therefore
its questionable if a distance dependent scaling exponent a(r) (calculated from the
data for a specific reflectivity or averaged over a reflectivity range) can improve the
results of motion estimation essentially. So far, we used for motion estimation only
constant scaling exponents.
110
Chapter 6
Applications
6.1 Still Image Processing
We applied the new two state smoothing, that we described in section 3.3, on various real world scenes and test patterns. A final evaluation is however outstanding.
Anyhow we want to present some intermediate results because they look promising.
In figure 6.1 you can see a (partial) GUI-snapshot of the image processing software
Heurisko we used to implement the two state smoothing. It shows different data sets
of a scene with a medium level of noise acquired by the PMD19k at 20ms exposure
time. The upper left image is the range map of the scene which is rich in fine
structures. There are errors in the distance data that are much higher in magnitude
than the regular noise, i.e. outliers (some of them are marked by the red boxes with
round edges). The observed distance range is from about 2.0–5.5m.
The image on the right shows an average over 10 frames, that we used as a kind
of ”ground truth”. However, while noise is strongly reduced the outliers are still
present. They occur typically either due to under- or overexposure. For example the
wooden pillar in the very front of the scene, reflects the light of the PMD partially
specular, what leads to overexposure and therefore wrong range estimates. In the
lower right of figure 6.1 you find the weight image we used for two state channel
smoothing. It is the square root of the amplitude image. We found the square root
(or alternatively the logarithmized amplitude) to yield better results than weighting
directly with the amplitude.
The lower left of figure 6.1 shows the result of channel smoothing using 50 channels
and a binomial 3 × 3 filter mask. Actually, the results look similar using 25 channels;
111
Chapter 6. Applications
Figure 6.1: Two-State-Channel smoothing applied: original range image (dImg), arithmetic average over 10 frames (dMean), two state smoothed range image
(dDec) and the used weighting image (dScAmp).
however the very details of the image are partially blurred. We notice that a large
part of the outliers has been removed, the image is denoised and the fine details of
the image still exist. The algorithm could however not (completely) remove the big
faulty area on the wooden pillar. It is just to big to be treated as an outlier using a
filter mask size of 3 × 3.
Simple binomial smoothing depicted in figure 6.2a is, besides the blurring of the object features, also quite problematic, because it also blurs the outliers and therefore
introduces new errors in their neighborhood. Conventional B-spline channel smoothing, shown in figure 6.2b, removes less outliers because it is missing the additional
confidence information from the amplitude data.
112
6.1. Still Image Processing
a
b
Figure 6.2: a Simple binomial smoothing introduces additional errors around outliers
and blurs image features/details. b Conventional B-spline channel smoothing removes less outliers.
In figure 6.3 we depicted a detail of the scene (marked by the central red box in
figure 6.1) denoised with different methods. Particularly notice the thin line coming
from the top (a cable): Channel smoothing conserves all the details while binomial
smoothing destroys them. The averaged range image b and the channel smoothing
result d are very similar, but in d only 3 outlier pixels are left, while b is rather
identical to a w.r.t. outliers.
a
b
c
d
Figure 6.3: Details of the scene: a original with noise and outliers, b temporal average
over 10 frames, binomial smoothing (c) and channel smoothing with 50
channels (d) applied to a
Using the mean image as ground truth and the amplitude data as a confidence
measure we calculate a quality measure Q:
max((r − m)2 , 2.0e-6)
qch = clip log
, −3, 3 ,
(sch − m)2
Qch = hqch i ,
(6.1)
113
Chapter 6. Applications
where r, m and s are the pixel values of the original range image R, the mean image
M and the channel smoothed image S. hqi is the arithmetic mean over the all pixels
that have an amplitude of more than 0.3. And clip(x, a, b) limits the summands
to values in the range [a, b]. sch depends on the number of channels ch used for
smoothing and therefore Q depends on ch too.
Figure 6.4 shows how the quality measure Q grows with increasing channel number to
find a constant level at approx. 60 channels. For larger filter mask sizes the quality
measure looks similar. However, if one uses the same scene but bearing a higher
noise level (because it was acquired with a shorter exposure time), the quality finds
a maximum somewhere around 55 channels and then slowly decreases. The quality
for conventional channel smoothing is always below two state channel smoothing.
The quality measure for binomial smoothing is equal to Q1 and less than 0.02.
0.3
0.25
quality
0.2
0.15
0.1
normal channel smoothing
two state channel smoothing
binomial smoothing
0.05
0
0
10
20
30
40
50
60
channel number
70
80
90
100
Figure 6.4: Quality measure used to find optimal number of channels.
However, Q does only partially capture the quality of the denoising, because the
”ground truth” we have is not a real ground truth. Moreover, Q is somewhat heuristic
and the simple average (6.1) can not really measure how good the details of the image
have been conserved. Therefore the above analysis is preliminary.
114
6.2. Synthetic Test Sequences
6.2 Synthetic Test Sequences
In the following we will present results on synthetic image sequences, which correspond to our model assumption of a single, translatory motion. The simulated
optical imaging is that of a pinhole camera. The range measurement is a perfect
radial distance (except for noise we add to the data) to the simulated object surface.
The amplitude information follows exactly a power law. We use the sequences for
a proof of concept of our method. We illuminate its advantages and show what the
requirements for its successful application are. We discuss possible extensions and
limits.
Tabular Result Scheme We discuss the method at examples that try to catch the
typical problematic situation in context of 3D motion estimation. All examples are
given in a fixed tabular scheme identical to that of figure 6.5. The content is as
follows:
a, b two amplitude image frames of the sequence with frame index 3 and 5, to give
an idea of the motion. a contains a smaller subframe showing the range-image
(that reveals not that much information).
c,d,e show the errors in the velocity components U,V,W of the estimated motion
field. U,V,W are the velocity components in the direction of the Cartesian
coordinate axes X,Y,Z. The error is given relative to the single ground truth
velocity components. The value range covered is shown in a small colormap on
the left of the images.
f, g are the confidence and type measure of the corresponding pixels. The value
range is from zero to one.
Parameters the table describes the noise levels of the synthetic data and the parameters of the motion estimation algorithm:
• nlr and nlg are the noise level of the range and amplitude data. nlr is the
standard deviation σr of the added normal noise in cm. nlg is σa relative
to the contrast of the amplitude-signal.
115
Chapter 6. Applications
• ps is the pre-smoothing level. The pre-smoothing for range data is a
normalized averaging with a binomial applicability of (2ps+1)×(2ps+1)
pixels. For the amplitude image standard binomial smoothing with the
same mask size was applied. No smoothing in time was done (this typically
decreases the quality of the estimate).
• ws is the size of the binomial integration window (ws×ws) of the structure
tensor.
• τ is the threshold of the confidence measure (4.30).
• τ2 is the minimum value that the second smallest eigenvalue must have
such that the estimate is assumed to be a full flow. It corresponds to a
threshold on the magnitude of the type measure (4.31).
• β is the weighting factor for the amplitude structure tensor equation (4.35),
i.e. the squared, global weighting factor for the rows of the data matrix
of the extended BCCE (4.33).
• wσ and wµ are the weights of the gradient based weighting (4.22) (only
given if applied)
Error analysis The table shows the density d of the estimated motion field and
statistics (mean, standard deviation σ, minimum and maximum) to the errors
of the estimate. The density d is defined as the number of pixels for which
a full flow could be estimated and which have a confidence of at least 0.5
divided by the number of pixels for which a ground truth flow exists, i.e. non
moving regions are excluded from the statistic. With the estimated flow vector
h
iT
f̂ = Û , V̂ , Ŵ
and ground truth flow f = [U, V, W ]T the given error types
are
• relative magnitude error: f̂ − f / |f |
f̂ h ·f h
, where f̂ h , f h are the homogeneous flow vec|f̂ h ||f h |
tors, i.e. the vectors extended for the temporal dimension, f h = [U, V, W, 1]T
(because the ”change” in time is by definition always 1)
• angular error: arccos
• directional error: arccos
f̂ ·f
|f̂ ||f h |
• absolute magnitude error: f̂ − f 116
6.2. Synthetic Test Sequences
• bias of estimate: (e · f )/ |f |, where e = f̂ − f . Thus this is the projection
of the error vector on the ground truth flow vector. It indicates a bias
(positive or negative) of the error in direction of the true flow.
Description to the right of the error analysis table there is a short description and
discussion of the most relevant features of the example.
Algorithms and Performance Issues All results presented in the following sections use derivative filters optimized for orientation estimation along edges (i.e. optimized Sobel filters, see also section 3.1.4 on page 41 and [Sch00]). This is reasonable
because motion estimation is basically spatiotemporal orientation estimation, as we
have shown before. For the spatial derivatives we use 5 × 5 filters (i.e. no smoothing
in time is applied), while we use for temporal derivatives a 3 × 3 × 3 filter, which
smoothes within the spatial domain.
The results were calculated using local TLS-motion-estimation as described in section 4.1.6. The local flow estimates were not regularized (see section 4.1.6.2). While
we also used the subspace regularization scheme developed by Spies and Garbe
[SG02], we found the improvements in quality compared to the increase in computational costs disappointing. The regularization of a flow field, with a full-flow-density
of about 10%, takes about 4–5 times longer than the calculation of the basic flow
field. Because regularization is implemented by iterative algorithms and may converge slowly (or not at all) the needed time also depends on the maximum number
of iterations allowed (we used 500). Moreover, we found the regularization results to
be very sensitive to parametrization and the structure of the processed sequences.
This is also the reason, why we do not discuss the results for the estimated plane
and line flow. These are only of interest if used within a regularization schema. We
achieved reasonable densities in many cases without regularization by choosing the
thresholds τ and τ2 appropriately.
We achieved about 20 flow-field-frames/sec using a single threaded implementation
on a 2.4GHz Intel Core 2 processor. The frames had a size of 160 × 120. The implementation is not optimized for avoiding unnecessary operations but rather to be
as flexible as possible (for large parts of the implementation the high-level imageprocessing language and library Heurisko was used). This is why the author expects
a possible increase in frame rate of about a factor 3, only by avoiding unnecessary
operations that are trivial to detect. A elaborate analysis of the mathematical structure of the calculations, might reveal further optimization possibilities (the author
117
Chapter 6. Applications
thinks that a multigrid (and/or multiscale) implementation could boost computational performance for about one magnitude at least). Most of the used algorithms
are standard implementations (like Jacobi-rotations for eigenvalue analysis). Using
more efficient, but equivalent algorithms and a multithreadable/parallelized implementation (for multi-core machines) should increase the frame rate significantly.
6.2.1 Motion of a plane
We start on the most basic kind of motion, a (linear) translatory motion of a plane.
It is the basis of our motion model, because we approximate the observed surface in
motion as conjoined small planar surface patches. Therefore the motion field which
our algorithm estimates should be optimal compared to scenes where the surface is
curved or has discontinuities; the results are a kind of upper limit in motion field
quality for the proposed method.
118
6.2. Synthetic Test Sequences
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.001
β = 0.3
ps = 0
ws = 5
U = 3.00
V = -6.00
W = 9.00
0
d=94.3 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
0.20
0.21
0.21
0.05
-0.01
min max
0.04 0.00
0.13 0.00
0.13 0.00
0.02 0.01
0.00 -0.02
0.34
0.62
0.61
0.12
0.01
This is the most ideal constellation, fitting perfectly to our
model assumption. The sampling theorem is completely
fulfilled, both in space and time. The only noise is due to
(single-precision) floating point calculation roundoff errors.
The density of the motion field is maximal (only the border
pixels decrease the density). Maximum errors are less than
1% in magnitude and 1◦ in direction and there is no bias.
The plane has a paraboloid-like shape in the range image,
because of the radial distance measurement.
Figure 6.5: Plane perpendicular to optical axis at distance Z=3m. Two superimposed
planar wave patterns of different orientation yield a plaid like texture.
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.001
β = 0.3
ps = 0
ws = 5
U = 3.00
V = 6.00
W = 9.00
0
d=94.3 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
1.71
0.83
0.83
0.25
-0.06
min
max
0.63 0.42 3.15
0.22 0.39 1.33
0.22 0.39 1.33
0.08 0.09 0.43
0.00 -0.06 -0.02
The plane covers a distance range from 2.6m–3.9m. The
approximative character of the derivative filters introduces
minor errors for the farther regions were the texture gets
smaller, i.e. the curvature increases. There is a minor bias
toward too small velocities.
Figure 6.6: Plane at same distance but tilted with surface normal having zenith θ and
azimuth φ of both 30◦
119
Chapter 6. Applications
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.001
β = 0.3
ps = 0
ws = 5
U = 3.00
V = 6.00
W = 9.00
0
d=89.0 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
0.82
2.01
2.01
0.40
-0.36
min
max
1.30 0.00
2.55 0.20
2.55 0.20
0.50 0.04
0.46 -5.41
13.72
35.07
34.91
6.47
-0.02
The flow estimate begins to break down for very far distances (the range is 2m–7m). For far distances the spatial
sampling theorem is hardly fulfilled anymore. The filter
derivatives introduce large errors and the flow estimate is
biased for to small estimates. The density decreases. Notice
that increasing the pattern size would improve the results
(not shown). The confidence measure excludes successfully
the worst region and indicates correctly were the measurements are less trustworthy.
Figure 6.7: The plane is tilted heavy with a surface normal θ = 60◦ , φ = −30◦
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.0003
β = 0.3
ps = 0
ws = 5
U = 3.00
V = 6.00
W = 9.00
0
d=86.7 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
0.27
0.57
0.57
0.12
-0.09
min max
0.28 0.00
0.35 0.03
0.35 0.03
0.07 0.02
0.07 -0.77
5.69
4.65
4.64
1.06
0.34
The radial-wave-like pattern has less local structure than
the plaid pattern. Around the contour lines of small
(brightness) gradient magnitude, i.e. the crests and troughs
of the radial wave pattern, there is an aperture problem,
only line flow can be determined. Moreover, the motion
estimate using this pattern is more sensitive to noise compared to results for the plaid pattern (not shown).
Figure 6.8: Plane with a radial-wave-like pattern tilted for θ = 45◦ , φ = −30◦
120
6.2. Synthetic Test Sequences
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 1.00
nlg = 1.50
τ = 0.0005
τ2 = 0.1
β = 0.3
ps = 0
ws = 5
U = 3.00
V = 6.00
W = 12.00
0
d=93.7 %
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
mean sigma
9.04
4.74
4.75
1.79
0.02
min
max
6.86 0.00
3.34 0.04
3.33 0.07
1.06 0.05
1.55 -5.97
54.26
36.33
36.29
13.00
5.63
Adding a weak noise to both data channels deteriorates the
motion estimate. However, the density is still about maximum. The noise in range of σr = 1cm is about the noise
of a PMD-sensor under good illumination conditions. Notice that increasing the velocity would improve the relative
errors of the estimate. The error decreases approx. inverse
proportional to the velocity, until the temporal sampling
theorem is violated (then the motion estimate fails).
Figure 6.9: Plane (θ = 25◦ , φ = 15◦ , rc = 3.5m) with weak normal i.i.d. noise added to
both amplitude and range data
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 3.00
nlg = 3.00
τ = 0.0005
τ2 = 0.1
β = 0.3
ps = 0
ws = 5
U = 3.00
V = 6.00
W = 12.00
0
d=18.7 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
23.00
11.31
11.34
4.25
0.13
min
max
16.97
0.02 105.60
10.03
0.12 94.22
9.99
0.31 93.68
2.65
0.25 16.61
4.04 -14.47 14.26
A high noise level decreases the density of the motion field
clearly. The remaining velocity field is roughly of correct
direction and magnitude. The confidence measure values
indicate that the estimates are not too trustworthy. The
type measure correctly indicates a full flow estimate. The
estimate can be improved by pre-smoothing the data, as
the next figure demonstrates.
Figure 6.10: Same plane as before but noise is increased about a factor 3
121
Chapter 6. Applications
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 3.00
nlg = 3.00
τ = 0.0005
τ2 = 0.1
β = 0.3
ps = 3
ws = 7
U = 3.00
V = 6.00
W = 12.00
0
d=91.1 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
16.71
7.69
7.71
3.00
-0.32
min
max
12.38
0.00
6.53
0.03
6.52
0.08
1.98
0.10
2.98 -14.06
89.15
78.70
78.32
14.78
13.33
By pre-smoothing the data by normalized averaging with a
7 × 7 binomial applicability and using a larger integration
windows size (of the same size) we recovered nearly maximum density. Both magnitude and directional errors are
improved. While maximum errors are still high the width
of the error distribution was improved.
Figure 6.11: Same plane and noise level as before but pre-smoothed
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 3.00
nlg = 3.00
τ =3
τ2 = 10
β = 0.3
ps = 3
ws = 7
U = 3.00
V = 6.00
W = 12.00
0
d=90.4 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
16.85
5.68
5.71
2.85
1.23
min
max
13.97 0.00 129.36
3.67 0.00 32.69
3.66 0.11 32.72
1.98 0.03 21.30
2.52 -5.88 15.31
Here the same pre-smoothing and integration window size
was used, but the structure tensor was equilibrated. While
the error in magnitude has not changed significantly the
directional errors were improved clearly (particularly σ is
much smaller). However, the equilibration is strangely more
biased then the not equilibrated estimate. Because of the
equilibration the thresholds (τ, τ2 for confidence and type
measure) need to be adapted.
Figure 6.12: Flow estimation using an equilibrated structure tensor
122
6.2. Synthetic Test Sequences
The presented results for a plane at pure translatory motion are in good agreement
with our expectations. The method is somewhat sensitive to noise. For weak noise
the results are still good. For medium level noise the results are acceptable if a
pre-smoothing filter is applied that is about the same size as the integration window
(here 7 × 7). Most important for a satisfactory result is that the sampling theorem is
not violated. We also saw that if there is not enough structure in the neighborhood
a full flow can not be estimated and the density of the motion field decreases.
6.2.2 Motion of a sphere
We now present results for a sphere in motion. This introduces additional problems
compared to the plane motion, because a sphere has a curved surface and is of limited
extension. There are motion boundaries present that are not explicitly modeled.
The texture on the surface is of varying curvature and has discontinuities that are
in conflict with the spatial sampling theorem, which is rather problematic for the
employed derivative filters.
123
Chapter 6. Applications
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.001
β = 0.3
ps = 0
ws = 5
U = 7.00
V = 4.00
W = 13.00
0
d=70.9 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
0.36
0.47
0.47
0.14
-0.01
min max
0.46 0.00
0.39 0.00
0.38 0.02
0.12 0.00
0.15 -0.53
4.69
3.94
3.93
1.21
1.20
The errors are comparable to those of the noise-free plane
of similar texture. But the density of the motion field is
decreased because in the center and on the central-right
side of the sphere the discontinuities in the texture violate
the spatiotemporal sampling theorem. Also at the edge
of the sphere no motion estimation is possible due to the
motion boundary. The size of missing border is about the
size of the integration window.
Figure 6.13: A sphere with a radius of 50cm and at a distance of 2.2m having a sinusoidal plaid texture. Motion boundaries and the violation of sampling
theorem decrease the density of the motion field.
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.001
β = 0.3
ps = 3
ws = 5
U = 7.00
V = 4.00
W = 13.00
0
d=77.8 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
0.41
0.52
0.52
0.16
-0.00
min max
0.58 0.00
0.48 0.00
0.48 0.00
0.15 0.01
0.17 -1.05
5.58
5.36
5.36
1.70
1.69
The pre-smoothing removes the very fine structures and discontinuities. Therefore the sampling theorem is no longer
violated and the density of the motion field is increased.
Figure 6.14: Same sphere as above but with pre-smoothing of range and intensity data.
124
6.2. Synthetic Test Sequences
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 0.00
nlg = 0.00
τ = 1e-006
τ2 = 0.001
β = 0.3
ps = 3
ws = 5
U = 14.00
V = 10.00
W = 50.00
0
d=32.5 %
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
mean sigma
0.49
0.72
0.72
0.73
0.33
min max
0.55 0.00
0.73 0.00
0.73 0.00
0.72 0.04
0.68 -0.48
3.00
3.48
3.48
3.56
3.04
Increasing the velocity leads around fine structured regions
to a violation of the sampling theorem in time. The motion
field density decreases, but the estimated velocity vectors
are still very good, i.e. the relative and directional errors
show no significant difference to the results of the sphere
with lower speed.
Figure 6.15: Same sphere but with increased velocity of the sphere. The motion estimate breaks down around the fine textured structures
a
30%
c
d
b
-30%
1
f
g
e
Parameters
nlr = 1.00
nlg = 1.50
τ = 0.001
τ2 = 0.2
β = 0.3
ps = 0
ws = 7
U = 5.00
V = 7.00
W = 9.00
0
d=73.8 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
8.46
5.78
5.79
1.75
-0.15
min
max
6.58 0.01 43.64
4.08 0.08 33.68
4.06 0.15 33.57
1.01 0.05 8.09
1.34 -5.50 6.37
The integration window size was increased to 7 × 7, to compensate for the increased noise. The errors in the motion
estimate are of the same magnitude as those for the plane
subject to noise of the same (weak) noise level. Only the
density is smaller. No pre-smoothing was applied. Thus we
find in the center of the sphere an unestimated region.
Figure 6.16: The same sphere at a farther distance subject to weak noise, with no
pre-smoothing applied
125
Chapter 6. Applications
a
30%
c
d
e
b
-30%
1
f
g
Parameters
nlr = 1.00
nlg = 1.50
wσ = 2.00
τ = 0.001
τ2 = 0.2
β = 0.3
ps = 0
ws = 7
wµ = 0.60
U = 5.00
V = 7.00
W = 9.00
0
d=81.3 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
9.06
6.16
6.17
1.85
-0.09
min
max
6.77 0.00 53.97
3.90 0.09 36.56
3.88 0.09 36.43
0.99 0.08 7.77
1.35 -5.53 4.89
Here we demonstrate how the proposed gradient based
weighting of the data terms integrated by the structure tensor can increase the density of the estimate without reducing the quality of the estimate. The errors are comparable
to the prior example but the density is increased.
Figure 6.17: Identical to the prior example but with gradient based weighting applied.
The density of the estimated motion field is increased
a
50%
c
d
e
b
-50%
1
f
g
Parameters
nlr = 3.00
nlg = 3.00
wσ = 0.50
τ = 0.0009
τ2 = 0.1
β = 0.3
ps = 3
ws = 9
wµ = 0.00
U = 10.00
V = 15.00
W = 18.00
0
d=81.0 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
13.50
9.53
9.53
5.78
3.12
min
max
10.64
0.00 115.92
6.97
0.07 53.38
6.97
0.09 53.32
3.80
0.18 38.67
4.28 -10.59 33.47
Here we increased the integration window size to 9 × 9 and
applied pre-smoothing. Gradient based weighting improves
the density here too, at a small loss of quality in the estimate (2% in relative magnitude, insignificant in angular
error) compared to the binomial weighted structure tensor
(not shown).
Figure 6.18: Sphere corrupted by noise of same level as used in figures 6.10 and 6.12.
Pre-smoothing and gradient based weighting yields an acceptable quality
at a high density of the motion field.
126
6.3. Real World Sequences
The estimated motion fields for the sphere we presented are of similar quality as
those for the plane motion. Motion boundaries do not increase the errors in the
velocity field but are excluded by the confidence measure and therefore reduce the
density of the motion field. Gradient based weighting can increase the density of the
motion field, at an only small loss of quality.
Its important to notice that the ratio of density to error of the motion field is not
fixed but depends on the chosen parameters. Particularly the thresholds τ and
τ2 on confidence and type measure regulate if an estimate is a full flow or not.
Typically increasing τ increases the density at a cost of quality. And the right
choice of the threshold depends on the noise level of the signal, which may vary
over the image. While weighting the input data during pre-smoothing according
to a confidence measure (as discussed in section 3.1.5) is a first step, an automatic
adaption to the noise level would be desirable. However, the author is not sure how
this could be done best. Some type of analysis of the kind of a Wiener-filter may be
appropriate, but a detailed analysis of this topic is outstanding.
6.3 Real World Sequences
After the proof of concept by simulated data, we demonstrate the performance of
the motion estimation algorithm on real world sequences. We used sequences of the
pyramid target moving in horizontal direction and parallel to the optical axis. The
input data are mean images (from step mode acquisition described in section 5.2.2)
that have a lower noise level and show no motion artifacts compared to live sequences.
The results are presented in the same tabular scheme as the synthetic sequences. The
meaning of the error colormaps has changed slightly. While in the previous figures it
has been the percental error w.r.t. the single ground-truth velocity components U, V
and W, its now the percental error w.r.t. the ground-truth velocity magnitude, i.e. all
three error images have the same scale. The definition of the density d has changed
slightly too (because we have no ground truth velocity field of the hole scene, but
only the ground truth velocity of the target). It is defined by the number of pixels
with a full flow estimate and a confidence of at least 0.5 divided by the number of
pixels that have an amplitude higher than a specific threshold (which is 20 for all
pyramid sequences).
All motion fields were calculated using a fixed scaling exponent a = 3 (see equation (4.32)). This is a sane compromise for the distance range from 1.8–4.8m, present
in the used sequences, w.r.t. the varying and not fully understood dependence of a
on the distance as depicted in figure 5.13 and discussed in section 5.2.3.
127
Chapter 6. Applications
a
30%
c
d
b
-30%
1
f
g
e
Parameters
ps = 0
wσ = 2.00
τ = 5e-006
τ2 = 0.1
β = 0.3
ws = 7
wµ = 0.50
U = 0.00
V = 0.00
W = -6.00
0
d=45.9 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
2.92
4.15
4.11
0.49
-0.01
min
max
2.18 0.00 11.30
2.32 0.12 19.02
2.28 0.24 18.79
0.24 0.03 2.10
0.22 -0.60 0.67
The pyramid target performs a pure translation in depth.
The errors are in the magnitude of the spatial resolution of
the sensor. This is not trivial, as the range sequence appears divergent. The estimate is not biased for this frame,
but generally there is a bias for uncalibrated camera data
that corresponds to the periodic phase errors show in figure 5.6.
Figure 6.19: Translation of the pyramid target in Z-direction at central position. The
density and quality of the motion field are good.
a
30%
c
d
b
-30%
1
f
g
e
Parameters
ps = 0
wσ = 2.00
τ = 1e-005
τ2 = 0.1
β = 0.3
ws = 7
wµ = 0.50
U = 0.00
V = 0.00
W = -6.00
0
d=18.8 %
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
mean sigma
6.64
3.88
3.91
0.62
0.36
min
max
4.21 0.03 26.67
2.10 0.04 13.80
2.02 0.52 13.62
0.25 0.07 1.68
0.28 -0.33 1.58
The target is no longer in a central position. The confidence of the left and right triangle of the pyramid is about
zero. While in the central position (of the prior example)
the triangles have a reduced confidence too, its much more
articulate in the border positions. The estimate errors are
only increased in magnitude but not directional.
Figure 6.20: Translation of the pyramid target in Z-direction at left position. The
density of the estimate is clearly reduced.
128
6.3. Real World Sequences
a
30%
c
d
b
-30%
1
f
g
e
Parameters
ps = 0
wσ = 2.00
τ = 1e-005
τ2 = 0.1
β = 0.3
ws = 7
wµ = 0.50
U = 0.00
V = 0.00
W = -6.00
0
d=19.8 %
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
mean sigma
6.32
3.98
4.00
0.61
0.35
min
max
4.15 0.01 24.86
2.71 0.02 33.32
2.65 0.30 32.99
0.32 0.04 4.10
0.25 -0.25 1.47
The fixed pattern noise was removed by calibration. While
the range and amplitude data looks much better, the motion estimate is not significantly improved. Sigma of the
angular error got even worse. Only the magnitude error
shows a minor improvement.
Figure 6.21: Same as before but fixed pattern noise was removed. Interestingly the
estimate is not improved.
a
30%
c
d
b
-30%
1
f
g
e
Parameters
ps = 0
wσ = 2.00
τ = 1e-005
τ2 = 0.05
β =0
ws = 7
wµ = 0.50
U = 0.00
V = 0.00
W = -6.00
0
d=18.1 %
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
mean sigma
9.96
7.75
7.73
1.10
0.51
min
max
5.90 0.16 37.29
5.20 0.15 26.48
5.09 0.61 26.17
0.56 0.19 3.00
0.38 -0.59 2.12
Here we demonstrate that a motion estimation fails in determining a full flow, if it does not take advantage of the
amplitude information using the extended BCCE (4.33).
While the confidence in the estimate is increased, only line
or plane flows can be estimated (not shown). The errors
are clearly increased.
Figure 6.22: No amplitude information for the motion estimate was used (β = 0).
The aperture problem allows only an estimate at the diagonal edges of
the pyramid.
129
Chapter 6. Applications
a
100%
c
d
b
-100%
1
f
g
e
Parameters
ps = 0
wσ = 2.00
τ = 6e-006
τ2 = 0.3
β = 0.3
ws = 7
wµ = 0.50
U = -1.00
V = 0.00
W = 0.00
0
d=20.3 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
47.65
41.90
28.75
0.70
0.49
min
max
29.47 0.08 181.15
41.83 0.41 175.36
16.80 1.11 92.92
0.37 0.04
2.18
0.53 -1.77
2.11
A horizontal translation of the pyramid target is rather
problematic for the motion estimation. Besides missing information in the left and right triangle, there is a clear bias
in the estimate of the upper and lower triangle. The bias
contributes to about 70% of the magnitude error. The angular error is large. Pre-smoothing the data will not help
to solve the problem. Dependent on the chosen threshold
it will either decrease the density or increase the errors.
Figure 6.23: Horizontal translation of the pyramid target. There is a strong bias in
the motion estimate.
The motion estimates for the pyramid sequences 6.19 to 6.23 show rather heterogeneous results. While pure motions in Z-direction could be estimated well, the results
for the estimates in X-direction are rather bad. The problem is not due to noise in
the data as this is really low. Interestingly removing the fixed pattern noise increased
the standard deviation in the angular errors! The author thinks that the pyramid
target is a quite difficult object for motion estimation. While it looks rather optimal,
as it has structure both in range and amplitude information, a second look reveals
that the texture information is of a kind that is suboptimal for our algorithm. The
texture in the amplitude is due to shadows and varying angles of reflectance. This
kind of texture is not modeled as we assume spatially but not temporally varying
reflectivity on the object surface, i.e. the reflectivity texture on the surface is constant in time. Moreover the step edges in the range data of the pyramid are hard to
handle properly by the derivative filters. At each edge there is a discontinuity and on
each step we find an aperture problem. This might be addressed by pre-smoothing,
but it seems that the integration window size is to small to come to an unique estimate. Possibly a multiscale approach could help, but we are not sure about this. An
open question in this context is weather the simple pinhole camera model introduces
significant errors. Therefore a geometric calibration of the camera is desirable.
130
6.3. Real World Sequences
Figure 6.22 showed the benefit of taking both information channels of the PMDsignal into account. If only the range information is used the errors in magnitude at
similar density increased about 50% in magnitude and 100% in direction.
We remarked with figure 6.19, that the motion estimates are generally biased if the
camera is not properly calibrated in range, with a similar method as exemplified in
section 5.2.2. Figure 6.24 shows the motion estimation results for an uncalibrated
sequence of the checkerboard target performing a motion along the optical axis.
Besides the decreased density due to the low reflectivity patches (pixels of low amc
c
X−velocity
Y−velocity
10
10
5
10
10
5
diagonal−pixels
diagonal−pixels
5
15
20
0
25
30
−5
35
5
15
20
0
25
30
−5
35
40
10
20
30
40 50
frame
c
60
70
−10
80
40
Z−velocity
20
40
frame
60
80
−10
70
68
5
66
10
c
diagonal−pixels
64
15
62
20
60
58
25
56
30
54
35
40
52
10
20
30
40
frame
50
60
70
80
50
Figure 6.24: Checkerboard target translation of 60mm/frame in Z-direction from
1.2m–6.0m: a, b and c velocity components U, V and W [mm/frame]
at the pixel positions along the (magenta) diagonal, shown in the amplitude image d (at 2.2m). The variation of W with the distance corresponds
to the periodic phase error of the camera.
plitude are excluded from the motion field), we see the bias of the motion estimate
that varies with the range in correspondence to the measurements 5.6 and 5.10.
131
Chapter 6. Applications
The results for the pyramid target are not satisfactory. Therefore, we have a look at
a real world sequence in a more natural setup which is depicted in figure 6.25. While
there is no ground truth motion data for these kind of sequences we can at least
check if the results are consistent with the apparent motion. We used the average
a
100%
c
d
b
-100%
1
f
g
e
Parameters
ps = 1
wσ = 2.00
τ = 0.001
τ2 = 0.1
β = 0.3
ws = 7
wµ = 0.50
U = 7.00
V = 3.00
W = -25.00
0
d=53.3 %
mean sigma
relative [%]
dir’nal [◦ ]
angular [◦ ]
magnitude
bias
35.26
13.38
13.56
10.88
-3.73
min
max
25.73
0.05
9.07
0.20
9.35
0.33
6.28
0.22
9.58 -21.57
99.14
52.26
78.63
26.54
22.08
The errors given on the left are a very rough approximation.
We took the approximate mean velocity of the person and
used it as ground truth for the sequence. Because the robot
moves much slower than the person, it is of red color in the
W-error-map (e). We find the motion field to be consistent
with the apparent motion. Moreover, it has a high density
(except for the missing head and the motion boundaries).
Figure 6.25: Real world sequence with a person moving toward the camera and a robot
that moves in the same direction but much slower.
velocity of the moving person calculated from the motion field as ”ground truth”.
Because there are at least two major motions in the sequence the mean errors are
not too meaningful. More interesting are the standard deviations as they should be
an upper limit of the true standard deviation because we falsely treat two motions as
one w.r.t. the error statistic. Nevertheless sigma is relatively small which indicates
that our estimate is consistent w.r.t. the apparent major motions. The similar color
in the velocity components of each of the two moving objects indicates this too.
132
6.4. Summary
6.4 Summary
We have shown that the proposed local motion estimation method yields correct
motion fields for synthetic range sequences that show translatory motion. Weak to
medium level i.i.d. noise in the range and amplitude data can be compensated by an
appropriate pre-smoothing if necessary at all. The method is numerically stable, i.e.
(small) errors in the input are not magnified in the output. Therefore the quality of
the motion field might be reduced due to noise, but only relative to the level of noise
introduced.
The method however is sensitive to a correct spatiotemporal sampling of the input
data. Motions large compared to the spatial frequency of the range or amplitude
data (i.e. its range-structure or reflectivity-texture) can not be handled directly,
as they violate the temporal sampling theorem. An extension of the method to a
multiscale implementation similar to the coarse-to-fine strategy developed by Black
and Anandan [BA96] would be necessary to overcome this problem. More advanced
techniques including a regularization of the motion field as introduced by Bruhn,
Weickert, and Schnoerr [BWS05] seem applicable too. Very fine (spatial) textures
are problematic w.r.t. the employed derivative filters, independent of the magnitude
of the motion, and need to be addressed by pre-smoothing the data.
The results for the real world sequences are heterogeneous. Most likely due to the
discussed problematic nature of the pyramid target w.r.t. our model assumptions,
we could not yield satisfactory results for a translation in horizontal direction, while
the results for a motion in depth are in very good agreement with the true motion
field. Analysis of a motion sequence in a natural setup showed the consistency of
the estimated motion field with the apparent motion. The density of the locally
determined motion field is pleasantly high.
The aperture problem was successfully addressed by using both range and amplitude
signal of the PMD-sensor. Combining both data channels yields for most cases a
higher quality and density of the determined motion field.
133
Chapter 6. Applications
134
Chapter 7
Conclusion and Outlook
In the previous chapters we have presented and discussed methods from a large
number of research fields, with the final goal to extend the possibilities of image
processing w.r.t. the analysis of dynamic processes captured in image sequences. In
particular we want to estimate correct, physical, three dimensional motion fields.
The image sequences of interest are acquired with range cameras based on the TOF
distance measuring principle realized in PMD-technology.
In the following we give a summary of the major topics we worked up and how
well we met our goal. The author states his personal evaluation of PMD-technology
w.r.t. motion estimation and highlights important topics that should be addressed
regarding the sensor technology as well as algorithmic aspects of motion estimation.
7.1 Summary
To optimally exploit the extended information content that a PMD-camera features,
we need to understand where this information originates from. Therefore we analyzed the basic measuring principle and showed possible sources of errors that can be
avoided, if specific details about the technical realization are know; particularly the
blindfold application of the common formula (2.12), which is actually valid only for
sinusoidal modulation (and even harmonics of higher order), can lead to articulate
systematic errors if the exact modulation of the emitted (infrared) light is more rectangular than sinusoidal. We presented formula (2.14) to calculate the correct range
and amplitude for a rectangular modulated optical signal. Actually, the producers
of the cameras should be aware of the problem and correct or better avoid them.
However, for current camera models this is not the case yet.
135
Chapter 7. Conclusion and Outlook
We have shown a new way to correct for one of the most prominent systematic
errors by means of a calibration based on Fourier approximation. The correction of
the error, after calibration has been done, can be calculated easily and from very
few data. Therefore an implementation of the correction algorithm directly on the
camera should be unproblematic.
We also discussed several further errors of the sensor and derived, based on the
statistical errors, an uncertainty measure for the acquired range data, which can be
used in further image processing steps. Thus, we might increase the accuracy of
results or at least come to a better estimate about the magnitude of the possible
errors, i.e. a confidence measure. We successfully used the uncertainty measure
within a novel extension to B-spline channel smoothing, that we named two state
smoothing.
We introduced the necessary theoretical fundament for motion estimation from image sequences. We use the range flow constraint equation in a novel formulation
that embeds the used pinhole camera model analytically and reduces the number of
necessary filter operations. To improve motion estimation results w.r.t. the unavoidable aperture problem, we extended the brightness change constraint equation for
the PMD-camera, assuming the irradiance to follow a (inverse) power law. Thus we
can take both BCCE and RFCE into account and use both in a combined structure
tensor approach, that is computational efficient.
On synthetic and real world data we applied our method successfully, achieving
rather dense motion fields without the need for regularization. However, the method
has still weaknesses that need to be addressed: the algorithm fails to estimate a
satisfactory motion field under specific conditions for which, within the limits of the
method, an estimate should be possible.
Within the project Lynkeus, funded by the Federal Ministry of Education and Research (BMBF), we implemented an interface to our algorithm, that allows it to be
used within a complex runtime environment. Figure 7.1 shows the range flow estimation module within a simple configuration of the Lynkeus runtime environment
(RTE)∗ .
Furthermore, we developed a method for motion artifact detection and utilized it
in conjunction with two state channel smoothing for the successful realization of
∗
the Lynkeus RTE is developed by the collaboration partner Elektrobit
136
7.2. Evaluation and Outlook
Figure 7.1: A prototype of the range flow module developed
for the Lynkeus runtime environment.
a demonstrator within the project Smartvision. Figure 7.2 shows the GUI to
our software and the associated experimental setup (from the collaboration partner
Schmersal), which demonstrates the application of PMD technology in the field of
safety engineering.
Figure 7.2: GUI and experimental setup, demonstrating the application
of PMD technology in the field of safety engineering.
7.2 Evaluation and Outlook
First of all the author would like to state that he thinks the PMD technology to be
a promising new method to acquire quickly and with rather small effort range maps
of the close surrounding. The ease of use and installation may be one of the major
advantages of this technology.
137
Chapter 7. Conclusion and Outlook
However, the method still has several limitations that hopefully will be addressed in
near future. The most important from the author’s point of view are:
framerate the current framerates of maximal 20 frames/sec are rather problematic
for high accuracy motion estimates that our method aims at.
dynamic range due to the dependence of pixel irradiance on object distance and
of range measurement accuracy on irradiance and the limited capacitance of
the pixels’ storage sites (see section 2.2.2.2), the dynamic range of a PMDsensor is critical w.r.t. its applicability in real world tasks. Techniques like
multiple exposure and adaptive illumination should be exploited to overcome
shortcomings of current sensor models w.r.t. this topic.
systematic errors a better calibration of the camera is possible and should be done
on producers side. While some problems like the interdependence of range and
amplitude/reflectivity are not well understood yet, there are still several errors
that can be corrected easily.
motion artifacts the problem of motion artifacts should be addressed technically by
taking the necessary correlation samples at the same time. While prototypic
PMD’s of this kind were realized as 4-tap lock-in pixels by Lange [Lan00],
their realization seems to contradict other optimization criteria of the CMOSprocess, like for instance the fill factor. However, for the sake of a simple
and robust analysis of dynamic image data, the problem should be addressed
nevertheless.
objective the optical elements used for some camera models seem to be partially
suboptimal. Scattered light can influence the range measurements very badly.
temperature the range measurement is not stable over time, but depends clearly
on the temperature of the sensor (however this problem is more serious for the
SR3000, than for O3D and PMD19k).
documentation a better technical documentation on specific features of the camera
would be of great benefit, particularly regarding demodulation contrast and
modulation of the light source.
With respect to our own algorithms there is room for various extension and improvements:
138
7.2. Evaluation and Outlook
• So far, the thresholds necessary for the eigenvalue analysis of the structure
tensor (and hence for motion estimation) need to be tuned manually. While
the same thresholds can be used for various kinds of sequences, still the need
to adapt for local and global noise levels exists.
A detailed analysis of the novel constraint equations w.r.t. error propagation
and particularly equilibration is standing out. An automatic adaption of the
thresholds based on this analysis seems possible, more particularly as we consider the uncertainty measure that can be derived from the PMD-signal. An
additional local analysis in the kind of a Wiener filter might be helpful for
noise estimation; also a local analysis of the variation of the eigenvalues in the
structure tensor might be appropriate, but computational costly.
• We used the subspace regularization scheme presented in [SG02], but found the
improvements in quality compared to the increase in computational costs disappointing. Meanwhile there exist more advanced concepts of global methods
incorporating local estimates. An extension of our method towards the concepts and techniques presented in [BWS05; Pap+06; Bru+06] seems attractive.
This would allow the estimation of dense flow fields in the presence of locally
extended aperture problems and in situations where the temporal sampling
theorem is partially violated. At the moment our method is limited to rather
”well behaving” image data, i.e. data that complies to the sampling theorem
spatially and temporally. With current limitations of the cameras in frame rate
and resolution and the given (rather high) noise level such sequences can only
be acquired for a limited number of real world applications.
• An additional analysis of the method on more realistic simulated test data is
advised. Originally we wanted to test the algorithms also on sequences created
with the TOF-simulator developed by Keller et al. [Kel+07]. Unfortunately
the simulator was not yet capable of simulating textures on surfaces, which
however are essential to our method.
• Finally a more thorough study of the interdependence of range and amplitude/reflectivity measurement is of interest for an exact motion estimate. A
mostly uncertain property in this context is the amplitude/irradiance dependent demodulation contrast of the various sensors. It could be essential for an
optimal modeling of the irradiance used in the extended BCCE.
139
Chapter 7. Conclusion and Outlook
140
Part III
Appendices
141
Acronyms and Notation
Acronyms and Abbreviations
APS
Active Pixel Sensor
CCD
Charge Coupled Device
CMOS
Complementary Metal–Oxide–Semiconductor
CTE
Charge Tranfer Efficiency
DFT
Discrete Fourier Transform
DRNU
Dark Response NonUniformity
FFT
Fast Fourier Transform
FT
Fourier Transform
i.i.d.
Independent and Identically Distributed
MTF
Modulation Transfer Function
NLS
Nonlinear Least Squares
OLS
Ordinary Least Squares
pdf
Probability Density Function
PMD
Photonic Mixer Device
PRNU
Photo Response NonUniformity
PSF
Point Spread Function
PTLS
Partial Total Least Squares
SNR
Signal-to-Noise Ratio
143
Chapter 7. Conclusion and Outlook
SVD
Singular Value Decomposition
TLS
Total Least Squares
TOF
Time Of Flight
w.l.o.g.
Without Loss Of Generality
General notation
ha, bi
Standard dot product between vectors a and b
hvi
Expectation value or average w.r.t. to random variable or set v
hhA, Bii
Dot product between matrices A and B
v̂
ı
Estimate of a (random) variable v in a statistical sense
√
Imaginary unit −1
M
m × n matrix
diag a
A diagonal matrix with vector a on its diagonal.
I
Identity matrix
tr A
The trace of matrix A.
1l
Unit matrix of ones
B
Binomial convolution operator or averaging operator
O, P, Q, . . .
Caligraphic letters indicate a representation-independent operator
v
Column vector
m̄
Statistical mean (either arithmetic mean or population mean) over a
set of measurements m
v̂
Normalized or unit vector
(vi )
Vector v with components vi .
an
The nth vector of a sequence of vectors
vT, M T
Transposed (column) vector (i.e. row vector) or matrix
144
Acronyms and Notation
Greek Symbols
ϕ, θ
Phase of a periodic signal
σ
Standard deviation of a normal distribution
Latin Symbols
C
Complex numbers
N
Natural numbers
R
Real numbers
Rn
n-dimensional vector space over R
145
Acronyms and Notation
146
Bibliography
[Alb07]
Martin Albrecht. “Untersuchung von Photogate-PMD-Sensoren hinsichtlich qualifizierender Charakterisierungsparameter und -methoden”.
PhD thesis. Siegen, Germany: Department of Electrical Engineering and
Computer Science, 2007.
[BA96]
M. J. Black and P. Anandan. “The robust estimation of multiple motions:
parametric and piecewise-smooth flow fields”. In: Computer Vision and
Image Understanding 63 (1996). Pp. 75–104.
[Bar00]
E. Barth. “The Minors of the Structure Tensor”. In: DAGM. Kiel, Germany 2000. Pp. 221–228.
[Bar+03]
E. Barth et al. “Spatio-temporal Motion Estimation for Trancparency
and Occlusions”. In: In Proceedings of IEEE International Conference
on Image Processing. 2003.
[Big06]
Josef Bigun. Vision with direction. Berlin: Springer Verlag, 2006. URL:
http://www2.hh.se/staff/josef/.
[Bla+98]
M. J. Black et al. “Robust anisotropic diffusion”. In: IEEE Transactions
on Image Processing 7.3 (Mar. 1998). Pp. 412–432.
[BR96]
Michael J. Black and Anand Rangarajan. “On the Unification of Line
Processes, Outlier Rejection and Robust Statistics with Applications in
Early Vision”. In: International Journal of Computer Vision 19.1 (July
1996). Pp. 57–92.
[Bru+06]
A. Bruhn et al. “A Multigrid Platform for Real-Time Motion Computation with Discontinuity-Preserving Variational Methods”. In: International Journal of Computer Vision 70.3 (2006). Pp. 257–277. DOI:
10.1007/s11263-006-6616-7.
[BWS05]
A. Bruhn, J. Weickert, and C. Schnoerr. “Lucas/Kanade Meets
Horn/Schunk: Combining Local and Global Optic Flow Methods”. In:
International Journal of Computer Vision 61.3 (2005). Pp. 211–231.
147
Bibliography
[DD02]
Frédo Durand and Julie Dorsey. “Fast bilateral filtering for the display
of high-dynamic-range images.” In: SIGGRAPH. Ed. by Tom Appolloni.
ACM, 2002. Pp. 257–266. ISBN: 1-58113-521-1. URL: http://dblp.
uni-trier.de/db/conf/siggraph/siggraph2002.html#DurandD02.
[Far03]
Gunnar Farnebäck. “Two-Frame Motion Estimation Based on Polynomial Expansion”. In: Proceedings of the 13th Scandinavian Conference
on Image Analysis. LNCS 2749. Gothenburg, Sweden 2003. Pp. 363–370.
[FFS06]
Michael Felsberg, Per-Erik Forssén, and Hanno Scharr. “Channel
Smoothing: Efficient Robust Smoothing of Low-Level Signal Features.”
In: IEEE Trans. Pattern Anal. Mach. Intell. 28.2 (Aug. 23, 2006).
Pp. 209–222. URL: http://dblp.uni-trier.de/db/journals/pami/
pami28.html#FelsbergFS06.
[FGW02]
Per-Erik Forssén, G.H. Granlund, and J. Wiklund. Channel Representation of Colour Images. Technical Report LiTH-ISY-R-2418. Dept. of
Electrical Eng., Linköping Univ., 2002.
[For04]
Per-Erik Forssén. “Low and Medium Level Vision using Channel Representations”. Dissertation No. 858, ISBN 91-7373-876-X. PhD thesis.
SE-581 83 Linköping, Sweden: Linköping University, Sweden, 2004.
[For98]
Bength Fornberg. “Calculation of Weights in Finite Difference Formulas”. In: SIAM Review 40.3 (1998). Pp. 685–691.
[Fos93]
E. R. Fossum. “Active pixel sensors: are CCDs dinosaurs?” In: ChargeCoupled Devices and Solid State, Optical Sensors III. Ed. by M. M.
Blouke. Vol. 1900. Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference. 1993. Pp. 2–14.
[Fow+98]
B. Fowler et al. “A Method for Estimating Quantum Efficiency for
CMOS Image Sensors”. In: Proc. SPIE 3301 (1998). Pp. 178–185.
[FPA99]
C. Fermueller, R. Pless, and J. Aloimonos. “Statistical Biases in Optic
Flow”. In: CVPR’99. Fort Collins, Colorado 1999.
[Fra07]
Mario Frank. “Investigation of a 3D Camera”. MA thesis. Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg,
2007.
[FSF02]
Michael Felsberg, Hanno Scharr, and Per-Erik Forssén. The B-Spline
Channel Representation: Channel Algebra and Channel Based Diffusion Filtering. Tech. Report LiTH-ISY-R-2461. Dept. of Electrical Eng.,
Linköping Univ., 2002.
148
Bibliography
[GHO99]
G. H. Golub, P. C. Hansen, and D. P. O’Leary. “Tikhonov Regularization
and Total Least Squares”. In: SIAM Journal on Matrix Analysis and
Applications 21.1 (1999). Pp. 185–194.
[GK95]
G. H. Granlund and H. Knutsson. Signal Processing for Computer Vision. Dordrecht, The Netherlands: Kluwer Academic, 1995.
[GL80]
G. H. Golub and C. F. van Loan. “An Analysis of the Total Least Squares
Problem”. In: SIAM Journal on Numerical Analysis 17.6 (Dec. 1980).
Pp. 883–893.
[GL96]
G. H. Golub and C. F. van Loan. Matrix Computations. 3rd ed. Baltimore and London: The Johns Hopkins University Press, 1996.
[Gov06]
V.M. Govindu. “Revisiting the Brightness Constraint: Probabilistic Formulation and Algorithms”. In: ECCV. 2006. III: 177–188.
[Grö03]
Hermann Gröning. “Radiometrische Kalibrierung und Charakterisierung
von CCD- uund CMOS Bild-Sensoren und Monokulares 3D-Tracking
in Echtzeit”. PhD thesis. University of Heidelberg, 2003. URL: http:
//www.ub.uni-heidelberg.de/archiv/3589.
[Had02]
J. Hadamard. “Sur les problèmes aux dérivées partielles et leur signification physique”. In: Princeton University Bulletin (1902). Pp. 49–52.
[Hei01]
Horst G. Heinol. “Untersuchung und Entwicklung von modulationslaufzeitbasierten 3D-Sichtsystemen”. German. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science,
2001. P. 157.
[HF01]
H. Haußecker and D. J. Fleet. “Computing Optical Flow with Physical Models of Brightness Variation”. In: IEEE Transactions on Pattern
Analysis and Machine Intelligence 23.6 (June 2001). Pp. 661–673.
[Hil82]
E. C. Hildreth. “The Integration of Motion Information along Contours”.
In: IEEE Workshop on Computer Vision, Representation and Control.
1982. Pp. 83–91.
[Hor86]
B. K. P. Horn. Robot Vision. Cambridge, MA: MIT Press, 1986.
[Hor87]
B. K. P. Horn. “Motion fields are hardly ever ambiguous”. In: Int.J.of
Computer Vision 1 (1987). Pp. 259–274.
149
Bibliography
[HS99]
Horst Haußecker and Hagen Spies. “Motion”. In: Handbook of Computer
Vision and Applications. Ed. by Bernd Jähne, Peter Geißler, and Horst
Haußecker. Vol. 2: Signal Processing and Pattern Recognition. Academic
Press, 1999. Chap. 13.
[Häu+99]
G. Häusler et al. “Three-Dimensional Sensors - Potentials and Limitations”. In: Handbook of Computer Vision and Applications. 1. Academic
Press, 1999. Pp. 485–506.
[Hub81]
P. J. Huber. Robust Statistics. New York: John Wiley and Sons, 1981.
[JDD03]
Thouis R. Jones, Frédo Durand, and Mathieu Desbrun. “Non-iterative,
feature-preserving mesh smoothing.” In: ACM Trans. Graph. 22.3
(Feb. 9, 2003). Pp. 943–949. URL: http://dblp.uni-trier.de/db/
journals/tog/tog22.html#JonesDD03.
[JGH99]
Bernd Jähne, Peter Geißler, and Horst Haußecker. Handbook of Computer Vision and Applications. San Diego: Academic Press, 1999.
[JH00]
Bernd Jähne and Horst Haußecker. Computer Vision and Applications:
A Guide for Students and Practitioners. Academic Press, 2000.
[Jäh02]
B. Jähne. Digital Image Processing. 5th ed. Berlin, Germany: Springer,
2002.
[Jäh04]
Bernd Jähne. Practical Handbook on Image Processing for Scientific and
Technical Applications. 2nd ed. Boca Rota London New York Washington, D.C.: CRC Press, 2004.
[JHG99]
B. Jähne, H. Haußecker, and P. Geißler. “Neighborhood Operators”. In:
Handbook of Computer Vision and Applications. 5 2. Academic Press,
1999. Pp. 93–124.
[JSK99]
B. Jähne, H. Scharr, and S. Körkel. “Principles of Filter Design”. In:
Handbook of Computer Vision and Applications. Ed. by B. Jähne, H.
Haußecker, and P. Geißler. Vol. 2. Academic Press, 1999. Pp. 125–151.
[Jus01]
Detlef Justen. “Untersuchung eines neuartigen 2D- gestützten 3D-PMDBildverarbeitungssystems”. German. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2001.
[Kel+07]
M. Keller et al. “A Simulation Framework for Time-Of-Flight Sensors”.
In: International Symposium on Signals, Circuits and Systems (ISSCS).
Vol. 1. Iasi, Romania: IEEE CAS Society, 2007. Pp. 125–128.
150
Bibliography
[Kla05]
M. Klar. “Design of an endoscopic 3-D Particle-Tracking Velocimetry
system and its application in flow measurements within a gravel layer”.
PhD thesis. University of Heidelberg, 2005. URL: http://archiv.ub.
uni- heidelberg.de/volltextserver/volltexte/2005/5961/pdf/
klar_PHD2005.pdf.
[KRI06]
T. Kahlmann, F. Remondino, and H. Ingensand. “Calibration for increased accuracy of the range imaging camera SwissRanger”. In: International Archives of Photogrammetry, Remote Sensing and Spatial
Information Sciences XXXVI.5 (2006). Pp. 136–141.
[Köt03]
U Köthe. “Edge and junction detection with an improved structure tensor”. In: Proc. of 25th DAGM Symposium. Ed. by B. Michaelis and G.
Krell. Vol. 2781. Lecture Notes in Computer Science. DAGM. Magdeburg 2003. Pp. 25–32. URL: http://kogs- www.informatik.unihamburg.de/~koethe/papers/structureTensor.pdf.
[KW93]
H. Knutsson and C. F. Westin. “Normalized and Differential Convolution: Methods for Interpolation and Filtering of Incomplete and Uncertain Data”. In: CVPR. New York City 1993. Pp. 515–516.
[Lan00]
Robert Lange. “Time-of-Flight Distance Measurement with Custom
Solid-State Image Sensors in CMOS/CCD-Technology”. English. PhD
thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2000.
[LK06]
M. Lindner and A. Kolb. “Lateral and Depth Calibration of PMDDistance Sensors”. In: International Symposium on Visual Computing
(ISVC06). Vol. 2. Lake Tahoe, Nevada: Springer, 2006. Pp. 524–533.
ISBN: 978-3-540-48626-8.
[LP02]
Prof. Dr. Wolfgang von der Linden and DI Alexander Prüll. Wahrscheinlichkeitstheorie, Statistik und Datenanalyse. Course-Script, Institute of
Theoretical and Computational Physics, TU Graz, 2002. URL: http:
//itp.tugraz.at/LV/wvl/Statistik/A_WS_pdf.pdf.
[LS01]
R. Lange and P. Seitz. “Solid-state time-of-flight range camera”. In:
Quantum Electronics, IEEE Journal of 37.3 (2001). Pp. 390–397.
[Lua01]
Xuming Luan. “Experimental Investigation of Photonic Mixer Device
and Development of TOF 3D Ranging Systems Based on PMD Technology”. English. PhD thesis. Siegen, Germany: Department of Electrical
Engineering and Computer Science, 2001.
151
Bibliography
[MF81]
L. Mortara and A. Fowler. “Evaluations of charge-coupled device
(CCD) performance for astronomical use”. In: Proc. SPIE 290 (1981).
Pp. 28–33.
[Müh04]
Matthias Mühlich. “Estimation in Projective Spaces and Applications in
Computer Vision”. PhD thesis. Johann Wolfgang Goethe Universität in
Frankfurt am Main, 2004.
[MM01]
Matthias Mühlich and Rudolf Mester. “Subspace Methods and Equilibration in Computer Vision”. In: Proceedings of Scandinavian Conference on Image Analysis SCIA 2001 Bergen. 2001.
[MSB01]
C. Mota, I. Stuke, and E. Barth. “Analytic solutions for multiple motions”. In: Proc. of International Conference on Image Processing. Vol. 2.
2001. Pp. 917–920.
[MV06]
Kurt Meyberg and Peter Vachenauer. Differentialgleichungen, Funktionentheorie, Fourier-Analysis, Variationsrechnung. Vol. 2. Höhere Mathematik. Berlin ; Heidelberg: Springer, 2006. XIII, 457 S. ISBN: 3-54041851-2, 978-3-540-41851-1.
[NGK94]
Klas Nordberg, Gösta H. Granlund, and Hans Knutsson. “Representation and Learning of Invariance”. In: ICIP (2). 1994. Pp. 585–589.
URL: http://dblp.uni-trier.de/db/conf/icip/icip1994-2.html#
NordbergGK94.
[Pap+06]
Nils Papenberg et al. “Highly Accurate Optic Flow Computation with
Theoretically Justified Warping.” In: International Journal of Computer
Vision 67.2 (2006). Pp. 141–158.
[Pla06]
Matthias Plaue. Analysis of the PMD Imaging System. Tech. rep. Interdisciplinary Center for Scientific Computing (IWR), Univ. of Heidelberg,
2006.
[PM90]
P. Perona and J. Malik. “Scale Space and Edge Detection using
Anisotropic Diffusion”. In: PAMI 12 (July 1990). Pp. 629–639.
[Rap07]
Holger Rapp. “Experimental and Theoretical Investigation of Correlating TOF-Camera Systems”. MA thesis. Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, 2007.
[RL87]
Peter J. Rousseeuw and Annik M. Leroy. Robust Regression and Outlier
Detection. Wiley & Sons, New York, 1987.
152
Bibliography
[Sch00]
H. Scharr. “Optimale Operatoren in der Digitalen Bildverarbeitung”.
PhD thesis. Heidelberg, Germany: University of Heidelberg, 2000.
[Sch03]
Bernd Schneider. “Der Photomischdetektor zur schnellen 3DVermessung für Sicherheitssysteme und zur Informationsübertragung
im Automobil”. PhD thesis. Siegen, Germany: Department of Electrical
Engineering and Computer Science, 2003.
[Sch+98]
R. Schwarte et al. “Novel 3D-vision systems based on layout optimized PMD-structures”. German. In: Technisches Messen 65.7-8 (1998).
Pp. 264–271. ISSN: 0171-8096.
[SG02]
H. Spies and C. S. Garbe. “Dense Parameter Fields from Total Least
Squares”. In: Pattern Recognition. Ed. by L. Van Gool. Vol. LNCS
2449. Lecture Notes in Computer Science. Zurich, CH: Springer-Verlag,
2002. Pp. 379–386. URL: http : / / books . google . com / books ? id =
0xcL1dIafSUC.
[SJB00]
H. Spies, B. Jähne, and J. L. Barron. “Regularised Range Flow”. In:
ECCV. Ed. by D. Vernon. Vol. 2. Lecture Notes in Computer Science
1843. Dublin, Ireland: Springer, 2000. Pp. 785–799.
[Spi01]
H. Spies. “Analysing Dynamic Processes in Range Data Sequences”. PhD
thesis. Heidelberg, Germany: University of Heidelberg, 2001.
[Spi+99]
Hagen Spies et al. “Differential Range Flow Estimation”. In: DAGMSymposium. 1999. Pp. 309–316.
[Ste99]
Charles V. Stewart. “Robust Parameter Estimation in Computer Vision”. In: Society for Industrial and Applied Mathematics, SIAM 41.3
(1999). Pp. 513–537.
[Stu+03]
I. Stuke et al. “Estimation of multiple motions: regularization and performance evaluation”. In: Image and Video Communication and Processing,
Proceedings of SPIE. Vol. 5022. 2003. Pp. 75–86.
[Tel+06]
Alexandru Telea et al. “A Variational Approach to Joint Denoising,
Edge Detection and Motion Estimation.” In: DAGM-Symposium. 2006.
Pp. 525–535.
[TM98]
Carlo Tomasi and Roberto Manduchi. “Bilateral Filtering for Gray and
Color Images.” In: ICCV. 1998. Pp. 839–846. URL: http://dblp.unitrier.de/db/conf/iccv/iccv1998.html#TomasiM98.
153
Bibliography
[TRK01]
Yanghai Tsin, Visvanathan Ramesh, and Takeo Kanade. “Statistical Calibration of CCD Imaging Process”. In: In IEEE International Conference
on Computer Vision. 2001.
[Tsc02]
D. Tschumperle. “PDE’s based regularization of multivalued images and
applications”. PhD thesis. Université de Nice-Sophia, 2002.
[Tsc06]
David Tschumperlé. “Fast Anisotropic Smoothing of Multi-Valued Images using Curvature-Preserving PDE’s”. In: Int. J. Comput. Vision
68.1 (2006). Pp. 65–82. ISSN: 0920-5691. DOI: http://dx.doi.org/
10.1007/s11263-006-5631-z.
[VHV91]
S. Van Huffel and J. Vandewalle. The Total Least Squares Problem:
Computational Aspects and Analysis. http://www.netlib.org/vanhuffel/.
Philadelphia: Society for Industrial and Applied Mathematics, 1991.
[Wag03]
C. Wagner. “Informationstheoretische Grenzen optischer 3D-Sensoren”.
PhD thesis. Universität Erlangen-Nürnberg, 2003.
[Wes94]
C. F. Westin. “A Tensor Framework for Multidimensional Signal Processing”. PhD thesis. Linköping, Sweden: Linköping University, 1994.
[Xia+06]
J. Xiao et al. “Bilateral Filtering-Based Optical Flow Estimation with
Occlusion Detection”. In: ECCV06. 2006. I: 211–224.
[Xu99]
Zhanping Xu. Investigation of 3D-Imaging Systems Based on Modulated
Light and Optical RF-Interferometry (ORFI). English. Vol. 14. ZESS
Forschungsberichte. Aachen, Germany: Shaker Verlag, 1999. ISBN: 38265-6736-6. URL: http://www.shaker.de/Online-Gesamtkatalog/
Booklist.idc?Reihe=102.
[Yam+93]
F. Yamamoto et al. “Three-dimensional PTV based on binary crosscorrelation method”. In: JSME International Journal, Series B 36.2
(1993). Pp. 279–284.
154
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement