Dissertation submitted to the Combined Faculties for the Natural Sciences and for Mathematics of the Ruperto-Carola University of Heidelberg, Germany for the degree of Doctor of Natural Sciences put forward by Diplom-Physiker Martin Otmar Schmidt born in Nürnberg Oral examination: 5. November 2008 Spatiotemporal Analysis of Range Imagery Referees: Prof. Dr. Bernd Jähne Prof. Dr. Ulrich Platt Zusammenfassung Die vorliegende Arbeit befasst sich mit der Fragestellung, wie aus einer Tiefenbildsequenz das zugehörige dreidimensionale Bewegungsfeld bestimmt werden kann. Wir untersuchen das Signal von Tiefenkameras, die auf dem Laufzeitverfahren basieren und sich eines neuartigen optoelektronischen Bauelements bedienen, dem Photomischdetektor (PMD). Dieser liefert neben der Tiefe auch Informationen zur mittleren Strahlungsleistung und deren Modulationsamplitude. Wir erörtern wie dieser erweiterte Informationsgehalt genutzt werden kann. Die Rekonstruktion eines Bewegungsfeldes aus einer Bildsequenz ist ein schlecht gestelltes inverses Problem und kann allgemeingültig nicht gelöst werden. Überdies enthält das raumzeitliche Signal einer PMD-Kamera diverse, teilweise sehr spezifische, systematische und statistische Fehler von explizit räumlicher wie zeitlicher Abhängigkeit (z.B. Bewegungsartefakte). Wir analysieren die unterschiedlichen Fehler und entwickeln ein Verfahren zur Korrektur systematischer Tiefensignalfehler. Mit einem neuartigen Two-State-Channel-Smoothing verbessern wir von Rauschen und Ausreißern verfälschte Tiefenkarten. Wir erweitern das Strukturtensorverfahren, um damit erstmals den erweiterten Informationsgehalt der PMD-Kameras zur Verbesserung der Bewegungsschätzung zu nutzen und Aussagen zur Güte der Schätzung zu ermöglichen. Bei den entwickelten Algorithmen wurde darauf geachtet, dass deren Berechnungskomplexität eine Verwendung in eingebetteten Systemen nicht ausschließt. Die Algorithmen werden anhand von synthetischen und realen Einzelbildern wie auch Bildsequenzen überprüft. Abstract The present thesis handles the topic of how to determine the three dimensional motion field from a corresponding sequence of range images. We investigate signals given by range cameras that are based on the time-of-flight principle for which they employ the novel optoelectronic photonic-mixer-device (PMD). Its signal comprises information about the range, the mean radiant flux and its modulation amplitude. We discuss how to take advantage of this wealth of information. The estimation of a motion field from image sequences is an ill-posed inverse problem which can not be solved in general. Moreover, the spatiotemporal signal of a PMD-camera is corrupted by several kind of, partially rather specific, errors of systematic and statistical nature depending explicitly on time and space (e.g. motion-artifacts). We analyze those errors and develop a method to correct for systematic errors in the range signal. By means of a novel two-state-channel-smoothing we improve range images corrupted by noise and outliers. We use and extend the structure tensor approach to come for the first time to an improved motion estimate that exploits the PMD-signal and provides an inherent measure for its confidence. The presented algorithms were developed under the premise to be of a computational complexity that not forbids their application within an embedded system. They are tested on synthetic and real images and image sequences. Acknowledgments I gratefully acknowledge the support of many people who contributed in various ways to the completion of this thesis. First of all I would like to thank Prof. Dr. Bernd Jähne for giving me the opportunity to work on various interesting topics of computer vision and for supervising my thesis. I am grateful for his kind support in both scientific and organizational issues. I thank Prof. Dr. Ulrich Platt for agreeing to act as the second referee. Thanks go to the staff of the IWR and HCI that do an excellent job in keeping things running, especially to Barbara Werner and Karin Kubessa-Nasri for making bureaucracy less painful and Dr. Hermann Lauer, Markus Riedinger and Dr. Ole Hansen letting the data streams flow right were they should. I am grateful to Pavel Pavlov, the most suave person I know, for giving work at the office a congenial feel and being a inexhaustible source of mathematical knowledge. I would like to thank Dr. Michael Klar for giving me an introduction to camera calibration and support in various related algorithmic issues. A big thank-you goes to PD Dr. Ullrich Köthe, who always took the time to answer my questions on various image processing topics. Thanks to PD Dr. Christoph Garbe for giving suggestion and tips on various topics. For proof-reading and comments on the thesis I am deeply grateful to Dr. Achim Falkenroth, Roland Rocholz, Claudia Kondermann, Zhuang Lin, Andreas Schmidt and Marion Zuber. I enjoyed working at the lab, which I blame mostly the Windis for and in particular Dr. Kai Degreif for introducing me to the small wind-wave-flume, Dr. Achim Falkenroth, Roland Rocholz for the various discussion on water-wave-measurements, Dr. Uwe Schimpf, Alexandra Herzog offering always some tea, Kerstin Richter, Florian Huhn, Rene Winter, Steffen Haschler, and last but not least Dr. Günther Balschbach for giving excellent administrative support - thank you all. With respect to the research done for the PMD-cameras I would like to thank Holger Rapp for all his work with the experimental setup, Mario Frank giving me access to his range measurements, Matthias Plaue for discussions about the PMD’s working vii principle, Dr. Markus Jehle for experimental and theoretical support, Michael Erz for the demodulation measurements and Dr. Hagen Spies for advice on range flow algorithms. Many thanks go to Dr. Björn Menze and Dr. Michael Kelm (telling me what the prior does in the monastery and why he ROCks under the trees of a random forest), Dr. Linus Görlitz, Daniel Kondermann (helping Charon over the Styx), Dr. Ralf Küsters, Christoph Sommer, Dr. Nikolaos Gianniotis, Dr. Marc Kirchner, Prof. Dr. Fred A. Hamprecht, Björn Andres, Bernhard Renard, Michael Hanselmann, Frederik Kaster, Sebastian Boppel, Bjoern Voss, Jörg Greis, Stephan Kassemeyer, Lars Feistner and Natalie Müller. I would like to thank all the people of the IWR, HCI and IUP that gave me a cheerful time in Heidelberg, particularly the members of DIP, MIP and IPA group. Along my time at the IWR I worked together with numerous external collaborators on various projects whom I would like to thank as well. Thanks to all members of the LOCOMOTOR team for interesting discussions and friendly collaboration; especially Dr. Ingo Stuke for giving me support with his multiple motion algorithms, Dr. Hanno Scharr for illuminating talks and Dr. Kai Krajsek for a crash course in Kriging. I enjoyed working together with the members of the Bosch corporate research team in Hildesheim (CR/AEM5), in particular Henning Voelz; very likely this was the first and last time in my life, that I can say my job is to drive a BMW through sun, rain, and snow around Heidelberg. I appreciate the support of PD Dr. Michael Felsberg by giving me access to his channel smoothing algorithm. I would like to thank all collaborators within the Smartvision project, in particular Hermann Hoepken for the fruitful and pleasant cooperation on the demonstrator. Working within the LYNKEUS project was a pleasant experience. Thanks to all collaborators and in particular to Sandra Stecher for her friendly and straight cooperation, Prof. Dr. Andreas Kolb and Maik Keller, giving me support for the TOFSimulator and trying to find solutions for my application specific problems, and Stefan Fuchs for the egomotion sequences. I gratefully acknowledged the financial support of the BMBF within the project LYNKEUS (promotional reference: 16SV2296). Last but not least, I would like to thank my parents, my brothers and my friends (in particular the Kumperla) for their words of encouragement and emotional support. Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 5 I 7 Theory 2 Range Data and Time-of-Flight Measuring Principle 2.1 Optical Range Measurement Techniques . . . . . . . . . . . . . . . . . 2.1.1 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Time-of-Flight Based Methods . . . . . . . . . . . . . . . . . . 2.1.3 Interferometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Photonic Mixer Device . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Demodulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1.1 Sinusoidal Modulation . . . . . . . . . . . . . . . . . . 2.2.1.2 Rectangular Modulation . . . . . . . . . . . . . . . . 2.2.1.3 Demodulation Contrast . . . . . . . . . . . . . . . . . 2.2.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2.1 Systematic Errors . . . . . . . . . . . . . . . . . . . . Periodic Phase Error . . . . . . . . . . . . . . . . . . . . Fourier Approximation . . . . . . . . . . . . . . . . . . . Constant Phase Error per Pixel . . . . . . . . . . . . . . Overexposure and Saturation . . . . . . . . . . . . . . . Exposure Time / Amplitude Dependent Phase Deviation 2.2.2.2 Random Errors . . . . . . . . . . . . . . . . . . . . . . 9 9 10 11 12 13 16 17 19 20 21 22 23 23 25 27 28 30 3 Image Processing and Filters 33 3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.1 Discretization and Sampling . . . . . . . . . . . . . . . . . . . . 33 Derivatives and Gradient . . . . . . . . . . . . . . . . . 33 ix 3.1.2 3.1.3 3.1.4 3.2 3.3 Fourier Transform . . . . . . . . . . . . Interpolation . . . . . . . . . . . . . . . Convolution, Point Spread Function and Filter Design and Optimization . 3.1.5 Normalized Averaging . . . . . . . . . . Band Enlarging Operators . . . . Edge Preserving Smoothing . . . . . . . . . . . 3.2.1 Robust Estimators . . . . . . . . . . . . 3.2.2 Bilateral and Diffusion Filtering . . . . Two State Channel Smoothing . . . . . . . . . . . . . . . . . . . Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Motion Estimation 4.1 Optical Flow and Range Flow . . . . . . . . . . . . . . . . . 4.1.1 Optical Flow and Motion Field . . . . . . . . . . . . 4.1.1.1 Barber’s pole illusion and complex motion 4.1.2 Brightness Change Constraint Equation . . . . . . . 4.1.3 Aperture Problem . . . . . . . . . . . . . . . . . . . 4.1.4 Range Flow Constraint Equation . . . . . . . . . . . 4.1.5 Aperture Problem Revisited . . . . . . . . . . . . . . 4.1.6 Local and Global Flow Estimation . . . . . . . . . . 4.1.6.1 Local Total Least Squares Estimation . . . Gradient Based Weighting . . . . . . . . . . . Minimum Norm Solutions . . . . . . . . . . . 4.1.6.2 Regularization of Local Flow Estimates . . 4.1.6.3 Performance Issues . . . . . . . . . . . . . . 4.1.7 Confidence and Type Measure . . . . . . . . . . . . 4.1.8 Combining Range and Intensity Data . . . . . . . . 4.1.9 Equilibration . . . . . . . . . . . . . . . . . . . . . . 4.2 Motion Artifacts . . . . . . . . . . . . . . . . . . . . . . . . II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 37 39 41 42 43 44 44 49 53 . . . . . . . . . . . . . . . . . 61 62 62 63 65 67 69 72 74 74 77 78 79 80 81 83 86 88 Experiments and Applications 93 5 Testbench Measurements 5.1 Experimental Setup . . . . . . 5.1.1 Power Budget . . . . . . 5.2 Depth and Amplitude Analysis 5.2.1 Fixed Pattern Noise . . 95 95 98 100 101 x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 5.2.3 Range Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 102 Interdependence of Amplitude and Range . . . . . . . . . . . . 105 6 Applications 6.1 Still Image Processing . . . . . . . . . . . . . . . 6.2 Synthetic Test Sequences . . . . . . . . . . . . . Tabular Result Scheme . . . . . . . Algorithms and Performance Issues 6.2.1 Motion of a plane . . . . . . . . . . . . . 6.2.2 Motion of a sphere . . . . . . . . . . . . . 6.3 Real World Sequences . . . . . . . . . . . . . . . 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 . 111 . 115 . 115 . 117 . 118 . 123 . 127 . 133 7 Conclusion and Outlook 135 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.2 Evaluation and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 137 III Appendices 141 Acronyms and Notation 145 Bibliography 147 xi xii Chapter 1 Introduction 1.1 Motivation There is a vast number of tasks in science and industry that involve the analysis of image data. Generally, these image data are images of two or three dimensional kind, i.e. projections of features of some physical object or scene on a two or three dimensional (regular) grid. These features are physical properties like the spectral reflectance of a surface (i.e. its color) or the absorbance of body tissue. The data is acquired with some imaging device like a (digital) camera, microscope, computer tomography scanner or magnetic resonance scanner, to name only a few. The sampled information can be scalar, vector valued (e.g. multispectral imaging) or even of tensorial kind. Sometimes it is sufficient to analyze single images. But with the advances in computing power and storage capacity more and more one wants to utilize the additional information embedded in the temporal domain. Or spoken differently, some problems can be tackled properly only if their inherent dynamic nature is caught in image sequences. Often the motion of single or multiple objects is of interest, like for time-to-collision estimation in automotive industry. Sometimes the temporal evolution of (non rigid, deformable) surfaces is studied, e.g. the motion of waves on the water-surface. Even if the dynamics itself are not of interest they still might need to be accounted for, because they introduce perturbations that need to be corrected. As for instance image registration techniques in medical imaging try to compensate the motion of a patient during the (long time) acquisition of an image series. Particularly for 1 Chapter 1. Introduction many scientific applications an exact measurement of the motion field is of major importance (e.g. calculation of growth rates of plant leaves). The calculation of a three dimensional physical motion field from a corresponding (2D) image sequences is however not an easy problem to solve, as we will discuss in the context of this thesis. In fact it can not be solved generally. The recent development of the so called photonic-mixer-device (PMD) marks a substantial progress towards that goal to reconstruct the physical motion field from an image sequence. For with them not only the irradiance information (known as gray value of conventional image sensors) but additionally the distance to the observed (object-) surface can be acquired at the same time. A PMD-sensor is an integrated circuit of an array of single PMD-pixels. It measures the distance based on the time-of-flight principle using an active illumination. The observed scene is illuminated by infrared, modulated, incoherent light which is reflected and gathered in an array of solid-state image sensors comparable to those used in common digital (CMOS/APS) cameras. The major benefits in comparison with range measurement techniques like stereo vision or laser scanners are • real-time range- and brightness-image acquisition • no correspondence problems and low algorithmic effort • inexpensive (because conventional) manufacturing process • compact and robust ”off-the-shelf”-cameras are available Before we continue, we need to state some words about the denotation of some terms we use throughout this thesis. If we are talking of the PMD-technique or simply the PMD, we mean sensors or cameras that are based on the principles of optoelectronic modulation based time-of-flight measurement, that we present in section 2.2. While the acronym PMD (photonic-mixer-device) relates to a specific realization of this technique protected by patent (held by PMDTechnologies GmbH), we still use it for all similar realization as there is no other common acronym for this technique. With gray value sensors we denote conventional imaging devices (digital cameras) that measure the irradiance on the sensor pixels. By 3D motion field we mean a two dimensional field of three dimensional velocity vectors that corresponds to the physical motion of surface patches projected on a two dimensional array (the sensor) by some optical receiver (an objective). 2 1.2. Outline The image-signal of the PMD-sensor comprises information about the range, the mean radiant flux and its modulation amplitude. In the present thesis we show how to utilize this extended information content for the improvement of the range measurement itself and for the estimation of an exact 3D motion field corresponding to an acquired image sequence. We illuminate the PMD-measurement technique in the context of image processing, particularly motion estimation. The signal of a current PMD-sensor bears several kind of errors. Some of them are rather specific to the PMD. For example motion-artifacts or a bias in the rangemeasurement that is modulated in its magnitude with increasing distance. The error types are of systematic and statistical nature and depend explicitly on time and space. We analyze those errors and develop a method to correct for the bias in the range signal. The work presented in this thesis partially evolved under collaboration within the LYNKEUS project, funded by the Federal Ministry of Education and Research (BMBF). The project addresses problem solutions by means of 3D-vision for applications in automation engineering and robotic, autonomous vehicles, man-machine interaction, safety engineering, medical engineering and quality control. Therefore the computational efficiency was an important constrain for the algorithms to develop. For the solution of tasks that might be implemented on an embedded system (like e.g. motion estimation), we tried to come up with algorithms of not more than linear complexity in time and space and we avoided iterative algorithms whenever possible. 1.2 Outline The single chapters of this thesis are not self-contained. The subject is just too complex for that to be possible. However, we tried to always refer to the respective sections in the thesis when a concept or term is used for the first time within another chapter and we do not believe it to be common knowledge. A reader who just wants to have a look at a specific section and is not about to read the thesis from beginning, is advised to use the electronic version of the thesis, as following the references via the hyperlinks is possible with ease. For the introduction to motion estimation in chapter 4 we recommend the electronic version too, because there is an illustrating animation embedded that (naturally) can be viewed only using the electronic version (with Acrobat Reader). 3 Chapter 1. Introduction The thesis is split in two parts: Theory and Experiments and Applications. In the former we give the theoretical fundament for the proposed methods and analyze errors under simplified, idealized assumptions. In the latter we present and discuss results of the experimental analysis regarding data acquisition, still image processing and motion estimation. The content of the chapters is the following Chapter 2 We give a short overview of common range measurement techniques and describe the relevant details of the PMD measuring principle for image processing. We specify the features of the PMD, both the advantageous and the problematic ones. We analyze its errors of common and specific nature and present a new method to correct for a specific kind of systematic error in the range measurement. Chapter 3 We describe image processing particularly in the context of filtering. We explain the basics of linear filtering and introduce more advanced concepts of non-linear kind, like robust estimation. We show how the PMD-signal can be exploited by means of non-linear filtering and present an extension to B-spline channel smoothing, that we named two state channel smoothing, to improve range imagery in the presence of noise and outliers. Chapter 4 We give an introduction to motion estimation and illuminate the general problem that it poses. We describe how a three dimensional motion field can be calculated from the PMD-signal, by extending the structure tensor approach to motion estimation for the specialties of an active vision system like the PMD. We describe a specific error of PMD-range measurement that occurs in the presence of motion, so called motion artifacts. Chapter 5 We describe the experimental setup that was used to acquire image sequences and calibration data. We exemplify diverse errors of the PMD and correct for fixed-pattern-noise and systematic range deviations. Particularly we study the interdependence of range and amplitude, important for motion estimation. Chapter 6 We present and discuss results for still image processing and motion estimation on synthetic and real-world data. We systematically investigate by various examples the basic difficulties in 3D motion estimation from range imagery. We give a proof-of-concept of the method, demonstrate its successful application on real-world data and discuss its limitations. 4 1.3. Contribution Chapter 7 We give a résumé of the achieved results and an evaluation of the PMDtechnology w.r.t. motion estimation. We try to highlight important topics that should be addressed regarding the sensor technology as well as algorithmic aspects of motion estimation. In the appendix III the reader may find a list of acronyms and abbreviations and a description of the notation used throughout the thesis. The author hereby apologizes if he should have failed to adhere this notation style anywhere and thereby diminished the comprehensibility of the text. 1.3 Contribution The following is a list of what the author believes to be the novel contributions of this thesis • a largely self contained presentation of 3D motion field estimation using the structure tensor approach extended for PMD-technology • a concise overview on the most prominent error types of PMD-sensors, particularly those introducing interdependencies in the range and amplitude measurement • a novel method to correct for a specific kind of systematic error in the range measurement • an extension for B-spline channel smoothing to improve range imagery in the presence of noise and outliers, which exploits the technicalities of the PMDsignal • an extension for the structure tensor approach to motion estimation that takes advantage of the wealth of information embedded in the PMD-signal, achieving betters motion estimates w.r.t. the aperture problem and in the presence of noise • first time application of the structure tensor approach to 3D motion estimation on PMD-data 5 Chapter 1. Introduction 6 Part I Theory 7 Chapter 2 Range Data and Time-of-Flight Measuring Principle 2.1 Optical Range Measurement Techniques There is an abundance of techniques for measuring distances or ranges ∗ to objects. Objects can be punctiform like a star in the sky or spread-out like the ground of the sea below a ship. What kind of range or object is of interest depends on the specific application, just like the method that is suitable for doing the measurement. Range measurements can be contactless (via sound or light) or tactile. Tactile measuring devices can be very simple like a plumb line or fancy like an atomic force microscope. Computer vision implies the use of contactless optical techniques for range measurement; in most cases used to find an accurate description of surfaces. The wealth of methods there are, demonstrates the importance of range measurement for all kinds of applications. The most important optical range measurement techniques can be divided into three categories: Triangulation, interferometry and time-of-flight based methods. 9 Chapter 2. Range Data and Time-of-Flight Measuring Principle Figure 2.1: A stereo camera setup (from [Jäh02]) 2.1.1 Triangulation Triangulation is based on the fact that a scene is seen under a different viewing angle if the viewing position changes. The difference in the viewing angle causes a shift in the projected image, from which the distance to the elements of the scene can be inferred, if specific conditions are met. In Stereoscopy, which is a specific realization of the triangulation principle, depicted schematically in figure 2.1, a point X is projected onto two different positions on the image plane of two different cameras with parallel optical axes, separated by a distance b, the stereoscopic basis. The difference in the position is called disparity or parallax p and is inverse proportional to the range X3 , as can be derived from geometrical optics: X1 + b/2 X1 − b/2 d0 p = rx1 − lx1 = d0 − =b (2.1) X3 X3 X3 Assuming the noise in the parallax measurement to be Gaussian, with a standard deviation σp , and applying the laws of error propagation, we deduce that the absolute precision of a range estimate decreases with the squared distance. z ≡ X3 = ∗ b d0 z2 bd0 ; σz = 2 σp = σp p p b d0 (2.2) The commonly used term range, can be somewhat confusing, as it is also a synonym for domain; at passages where the meaning might be ambiguous, we will use the terms depth range and distance synonymous with range to clarify. Besides, from a physical point of view, a distance measurement is, because of Heisenberg’s uncertainty principle, always associated with a range of distances (or rather a probability distribution) and never with a precise single value 10 2.1. Optical Range Measurement Techniques Another variant of triangulation is Active Triangulation, which replaces one of the cameras by a light projector. Triangulation suffers from the fact, that the observed object needs to be textured, because corresponding points in two images need to be identified. If this is not possible this is typically due to the so called aperture problem that can not be solved (see section 4.1.3 for a discussion of the problem in the context of motion estimation); then the parallax, which is essential for triangulation based range estimation, can not be determined. Moreover, one needs at least two, preferably calibrated, optical devices if the observed objects are moving.† Those systems are typically neither cheap nor easy to maintain, due to necessary calibrations, if not operated in an ideal environment. 2.1.2 Time-of-Flight Based Methods The basic idea of time-of-flight based methods (TOF) is to determine the delay on a signal (typically on an electromagnetic carrier wave) induced by the time it needs to travel a certain distance. This delay is directly given as time τ needed to cover twice the distance z between a detector and an object, for a light pulse that travels to an object, is (diffusely) reflected from it and then detected on the same position from which it was sent: τ= 2z , where c is the speed of light. c (2.3) With the upper equation for an ideal time-of-flight measurement, we infer that in contrast to triangulation the precision of the distance measurement is independent of the distance but direct proportional to the precision of the time measurement τ : z= c c τ ; σz = στ . 2 2 (2.4) Using a light pulse or pulses (i.e. pulse modulation) and measuring the time delay is rather demanding regarding the speed of light and typical distances to be measured, because frequencies in the order of GHz and above need to be handled properly. † If the objects are not moving while the image acquisition takes place, one could also move the camera to generate the parallax. 11 Chapter 2. Range Data and Time-of-Flight Measuring Principle Using continuous wave modulation (CW), the carrier wave is modulated periodically with a frequency ν. Here, not a time delay but a phase shift φ between outgoing and incoming signal is determined: z= c c φ ; σz = σφ . 4π ν 4π ν (2.5) The phase shift can be determined by correlating the signals in time. This is less demanding and more robust with respect to tolerances of the used measuring device components compared to pulse modulation. The chosen modulation frequency ν determines the distance range ∆z = c/2ν that can be measured uniquely and which is known as unambiguity range. Because the phase shift φ used to calculate z is a cyclic variable, distances z 0 above ∆z yield an erroneous range z = z 0 mod ∆z. 2.1.3 Interferometry Interferometry can be regarded as a special case of time-of-flight using continuous wave modulation, where the modulation is given by the frequency of the electromagnetic wave itself. The radiation needs to be coherent as otherwise the crosscorrelation of out- and incoming signals can not be used to determine the phase shift, i.e. the waves do not interfere. For an electromagnetic wave, c is ν · λ. Substitution in equation (2.5) yields: z= λ λ λ φ ; σz ∼ σφ and ∆z = 4π 4π 2 (2.6) Detailed analysis shows that for classical interferometry, as realized in a Michelson interferometer, σz is proportional to the inverse of the distance or the aperture of the observation, as it features an optical averaging over the micro-topology of the object under investigation: σz ∼ z −1 ([Häu+99]). To overcome limitations given by the small, wavelength-determined unambiguity range ∆z, multiwavelength interferometry can be used. Due to speckle noise that occurs when coherent light is reflected from a rough surface, classical interferometry is only suited well for smooth surfaces. White light interferometry, more precisely coherency radar, uses a radiation that has a coherence length of only a few wavelengths, thus interference patterns arise only for 12 2.2. The Photonic Mixer Device distances of this coherence length. The range is found while scanning over a distance range and looking for interference of maximum contrast. As the interference contrast is used as the basis for the range measurement, unlike phase information in classical interferometry, speckle noise has no influence and rough surfaces can be measured. There are numerous interferometry methods for all kinds of applications reaching the highest possible depth-resolution and competitive measuring ranges of several meters, at the cost of a sensitive, expensive, highly specialized and therefore inflexible setup. 2.2 The Photonic Mixer Device The following section is a condensed description of the photonic mixer device (PMD) based on the work of Lange [Lan00]; Heinol [Hei01]; Justen [Jus01]; Schneider [Sch03] and our theoretical and experimental findings (see also chapter 5). The focus is on highlighting aspects that are relevant for the tasks of image processing, especially motion estimation, avoiding the very details of technical realization. Furthermore, we describe a new way to correct for systematic errors in the phase measurement and give a new formula for calculating the phase shift of purely rectangular modulated signals. The photonic mixer device realizes TOF-range measurement by continuous wave modulation. Compared to more conventional systems, that do the necessary detection of the optical signal and its cross-correlation with the reference signal separately from each other, the PMD integrates both in one semiconductor based circuit. By moving the process of mixing and correlation of the signals from a separate electronic component to the integrated optoelectronic interface, the major sources of errors of conventional systems are avoided. Moreover, the parallelization of range point measurements on a sensor matrix, as it is essential for a 3D-camera system, is significantly simplified [see Sch+98]. Figure 2.2 illustrates the principle of concurrent detection and mixing, i.e. the simultaneous generation of photoelectrons and mixing with the electronic reference signal, which is the essential new feature of the PMD. Today’s PMD-sensors combine the functionality of the CCD-principle with the benefits of a realization in a CMOS-process see [Lan00]. The CCD principle offers an almost noise-free addition of optically generated charge carriers and defined local and temporal charge separation in the charge domain, being crucial for an optimal performance of a PMD. As the PMD successively adds short-time integrated sampling 13 Chapter 2. Range Data and Time-of-Flight Measuring Principle Modulated Reference Signal ∫ 3D Scene Controller Transmitting Optics Distance Signal Receiving Optics Figure 2.2: Principle of concurrent detection and mixing in the PMD (based on [Sch03]) points, a reasonable signal-to-noise ratio (SNR) can only be achieved, if this works practically free of noise. Furthermore, only the numerous addition of the short-time samples (i.e. the cross-correlation) suppresses the higher-order and non-harmonic frequencies in the modulation signal (that are unavoidable due to technical constraints) and allows the pixels to act as lock-in pixels, insensitive to frequencies other than the modulation frequency. CMOS-processes are widely available and as such relatively cheap. They allow the implementation of the PMD as an active pixel sensor (APS), what means pixels with an active stage [Fos93]. An APS permits random access pixels (and therefore application specific performance enhancements) and an improved SNR. The CCDprinciple can not only be realized within a CCD-process, but also within a CMOSprocess. The maximization of the charge transfer efficiency CTE, that is one of the major benefits of the CCD-process, is of minor importance with respect to the PMD, as only a few charge transfers are needed (for details see [Lan00, chap. 5]). Using CMOS circuitry, first signal processing steps (like the temporal sampling, i.e. demodulation) can already be realized in the pixel, while maintaining reasonable fill factors. The fill factor is of major importance, because the performance of the PMD-camera depends on the modulated signal from an active illumination. As the intensity of back-scattered light decreases with the square of the target’s distance to the camera, again the active illumination accounts for the need of a high dynamic range of the PMD-pixels. The pixel’s working principle is exemplary illustrated in figure 2.3a: The potential gradient in the semiconductor is controlled by applying proper gate voltages (umod,A 14 2.2. The Photonic Mixer Device a uout,A e e- e- e- e- e Separation Gate B Readout Diode B S(t) R(t+θ/ω) θ =0° uint = S(t) S(t) R(t+θ/ω) R(t+θ/ω) θ =90° θ =180° + n+ A B ee- x y (x) b uSep Modulation Gate B Modulation Gate A Readout Diode A Separation Gate A umod,A Modulation umod,B n+ Silicon dioxide p-Substrate - ee- ee- e- uout,B Readout Circuitry e- = 0° - 90° 180° 270° θ Figure 2.3: PMD-pixel’s working principle: a schematic structure of the circuit b sampling of the cross-correlation function (adapted from [Alb07]) and umod,B ) to the photogates A and B. The potential gradient is changed synchronously with the modulation of the incoming light, so the optically generated charge carriers are either driven into the left or the right integration gate (under the readout diodes A and B), each for half of the modulation period. For a typical modulation frequency of e.g. 20MHz and an integration time of 5ms these short time integrations repeat a hundred thousand times. It is worth noting that within one short time integration (for the upper example, half a period is 25ns) only a few electrons (or none, depending on the optical power) are generated and transported to the integration gates. As the accumulation of single electrons under the integration gate is essentially noise free, this is actually an advantage, because self-induced drift ‡ , which would lower the charge transfer to the integration gates, has no influence for the PMD [Lan00]. The accumulated charge in the capacity under one integration (or readout) gate, results in a measurable voltage uout at the readout diodes. This voltage approximates a cross-correlation of the optically generated input signal S(t) with the phase-shifted sampling signal R(t), the phase-shift θ being the correlation variable. Figure 2.3b illustrates the sampling for 3 different phase-shifts θ at 0◦ , 90◦ and 180◦ and the ‡ self-induced drift is induced by the coulomb forces between free charge carriers of same polarity, making them repel one another. It’s of practical significance only, if many free charge carriers are located close to each other, and as such depends heavily on the number of generated electrons. 15 Chapter 2. Range Data and Time-of-Flight Measuring Principle resulting voltages at the readout gates A and B. The electrooptical input-signal S(t), indicated by the red bars, is always the same (because the scene does not change). Only the sampling (or reference) signal is phase shifted. If the readout gates A and B are identical w.r.t. the manufacturing process then uout,A (θ + 0◦ ) = uout,B (θ + 180◦ ). The output signal is only an approximation of the correlation, because not the sampling signal but some not necessarily linear function, dependent on the sampling signal and properties of the semiconductor circuit, describes what fraction of the photoelectrons are deposited in the integration capacity. For example a generated photoelectron may recombine with its hole before it reaches the capacity, because it was generated too far from it. In the ideal case, the fraction would be one, as soon as the potential drives the electrons to the corresponding integration gate, and zero otherwise. This way the correlation would be that of a square wave (its range being {0, 1}) with the electrooptical signal of the same period. The phase of the square wave is shifted by 180◦ for the two different gates. For the ideal case of a sinusoidal modulated electrooptical signal, the correlate is sinusoidal too: R 2π 0 sin(x + ϕ)·H(sin(x + θ))dx = 2 cos(ϕ − θ) with H being the Heaviside step function. 2.2.1 Demodulation The described process of demodulation shall now be put in mathematically formalism, making some idealizing assumptions. The irradiance E(t) seen by the PMD-sensor may be modeled as E(t) = G0 0 + A0 ·M (t), with M (t) = M (t + T ). (2.7) M (t) is a periodic function with (time-) period T (and angular frequency ω = 2π/T ) that originates from the modulated illumination of a spatial scenery. G0 0 is a constant irradiance offset and A0 is the amplitude associated with the (normalized) modulation M (t). If we assume the photosensitive semiconductor to have a linear responsivity, the induced electrooptical signal S(t), which is proportional to the generated charge carriers, is S(t) = G0 + A·M (t), (2.8) with G0 and A just being linearly scaled versions of G0 0 and A0 . Of course this is only an approximation as both G0 and A are influenced by various optical and 16 2.2. The Photonic Mixer Device electrooptical properties of the system (e.g. transmission of optical filters or quantum efficiency of the PMD). The details of these properties are discussed in [Lan00; Lua01; Jus01; Sch03] and are out of the scope of this thesis. Because of the time it takes for the light to travel to the object and back again, the modulation M (t) is delayed for a phase-difference ϕ with respect to a reference modulation M 0 : M (t) = M 0 (t−ϕ/ω). Under ideal conditions, the PMD demodulates the signal S(t) by correlating it with a discretized version of M 0 , namely R(t) = H(M 0 (t)), H being the Heaviside step function. The cross-correlation function I(θ) reads mT Z θ I(θ) = S(t)·R(t + )dt. ω (2.9) 0 2.2.1.1 Sinusoidal Modulation Assuming the modulation being sinusoidal M 0 (t) = sin(ωt), we find for a correlation range of m (∈ N+ ) periods mT Z I(θ) = (G0 + A sin(ωt − ϕ))·H(sin(ωt + θ))dt (2.10a) 0 m ZT (G0 + A sin(ωt − ϕ − θ))·H(sin(ωt))dt = 0 T Z2 = m (G0 + A sin(ωt − ϕ − θ))dt 0 T 2 + Z ϕ+θ ω =m (G0 + A sin(ωt))dt (2.10b) ϕ+θ ω = mT A G0 cos (ϕ + θ) + π 2 , (2.10c) considering that H(sin(ωt)) is one for half a period [0, T /2] and zero otherwise. 17 Chapter 2. Range Data and Time-of-Flight Measuring Principle Equation (2.10c) describes the idealized demodulation for a sinusoidal modulation that is done by the PMD. A generalized version of it is given by Plaue [Pla06] for modulations M (t) that can be approximated by a Fourier series. I(θ) corresponds to the output of the PMD, and θ is the variable of the cross-correlation function that the PMD implements. Choosing a specific θ corresponds to sampling the crosscorrelation function. According to equation (2.9) we may select a specifc θ by phaseshifting the reference signal R(t) respectively. With respect to a TOF-camera, we are interested in the unknowns ϕ, A and G0 . ϕ corresponds to the range r by ϕ = 2ωr/c, c being the speed of light (see equation (2.5)). A is proportional to the amplitude of the modulation of the active illumination of the camera. G0 sums up background illumination and the DC-component of the active illumination. As we have three independent unknowns, we need at least 3 equations to find a solution. By taking samples of I(θ) for several θ between [0, 2π], we obtain those equations. One equation (2.10c) for each sample. As the output of the PMD is subject to noise, it is reasonable to take more samples to find the optimal solution in a least squares sense; anyhow the implementation of the PMD, which was described above, returns at its two output gates in one shot two samples shifted by 180◦ . Taking another shot with 90◦ (and therefore also 270◦ ) phase-shift gives us enough information to solve for the unknowns of equation (2.10c) optimally in a least square sense. Of course, one can take more than 4 samples to improve the variable estimation with respect to noise. For N equidistant sampling points, the optimal solution in a least squares sense is given by: "N −1 # X ϕ = arg In e−ıθn n=0 N −1 2π X −ıθn A= In e N G0 = 18 2 N n=0 N −1 X In n=0 with I(θn ) , mT n θn = 2π . N In = (2.11) 2.2. The Photonic Mixer Device A proof for an equal formulations of equation (2.11) can be found in [Pla06] or [Xu99]. It simplifies for N = 4 to: ϕ = arg [(I0 − I2 ) + ı(I3 − I1 )] πp A= (I0 − I2 )2 + (I3 − I1 )2 2 I0 + I1 + I2 + I3 G0 = 2 I(θn ) , mT with π θn = n . 2 In = (2.12) Most PMD-based TOF-cameras use essentially these equations to determine phase and amplitude. However, the amplitude A calculated here is that of the electrooptical input signal, while formulas given in literature (e.g. [LS01]) often calculate the amplitude of the correlation signal. It is important to be aware of the difference, if it comes to the interpretation of A as a physical property of the optical signal: equation (2.12) compensates for the so called demodulation contrast. In particular, if the demodulation contrast depends on the measurement itself (i.e. it is not a constant) the difference is of relevance, as we will discuss in section 2.2.1.3 on page 20. 2.2.1.2 Rectangular Modulation Now let us suppose that the modulation is not sinusoidal but rectangular. In equation (2.10b) we then have to replace sin by sgn ◦ sin, i.e. a square wave with range {−1, 1}. The resulting correlation function is a triangle wave: T 2 + Z ϕ+θ ω (G0 + A · sgn(sin(ωt)))dt I(θ) = m (2.13) ϕ+θ ω = mT where tri(φ) = tri(0) = π2 . π 2 A G0 tri (ϕ + θ) + π 2 , − arccos(cos(φ)), i.e. a triangle wave with range [− π2 , π2 ] and If we apply equation (2.12) to correlation samples (2.13) that result from a rectangular modulation, there is a systematic error in the phase estimation, as the model assumption of a sinusoidal modulation do not apply. We will discuss this in detail 19 Chapter 2. Range Data and Time-of-Flight Measuring Principle in section 2.2.2. Here we just want to state an exact solution for the unknowns in equation (2.13) given 4 equidistant sampling points: A = max [|(I0 − I2 ) + (I1 − I3 )| , |(I0 − I2 ) − (I1 − I3 )|] I0 − I2 1 1 ϕ = sgn(I1 − I3 ) · ( + )+ 4A 4 2 I0 + I1 + I2 + I3 G0 = 2 I(θn ) mT with π θn = n 2 In = (2.14) The author found the solution by analyzing the symmetries in the 4 correlation signals I(θn , ϕ) (2.13) and checked it by inserting these in the solution (2.14) using a computer algebra system. However, he is not sure if it is really new, as the problem seems to be a quite common; but he could find nothing similar in PMD-related literature. 2.2.1.3 Demodulation Contrast In optics, the modulation transfer function MTF is of major importance for the description of an optical system. It is based on the modulation (or modulation contrast) M , which typically refers to a spatial pattern: Lmax − Lmin Lmax + Lmin Mimage MTF = . Mobject M= (2.15) (2.16) L is the radiance (or luminance) and Mobject and Mimage the modulation of an object and its image. If the object has a modulation of 100% (if Lmin = 0) the MTF cancels to Mimage . Equation (2.16) is correct only if we assume there is only a single frequency (of e .g. a spatial pattern), as the MTF depends on the frequency. More precisely, the MTF is the magnitude of the (optical) transfer function of a (optical) linear system (see section 3.1.4), and describes the response of an optical system to an image decomposed into sine waves. The demodulation contrast is defined similarly but refers to a modulation in time: measured amplitude (2.17) measured offset It quantifies the PMD’s performance of charge separation. The amplitude of the sampled correlation I(θ) (2.10c), assuming a perfect charge separation, is with respect Cdemod = 20 2.2. The Photonic Mixer Device to the integration time m T attenuated by a factor of 1/π relative to the original modulation of S(t); the offset is G0/2. If we assume a modulation contrast of 100% of the electrooptical signal S(t) then G0 = A. So the demodulation contrast of this idealized PMD is: Cdemod = A π G0 2 = A π A 2 = 2 ≈ 64%. π The various realizations of the PMD, however, neither have a demodulation contrast of this magnitude nor it is constant. It tends to be below 40%. Moreover, Cdemod depends on the modulation frequency and the radiant energy deposit on a PMDpixel (and other hardwired system features, that are of no interest in the scope of this thesis) during the exposure [Lan00; Lua01]. While the dependence on frequency is negligible for motion estimation, as typically the modulation frequency is not changed during image acquisition, the dependence on radiant energy is of major importance: We want to use the amplitude of equations (2.12) or (2.14) as a measure for the radiance emitted from a (moving) objectpatch in the scene. We then can use this information in a physical motivated model of the scene for motion estimation, as will be discussed in section 4.1. If however the demodulation contrast itself depends on the radiance, we have to compensate for this, as (2.12) and (2.14) only apply for constant Cdemod . This can be achieved by doing a calibration measurement Cdemod subject to G0 and use it to correct the measured amplitude for the varying contrast. Then for example the amplitude calculated for a sinusoidal modulation in (2.12) becomes: p (I0 − I2 )2 + (I3 − I1 )2 A∼ . (2.18) Cdemod (G0 ) We dropped for the sake of simplicity any constants of proportionality because they are marginal within the context of motion estimation. Furthermore, equation (2.18) only holds true if there is no background illumination during calibration and image acquisition, as the demodulation contrast depends on background illumination itself. 2.2.2 Error Analysis Current PMD camera types show various errors — systematic as well as random ones — in their range signal. We first investigate the systematic errors, which result 21 Chapter 2. Range Data and Time-of-Flight Measuring Principle from deviations of the technical realization from the model assumptions that were used to derive equation (2.11). Most of them may be corrected by an appropriate calibration. To do a proper calibration, we need at least a model for the errors and a way to measure them. Hence, in the following we will describe the errors, model them, and show ways to correct them; how the errors are measured is part of chapter 5. 2.2.2.1 Systematic Errors The investigated systematic errors are • a periodic, sinusoidal deviation of the phase measurement over the unambiguity range • a constant phase deviation per pixel, which corresponds to the dark-response nonuniformity (DRNU) of classical CMOS sensors (often laxly called ”fixed pattern noise”) but has quite different reasons • overexposure of individual pixels • an exposure time dependent constant phase deviation There are some more known systematic errors that will not be addressed here but are nevertheless of some importance: • Phase drift due to thermal effects • Near field errors due to the extended, non-punctiform, non-radial (in respect to the object lens) modulated illumination 22 2.2. The Photonic Mixer Device Periodic Phase Error If we assume a periodic modulation of the illumination which induces a modulation of the number of emitted photoelectrons and assume the reference signal to be modulated with the same frequency, then the samples the PMDsensor returns are those of a periodic signal too. The reason for this is that the PMD performs an operation that corresponds to cross-correlating the light- and reference signal as discussed in section 2.2.1. The sampled signal however is not necessarily a sinusoidal one. If for example the light and reference signal are assumed to be rectangular, then the cross-correlation is a triangle wave (see equation (2.13)). Applying ~ = arg [(I0 − I2 ) + ı(I3 − I1 )] ϕsin (I) on the vector I~tri of cross-correlation samples A G0 tri (ϕ + θn ) + , π 2 yields a periodic error in the phase estimation: Itri (θn ) ∼ Eϕ (ϕ) = ϕsin (I~tri ) − ϕ ∼ arg [arcsin (cos(ϕ)) + ı arcsin (sin(ϕ))] − ϕ (2.19) 1 π sin(4ϕ) (2.20) ≈ (3 − arctan(3)) sin(4ϕ) ≈ 8 14 Thus the maximum absolute error is 1/14/2π = 1.14% of the unambiguity range. The error of the approximation using sin(4ϕ) is less then 10% of the phase error. Because motion estimation deals with derivatives of range measurements, more important than the absolute error, is the relative error with respect to the slope of ϕsin : Erel = ∂ϕ Eϕ 4 = ∂ϕ Eϕ ≈ cos(4ϕ). ∂ϕ ϕ 14 This means that we introduce a maximum error of 4/14 ≈ 29% if we calculate rangeslopes based on measurements that assume that the modulation is sinusoidal, while it is actually rectangular. Fourier Approximation If we have a look at the testbench range measurements (see section 5.2.2), we find that the systematic error has indeed a major component going with sin(4ϕ). However, there are additional smaller components of higher and lower harmonics of the angular base frequency n = 1. Therefore, we may approximate the error by a Fourier series: E 0 (ϕ) = offset + k X (an sin(nϕ + θn )) (2.21) n=1 23 Chapter 2. Range Data and Time-of-Flight Measuring Principle The Fourier coefficients can be found numerically by doing a least squares fit. If data can be acquired for the whole phase range of 2π one may use also a FFT. The erroneous phase measurement ϕerr , then is described by ϕerr (ϕ) = ϕ + offset + E(ϕ), where E(ϕ) = E 0 (ϕ) − offset, (2.22) and by defining φ(ϕ) = ϕerr (ϕ) − offset, we get rid of the constant offset error: φ(ϕ) = ϕ + E(ϕ). We need to find the inverse function ϕ(φ), if we want to correct the measured data φ to become the true value ϕ. As there is no analytic exact solution for the inverse function we may take E(ϕ) to be a small perturbation and approximate the inverse function as the inverse of its Taylor polynomial. The first order Taylor series of φ(ϕ) at ϕ = ϕ0 is φ(ϕ) = ϕ0 + E(ϕ0 ) + ∂ϕ E(ϕ0 ) · (ϕ − ϕ0 ) + O (ϕ − ϕ0 )2 . The inverse of the Taylor series (given by Mathematica) is φ 1 (φ(ϕ)) = ϕ(φ) = ϕ0 − Choosing ϕ0 to be φ we obtain ϕ0 − φ + E(ϕ0 ) + O (ϕ0 − φ + E(ϕ0 ))2 ∂ϕ E(ϕ0 ) + 1 § φ − φ + E(φ) + O E(φ)2 ∂ϕ E(φ) + 1 E(φ) ≈φ− . ∂ϕ E(φ) + 1 ϕ(φ) = φ − (2.23) The inverse Taylor polynomial (2.23) is a good approximation for ϕ(φ) if |E(φ)| 1 ! and E(φ) ≈ E(ϕ + ∂ϕ E(ϕ) · E(ϕ)) ≈ E(ϕ) (which conditions ∂ϕ E(ϕ) to be small too), because only then the remainder indicated by O(E(φ)2 ) is small compared to E(ϕ) (the error that needs to be corrected). Additionally φ(ϕ) is invertible only if it is monotonic, implying that ∂ϕ E > −1. All requirements are fulfilled if the Fourier-coefficients an and the number of modes k needed to describe the error are small, which is in good agreement with the measurements. Equation (2.23) is a compact, analytic solution for the problem of correcting a phase error that can be described by equation (2.21). If necessary an approximation by a § mathematical more precise we take ϕ(ϕ0 ) in the limit of φ: limϕ0 →φ ϕ(ϕ0 ) 24 2.2. The Photonic Mixer Device higher order Taylor polynomial can be derived with ease, at the cost of increasing the resources needed to calculate the inversion. The necessary data for calculating the Fourier coefficients has to be acquired during calibration of the camera. Typically k = 4 Fourier modes are sufficient to suppress the error considerably (see chapter 5 for results). Compared to a lookup table or B-spline approximations [LK06; KRI06] (in extreme cases for every pixel) this is very efficient with respect to needed memory resources and acceptable regarding processing time. Constant Phase Error per Pixel Typical images of conventional CCD- or CMOScameras show two types of pixel specific systematic errors (both being more prominent for CMOS sensors): dark-response nonuniformity (DRNU) and photo-response nonuniformity (PRNU), that can be directly related to the pixels’ varying offset and gain (due to e.g. variations in oxide thickness and doping concentrations over the sensor). Sometimes these (in respect to the measurement systematic) errors are somewhat sloppy called fixed pattern noise. Range imagery appears to have the same kind of nonuniformity errors. But differences in offset and gain (assuming a linear gain) should actually have no influence on the phase measurement, because offset and gain just cancel out of equation (2.12). The explanation for the pixel nonuniformity is that the reference signal R(t) (of equation (2.9)) connected to each pixel receives a phase delay due to the slightly varying capacitance of the individual pixels and other hardware-design and -processing related reasons that may affect the phase of R. As the error is constant over time it can be corrected using appropriate calibration methods. Let K denote the matrix of fixed pattern phase errors (where the matrix elements correspond to image pixels) and N the (temporal) noise corresponding to a field of t random variables, such that the expectation value hN i is 0. Then N ≈ 0, the bar denoting the (temporal) arithmetic mean over a sequence of acquired data. With the true range at each pixel given by T , we may model a taken range (or phase) image R as: R = T + K + N. (2.24) Taking the arithmetic mean over a sequences of frames of a fixed view we get: t t t t R = T + K + N ≈ T + K. If we are able to create a homogeneous incident illumination with respect to phase (and preferably irradiance), i.e. T = I · T , the estimation of K is easy; we regard the 25 Chapter 2. Range Data and Time-of-Flight Measuring Principle fixed pattern error K as a sample (taken once during manufacturing of the sensor) of a field of i.i.d. random variables that have an expectation value of zero, then the spatial average over K is approximately zero and we just have to subtract the st t spatiotemporal average R from R : t st R − R ≈ (T + K) − (T st st st + K + N ) ≈ (T + K) − T = K. (2.25) A homogeneous phase may be achieved by modifications to the camera hardware using a telecentric illumination, but involves a complex and somewhat expensive experimental setup. Also the assumption of i.i.d. may be violated as the variations in the manufacturing process are not necessarily spatially uncorrelated. If we use a simple whiteboard-like target for the calibration, a paraboloid-like phase pattern is irradiated on the sensor. If we remove the lens from the camera this improves the homogeneity of the data, and neighborhoods of the image pixels may be approximated by planar surface patches. The average over a symmetric neighborhood of a central pixel then is the value of the central pixel itself. So we may estimate K via t t R − BR ≈ (T + K) − (BT + BK) ≈ (T + K) − T = K. (2.26) B denotes binomial convolution of an appropriate mask size, which is large enough to let BK ≈ 0, while small enough that the approximation of a pixel neighborhood as a planar surface patch is still valid and BT ≈ T . Image borders may be treated with respect to convolution by mirroring the data at the image borders. For results we refer to section 5.2.1. The demodulation images In (2.11), from which R is calculated, are subject to DRNU and PRNU just like conventional image sensors. The data may be corrected for both errors by doing a calibration analog to conventional systems, if the modulated illumination is replaced by an unmodulated one, as then the PMD acts essentially like an irradiance sensor. The channels In may be corrected individually in a preprocessing step and its result used for calculating an improved phase and amplitude estimate. As explained before, offset and gain variations are of minor importance for the phase estimation, but the amplitude estimate is essentially influenced by gain. Looking at equation (2.11) we find that if the single demodulation images depend on a common image of gain factors α, the same is true for the amplitude estimate: (2.11) if In ∼ α −−−→ A ∼ α 26 2.2. The Photonic Mixer Device For a description of methods like photon-transfer technique, flat-fielding or statistical approaches and their application to correct for DRNU and PRNU we refer to [MF81; Fow+98; TRK01; Wag03; Grö03; Kla05]. Overexposure and Saturation For a gray value sensor overexposure occurs if the full well capacity (i.e. the saturation level) of a sensor pixel is exceeded. Then one can no longer map from the measured signal on the irradiance or the physical property of interest (in our case the distance to an object) even if the sensor response is known. If we want to correct for overexposure we first have to detect it. A simple but somewhat unreliable method to detect heavy overexposure, even if one does not have access to the cross-correlation samples In , but only to the calculated amplitude A (2.11), is to check if A is zero; because for heavy or total overexposure, all capacities are saturated and all In are equal and thus A calculates to zero. However, this only works if really all samples In are saturated, which is not the case for partial overexposure, as we may see. Furthermore, A = 0 is ambiguous w.r.t. underexposure which may lead to an amplitude of zero too. So both false positive and false negative rate with respect to detection of overexposure may be high. Overexposure of a PMD-pixel depends on both radiance and distance of an imaged object. Using equation (2.10c) and the inverse-square law for a punctiform light source, we come to a simplistic approximation of the demodulation signals In dependent on the distance r of an object of constant reflectivity: In (r) ∼ A(r) 2π π A(r) cos(r +n )+ ∼ π R 2 2Cmod 2Cmod π π cos(r 2π R + n2) + 1 . r2 (2.27) We assume the modulation contrast Cmod of the light source to be 100%, the unambiguity range R = 7.5m (≡ fmod = 20MHz). Normalizing the demodulation signals by I0 yields figure 2.4. Taking e.g. I2 (4m) to be exactly the saturation level of a sensor pixel, we find that the other demodulation samples are not in saturation yet. Or vice versa only if I0 (4m) is at saturation level, we can be sure to detect overexposure by testing if all samples are equal. We may denote this two kinds of overexposure specific to the PMD as partial and total overexposure. Furthermore, a maximum signal ratio of more than 4 indicates that irradiance needs to be high, such that an overexposure can be detected by testing on A = 0. If the raw data is technical accessible, it is more reliable to check if any In is saturated and then, if positive, to flag the measurement as overexposed. Detected single 27 Chapter 2. Range Data and Time-of-Flight Measuring Principle Demodulation signal ratio against distance 5 I 0(r)/I 0(r) I 1(r)/I 0(r) I 2(r)/I 0(r) I 3(r)/I 0(r) ratio 4 3 2 1 0 0 2 4 6 distance [m] Figure 2.4: Demodulation signal ratio against distance [m] pixels or pixel regions then may be corrected (under specific assumption about the neighborhood) using techniques like inpainting (see e.g. [Tsc06] for an elaborated example) or simpler interpolation techniques, or may be excluded from further processing if possible. Overexposure may lead to effects like blooming (see [Jäh04]) so that the confidence in the information content of the neighbor-pixels shall be reduced. Exposure Time / Amplitude Dependent Phase Deviation Experimental data shows that current PMD-camera types bear a systematic error in phase measurement which is constant for a specific exposure time (or integration time with respect to the process of cross-correlation) and independent of the measured range. Figure 2.5 gives an overview of the error for some PMD-type range cameras. The simplistic models discussed so far, can not give an explanation for this dependency. A probable explanation could be that the electronics that control the phase shifting of the reference signal are somehow correlated to the integration time circuit. We are not fully convinced that the error explicitly (and exclusively) depends on the integration time. [Rap07] argues that the error is not related to amplitude as otherwise it would change with depth (while the measurements show that it is constant). However, we observed a similar constant offset that is rather constant with increasing depth and only depends on the reflectivity of the observed surface (see section 5.2.3). Therefore exposure time, amplitude A and also offset G0 (the latter both depending on the reflectivity) are candidates for being the source of the 28 2.2. The Photonic Mixer Device 100 Depth offset [mm] 90 80 70 60 50 40 0.1 1 Integration time [ms] (a) PM D 19k 10 100 10 100 100 Depth offset [mm] 80 60 40 20 0 -20 -40 0.1 1 Integration time [ms] (b) SR-3000 -30 Depth offset [mm] -35 -40 -45 -50 -55 -60 -65 -70 -75 -80 0.1 1 Integration time [ms] (c) IFM -O3D 10 Figure 2.5: Range error against integration time (at r = 2.5m) [Rap07] 29 Chapter 2. Range Data and Time-of-Flight Measuring Principle error. The author thinks that there is some kind of bias in the phase-calculation formula, that turns up if the idealized model assumptions are not met. A exposure time dependent error is of special interest for cameras that use adaptive or multiple exposure times (like the IFM O3D) to increase the effective measurable range. Then an offset correction needs to be applied specific for every exposure time. For details regarding the experimental investigation we refer to [Rap07]. One may subsume this error under fixed pattern error if it is extended for exposure dependence. 2.2.2.2 Random Errors The essential noise sources for PMD-sensors are the same as for conventional CCDor CMOS-image sensors. Theses are electronic and photon shot noise, thermal noise, reset noise, 1/f noise and quantization noise (for details see [JGH99]). For a detailed analysis of quantization noise (in context of a TOF-camera system) that is of some importance in weak illumination environments we refer to [Fra07]. Except photon shot noise, all of these random errors can be significantly reduced or eliminated by respective signal processing or hardware related techniques (like cooling) [LS01]. Therefore, the influence of shot noise on the range measurement resolution shall be analyzed. Shot noise is a fundamental property of the quantum nature of light and arises from statistical fluctuations in the number of photons emitted from a light source. The same is true for the generation process of electron-hole pairs, which is discrete as well. Shot noise is unavoidable and always present in imaging systems. In terms of signal-to-noise ratio, the best a detector can do is to approach the shot noise limit. The pseudo signal X produced by shot noise can be described by Poisson statistics for which applies Var(X) = hXi = rate of charge carrier generation. From the basic law of error propagation we know that the uncertainty in the phase calculation (2.11) is given by: " # N −1 X ∂ϕ 2 Var(ϕ) = Var(In ) . (2.28) ∂In n=0 30 2.2. The Photonic Mixer Device Assuming In (measured in units of electrons) to be a Poisson distributed random variable, Var(In ) = In applies. Using the 4 sample algorithm (2.12) and (2.10c) we find the variance to be: # " 3 X π 2 G0 ∂ϕ 2 π 2 G0 Var(ϕ) = In = ∼ (2.29) ∂In 4mT A2 4 A2 n=0 We dropped integration time mT (assumed to be constant) with the proportionality. For the amplitude and offset calculation we determine by analogical reasoning: Var(A) ∼ π2 G0 4 1 Var(G0 ) ∼ G0 2 If we drop the assumption of an optimal demodulation contrast of 2/π and express equation (2.29) using modulation contrast and demodulation contrast and G0 by DC + B, B being the background illumination and DC the DC-component of the modulated illumination, we find: Var(ϕ) ∼ DC + B (Cmod Cdemod DC)2 The additional noise sources, 1/f -, reset- and thermal noise, may be summarized as dark noise and modeled by an additional number of electrons D that contribute exclusively to the constant background illumination as this noise does not correlate with the modulation. The standard deviation of the range measurement error is then given by √ DC + B + D σϕ ∼ (2.30) Cmod Cdemod DC Figure 2.6 shows qualitative how the uncertainty in the range measurements depends on various parameters. Only the denoted parameter was changed while the others were kept constant: mild background illumination and dark noise equivalent to 30000e generated during exposure time, electrooptical signal of 2 · 105 e , modulation contrast of 90% and demodulation contrast of 50%. For the range curve the inverse square law of irradiation was applied and 105 e were assumed to be integrated in a distance of 1m. These values are in magnitude those of a realistic setup (see [Lan00, chap. 4.2]). The range curve was cut at 50cm, because the pixel would go into saturation below this limit (and the linear model employed for demodulation would no longer be valid). 31 uncert offset/amplitude range 0.01 1 10 100 3 1×10 4 1×10 parameter Auflösung in Prozent des Eindeutigkeitsbereichs π C2 52 B+DC 2 Cmod DC 0.785 Chapter 2.resolRange Data and Time-of-Flight Measuring Principle ( DC, Cmod , B) := 22 C5 22 C5 22 C5 ( 2π %) uncertainty [percent of unambiguity range] 10 1 0.1 DC [1000e] Cmod [‰] range [cm] B+D [2000e] 0.01 10 100 1000 parameter Figure 2.6: σϕ given in percent of the unambiguity range in dependence of optical power (DC), modulation contrast, range and background illumination We find that the range dependence has the most prominent impact on the measurement accuracy. Due to limited capacitance of a PMD-pixels it is not possible to achieve a decent range accuracy over the entire unambiguity range at constant exposure time (or optical power) and constant reflectivity. Multiple exposure times or adaptive illumination is needed to achieve a good accuracy. Once image acquisition is completed, we can improve range measurement with respect to noise, by making assumptions about the smoothness of the range data in a local neighborhood; we may apply simple linear smoothing filters, which will however blur image features at edges. Using more advanced (nonlinear and/or robust) filters, we are able to preserve edges in range imagery both at intensity and range edges. Its important to notice in this context, that we can use equation (2.29) to determine an uncertainty (or confidence) measure from the measurements A and G0 , from which appropriate filters can take advantage of and improve their denoising performance (see section 3.1.5 and section 3.2). 32 Chapter 3 Image Processing and Filters 3.1 Basics 3.1.1 Discretization and Sampling Dealing with PMD-imagery and motion estimation involves handling of several kinds of discretization: We have discretization in space due to the sensor grid. Discretization in time, as we can only take a finite number of image-frames during a specific time-slice. And a discretization in the range of image-data, i.e. quantization, due to the quantization on sensor-level. Discretization is closely related to the term sampling, which denotes the reduction of a continuous signal to a discrete signal. And a sample refers to a value or set of values at a point in time and/or space. For signal and image processing the Nyquist–Shannon sampling theorem is of utmost importance. It states that if a function f (t) contains no frequencies higher than or equal to ν, then it is completely 1 apart. determined by giving its ordinates at a series of points tn spaced 2ν For the different kinds of discretization different models apply, or rather the continuous physical models, which are the basis for the interpretation of the data, need to be transformed into their discrete counterparts. Derivatives and Gradient For example in context of motion estimation derivatives on signals are of major importance, because these are basic to compound operators like the gradient which is involved in estimating orientation (or equivalently velocity) 33 Chapter 3. Image Processing and Filters in an image sequence. Unfortunately derivatives are defined exactly for continuous functions only and not for discrete signals. One might try to approximate the derivative-operator by a discrete counterpart e.g. a finite difference. Finite differences are successfully applied in finite difference methods for solving (partial or ordinary) differentials equations on a digital computer (on an analog computer things are treated quite different). Fornberg [For98] gives a very compact, analytic solution for calculating derivatives of any order, approximated to any level of accuracy on equispaced grids. In the context of finite difference methods, the consistency of an operators is of major importance, i.e. that the discrete approximation converges towards the continuous operator for vanishing difference h between grid points. Well known examples of these approximations are forward, backward and central difference quotient. However, for image processing these operators, though applicable, are not first choice. The basic reason is that typically the grid of the data-samples is fixed, while it is adaptable with respect to the solution of a differential equation. So even though consistency of the operator is given it is not necessarily a good measure for the performance of the operator with respect to image processing. For image processing other features are for various reasons of practical relevance. For instance the separability of an operator is of relevance w.r.t. its speed , i.e. necessary floating point operations. The isotropy of the gradient (or rather the viewpoint invariance of its result in the sense of a tensor) is important to an unbiased orientation estimate. Isotropy w.r.t. a vectorial quantity like the gradient has two aspects: isotropy in magnitude and in direction. For motion estimation the isotropy in direction is of major importance as the flow field can be calculated from the spatiotemporal orientation of structures in an image sequence. Given a plane wave !! m cos(φ) T w(k) = A exp(ı(kx + θ)) = A exp ı ·x+θ =: w(m, φ) (3.1) m sin(φ) and the gradient ∇x = [∂x1 ∂x2 ]T in two dimensions, the anisotropy in magnitude of the gradient w.r.t. the wavenumber k may be described by ∆(m) := |∇x w(m, φ)| − |∇x w(m, 0)| 34 (3.2) 3.1. Basics and anisotropy in direction by ∂x2 w(m, φ) −φ ∆(φ) := arctan ∂x1 w(m, φ) (3.3) which are zero for the continuous gradient applied at coordinates of equal phase, e.g. x = 0. An approach, that allows a detailed analysis of an operator w.r.t. anisotropy and other features, is to interpolate a continuous signal from the discrete samples, do the continuous operation and sample the result at the grid points. As we will see, these tasks can be realized by means of digital filters without leaving the discrete domain. However, for the analysis of digital filters we need to change from spatial domain to frequency domain, or rather project the spatial data into the Hilbert space of plane waves (3.1), what corresponds to Fourier transformation that we will introduce in the following. 3.1.2 Fourier Transform A function g : Rn 7−→ C is called Fourier transformable if the Cauchy principal value 1 ĝ(k) := F(g(x)) := (2π)n Z∞ exp(−ıkx)g(x) dn x (3.4) −∞ for all k exists. ĝ(k) is called the (multidimensional, forward) Fourier transform (FT) of g (adapted from [MV06]). The inverse Fourier transform of ĝ : Rn − 7 → C is F (ĝ(k)) := 1 Z∞ exp(ıkx)ĝ(k) dn k ( = g(x)) (3.5) −∞ if the integral exists as Cauchy principal value. g(x) and ĝ(k) are called a Fourier transform pair, denoted abridged as g(x) c s ĝ(k). Due to the finite energy and range and the continuity of physical processes the Cauchy principal value always exists in the context of image processing. There are several other common and equivalent definitions for the Fourier transform, which differ in to which kind of frequency domain the spatial domain is mapped and how the factor (1/2π)n is distributed among the (forward) Fourier transform and 35 Chapter 3. Image Processing and Filters its inverse. We have chosen the frequency domain to be the vectorial wavenumber k := 2π/λ. One needs to be very careful about which definition was used, if looking up formulas in that context in textbooks, and how these shall be applied in one’s own calculations, especially because various textbooks show inconsistencies with their own definitions throughout the text. Features of the Fourier Transform Some important features of the (multidimensional) Fourier transform, that are used throughout the thesis are depict here via their respective Fourier transform pairs: Linearity Separability ag(x) + bh(x) n Y f (xp ) c c s aĝ(k) + bĥ(k) s n Y fˆ(kp ), (3.6) where f : R 7→ C (3.7) p=1 p=1 g(s x) c s ĝ(k/s)/|s| g(Ax) c s ĝ(AT 1 k)/|A| (3.8) g(U x) c s ĝ(U k), U is unitary (3.9) Convolution (g∗h)(x) c s (2π)n (ĝ · ĥ)(k) (3.10) Multiplication (g·h)(x) c s (2π)n (ĝ ∗ ĥ)(k) (3.11) g(x − x0 ) c s ĝ(k) exp(−ıkx0 ) g(x)(exp ık0 x) c s ĝ(k − k0 ) (3.12) ∂xp g(x) c s ıkp ĝ(k) (3.13) c s (1/2π) (3.14) c s Similarity Rotation Translation Derivatives δ-impulse δ-comb δ(x) X δ(x − m∆x) m Gaussian Box xT x ) 2 σ2 c H(w − |xp |) c exp(− n Y p=1 n 2π X 2π δ(k − n ) (3.15) ∆x n ∆x σ 2 kT k σ n s √ exp(− ) (3.16) 2 2π n n Y Y sin(w kp ) w w s = sinc( kp ) π kp π π p=1 p=1 where sinc(x) := sin(πx)/πx (3.17) The features given embed also those of the discrete Fourier transform (DFT), that is implemented on computers typically via a Fast Fourier Transform (FFT). The DFT 36 3.1. Basics is the FT of a function with a finite extension and a finite bandwidth (FEF). Finite extension functions can always be extended to a periodic function by concatenating it to infinity with the range-values of its domain (we may call this process periodization). According to the Nyquist–Shannon sampling theorem such a function may be represented without loss of information by a set of samples of this function, if these are taken as ideal samples (i.e. via convolution with a Dirac delta function) and the sampling frequency is bigger than two times the highest frequency in the function-signal (i.e. the onesided baseband-bandwith B): freqsamp > 2B. Vice versa a signal that is ideally sampled with a sampling distance ∆x must not contain any frequency equal or above the Nyquist frequency kny , if the sampling shall neither loose information nor introduce artifacts (called Moiré pattern for 2D-imagery or in general aliasing): ! kmax < kny = π = π freqsamp . ∆x (3.18) We denote the wavenumber which is scaled to the Nyquist frequency as k̃: k̃ := k k ∆x 2∆x = = kny π λ ⇐⇒ k = ±kny ≡ k̃ = ±1 . (3.19) The DFT corresponds to FT applied on a FEF function that is periodizated (i.e. convolution with a delta-comb with a spacing as big as the domain of the function) and sampled (multiplication with a delta-comb of spacing smaller then half of the smallest wavelength of the periodizated signal). In Fourier domain this corresponds to a sampling of the frequency space (multiplication with delta-comb of spacing inverse to the extension of the function’s domain) and periodization of it with a period of freqsamp , due to equations (3.10) and (3.15). 3.1.3 Interpolation If we want to interpolate a sampled FEF function (e.g. an image), we may do this by reversing the effect of sampling in the Fourier domain, which is the periodization; by multiplication with a box-function of half width w = π freqsamp we can achieve this. In the spatial domain this corresponds to a convolution with a scaled sinc function (3.17). While theoretically the sinc function is the ideal interpolation function, from a practical perspective it does not help too much. First of all the support of sinc is unbounded (i.e. it has no compact support) and the function decreases only linear 37 Chapter 3. Image Processing and Filters with its variable, which makes it a poor candidate to be used in a numerical convolution, even if small errors are acceptable. Furthermore it is not direction isotropic (or rotational-invariant), e.g. in a multidimensional image the interpolation result depends on the orientation of structures in the image. Another candidate for interpolation are properly scaled Gaussian functions: What we need to do for a proper interpolation, is to set the signal samples in Fourier space to zero outside the signal’s original baseband-bandwidth. Or in analogy to solid-state physics: All but the first Brillouin zone have to be zero. Additionally for a lossless reconstruction the signal must not be suppressed within the first Brillouin zone. A properly scaled Gaussian multiplied with the Fourier transform approximates those requirements sufficiently well, and corresponds to a convolution with a Gaussian of inverse width (3.16). Moreover the multidimensional Gaussian is the only function that is both separable and rotation-invariant. Interpolation using Gaussian functions, while not ideal, still works well for digital imagery that complies to the sampling theorem, because real world image data typically has a low signal to noise ratio for wavenumbers near to the Nyquist frequency. So even though the Gaussian becomes approximately zero close to the Nyquist frequency, it tends to suppress more noise than signal. Furthermore the sampling of a digital camera is not ideal but influenced by the MTF (see section 3.1.4) of the optics involved and typically acts as a low pass filter which narrows the effective bandwidth of the signal additionally. Such the missing flank of the Gaussian (compared to the box-function) is less problematic. Interpolation of a continuous function gc from sampled data g(xn ) on a grid xn is realized via a discrete convolution with a continuous function h(x): gc (x) = X g(xn ) h(x − xn ). (3.20) n The interpolation function h(x) needs to fulfill the interpolation condition: ( 1 for x = 0 h(x) = . 0 for x = xm − xn where m 6= n 38 (3.21) 3.1. Basics Applying a partial derivative along the grid dimensions xp yields due to linearity of the derivative: X gc0 (x) := ∂xp gc (x) = ∂xp g(xn ) h(x − xn ) n = X g(xn ) ∂xp h(x − xn ) = X g(xn ) h0 (x − xn ) n n Resampling on the original grid positions xm equals: g 0 (xm ) := gc0 (xm ) = X g(xn ) h0 (xm − xn ) (3.22) n So for calculating the derivatives at the original grid points we only need to do a discrete convolution of the signal with a sampled version of the (partial) derivative of the interpolation function. If the interpolation function is chosen to be a Gaussian, h0 (xm − xn ) may be approximated by a filter mask H with a finite number of filter coefficients. It is only an approximation, because h0 has no compact support, and we need to truncate it at its ends. But as h0 decreases rapidly with increasing |xm − xn |, the error introduced is only a small one. The filter or convolution mask H is completely independent of the signal and the position that it is applied on. The operation of applying such a mask via convolution belongs to the class of linear shift-invariant filters, that we introduce more formally in the following section. For the sake of simplicity we will stick to a formulation for 2D scalar images; anyhow things may be generalized to multidimensional, spatiotemporal (image-)data or tensor valued data see [Big06, chap. 3.6]. 3.1.4 Convolution, Point Spread Function and Transfer Function Filtering in the context of image processing is realized for the class of linear shiftinvariant (LSI) filters as convolution. Convolution of a 2-dimensional image G of size M × N with a square convolution mask H of (2R + 1)2 elements hmn is given by 0 gmn = R X R X hm0 n0 gm−m0 ,n−n0 =: [H∗G]mn m0 =−R n0 =−R The filter is by definition linear and shift invariant, as it has the properties of a linear operator and does not depend on the position (m, n) at which it is applied. 39 Chapter 3. Image Processing and Filters The point spread function (PSF) is defined as the filter-response on a point image P (pm,n = {1 for m = n = 0, 0 otherwise}) and is identical to the convolution mask H R X PSFmn := R X hm0 n0 pm−m0 ,n−n0 = hmn = [H∗P ]mn (3.23) m0 =−R n0 =−R It fully describes a LSI filter, as its response to an arbitrary image is just a linear combination of shifted PSFs, with the coefficients being the pixel values of the image. The optical transfer function (OTF) is defined as the Fourier transform of the PSF.∗ It is the wavelength dependent multiplication factor of a LSI filter in the Fourier domain. This is easy to see, regarding the convolution theorem (3.10) that states, that convolution in the spatial domain corresponds to a multiplication in the Fourier domain (and vice versa). The discrete delta peak image P of equation (3.23) transforms due to equation (3.14) to a constant value for all Fourier domain pixels. Thus, the filter is, not surprisingly, described completely by Ĥ. The magnitude of the optical transfer function is called the modulation transfer function (MTF) and describes the attenuation of the sinusoidal waveforms as a function of their spatial frequency. OTF := F(PSF) = F(H∗P ) = Ĥ · P̂ = Ĥ · const. MTFmn := |F(PSF)mn | = |Ĥ mn |·const. (3.24) (3.25) Eventually, all a LSI filter does, is to attenuate sinusoidal waves and to translate their positions (where the translation vector is encoded in the argument of the respective complex Fourier transform entry Ĥ mn ). The (continuous) transfer function ĥ of a LSI-filter H on an orthogonal grid is T R R X X n ĥ(k̃) = hmn exp(−π ı · k̃) (3.26) m m=−R n=−R From Euler’s formula exp(ıx) = cos(x) + ı sin(x) we derive for filter masks of even symmetry ĥ(k̃) = h0 + R X 2 hn cos(π n k̃), (3.27) n=1 and for odd filter masks ĥ(k̃) = ı R X 2 hn sin(π n k̃). (3.28) n=1 ∗ The term transfer function has a more general definition, but is used sometimes synonymously with OTF 40 3.1. Basics Due to the separability of the Fourier transform (3.7), the transfer function of a multidimensional, separable filter - composed by convolving one dimensional filters (of specific symmetry) - is just the product of the individual filters. Filter Design and Optimization Now that we introduced LSI filtering by means of convolution we come back to discrete operators. We have shown on page 37 that interpolating the discrete data, taking the derivative and resampling on the original grid can be done approximatively by applying a single discrete filter by means of convolution. If we do as proposed we arrive at the filter family of so called Derivatives of Gaussian. The anisotropy of a gradient operator composed by these filters is w.r.t. its magnitude (3.2) definitively lower than those of the central difference quotient, but identical w.r.t. the direction estimation (3.3). A very well known derivative filter is the Sobel operator. Compared to the previously mentioned filters its anisotropy is more than a factor 2 lower for the angle estimate and also smaller w.r.t. the magnitude of the gradient for details see [JH00, chap. 9.7]. This is achieved by introducing an asymmetry in the width of the interpolating Gaussians in direction of the derivative and normal to it. But the maximum angle anisotropy is still around 20◦ and is independent of the filter-size (which improves magnitude anisotropy only). To further reduce anisotropy one can treat the filter design as an optimization problem. This means that we look for a filter that differs from the ideal (continuous) filter as less as possible under given constraints, arising from the discrete and finite extension of the applicable filter mask. The measure for the difference between the ideal and optimized filter and how the ideal filter actually should look like is based on the problem and its specific requirements. E.g. a discrete derivative filter can due to its finite extension never have a transfer function, that is both ideal w.r.t. equation (3.13) such that ĥ(k̃) = ık̃ for k̃ within ]-1,1[ and zero outside, as a discontinuity in Fourier space would require an infinite extension of the filter. Therefore, its a matter of design in which frequency band the so called reference function should approximate the ideal best, and which of the desired features (like isotropy) may be violated at which cost, within the optimization. A suitable optimization strategy then returns the filter coefficients, minimizing the cost or error between the ansatz function (basically equation (3.26)) and the reference function, in compliance with the given constraints. For details we refer to [JSK99] and in closing would like to point out that the derivative filters used in context of motion estimation for this the- 41 Chapter 3. Image Processing and Filters sis were optimized w.r.t. a maximum precision in orientation estimation as described by Scharr [Sch00]. 3.1.5 Normalized Averaging The PMD-data we are dealing with is affected by errors, statistical and systematic ones. Here we show a simple method to improve the range data with respect to the statistical errors. As we have seen in section 2.2.2.2 the amplitude and offset of the PMD signal gives us a measure for the reliability of the range measurement. With equation (2.29) we find the variance of the range signal as proportional to G0 /A2 . However, most of the PMD-camera models we know do not give direct access to the offset G0 in their standard configuration. The PMD19k for example returns besides range R and amplitude A also a third channel denoted as intensity I. But this intensity is not the DC-offset of the electrooptical signal but the amplitude signal weighted by the distance in some (unknown) manner.† Only if we have access to all raw channels, we might calculate G0 by equation (2.12). If this is not possible one might approximate G0 as proportional to A, if we assume no background illumination. Then the variance in the range measurement is approximated by var(R) ∼ A−1 . If we want to denoise the data correctly by averaging over a specific neighborhood, we know from elementary statistics that appropriate averaging requires the weighting of each data value with the inverse of the variance, i.e. using the upper approximation we just need to multiply by A. As it is well known that box filters do not have very good properties from a signal processing point of view (due to their infinite and slowly decreasing transfer function), the neighborhood itself needs to be weighted too. So we need to incorporate another set of weights in the averaging procedure by using a filter such as a Binomial. Both weightings can be achieved with a technique that is known as normalized averaging [GK95]. † b The relative error of I is even p higher than that of A and R: suppose I = AR , then error 2 2 propagation leads to σI/I = (σA/Ā) + (b σR/R̄) . 42 3.1. Basics Normalized averaging is a special case of a more general filtering technique called normalized convolution that is described in detail by [KW93; Wes94; Far03]. The employed filter (e.g. a Binomial) is called the applicability B. If the measurement data are denoted with R and the weighting image with W (e.g. the amplitude image A) normalized averaging reads: R0 = B ∗ (W · R) . B∗W (3.29) The weighting image is not necessarily associated with an error. It can be used to exclude or amplify pixels with certain features. In this way, normalized averaging becomes a versatile operator, that was used for various tasks in the context of this thesis. However, it should be noticed that applying normalized averaging as described, leads to a bias toward smaller range values at depth-edges, as the confidence measure (or weighting image) is correlated with the (physical) quantity to denoise, i.e. the amplitude decreases with increasing depth: In the neighborhood of a depth edge, surface patches of the same reflectivity near to the camera (denoted in the following as near surfaces) will be weighted stronger than those far away. This leads to an anisotropic, biased blurring of image features, such that the near surfaces tend to grow while those away shrink. So normalized averaging should not be applied at surface edges. Band Enlarging Operators Normalized averaging is a potentially band enlarging operation, because it involves multiplication of two images W · R, which corresponds to a convolution in the wavenumber domain (3.11). If the sum of the bandwidths of Ŵ and R̂ is larger than k̃ = 1 in any dimension, aliasing occurs. Thus, it is important to adapt the bandwidth of the images w.r.t. the Nyquist wavenumber before multiplying the images, either by upsampling the images, which is a lossless but somewhat expensive operation (w.r.t. processing time and memory consumption) or by pre-smoothing with e.g. a binomial filter, which is fast but potentially lossy. The same rules apply for operations where rotations (3.9) are involved, as a FT-image that is rotated exceeds the Nyquist borders - the corner areas lie outside the first Brillouin zone - and if the corresponding Fourier coefficients are not zero, aliasing will occur. Köthe [Köt03] points out, that the influence of band enlarging operators was frequently neglected in computer vision literature in conjunction with more complex 43 Chapter 3. Image Processing and Filters operators like e.g. the Canny edge detector and the structure tensor. With modern high resolution image sensors of several million pixels resolution however, the aspect is of less importance, because typically the camera’s optics act as a low-pass filter with respect to the sensors resolution, especially in the field of consumer market products. Not so however for current PMD-sensors, as they have a low sensor resolution compared to the resolution of the optics yet. 3.2 Edge Preserving Smoothing The methods to denoise range imagery discussed so far, all lack the ability to denoise or smooth the data without blurring image features like edges or corners. This is due to the fact that the models they are based on assume a planar neighborhood or at least one with a very specific symmetry, and thus are violated around the mentioned features. There are basically two ways to handle this problem. One is to extend the model to explain the data better in a specific neighborhood. The other is to improve the estimate on the model parameters, by means of a robust estimator, i.e. one that gives a correct estimate in the presence of a minority of data points that do not fit to the model, so called outliers. Both approaches may be combined and transitions are smooth. Applying robust methods of statistics to the field of computer vision is not trivial. Taking for example the simple case of a corner of a cube seen from atop with a range camera: In the vicinity of the corner there are 3 planes, thus the majority of the pixels in a neighborhood of any pixel near to the corner, will violate a single planar model, and therefore cannot be treated as classical outlier w.r.t. this model. 3.2.1 Robust Estimators Robust estimation is concerned with the accurate estimation of model parameters in the presence of data that violates the model for which the parameters shall be determined and/or the assumptions about which errors the measurements show (i.e. the employed noise model): the data may contain classical, gross outliers that are not consistent with an assumed data model exposed to e.g. Gaussian noise. For a low-level model of PMD-data this might stem from specular reflections of the modulated illumination, leading to saturation of the capacities and in turn to a completely arbitrary depth measurement. Defective pixels or interreflection of light from multiple surfaces are other sources of outliers. Another class of outliers consists 44 3.2. Edge Preserving Smoothing of pixels belonging to a minority of the data, a population that is compatible with a different‡ , potentially unknown data model; e.g. in the case of a planar surface model every step- or roof-edge of an object or partial occlusion will give rise to these kind of outliers. With respect to image processing the same pixel can be either an outlier or an inlier, depending on the position of the model to be estimated. As each pixel measurement is subject to small-scale random variations, the parameter estimation problem is heavily overconstrained (for both low-level and high-level models), which suggests that a maximum likelihood estimation technique should be employed to solve the problem. Under the assumption of normal (Gaussian) distributed, additive noise, least squares estimation is a maximum likelihood estimator [LP02, chap. 20.2.6], i.e. the probability for the observed measurements is maximal for the estimated parameters. Let yi be the measurements at the independent (or control or explanatory) variables xi , e.g. the sensor grid coordinates, of the model m for which the (vector of) parameters p are to be estimated, e.g. the surface normal and intercept of a planar surface model. Then the ordinary least squares (OLS) estimate p̂ is given as p̂ = argmin p = argmin p X yi − m(xi , p) 2 i σi X r(yi , xi , p) 2 i σi , leading to the more general formulation known as M-estimator : X ri,p p̂ = argmin ρ . σi p (3.30) (3.31) (3.32) i The expression to be minimized is called the objective function, and ρ(r) is known as the loss function (or error norm), which is ρ(r) = r2 for the least squares estimate. The residual function r describes the (error) distance between a measurement and the model determined by p (and xi ). The residuals need to be normalized to the scale (noise level) σi associated with the measurements yi ; in the simplest case the measurements are i.i.d. thus σi = σ̂ ∀ i, which in turn can be neglected for least squares estimation. ‡ different means, either an instance of the same model with different parameters or a completely different model 45 Chapter 3. Image Processing and Filters An advantage of the least squares estimation problem (3.30) is, that it can be solved efficiently for models m, which are linear in their parameters p (but possibly nonlinear w.r.t. the independent variables), by means of the LSI-filters introduced in the previous section (for details see [JHG99]). However, most real world problems cannot be described sufficiently by a single model under a Gaussian noise assumption, and because the least squares loss function grows unlimitedly with increasing |r|, a single outlier can corrupt the estimation seriously; this is why the breakdown point of least squares is 0. The breakdown point is the minimum fraction of outlying data that can cause an estimate to diverge arbitrarily far from the sought value. The theoretical maximum breakdown point of any ”general purpose” estimator is 0.5, because with more than 50% outliers, these can be arranged in a way that, in terms of regression analysis, a fit through them will minimize the objective function. The breakdown point of an estimator does not say anything about its efficiency, which is defined as the minimum possible variance for an (unbiased) estimator divided by its actual variance, with the minimum possible variance being determined by a target distribution (e.g. a Gaussian one). Typically robust estimators with a high breakdown point tend to have a low efficiency, thus the estimates have a high variance and require a big number of measurements to gain a reasonable (statistical) precision. The least squares estimator belongs to the class of M-estimators (”M” for ”maximum likelihood type” [Hub81, page 43]). These are of the form (3.32), with ρ(r) being a function of even symmetry (ρ(r) = ρ(−r)) with an unique minimum at zero and monotonically increasing for r > 0. The robustness against outliers is achieved by a loss function that grows subquadratically. This becomes clearer if we look at the solution of equation (3.32) which is determined by the root of its derivative w.r.t. p. If ∇p denotes the vector of partial derivative operators (∂/∂pn ) of the n = 1 . . . N parameters of m, then we find a system of N equations for the vanishing derivatives at the minimum of the objective function: X ri,p X 1 ri,p ! ∇p ρ = ψ ∇p ri,p = 0, where ψ(r) = ∂r (ρ(r)) (3.33) σi σi σi i i For a model m linear in its parameters p, ∇p ri,p simplifies to xi : X ri,p xi ψ =0 σi σi (3.34) i and for the simplest model m = 1·p it is X ri,p 1 ψ = 0. σi σi i 46 (3.35) 3.2. Edge Preserving Smoothing If the model m is not linear in x then xi in equation (3.34) may denote a vectorial function dependent on the explanatory variables only. The derivative of the loss function is known as the influence function ψ. The name is reasonable as it is plain to see from system (3.34) that ψ directly determines the influence of a single residual on each constraint equation to become zero. For least squares the influence function is identical to the normalized residual and thus, its absolute value can become arbitrarily large. Robust estimators use an influence function that is bounded above and below and which may become zero for large residuals. Influence functions tending to zero most quickly (known as hard redescenders) permit the most aggressive rejection of outliers. This feature is of major importance if the outliers have small residuals in the range of 4 to 10 σ [Ste99]. Redescending influence functions however make P i ρ(ri,p /σi ) nonconvex, such that solvers for equation (3.33) may converge to local minima, if the initial guess is not close to the optimum. Iteratively reweighted least squares (IRLS) is such a solver, which deduces from equation (3.33), where ψ is substituted by w(r) r, with w(r) = ψ(r)/r known as the weight function: X 1 ri,p w ri,p ∇p ri,p = 0 (3.36) σi σi2 i This can be iteratively solved by means of common weighted least squares solvers (e.g. SVD or Gaussian elimination for m linear in p and Gauss–Newton or LevenbergMarquardt for a non-linear model), if for each iteration the weights w(r) are calculated for the current guess of p and then fixed for the least squares solver; a least squares solver is applicable because the term ri,p ∇p ri,p in (3.36) is just the derivative of the least squares problem (3.31), while the other terms are kept constant. Black and Rangarajan [BR96] give a survey of the various, in statistics and computervision literature proposed influence/loss functions in the light of related outliers processes; one example of a redescending influence function is the Leclerc function, depicted in Figure 3.1: r2 ), η2 2r r2 ψη (r) := ∂r ρη (r) = 2 exp(− 2 ) η η ρη (r) := 1 − exp(− (3.37) and wη (r) := ψη (r) 2 r2 = 2 exp(− 2 ) r η η A second look at equation (3.34) tells us that standard M-estimators still have a breakdown point of zero, because an erroneous measurement at a point xi , which is 47 Chapter 3. Image Processing and Filters a b −4 1 1 0.5 0.5 −2 0 − 0.5 −1 2 loss function ρ influence function ψ weight function w 4 −4 −2 0 − 0.5 2 4 loss function ρ influence function ψ weight function w −1 Figure 3.1: Robust and nonrobust M-estimators: a The robust Leclerc functions for η 2 = 2 and b the respective functions for least squares (ψ(r) := r) far away from the bulk of the data may still corrupt the whole measurement. An alternative to M-estimators is the least median of squares (LMS) estimator, which has the maximum possible breakdown point of 0.5; compared to least squares the objective function is not the sum, but the median of the squared residuals: ri,p 2 p̂ = argmin median (3.38) i σi p For a simple linear regression model the LMS solution corresponds to the ”narrowest strip covering half of the observations” [RL87]. LMS buys its excellent robustness against outliers at the cost of a less efficient (random sampling) search technique, because the median is not differentiable and thus gradient descent or Newton’s method are not applicable; moreover, LMS has an abnormally slow convergence rate [RL87]. A robust estimator between OLS and LMS is least trimmed squares (LTS) which minimizes like OLS the sum of squared residuals, but excludes (at most) 50% of the residuals of larger magnitude from summation, which leads to an improved convergence rate while maintaining a high breakdown point. Figure 3.2 illustrates some of the properties of robust estimators for a simple linear model (a straight line with unknown slope and intercept). While the LMS estimate finds the majority population model independent of the initial guess, the Leclerc M-estimator finds local minima which might belong to the majority population (the increasing straight) or the minority (the decreasing straight) or are completely wrong. The LMS estimate tends to be worse than the M-estimate, if both succeed and use the same initial guesses and the same convergence tolerance (the limiting difference in the objective function of two succeeding guesses to stop the minimization), indicating that LMS has a lower convergence rate. All optimizations were done with 48 3.2. Edge Preserving Smoothing a b 200 200 100 100 0 − 100 − 200 0 data Leclerc M-est (1) Leclerc M-est (2) least squares fit least median 0 20 40 data Leclerc M-est (1) Leclerc M-est (2) least squares fit least median − 100 60 80 100 − 200 0 20 40 60 80 100 Figure 3.2: Illustration of sensitivity of M-estimators to local minima and slow but robust convergence of the LMS estimator. a two populations with fractions of 56% and 33% and 11% gross outliers; M-est (1) and (2) just differ in the inital guess. b same as a but with different initial guesses. a nonlinear conjugate gradient solver. η 2 was chosen to be 2 for the Leclerc function and the residuals were scaled to the noise level (σ = 5) of the Gaussian i.i.d. model populations. The gross outliers stem from an uniform distribution in the range [250,300]. Figure 3.3 illustrates a problem more typical for image processing, in the context of robust estimators: a step edge. The model is the same as for figure 3.2 but the data contains no gross outliers and the two populations (constant lines with different offset) hardly overlap. Again Leclerc finds local minima and LMS gives a worse estimate. Furthermore, we see that the robust estimators break down if the residuals of the outliers w.r.t. the larger population only have a magnitude of some σ (the step of the edge for figure b is only 4σ). 3.2.2 Bilateral and Diffusion Filtering Bilateral and diffusion filtering are very popular image processing methods for the task of denoising image data. Black et al. [Bla+98] show that anisotropic diffusion as introduced by Perona and Malik [PM90] may be regarded as a robust estimator. Durand and Dorsey [DD02] point out the that bilateral filtering as introduced by Tomasi and Manduchi [TM98] and what they call 0-order anisotropic diffusion 49 Chapter 3. Image Processing and Filters a b 150 data Leclerc M-est (1) Leclerc M-est (2) least squares fit least median 100 data Leclerc M-est (1) Leclerc M-est (2) least squares fit least median 40 20 50 0 0 0 20 40 60 80 100 − 20 0 20 40 60 80 100 Figure 3.3: Illustration of breakdown of robust estimators with decreasing difference in magnitude between residuals of outliers and model samples, at a step edge. a the outliers (minority population) have a distance of more than 20σ from the majority; b the distance is only 4σ and all estimators fail, leading to a ”bridging” estimate between the two population models. (while inhomogeneous diffusion would be more appropriate) belong to the same family of robust estimators, the major difference being, that inhomogeneous diffusion filtering is energy preserving, while bilateral filtering is not (due to an asymmetric normalization term w.r.t. single pixels); energy preserving for a gray value image means that the arithmetical mean of its pixel’s gray values does not change due to the applied filter. Diffusion filtering is motivated by a physical observation expressed by Fick’s law j = −D ∇u, (3.39) which states, that a concentration (or temperature) gradient ∇u causes a flux j that tries to compensate the gradient, in a way that is determined by the diffusion tensor D, which is a positive definite, symmetric matrix. If j and ∇u are parallel we speak of isotropic diffusion. Then D degenerates to D, the diffusivity. If D depends on (local) features of the field u and therefore is not constant, we are speaking of inhomogeneous (but isotropic) diffusion. Only if j is in general not parallel to the gradient this shall be called anisotropic diffusion. 50 3.2. Edge Preserving Smoothing du(x, y, t) For a closed system, mass (or heat) do not vanish, i.e. = 0. Applying the dt ∂t x chain rule and identifying j as u· ∂t y gives us the continuity equation ∂t u + ∇j = 0. (3.40) Substituting j from Fick’s law (3.39) yields the diffusion equation ∂t u = ∇(D ∇u). (3.41) For image processing the local concentration may be replaced by e.g. the gray value of an image pixel, implying a discretization of the diffusion equation in space. Using a constant diffusivity is only appropriate, if we assume a constant gray value, such that the inhomogeneity introduced by (Gaussian) noise in the data is distributed and therefore leveled out by the homogeneous diffusion. At an edge in an image this assumption is clearly not fulfilled and homogeneous diffusion introduces errors (e.g. a blurring of the gray value edge). If image structures shall not become corrupted, the diffusion tensor needs to depend on the local structure in the evolving image; if so, the time dependence leads to a feedback, which indicates the nonlinearity of such a diffusion filter. Discretization in time and approximation of the derivatives by finite differences leads to an iterative solver. Perona and Malik [PM90] proposed a discretization for an inhomogeneous diffusion (which they called, somewhat sloppy, anisotropic): Ist+1 = Ist + λX λX w(∇Is,p )∇Is,p ≈ Ist + w(Ip − Ist )(Ip − Ist ), |n| p∈n |n| p∈n (3.42) where w(x) was proposed to be exp(−x2 /σ 2 ). Ist denotes the (gray) value of a sampled image pixel s at time step (or rather iteration) t and n a neighborhood of |n| pixels p around s. ∇Is,p indicates the directional derivative of I at s in the direction of p. If we look back at the definitions of the M-estimator equation (3.32) we may identify the model m as that of a constant gray value p = Is , the measurements yi as the pixel values Ip , and w(x) as the Leclerc weight function (3.37). Comparing equation (3.42) with (3.36) while remembering that ∇p ri,p = 1 for m = p, we find that equation (3.42) is just the gradient descent solution of (3.36) (i.e. IRLS). Thus, inhomogeneous isotropic diffusion is a robust M-estimator for the very simple model of a constant neighborhood. If one extends equation (3.42) for a weighting of the addends by their distance (e.g. w(p − s)) this corresponds to a generalized Mestimator (GM). While the solution of the homogeneous isotropic diffusion equation 51 Chapter 3. Image Processing and Filters converges to an image of a single constant value (where the iteration steps are well approximated by Gaussian or binomial filtering), inhomogeneous diffusion converges to segments of constant value if hard redescenders are used as influence functions (the number and size of the segments depends on the noise level and structure of the image as well as the chosen influence function). Anisotropic diffusion allows to smooth along edges but not perpendicular to these, achieving edge-enhancing smoothing. A thorough discussion of (anisotropic) diffusion filtering, relations to curvature-preserving PDEs and their application is given in [Tsc02; Tsc06]. A similar reasoning as above is possible for bilateral filtering, which is motivated by introducing a weighting of the addends not only w.r.t. their spatial distance (as Gaussian filtering does) but also w.r.t. their distance in range to the pixel s: Is := X 1 ws (p − s) wr (Ip − Is ) Ip norm(s) p∈n with the normalization term norm(s) := P p∈n ws (p (3.43) − s) wr (Ip − Is ). For details regarding the relation to robust estimators and diffusion filtering we refer to [DD02] and just want to annotate that the formal similarity to equation (3.42) already suggests their close relation and that the performance of the methods heavily depend on the chosen weight- or respective influence function. Jones, Durand, and Desbrun [JDD03] developed an interesting extension of bilateral filtering to (3D) surface meshes, that can be used to estimate the position of mesh vertices in a robust manner. The extension introduces the concept of predictors, which incorporate shape information in the filtering process, by means of normals on the (non-robustly) smoothed surface. We realized an optional bilateral filtering, dependent on the range information, for the robust regularization of the structure tensor used in range flow estimation. For the task of denoising single PMD-frames however, we employed another robust estimation technique, an extended version of channel smoothing; it exhibits the advantage of being computationally very efficient compared to bilateral and (even more) anisotropic diffusion filtering and is more relaxed about its exact parametrization. 52 3.3. Two State Channel Smoothing 3.3 Two State Channel Smoothing Channel smoothing, or more precisely w.r.t. this work B-spline channel smoothing is a technique introduced by Forssén, Granlund, and Wiklund [FGW02] and thoroughly discussed in [FSF02; FFS06], that allows robust smoothing of low-level signal features without the main drawback of conventional robust smoothing concerning its applicability in image processing: the high computational complexity, arising from the typically employed iterative solvers for finding the (local) minimum of the objective function. Channel smoothing uses a channel representation[NGK94] of the signal to be smoothed. Channel representation is closely related or analogous to concepts in other fields of research, e.g. population coding (computational neurobiology), radial basis functions (neural networks) or fuzzy membership functions (control theory). From a viewpoint of classical statistics, averaging of the channel representation can be regarded as a regularized sampling of the probability density function (pdf ) of the signal measurements, by means of a kernel density estimator (for a detailed discussion see [For04, chap. 4]). A (nonlinear) decoding of the averaged channel representation allows to extract the modes of the distribution, i.e. the local maxima of the distribution. The modes correspond in terms of section 3.2.1 to the different model instances or populations comprised in the signal. It is essentially the decoding step, what makes channel smoothing a robust estimator. We extend regular channel smoothing for PMD-range data, by applying a weighting of the single channel vectors w.r.t. the confidence in the single pixel-measurements and using a new smoothing technique that differentiates between pixels for which the weighting is taken into account and those that use the unweighted channel entries. 53 Chapter 3. Image Processing and Filters The steps involved in the application of our extended B-spline channel smoothing to PMD-data are: Encoding creation of the B-spline channel representation from the PMD-range data Two State Smoothing smoothing the channels with a technique we named two state smoothing, which allows to weight the range measurement w.r.t. some confidence measure, without the tendency to enlarge the near surfaces as observed for common normalized averaging Decoding extracting the mode that approximates maximum likelihood from the averaged channel representation, yielding a robust estimate of the surface distance In section 6.1 you can find an application of this novel extension to B-spline channel smoothing, that we will describe in the following paragraphs in detail. Encoding The range signal is transformed to the B-spline channel representation, i.e. (sparse) vectors of B-spline values at every pixel position. The channel representation for a bounded signal f (x) ∈ [1.5, N − 0.5] is given by an encoding into N channels at pixel positions x cn (x) = B2 (f (x) − n), n = 1 . . . N, (3.44) where the quadratic B-spline B2 (f ) is given by convolving the rectangle function Π(x) = H(1/2 − |x|) two times with itself, yielding the explicit piecewise definition B2 (f ) := 3/4 1/2 |f | − f2 |f | < 1/2 − 3/4 for 1/2 ≤ |f | < 3/2 . 0 3/2 ≤ |f | (3.45) As the signal needs to be bounded between [1.5, N − 0.5] we have to scale the range data accordingly. If r(x) is the range signal bounded to [A, B], it may be transformed as f (x) = N −2 (r(x) − A) + 1.5. B−A (3.46) As the depth information of a PMD-sensor is based on a phase measurement, implying a specific unambiguity depth-range, and phase corresponds to a periodic domain, one might think about adapting channel representation to this circular topology. Felsberg, Forssén, and Scharr [FFS06] show that this is easily done, by adding the 54 3.3. Two State Channel Smoothing lower two and upper two channels into two single channels (as they are the same for a periodic domain). However, for PMD-data this would not be of much help, because while phase is periodic, range is originally not, i.e. the periodicity of the PMD-sensor’s depth-range is only a technical shortcoming. Two State Smoothing We want to weight the range data according to the confidence measure we derived for the PMD-signal. As described above the (averaged) channel representation may be interpreted as an estimate of the pdf . Multiplying the individual channel vectors by the respective pixel confidence, does not change the depth value but only the weighting of the vector with respect to the pdf -estimate, similar to the weighting done by a GM-estimator. The mathematical proof that such a weighting is sound from a statistical point of view w.r.t. the validity of the pdf is outstanding, but experimental results show a good performance of the proposed method w.r.t. robustness and noise suppression. Lets suppose the weight image is w(x), then the weighted channels vectors (c0n ) are given by c0n (x) = cn (x) w(x) We need to average the data in each channel to come to a reasonable estimate for the pdf w.r.t. the range of the signal. With a model that assumes local constancy (or smoothness) this can be achieved by Gaussian (or binomial) convolution, as it respects the locality of the model by weighting distant pixels less. However, a pure binomial filtering tends to bias the estimate toward nearer values, if the channel vector are weighted, similar to the case of normalized convolution in section 3.1.5. In the resulting image near surfaces tend to grow, but different to normalized convolution the edges are not blurred but sharp. The reason for this is that binomial smoothing of the channels creates a nonzero probability estimate for zero-value channel pixels (and the respective value range), if the neighboring channel-pixels are nonzero. Furthermore a nonzero channel pixel will be diminished if it is in the neighborhood of zero-valued pixels, i.e. an edge. Because the weighting has a bias to weight the near surfaces more than those far away, the probability estimate for the near value, which has been zero before smoothing, tends to become bigger than that of a channel pixel farther away. Thus, we propose a new smoothing algorithm for a channel representation of range data that is going to be weighted and which differentiates between zero and nonzero 55 Chapter 3. Image Processing and Filters channel pixels, and therefore was named two state channel smoothing. For zero valued pixels we use the unweighted (w.r.t. the confidence measure) neighborhood to find a pdf -estimate, while for the nonzero pixels we use the weighted neighborhood. The nonzero pixel estimates need to be normalized w.r.t. the weighting to be comparable with the zero pixel estimates. Eventually, we calculate the estimate c0 (x) of the pdf for pixel x as follows : B ∗ (W · Cn ) B∗W Cz,n = B ∗ Cn ( cnz,n (x) cn (x) 6= 0 0 cn (x) = if , cz,n (x) cn (x) = 0 Cnz,n = (3.47) (3.48) (3.49) where small letter variables denote the functional representation of the corresponding matrices C, with indices z for zero-pixel, nz for nonzero-pixel and n indicating the channel number. Equation (3.47) is normalized averaging with a binomial applicability B, while (3.48) is just plain binomial smoothing. Decoding The decoding of the encoded signal cn (x) = B2 (f (x) − n) can be achieved by the linear interpolation f (x) = N X n cn (x) . (3.50) n=1 This is a result from describing a function P (f ) by a B-Spline approximation X P (f ) = αn B2 (f − n), n and requiring that P (f ) = f , i.e. the identity function. For this case one obtains the approximation coefficients to be αn = n [FSF02]. If we interpret f (x) as a random variable and c0 as a kernel density estimate of its pdf , we come to an estimate of the first moment of f , by replacing c with c0 in (3.50). Thus, (3.50) gives us an estimate of the expectation of f , because the first moment about zero of a probability distribution is the expectation value of the corresponding random variable. We may reformulate (3.50) as the first central moment, which is zero for a pdf X (n − fˆ) c0n = 0 , (3.51) n 56 3.3. Two State Channel Smoothing where fˆ is an estimate of the unperturbed signal. This formulation corresponds to the X constraint equation (3.35), but is hidden in the channel vector. To understand i this, we take a look at the continuous formulation of the optimization problem we are dealing with in the limit of an infinite number of measurements: fˆ = argmin E(f0 ) , (3.52) f0 Z where E(f0 ) := ρ(f − f0 ) pdf(f ) df = (ρ ∗ pdf)(f0 ) . (3.53) The condition of a vanishing derivative at the minimum gives us the continuous formulation of (3.35): Z 0 = ∂f0 E(f0 )|f0 =fˆ = − ρ0 (f − fˆ) pdf(f )df = (ψ ∗ pdf)(fˆ) , where ρ0 (r) = ∂r ρ(r) = ψ(r) . (3.54) Looking back at equation (3.51) and comparing it with equation (3.54), we may identify the first central moment as a discrete convolution at fˆ of the sampled identity function (n) with the sampled pdf (c0n ), and we conclude that the influence function ψ of channel smoothing with linear decoding (3.50) is the identity function ψ(r) = r. As we know, this corresponds to a least squares estimate and is therefore not robust. We need to make the decoding robust, as the pdf estimated by c0 may be multimodal or contain outliers. This can be achieved by making ψ a hard redescender, doing a windowed reconstruction about the mode of c0 . The window size is chosen to be three, because we need to keep the window size as small as possible, to achieve a minimum computational effort; and three channels are the minimum to reconstruct a measurement f encoded via (3.44) without errors. Instead of changing the width of the influence function (corresponding to the decoding window size), the degree of robustness of channel smoothing can be controlled by adjusting the number of encoding channels N. This is because the robustness is determined by the width of the influence function relative to the number of channels N . The appropriate number of encoding channels N depends on the (Gaussian) noise level σf of the signal f (x). In order to reject not more than 5 percent of the inlier samples, the distance between two channels must be greater than 4σf [FFS06]. 57 Chapter 3. Image Processing and Filters The robust reconstruction reads fˆn0 (x) = nX 0 +1 c0 (x) − c0n0 −1 (x) 1 n c0n (x) = n0 + n0 +1 , E(n0 ) E(n0 ) (3.55) n=n0 −1 where E(n0 (x)) = c0n0 +1 (x) − c0n0 −1 (x) is the probability for the estimate fˆ to be within the value-range of the corresponding decoding window about n0 . The channel window center n0 should be near to the mode of pdf , i.e. the location of its global maximum, such that the decoded signal value becomes a maximum likelihood estimate of the unperturbed signal. There are several possibilities to choose n0 with the given kernel density estimate c0 . We decided for the computational efficient, but not necessarily best method to choose n0 = argmaxn0 (E(n0 )), such that the determined channel window has the largest sum of channel values. For a multimodal pdf the modes may be located near to each other, such that n0 might be chosen to lie in between the modes, what leads to a wrong estimate. Based on the windowed reconstruction of the signal value equation (3.55), taking into account the definition of the channel vector entries cn (3.44) and assuming an infinite number of samples, the effective influence function of channel smoothing can be calculated analytically [FFS06]: ψ(∆f ) = B2 (∆f − 1) − B2 (∆f + 1), where ∆f := f − n0 . (3.56) The depicted function ψ(∆f ) in figure 3.4 is however not precisely the influence 1 0.5 −3 −2 −1 0 − 0.5 1 2 3 loss function ρ(Δ f) influence function ψ(Δ f) −1 Figure 3.4: Influence and loss function of robust channel smoothing function, because therefore ∆f had to be the residual of the measurement, which is defined as f − fˆ. Thus, the true influence function is shifted for the rounding difference n0 − fˆ. Therefore, the influence function is no longer ideal w.r.t. the 58 3.3. Two State Channel Smoothing broken (odd) symmetry about zero, what introduces a (minor) quantization error for all estimates fˆ that are no integer values. Felsberg, Forssén, and Scharr [FFS06] propose a method called virtual shift decoding that resolves this problem of channel smoothing. However, this method is somewhat involved and expensive w.r.t. computation time. We compared the results of both methods on PMD-data and found no significant improvements, given a high number N of encoding channels, as it is appropriate for PMD-data. For signals that have a high noise variance (low SNR), implying a low number of encoding channels, virtual shift decoding is of more interest, because its computational effort scales with the number of channels and the quantization errors are the more prominent the smaller the number of channels is. 59 Chapter 3. Image Processing and Filters 60 Chapter 4 Motion Estimation Motion estimation has become an important discipline in computer vision. There is hardly any complex computer vision task, that has not to deal with motion. In various industrial and scientific applications the movement within a scene needs to be accounted for. There is a multitude of tasks that are obviously related to motion estimation, like time-to-collision estimation or pedestrian detection in automotive industry. Another example is particle tracking for the visualization of flow fields of liquids or gases or more general the analysis of dynamical processes in scientific applications. The calculation of displacement fields for motion-based compression of video sequences involves motion estimation too. For many other tasks, however, the link to motion estimation is not so obvious, although it is still inherent. For instance image registration or disparity estimation in stereo vision. Even for still image processing the visual systems of mammals employ the motion analysis pathways of the brain. For example, while humans are looking at a picture, their eyes perform socalled (micro-)saccades for the analysis of the scene. These micro saccades introduce artificial motion on the retina which is then processed by parts of the visual cortex sensitive to the direction of motion and spatial structures (see [Big06]). Typically the motion estimate is not the actual target of real-world applications. Most times the specific motion estimation algorithm is only one link in a process chain or chains. Therefore, the input and output, as well as the computational efficiency and qualitative performance are subject to various constraints and limitations. This might be one reason why there is such a vast number of different approaches, algorithms and specific implementations for motion estimation. Another reason is that motion estimation from image sequences, which is the topic of this chapter, is in general an ill-posed inverse problem. We will see in the following sections, that various assumptions have to be made in order to make the problem a well-posed one. 61 Chapter 4. Motion Estimation This is also the reason why we are talking of an estimate. It is only a guess that is true (or rather approximately correct) only if the various necessary assumptions are met. Basically the different assumptions that are made lead to the various algorithms proposed in computer vision literature. Often it turns out that one concept is equivalent to the other or just a special case, formulated in a different manner; this is no wonder, since computer vision is an interdisciplinary research field incorporating the jargon and concepts of various scientific disciplines. 4.1 Optical Flow and Range Flow 4.1.1 Optical Flow and Motion Field Before we start discussing our approach of motion estimation, we first need to clarify in which kind of motion estimate we are interest and what a motion estimate is. We are interested in the motion of objects or rather their surfaces in three dimensional space. This physical motion is partially captured by an optical device, e.g. a camera, by taking several images of it over time. Taking an image, typically means projection of the 3D scenery on a 2D plane. Thus the physical 3D vector field of velocity vectors associated with the motion, is projected to the image plane and becomes a 2D vector field known as the motion field. The basic motion estimation algorithms try to estimate this motion field from the sequence of images. Horn [Hor87] showed that the reconstruction of the physical 3D motion field from the 2D motion field is possible in most cases if the optical characteristic and the external parameters of the setup (especially the parameters of the projection) are known. However, what the camera (or the human eye) sees is not necessarily as closely related to the motion field as one might think. The apparent motion at the image plane that is based on the visual perception is known as the optical flow or image flow. An extreme example of the potential disagreement between optical flow and motion field is given by Horn [Hor86] and depicted in figure 4.1. The figure shows an ideal sphere with an uniform surface. It may rotate around any axes through its center of gravity without any apparent motion. Therefore the optical flow field of the rotating Horn sphere is zero everywhere. In contrast, a moving light source that illuminates the sphere will change the brightness distribution on the sphere over time, inducing an apparent motion. 62 4.1. Optical Flow and Range Flow a b Figure 4.1: Disagreement of physical motion field and optical flow field: a a spinning ideal sphere with fixed illumination shows no apparent motion; b a moving illumination causes an apparent motion of the brightness distribution at no motion of the sphere (images from [JH00]) Therefore we find a nonzero optical flow field while actually the sphere might be at rest. 4.1.1.1 Barber’s pole illusion and complex motion Because human vision is very good in motion estimation, one might presume that the problems described so far are somewhat academic and only problematic because of technical weaknesses or a lack of intelligence in the employed algorithms. We want to stress that, while additional intelligence might help to come to a better motion estimate or rather to a guess that has a higher probability to be correct, it does not solve the general problems. Therefore, we want to present another less academic example. It demonstrates the weaknesses in human vision and illuminates some further problems of associating the optical flow with the motion field. Moreover, it demonstrates that the optical flow is potentially ambiguous because of the so-called aperture problem. 63 Chapter 4. Motion Estimation The barber’s pole on the righta is a good example for a situation that appears to be simple to analyze but features various aspects of complex motion that are demanding w.r.t. motion estimation and only to handle if specific, rather restrictive, assumptions are made. The barber’s pole is a cylinder with diagonal running colored stripes (of a single orientation) rotating around its axis of symmetry. A human observer has the illusion of a motion upward despite the fact that he knows that the cylinder is spinning to the left. This phenomenon is a manifestation of the aperture problem. To put it simply (we will discuss the details later), it describes the following: If we have access only to a limited field of view (the aperture) of a moving object and this object has a texture that exhibits only a single orientation, then we can not say anything about motion along this orientation. a The electronic version (PDF with JavaScript enabled) of this thesis is needed to see the described optical flow estimation phenomena; zoom in to focus easier on the different apertures; click on the pole to temporarily stop the animation Figure 4.2: The barber’s pole illusion demonstrates the general ambiguity of optical flow Thus the motion estimate becomes ambiguous, as only the vector component of the motion normal to the orientation can be determined. Hildreth [Hil82] found an elegant rule that the human vision system applies to come to an unique estimate: The constructed motion field is that which is compliant with the apparent brightness changes and of maximum uniformity within the aperture. Focusing on one of the small rectangular apertures on the right side of figure 4.2, we construct a flow field pointing from right to left, because it is more uniform: there is a discontinuity in the direction of flow only at the shorter vertical edges, while along the longer horizontal there is no discontinuity in the direction of the flow field. If we focus on a larger aperture centered between the pole and the small clippings, our visual system constructs a mixture of both motions which runs normal to the orientation of the stripes. This type of flow field corresponds to a motion estimate known as normal flow which we will discuss in section 4.1.3. It is important to understand that this ambiguity is not a problem of human vision only and not specific to the rather seldom cases of optical illusions. With only the sequence given on the current page one just can not be sure which motion is the true one. Also physical assumptions about possible motions do not help. While a rotation is a good explanation for the horizontal motion, the same is true for the 64 4.1. Optical Flow and Range Flow vertical one, if one assumes a striped, colored ribbon moving like a belt drive. While the single orientation of the barber’s pole is rather unnatural, weak textured regions and step edges in images occur quite often, and the scale and size of the analyzed region determines if an aperture problem exists, given the unavoidable uncertainty (i.e. noise) in the measurements. The barber’s pole features some other typical problems of motion estimation. There are specular reflections on the cylinder. In the upper part of the pole they are of a magnitude such that the color of the stripes is occluded. Within a small neighborhood along the occlusion boundary there exist two motions. The one of the stripes and that of the fixed specular highlight (a zero-motion). A similar situation we have in the lower part of the pole. The specular reflections are of less magnitude and are transparent superimposed on the moving stripes. This time the two motions are not along the boundary but spread over the area of specular reflection. Remembering the discussion about robust estimation, we realize that this could be handled by a two motion model or a robust estimator. However, both approaches will fail for 3 motions: Imagine the pole protected by some glass tube. The reflections on it might transparently superimpose an additional layer of motion in the surrounding scenery. Another problem is that we might not be interested in the rotation of the pole but only in translations of the pole’s position. For example a computer vision system installed on an automobile, might not know anything about barber poles. How to decide if the apparent motion is of relevance or not, if all information accessible is a sequence of gray or color valued images? Now that we have illustrated some of the problems in recovering the physical motion field from optical flow we will show how the PMD-signal can be used to come to a motion estimate that is more robust with respect to the motion field we are interested in. We will use an optical flow based approach to motion estimation. The advantage of optical flow compared to correspondence based methods is that they are inherent continuous. Continuous problems can be tackled in a profound way with the mathematical apparatus of analysis and in particular calculus. 4.1.2 Brightness Change Constraint Equation How to describe optical flow mathematically? Let the point p belong to a small image patch in the 2D-image plane g. If the patch moves at constant velocity f = uv along 65 Chapter 4. Motion Estimation a line and does not rotate, i.e. does a translation, then the motion of p is described by " # x(t) p(t) := = p(0) + f t . (4.1) y(t) If the motion is different, i.e. in some way accelerated (like for a rotational or curved motion), p(t) shall be a first order approximation of the true motion, valid for small t. Lets assume that the image patch just changes its position over time but not its appearance, i.e. its gray values (if g is a gray value image). The constancy of the gray value texture along with the motion of the image patch may be stated as g(p(t), t) = const . (4.2) If we take the (total) derivative in time of equation (4.2), applying the chain rule and considering (4.1) we find: ∂g dx ∂g dy ∂g dt d g(p(t), t) = + + = gx u + gy v + gt = 0 dt ∂x dt ∂y dt ∂t dt −→ or (∇g)T · f + gt = 0 T (4.3) T (∇st g) · f̃ = 0 , where f̃ = [u v 1] . (4.4) Equations (4.3) and (4.4) are equivalent formulations of the well known brightness change constraint equation (BCCE). f is the velocity of a point p(t0 ) in the image patch, ∇g and ∇st g are the spatial and spatiotemporal image gradient at p(t0 ) at time t0 , and gt the partial derivative in time at the same point. The BCCE is valid only inside the patch: At the borders there is a velocity discontinuity as well as a potential discontinuity in the image signals; the spatial and temporal derivatives on g are not defined on both sides of the border. This is of relevance, because the continuous formulation of the BCCE is going to be discretized and the patch border in general does not coincide with a pixel border and thus has a ”spatial extension”. The BCCE belongs to the inverse problem of finding the model parameters f for given data ∇st g. It relates the spatiotemporal image structure with the sought optical flow velocity vector f , which consists of two unknown scalar values. One equation is not enough to solve the problem uniquely, but it constrains the solution to a line in the (u, v) flow space. However, if the image is constant in the neighborhood of p, ∇st g is a null vector, such that the BCCE is fulfilled for arbitrary f and therefore gives no constraint at all. To come to an unique solution we may take additional points 66 4.1. Optical Flow and Range Flow pn of the moving patch into account, each one related to a BCCE (4.4), yielding a system of linear equations G∇ · f = 0 , (4.5) where the matrix G∇ contains in its row vectors the respective spatiotemporal derivatives of g at the points pn . It depends on the signal g, if we succeed in finding an unique solution: if ∇g is linear dependent on gt , i.e. there is only a single orientation in the data, all constraining equations are equivalent and the system is underdetermined. Only if there are exactly two linear independent equations in the system, the solution will be unique. Due to noise this will never be the case for real world data, and typically the system is overconstrained; we may compensate for the given uncertainty in the data by writing G∇ · f ≈ 0. A solution can then be found only from a probabilistic point of view, trying to minimize the error in the estimate f̂ of f , by e.g. a (total) least squares approach (we will discuss this in section 4.1.6). We realize that in general, motion estimation from image sequences is an ill-posed inverse problem, i.e. a problem that does not fulfill the postulates of Hadamard [Had02] about well-posedness: it might not have a solution in the strict sense (i.e. only a probabilistic estimate can be given), the solutions might not be unique and/or might not depend continuously on the data, in some reasonable topology. This is both true for estimation of optical flow and even more for the estimation of the motion field. Only if one can assure that the various implied assumption are true (like the rigidity of observed objects, which move in conformance to a specific motion model) either by a specific experimental setup or by a sophisticated analysis of the image content, the problem may become partially well-posed. In general any quantitative motion estimate is associated with an uncertainty and therefore motion estimation algorithms should supply a confidence measure, describing the accuracy of the estimate. In the following we will not address the additional problems involved with noise explicitly, but always keep in mind that there is noise and that all real world flow estimates are subject to noise. For example if we speak of two equal BCCE (4.4) for two different points, equality is to be understood as relative to the SNR of the data. 4.1.3 Aperture Problem We have noted that the BCCE (4.4) is underdetermined. The system (4.5) is underdetermined too, if the single equations correspond to samples of an image patch g of 67 Chapter 4. Motion Estimation a single orientation. To clarify this we look at the time evolution of such an image patch, which corresponds to a rank one signal : Given the vector n = nn12 normal (knk2 = 1) to the single oriented texture in the image patch g, and the differentiable function s : R 7−→ R, we define g 1 (x, y, t) := s([x y t] · ñ) = s(l) , where ñ ∈ R3 is n extended for the temporal dimension. We find ñ3 by substituting g 1 for g in the BCCE (4.3) T (∇g 1 ) f + gt1 = n1 s0 u + n2 s0 v + ñ3 s0 = s0 (nT f + ñ3 ) = 0 (4.6) ds and solve for ñ3 = −nT f = −fn , if dl s0 is not zero, i.e. s(l) must neither be constant nor at an extremum. such that we may cancel the derivative s0 = All equations for all points at all times within the image patch are of the form (4.6) and therefore equivalent. This is the aperture problem of optical flow. We can only determine the flow component fn normal to the orientation of the image, i.e. in the direction of n. We may express the raw normal flow vector fn = fn n in terms of the spatiotemporal image derivatives by the following reasoning ñ3 s0 = − fn s0 = gt1 1 ∇g = s0 n = s0 knk = s0 2 2 2 =⇒ fn = − gt1 k∇g 1 k2 , n= ; ; ∇g 1 k∇g 1 k2 g1 fn = − t0 s 1 0 s = ∇g 2 and fn = − gt1 ∇g 1 k∇g 1 k22 . (4.7) If the patch is not of single orientation we might find the full flow f by taking other points into account. If we take other points into account we have to be sure that they are inside the image patch and not on or beyond the motion boundary, as otherwise the BCCE is not valid anymore. Thus we can not just blindly extend the region around our point of interest, to solve for the aperture problem. The problem that we need to take additional points into account, but are restricted to a region to choose this points from, which is in general unknown and depends on the flow itself, is referred to as generalized aperture problem. And while we would like to take as many data points into account as possible, to achieve a good estimate w.r.t. noise in the data, we are restricted in doing so for the same reasons. There are several ways to deal with the generalized aperture problem, and to find the best flow 68 4.1. Optical Flow and Range Flow estimate under random, or more precisely, generic conditions is a topic of ongoing research [Pap+06; Tel+06; Gov06; Xia+06]. Equation (4.4) is the simplest formulation of the BCCE, which may be extended for more complex motion models like affine flow (see e.g. [BA96]) and for multiple motions (see [MSB01; Bar+03; Stu+03]). So far we totally neglected the specific properties of the PMD-signal. For the amplitude-signal of the PMD, the BCCE is a rather bad approximation. A PMD-camera uses an active illumination and therefore motion in depth will involve a major change in the optical power irradiated on each sensor pixel and therefore violates the BCCE. We will discuss later how this can be handled, but first show how to use the range-signal of the PMD, because this also helps to understand how to deal with the amplitude-signal. 4.1.4 Range Flow Constraint Equation A time varying surface may be viewed as a depth function Z(X, Y, t), with the Cartesian coordinates (X, Y, Z). The coordinate of a point of interest on this surface may be described by P (t) = [X(t), Y (t), Z(X(t), Y (t), t)]T . The function Z(t) := Z(X(t), Y (t), t) with one argument is the Z-coordinate of the moving point on the surface, while the function Z with three arguments describes the time evolution of the surface. If we take the derivative of P (t) in time and assume a pure translation with constant velocity X(t) X0 + U ·t Y (t) = Y0 + V ·t = P (0) + f t , where f is the velocity vector of P , Z(t) Z0 + W ·t we yield by applying the chain rule on Z(t): U U d P = f = V = V , dt W U ZX + V ZY + Zt 69 Chapter 4. Motion Estimation where ZX and ZY are the partial derivatives of Z(X, Y, t). The herein embedded equation W = U ZX + V ZY + Zt (4.8) is called the range flow constraint equation (RFCE) [Yam+93]; an analogon to the BCCE, that deals with range instead of brightness values. It constrains the sought solution f for a given range-data-set Z(X, Y, t) at (Xt0 , Yt0 , t0 ). However, the constraint may only be applied if the surface is smooth with respect to the spatial resolution of the data set, as otherwise the partial derivatives of Z can not be calculated properly (on depth-edges they are not defined at all). Furthermore, with respect to the temporal resolution of the data set, the motion of the surface patch must be well approximated by a translation. In order to evaluate the RFCE, the partial derivatives of the depth function Z(X, Y, t) with respect to world coordinates X and Y have to be computed. Range data, as delivered by the PMD-camera, is given in sensor coordinates r(x, y, t), with r being the radial distance |poscamera − possurf ace | and x, y being the sensor-pixel coordinates∗ . After applying the transformation from sensor to world coordinate system the range-data is unevenly sampled. Thus computing the derivatives is no longer straight forward. One can apply TLS (total least squares) or OLS (ordinary least squares) estimation from a local firstorder approximation of the surface or resample the data on a Cartesian grid [SG02]. However, both methods have the disadvantage of being rather slow and in the case of resampling the necessary interpolation may introduce additional errors. In the following we will employ fast derivative filters to compute the world coordinate derivatives; Spies and Barron [SG02] showed that these have competitive accuracy when applied to real-world range data sequences. Because derivative filters are applied via convolution, which implicitly assumes an evenly sampled grid, we have to find a way to compensate for the deviation due to uneven sampling. Here the objects of interest are 2D surfaces in the 3D world Z = Z(X, Y, t). The data points are sampled at locations on the sensor array, which in turn depend on ∗ this is a simplified description of the sensor coordinates, for a more precise description see [Jus01, pp. 61 ff] 70 4.1. Optical Flow and Range Flow the 3D data points observed: x = x(X, Y, Z); y = y(X, Y, Z). The transformation from r(x, y, t) to world coordinates yields one data set for each of X, Y and Z on a sampling grid (x, y, t) (e.g. X = X(x, y, t)). For the total differential of the three data sets we obtain dX = Xx dx + Xy dy + Xt dt, dY = Yx dx + Yy dy + Yt dt, dZ = Zx dx + Zy dy + Zt dt . (4.9) Eliminating dx and dy from equation (4.9) results in: dZ = 1 − ∂(Y,X) ∂(x,y) ∂(X, Z) ∂(X, Y, Z) ∂(Z, Y ) dX + dY + ∂(x, y) ∂(x, y) ∂(x, y, t) . 1 ,··· ,An ) The expressions of type ∂(A ∂(a1 ,··· ,an ) denote the determinant of the Jacobian matrix of the functions A1 , · · · , An with respect to their arguments a1 , · · · , an , which we may abbreviate as the Jacobian hereinafter: ∂A1 ∂A1 ··· ∂a1 ∂an ∂(A1 , · · · , An ) . . . . . . := . . . ∂(a1 , · · · , an ) ∂A ∂An n · · · ∂a1 ∂an Differentiation in time and rearranging 0 = dZ dt = W yields ∂(Z, Y ) ∂(X, Z) ∂(Y, X) ∂(X, Y, Z) U+ V + W+ ∂(x, y) ∂(x, y) ∂(x, y) ∂(x, y, t) (4.10) This is the RFCE using derivatives on the (evenly sampled) sensor coordinates x, y (and time t) and thus can be evaluated by convolving the range-data with derivative kernels. Using equation (4.10) implies having transformed the radial range data r(x, y, t) to Cartesian world coordinates. Instead of applying the filters on the transformed data, it is possible to substitute X, Y and Z with the analytic expressions from the sensor model, so that the derivative filters are applied directly on r(x, y, t). 71 Chapter 4. Motion Estimation We use a pinhole camera model, thus X(x, y, t) = y r(x, y, t) f r(x, y, t) x r(x, y, t) √ √ √ , Y (x, y, t) = , Z(x, y, t) = e e e e := x2 + y 2 + f 2 with and f being the focal length. (4.11) After substituting X, Y and Z in equation (4.10) we obtain a somewhat bulky expression, which we simplify by further substitutions and rearrangements to 0 = where d = U (r x − rx e) + V (r y − ry e) + W d − r rt e (rx x + ry y) fr+ f √ e (4.12) This new variant of the RFCE reduces the number of necessary filter operations and simplifies error analysis regarding noise in r(x, y, t) and systematic errors introduced by the derivative kernels. 4.1.5 Aperture Problem Revisited The RFCE poses only one constraint in three unknowns. It describes a plane C in (U, V, W )-space with surface normal [ZX ZY 1]T . The best solution, given this constraint, is the minimal vector fr between the (U, V, W )-space-origin and C (see figure 4.4a). The raw normal flow for range data is analogous to that of the BCCE equation (4.7) fr = ZX 2 −Zt Zt ∇Z , [ZX ZY 1]T = − 2 + ZY + 1 k∇Zk22 (4.13) where ∇Z denotes the 3D spatial gradient of the range measurement Z at the surface point of interest in Cartesian coordinates. Different to the BCCE however a constant neighborhood (or rank null signal) is already sufficient to determine a normal flow, because the RFCE (4.8) contains the additional velocity term W . Three characteristic types of neighborhoods for range data are illustrated in figure 4.3. Depending on the neighborhood only a specific flow type can be estimated: plane flow If the neighborhood is of planar structure all constraints are linearly dependent and only the plane flow can be calculated (see figure 4.4a). This 72 4.1. Optical Flow and Range Flow Plane Flow Line Flow Full Flow Figure 4.3: Illustration of the three characteristic types of neighborhoods encountered in range data and the corresponding flow types that can be estimated in the respective neighborhood. a b C f ¢ c f f Figure 4.4: The number of independent constraint planes in velocity-space (U, V, W ) associated with the RFCE’s within an aperture determine the type of flow that can be estimated: a plane flow, b line flow or c full flow. means that if the considered field of view (the aperture) shows a pure plane (with no additional structure), only the movement perpendicular to this plane can be detected. line flow Linear structures in position space such as intersecting planes correspond to two distinct classes of constraint planes in the examined aperture (see figure 4.4b). The point on the common line closest to the origin gives the appropriate line flow. This line flow lies in the plane perpendicular to the linear structure. Any movement along the direction of the structure, e.g. an edge, can not be resolved by a local analysis; this corresponds to a rank one signals (that we introduced with the aperture problem for optical flow), where s0 (l) is not constant. 73 Chapter 4. Motion Estimation full flow On corner- or peak-like structures clearly all three components of the movement can be determined locally. In such a neighborhood three linearly independent constraint equations can be found. These correspond to three mutually distinct, i.e. non-parallel, constraint planes in the velocity space (see figure 4.4c). The full 3D-flow is readily computed by the intersection of the constraint planes, assuming that the flow is constant in the neighborhood. The existence of plane and line flow, being the only possible local estimates within specific neighborhoods, is the manifestation of the aperture problem for range flow. 4.1.6 Local and Global Flow Estimation 4.1.6.1 Local Total Least Squares Estimation In the following we will describe how a local estimate is obtained by means of a total least squares (TLS) technique. We use TLS because it can be computed efficiently and is the appropriate choice if not only the measurement vector b but also the model (or ”data”) matrix M of an overdetermined (linear) system of equations M x = b is contaminated by noise. Ordinary least squares (see section 3.2.1) minimizes the residual r = M x − b argmin kM x − bk2 , x which is appropriate if only b is subject to noise; in other words OLS corresponds to perturbing the measurements b by a minimum amount r such that b + r can be explained by M for the model parameters x. Total least squares minimizes argmin kDpk2 , subject to pT p = 1 and where D = [M , b] , p which corresponds to perturbing both M and b. This is the appropriate problem description for both optical and range flow because, as we will see, the entries of D correspond to the single Jacobians in the RFCE (4.10); the Jacobians, while they dependent on the explanatory variables (x, y, t), are also subject to noise because they are based on the PMD-signal. For a detailed study of TLS we refer to [VHV91] or to [GL80] for a less extensive discussion. 74 4.1. Optical Flow and Range Flow Assuming constant flow in a neighborhood of the point of interest, we get n constraint equations (4.10) if we take n neighboring samples into account. These can be written as dT f˜ = 0 , k = 1 . . . n T ∂(X, Z) ∂(Y, X) ∂(X, Y, Z) ∂(Z, Y ) k where d= ∂(x, y) ∂(x, y) ∂(x, y) ∂(x, y, t) (xk ,yk ,tk ) T T T ˜ and f = f 1 = [U V W 1] k (4.14) Stacking up all equations in the (spatiotemporal) neighborhood gives analogous to equation (4.5) D f˜ = 0 where data matrix D = [1d, . . . , nd]T (4.15) For real world data the rank of D is due to noise typically greater three; at least if more than three samples (n > 3) were taken to build the system of equations. It follows that the system (4.15) is overdetermined and there exists no exact solution. One way to deal with this problem is to recast it to an optimization problem and find a solution f˜ in a total least squares sense, i.e.: T T ˜ D f (4.16) = f˜ D T D f˜ = f˜ S f˜ −→ min. 2 Be aware that f˜4 = 1 imposes a constraint. As the more generic (but equivalent) constraint pT p = 1 is easier to handle, we replace f˜ by the generic parameter vector p. Restating the upper minimization problem in a continuous form gives Z∞ p̂ = argmin p w(x − x0 , t − t0 )(dT p)2 dx0 dt0 subject to pT p = 1 . (4.17) −∞ The weighting function w(x − x0 , t − t0 ) defines the data points d = d(x0 , t0 ) that are taken into account for the estimate and allows to weight these according to their position relative to the point of interest (x, t). A common choice for w is a three dimensional Gaussian function or rather Gaussian pdf , if we require the weighting to correspond to a probability distribution, such that w is normalized, i.e. R∞ 0 0 −∞ w dx dt = 1. A Gaussian weighting pays tribute to the generalized aperture problem, in that it implies the reasonable assumption that far from the point of interest the probability to find the same flow is lower than near to it. From a signal 75 Chapter 4. Motion Estimation processing point of view it additionally has the advantage, in contrast to a box filter, that it will not introduce any aliasing because of its limited bandwidth. Nevertheless, it is important that the sampling theorem was not violated by prior signal processing steps like pixel-wise multiplication, as explained in the previous chapter in the context of band enlarging operators. The requirement pT p = 1 can be incorporated by means of a Lagrangian multiplier λ in the objective or energy function E we need to minimize: " # Z∞ n X E= w(x − x0 , t − t0 ) (dT p)2 + λ(1 − p2i ) dx0 dt0 . (4.18) i=1 −∞ Taking the derivatives with respect to the parameter vector p we find the constraint equations for a minimum of E to be ∂E = 2 ∂pi Z∞ ! w(x − x0 , t − t0 ) di (dT p) − λpi dx0 dt0 = 0 ∀ i = 1 . . . 4 . (4.19) −∞ We may take the pi out of the integral, as they are assumed to be constant: Z∞ p1 0 Z∞ 0 w di d1 dx dt + . . . + p4 −∞ w di dn dx0 dt0 = λ pi ∀ i = 1...4 . (4.20) −∞ The right hand side follows if we require the weight function to be normalized. The 4 equations (4.20) can be written in matrix form: Z∞ S p = λp where S= w (x − x0 , t − t0 ) (d dT ) dx0 dt0 . (4.21) −∞ The real symmetric (and positive semidefinite) 4 × 4 matrix S is an extension of the structure tensor [HS99] for range flow introduced by Spies et al. [Spi+99]. Equation (4.21) is the eigenvalue equation for S. Each of the 4 eigenvectors corresponds to an extremum of the objective function E, and the (always positive) value of the corresponding eigenvalue is a measure for how close to zero the extremum is (the smaller the closer). In a discrete implementation, the components of S can be computed using standard image processing operations: S i,j = hdi dj i , i, j = 1 . . . 4 , 76 4.1. Optical Flow and Range Flow where h· · · i denotes an averaging operator like normalized averaging or plain binomial smoothing. Gradient Based Weighting While binomial (or Gaussian) averaging is an optimal choice for a smooth flow field and data terms subject to i.i.d. noise, it basically ignores motion boundaries. Because a large spatial gradient in range data typically coincides with a motion boundary (at the border of an object) the author proposes an additional weighting dependent on the magnitude of the spatial gradient to reduce the influence of occlusion on the estimate: 2 m , (4.22) wo (m, σ, µ) = exp − −µ σ p where m is the magnitude of the spatial range gradient ( d21 + d22 ), while σ and µ control width and center of the function. While the weighting does not solve the aperture problem, it can exclude data points near to an edge, which are likely to bear information that contradicts the motion model due to occlusion. If the aperture of the spatial weighting is small, probability is high that no data points corresponding to a different motion are integrated. wo( m, 1 , 0) 1 weight wo( m, 2 , 0) wo( m, 1 , 0.5) 0.5 wo( m, 2 , 0.5) wo( m, 1 , 1) 0 1 2 magnitude of spatial gradient m 3 Figure 4.5: Weight function to suppress influence of data at a motion boundary Furthermore, if the parameters are chosen correspondingly (see figure 4.5), wo can attenuate the influence of data points with a small gradient, which typically are more critical to integrate w.r.t. the noise in data. To understand this, let us look at the BCCE: (∇g)T · f + gt = 0. If ∇g is small and gt is large and both are due to noise rather bad measurements, then the BCCE implies an incommensurable large and erroneous normal flow, i.e. an outlier. The same is true for range flow. So it is reasonable to attenuate data points with a small gradient. If the neighborhood 77 Chapter 4. Motion Estimation contains only gradients of similar magnitude, the attenuation will not influence the estimate. On simulated test data we achieved improved flow estimates in the vicinity of occlusion boundaries using the above weighting. However, the parameters σ and µ were tuned manually and a detailed analysis based on real world data is outstanding. We also used 1-step bilateral filtering for weighting the rows of the data matrix D based on the the difference in depth between central pixel and neighbors (analog to equation (3.43), whereas we replaced wr (Ip − Is ) by wr (Zp − Zs )) but achieved no satisfactory results. The resulting flow estimates were very sensitive to the parametrization of the weighting function wr and the local structure of the data. Minimum Norm Solutions The structure tensor S contains all necessary information to determine the local spatiotemporal structure of the data. The estimate p̂ is found as the eigenvector to the smallest (or vanishing) eigenvalue λ4 of S, if and only if the aperture problem was solved by pooling over an adequate neighborhood, i.e. one that corresponds to a full flow. The sought solution is then given by fi = p̂i /p̂4 with i = 1 . . . 3 or equivalently by e41 1 ff = (4.23) e42 . e44 e43 The emn are the entries of the matrix of eigenvectors of S: e11 . . . e41 E S = [e1 · · · e4 ] = ... . . . ... , e14 . . . e44 where the eigenvector en belongs to the eigenvalue λn and the eigenvalues are sorted in descending order, i.e. λ1 ≥ λ2 ≥ λ3 ≥ λ4 (≈ 0, if model assumption were met). In the ideal case λ4 is zero. Deviations from this are either contributed to noise in the data or, if the eigenvalue is large relative to the noise, indicate a violation of the model assumptions of a local constant flow, like in the case of occlusion (corresponding to at least two motions) or multiple transparent motion. However, if the aperture problem was not solved by integrating over a set of data points, because for example the surface we are looking at is a pure plane, multiple eigenvalues will be rather small. For this case only the plane flow or the line flow can be estimated locally, as explained in the section 4.1.5. The normal flow is determined by taking the 78 4.1. Optical Flow and Range Flow eigenvectors of either all vanishing eigenvalues or of all non-vanishing eigenvalues into account. For line flow two eigenvalues vanish and the flow estimate calculated from the eigenvectors of the two non-vanishing eigenvalues is e21 e11 1 (4.24) fl = e14 e12 + e24 e22 . 1 − e214 − e224 e23 e13 For plane flow three eigenvalues vanish and the flow estimate calculated from the eigenvector of the single non-vanishing eigenvalue is e11 e11 e14 e14 fp = = (4.25) e 12 e12 . 1 − e214 e211 + e212 + e213 e13 e13 The equality 1 − e214 = e211 + e212 + e213 above holds true, because we optimized under the requirement pT p = 1, which every eigenvector needs to fulfill. Spies [Spi01] derives the general formula for minimum norm solutions† of TLS range flow estimates which is given by T T Pn Pq i=q+1 ein ei1 . . . ei(n−1) i=1 ein ei1 . . . ei(n−1) P Pn = , (4.26) f= 2 1 − qi=1 e2in i=q+1 ein where q is the number of non-vanishing eigenvalues of a n × n structure tensor S. The left expression calculates the flow based on the vanishing eigenvalues (and corresponding eigenvectors), while the right one does it based on the non-vanishing eigenvalues. 4.1.6.2 Regularization of Local Flow Estimates If we want to estimate the full flow for a neighborhood that allows only plane or line flow to be determined locally, we needs to make further assumptions, which give further constrains, in a global sense. This is typically done in a variational framework † With respect to TLS also the full flow estimate is a minimum norm solution as the structure tensor is rank deficient for all flow types. Only if the model assumptions are violated the structure tensor is of full rank. 79 Chapter 4. Motion Estimation that uses a data and a smoothness term, which together make up an energy that is to be minimized globally. The data term derives itself from constraints (or vice versa) of a kind like those we used for the local TLS estimation. The smoothness term is motivated by assumptions of global nature, like for example that the motion of a rigid object is smooth. Spies and Garbe [SG02] present a variational approach that is based on the local TLS-estimates. Restated for our problem, the regularized motion vector v̂ is found as the solution to the following minimization problem v̂ = argmin Z v ωc (P v − f )2 + α A 3 X (∇vi )2 dxdy (4.27) i=1 The projection matrix P projects the sought parameter vector p on the solutionsubspace to which the minimum norm solution f of the local TLS estimate belongs, such that solutions v that are distant to f w.r.t. this subspace increase the cost within the objective function. The parameter ωc describes the confidence in the local TLS-solution, and will be defined in the next section. The parameter α controls the overall smoothness of the regularized flow field v̂(x, y); it controls the influence of the smoothness term which penalizes flow fields that have large gradients in the vector components, i.e. are not smooth. The projection matrix P is calculated from the reduced eigenvectors bi of S (which are a basis of the minimum norm solution (4.26)): ek1 1 T (4.28) P = Bq Bq , Bq = [b1 . . . bq ] , bk = qP ek2 , 3 2 e i=1 ki ek3 where q is the number of non-vanishing eigenvalues. For further details on such (TLS) regularization techniques we refer to the work of [SG02; GHO99]. 4.1.6.3 Performance Issues As S is real and symmetric, the eigenvalues and -vectors can easily be computed using the Jacobi eigenvalue algorithm (Jacobi rotations), which has the advantage to be inherently parallel (and therefore a parallel implementation is possible [GL96]). However, the method calculates all 4 eigenvalues and corresponding eigenvectors of 80 4.1. Optical Flow and Range Flow the 4×4 matrix, while for a full flow it would be sufficient to calculate the eigenvector to the smallest eigenvalue. This implies that there is potential for a reduction of the computational effort. Barth [Bar00] shows that the minors of the structure tensor of optical flow (a 3 × 3 matrix) can be used to calculate a flow estimate with only a fifth of flops needed for conventional structure tensor analysis. However, its questionable if the algorithm can be efficiently extended for 4 × 4 matrices. Moreover, no normal flows may be calculated with this method (as it assumes only a single eigenvalue to vanish). If not a complete eigenvalues analysis is of interest, then partial total least squares (PTLS) [VHV91] may be used to directly calculate the minimum norm solution. Depending on the structure of the data a performance increase of up to 50% compared to a complete eigenvalue analysis of the structure tensor seems achievable. Computational most expensive however, is the regularization of the local flows with a variational method, if there is an aperture problem in the local neighborhood. The aperture problem can partially be solved by taking the amplitude information of the PMD-signal into account, as will be discussed in section 4.1.8. This increases the density of full flow estimates that can be achieved by the local method. If the further processing depends on a dense flow field, one may calculate the regularized flow field on a (spatially) downsampled grid, because the spatial resolution of a flow estimate is due to the employed aperture (to overcome the aperture problem) always lower than the original resolution of the data. The original resolution may be regained after regularization, by a cheap bicubic or B-spline interpolation on the original grid. A further speed improvement could be achieved by using multigrid methods similar to those Bruhn et al. [Bru+06] employed for accelerating various variational optical flow techniques. 4.1.7 Confidence and Type Measure So far we explained how to calculate a flow estimate appropriate to the specific neighborhood. But we also need to decide which neighborhood exists so we may choose the right flow type. Furthermore, we would like to get a measure for the likelihood of the estimate. For this we may exploit the spectrum of the structure tensor S. 81 Chapter 4. Motion Estimation The smallest eigenvalue λ4 of S corresponds to the residual of the TLS estimate. Therefore, if it is large relative to the noise level of the data samples kd of equation (4.14), it indicates a violation of the model assumptions and we need to reject the estimate. This may be accomplished by introducing a threshold τ which λ4 must not surpass if the flow estimate shall be accepted. Spies, Jähne, and Barron [SJB00] propose a confidence measure based on λ4 : 0 if λ4 > τ (or tr(S) < η) 2 ωc = . (4.29) τ − λ4 otherwise τ + λ4 Since the trace of the structure tensor S is essentially the sum over the squared magnitude of the spatiotemporal gradients in the neighborhood and the trace of a symmetric matrix is rotation invariant, tr(S) is a measure for structure in the data independent of its orientation. This is why in [SJB00] pixels are excluded from a further eigenvalue analysis if the trace falls below a specific threshold. While this is reasonable for the optical flow, this is not necessarily so for range data; the plane flow can be calculated for any kind of neighborhood, as depth information is an unambiguous feature of an observed object (compared to brightness information that is somehow fuzzy w.r.t. its significance to describe features of an object). The author proposes to use the amplitude information A of the PMD-signal to exclude pixels from a local flow estimate, because only if the range information itself is not reliable a flow estimate seems unreasonable. if λ4 > τ or A < κ 0 . (4.30) ωc (λ4 , τ ) = τ − λ4 2 otherwise τ + λ4 Only if we are not interested in the plane flow, we might use additionally or alternatively the trace of S to reject flow estimation. If the trace is small or equal compared to the noise level in the data, one might use it, without further analysis, to assume a plane flow which is approximately zero parallel to the optical axis (i.e. Z), such that the subspace of a probable flow vectors is the plane spanned by velocity vectors perpendicular to the optical axis (i.e. (U, V, W = 0)). The confidence measure is 1 if there is no residual in the estimate and quickly decreases to zero towards λ4 = τ as depicted in figure 4.6. Because least squares estimation is a maximum likelihood estimator if the model assumptions (Gaussian noise, constant flow) are met, the residual (or λ4 ) corresponds to the width of the 82 4.1. Optical Flow and Range Flow 0.8 ( ) 0.6 ( ) 0.4 ωc λ 4 , τ ωt λ q, τ 0.2 0 τ λ 4, λ q 3τ Figure 4.6: Confidence and type measure for range flow likelihood function and is therefore also a measure for the likelihood of the estimate. However, because a likelihood function is generally not normalized, the author doubts that ωc is a consistent measure of the likelihood over the various structures that an image sequence may exhibit. The type measure [SG02] allows to measure how well the dimensionality of the nullspace of S, i.e. the space spanned by the eigenvectors of vanishing eigenvalues, has been determined. This can be done by examining how much the smallest non-vanishing eigenvalue (λp > τ ) is above the noise-level dependent threshold τ . A normalized measure for each encountered type then is λq − τ 2 ωt (λq , τ ) = , (4.31) λq The type measure ωt depicted in figure 4.6 increases slowly from ωt (τ, τ ) = 0 converging to 1 in the limit of λq → ∞. 4.1.8 Combining Range and Intensity Data Apart from the range information discussed so far, the PMD sensor also returns light intensity information that is proportional to the amplitude of the backscattered modulated illumination. Do not confuse this amplitude intensity with the intensity information of an ambient illumination, like it is typical for conventional video systems, as this intensity signal is directly related to the distance sensor - reflecting surface. 83 Chapter 4. Motion Estimation As for common optical flow we can derive a constraint on the solution from this kind of intensity information. This intensity constraint is deviating from the classical BCCE (4.3), in that it depends on the depth coordinate. Haußecker and Fleet [HF01] showed that the classical BCCE can be augmented in a way, that the brightness can change according to a parametrized time-dependent function h: g(p(t), t) = h(g0 , t, l) , where g0 denotes the gray value at time t = 0 and l represents the parametrization. p(t) is the point of interest as defined in equation (4.1), but we substitute v for f to differentiate between the optical flow v and the Cartesian 3D-flow f . The total derivative in time on both sides yields the generalized brightness change constraint equation: d d g(p(t), t) = gx u + gy v + gt = (∇g)T · v + gt = h(g0 , t, l). dt dt Substituting the flow vector f for l and modeling the dependence of brightness on distance according to a power law we find a |r 0 | , (4.32) h(g0 , t, f ) = g0 · |r(t, f )| where r(t, f ) = r 0 + f t is the point of interest in Cartesian camera coordinates and r = |r| the radial distance measured by a PMD-camera. For e.g. a = 2 equation (4.32) corresponds to the inverse square distance law of a punctual light source. Differentiation with respect to t and approximation by a first order Taylor series valid for small t yields: dh rT ≈ −a g0 02 f , dt r0 where r 0 = [X Y Z]T |t=0 Taking into account the necessary coordinate transformation from sensor- to cameracoordinates and renaming g0 to A(x, y) (for the measured amplitude at a specific pixel [x, y]) the extended BCCE is found analogous to the RFCE (4.10) and (4.14) as ∂(A, Y ) a A X ∂(X, A) a A Y ∂(Y, X) a A Z ∂(X, Y, A) + , + , + , · f˜ = 0 (4.33) ∂(x, y) r2 ∂(x, y) r2 ∂(x, y) r2 ∂(x, y, t) This equation, in contrast to the classical BCCE, constrains all three velocity components and takes into account the brightness change due to a change in the radial distance. Analog to equation (4.12) it may be transformed to a formulation where 84 4.1. Optical Flow and Range Flow the derivative filters are applied directly on the measured range values r(x, y) using the pinhole camera model (4.11) 0 = At − rt e · (Ax x + Ay y) f 2 r + e · (rx x + ry y) +U· +V · ! Ax f 2 + x2 + Ay x y + y·b·e r x·d+ c ! Ay f 2 + y 2 + Ax x y − x·b·e r y·d+ c (4.34) + W · (f · d) where b := Ax ry − Ay rx , a d := A √ , r e c := √ f2 r e (rx x + ry y) + √ e e := x2 + y 2 + f 2 , power law exponent a and focal length f . The exponent a may depend on r itself. While we would expect it to be constant a = 2 for a Lambertian scatterer, radiometric analysis for the PMD19k showed, that it depends on depth (see section 5.2.3). Therefore, if the observed depth range is large, it may be necessary to replace a by a(r). The function can be determined by a radiometric calibration of the specific camera. While the upper equation looks rather complicated and computational costly, it contains various terms like for instance e or f 2 + x2 that are constant over time and therefore need to be calculated only once. And while we need to calculate 9 different derivatives by convolution (expensive compared to multiplication and addition) for the Jacobians in equation (4.33), there are only 6 needed for equation (4.34). Furthermore, if we are interested in the motion field only, an explicit calculation of the Cartesian range coordinates is not necessary anymore. The outer form and the sought flow f of equation (4.33) is identical to that of equation (4.14). And thus we can combine both information channels to estimate f in the manner of equation (4.16): T f˜ (SR + β SA )f˜ −→ min. =⇒ S p = λ p, with S = SR + β SA , (4.35) where R and A subscript the r ange and amplitude based extended structure tensors. β is used to weight the two data channels with respect to possible differences in signal- 85 Chapter 4. Motion Estimation to-noise ratio or other reasons that affect confidence in the data. This way both data channels can be joined in a single, combined structure tensor on which the eigenvalue analysis is applied. Thus we can increase the accuracy of the flow estimate, because we take more samples for a single estimate into account. And what is even more important, we increase the probability to estimate locally a full flow, because the additional structure of the intensity channels may solve the aperture problem. 4.1.9 Equilibration So far we were a little bit sloppy about the noise in the data. We assumed it to be i.i.d. in the components of the data vector d. If we assume the di (i = 1 . . . 4) to be i.i.d. random variables with variance σ 2 , then their covariance matrix is simply Cov(d) = σ 2 I and the structure tensor nS subject to noise is approximately given by‡ n S ≈ S + Cov(d) = S + σ 2 I , where S is the ideal structure tensor given there is no noise in the data matrix D. Under this premise the determined eigenvectors of nS are identical to that of S, because the addition of the scaled unity matrix only affects the eigenvalues but not the eigenvectors of the matrix. Therefore a structure tensor corrupted by i.i.d. noise yields an unbiased flow estimate if the model assumptions are met. Unfortunately in general the noise in the data vector entries is neither independent nor identical distributed, but correlated and of different variances, as a glance at equations (4.10) and (4.33) suggests and as it was discussed in [FPA99]. We will not handle the case of correlated noise in the data vector elements but rather stick to the less involved case of different variances and refer to [MM01; Müh04] for details on this topic. If the errors in the data entries are uncorrelated and of zero mean but different variance, then the covariance matrix is given by Cov(d) = diag( (σi2 ) ) and nS ≈ S + diag( (σi2 ) ) The eigenvector of nS are no longer identical to those of S but biased by the noise variance (for details see [VHV91]). We can correct for this if we multiply the data ‡ only in the limit of an infinite number of data samples the relation becomes an identity 86 4.1. Optical Flow and Range Flow matrix D by a right hand equilibration matrix W , which is just the square root of the inverse covariance matrix i.e. W W T = Cov(d)−1 = diag( (1/σi2 ) ) The equilibrated data matrix is then given as e D = D W = D diag( (1/σi ) ) i.e. the columns of the data matrix are weighted by the inverse of the respective standard deviation. The covariance matrix of the respective data vector ed is then the identity matrix. This is easy to see if we model the equilibrated data vector entries as e di = di + ni , σi where the di are the unperturbed entries and ni a noise term of variance σi . The expectation values of the structure tensor entries Sij are then di dj + ni nj + di nj + ni dj hdi dj i + hni nj i + hdi nj i + hni dj i e e h di · dj i = = σi σj σi σj Because the noise terms ni are of zero mean and uncorrelated all expectation values with a noise term ni are zero except for hni ni i = σi2 and therefore the covariance is just the identity matrix. Thus we come to an unbiased eigenvector estimate. However, we need to scale the found parameter vector entries epi of the equilibrated structure tensor, to find the parameter vector that belongs to the original (unequilibrated) problem: pi = epi /σi A flow vector to a full flow is then given by fi = epi σ4 /(ep4 σi ), ∀ i = 1 . . . 3. Finding the correct σi for the data vector is not trivial. For example the 4 data vector entries of the extended BCCE (4.34) correspond to 4 nonlinear scalar function di (v) in the random variables v being A, r and their derivatives. It should be possible to calculate the variance of the resulting random variables from (see [Jäh04, chap. 7.3.2]) σi2 ≈ (∇di )T Cov(v) ∇di , or if we take d as a vector valued function d(v) Cov(d) ≈ J Cov(v) J T , 87 Chapter 4. Motion Estimation where the Jacobian matrix J (of d w.r.t. v) is to be taken at the expectation value of d. The covariance matrix Cov(v) needs to be determined from the filter-masks of the employed derivative filters and an estimate about the noise level of the PMD-signals A and r. However, to speak of an expectation value of d within a necessarily structured and therefore not constant spatiotemporal neighborhood is dubious. The underlying pdf ’s are multimodal. Because (σi ) needs to be defined only up to a scaling factor, a normalization of the involved functions might help to solve the problem. However, the author is doubtful about how to handle this topic and a final examination is outstanding. Therefore we determined the equilibration factors empirically. We used a test pattern subject to Gaussian noise of specific variance σ 2 as input and determined the resulting noise σi in the data elements di of the RFCE. The constant equilibration factors are then 1/σi , because the input noise is only a scaling factor that can be ignored. This approach is however dependent on the specific test pattern and does not consider the local structure of real data. Therefore the presented results in this thesis are improvable w.r.t. equilibration. 4.2 Motion Artifacts Motion artifacts are a PMD-specific error that occurs around moving reflectivity or distance edges. In figure 4.7 we see two cylinders rotating about their symmetry axis. The cylinders are painted in black and white such that regions of articulate different reflectivity are side by side. Around these moving reflectivity edges we identify heavy errors in both amplitude and range image. The artifact regions have a clear border. It is not blurred like in the case of motion blur of conventional cameras. The faulty distance measurements span the whole unambiguity range of the sensor. Motion artifacts are a technical weakness of current PMD-sensor. The four crosscorrelation samples In = I(θn ) (2.9), that are used to calculate the range estimate (in a least square sense) based on equation (2.12) (for sinusoidal modulation), are not acquired in parallel (i.e. at the same time) but serially. The technical details of the several camera models differ, but non to the author known and commercially available camera model acquires all 4 samples at once. The PMD19k and the O3D 88 4.2. Motion Artifacts a b Figure 4.7: Motion artifacts at an irradiance edge. The rotating cylinders show articulate errors around the step edges in reflectivity: a amplitude image, b range image. (see figure 5.4) take 2 samples at one time§ , the SR3000 is documented to take only 1 sample per time. Therefore, the sensor pixels take cross-correlation samples of different surface patches if the observed surface is moving. Depending on the reflectivity texture and the spatial structure of the surface the calculation (2.12) might yield results that are very different from an averaged measurement (an average would be the ”best” possible measurement in such a situation). Starting from equation (2.10c) A G0 , I(θ) = m T cos (ϕ + θ) + π 2 we might model the correlation samples In as being proportional too In ∼ magn (C cos (ϕn + θn ) + 1) , (4.36) where magn is proportional to the irradiance at the pixel and C is the modulation contrast of the illumination. If we assume that two samples of the cross correlation In and In+2 are taken at the same time, then the phase ϕn and magnitude magn are identical for the samples at θn = θ + nπ/2 and θn+2 (the addition in the index n shall be in modulo arithmetic mod4). With the upper equation we can easily calculate the results for different kind of edges (at reflectivity and/or distance edges) if we substitute magn and ϕn by values of our § the two samples correspond to the outputs A and B of figure 2.3 that are phase shifted for 180◦ . Ultimately, the sensor needs to take all in all 8 samples (i.e. 4 snapshots) to compensate for manufacturing variations in the capacities of gate A and B. 89 Chapter 4. Motion Estimation choice. The results can be calculated from equation (2.12) or equally by doing a least squares fit of I(mag, ϕ, C, θ) = mag (C cos (ϕ + θ) + 1) to the sampling points In at θ = θn (as this is exactly what (2.12) corresponds to). a error at a reflectivity edge (1:3) at 3m distance cross correlation 6 τ fit I_mag1 I_mag2 samples 0°,180° samples 90°,270° least squares fit 4 τ1 magfit 2 0 0 2 4 6 phase b error at a distance edge (1m/3m) (inducing an irradiance change 9/1) cross correlation 2 τ fit τ mean I_τ 1,mag1 I_τ 2,mag2 samples 0°,180° samples 90°,270° least squares fit 1.5 1 magfit 0.5 0 0 2 phase 4 6 Figure 4.8: Illustration of how motion artifacts occur due to a least squares fit to sampling points of different populations at a a reflectivity edge and b at a distance edge (where the irradiance was modeled according to the inverse square power law and constant reflectivity) Figure 4.8 shows how the least squares fit produces the motion artifacts. τ 1 in 4.8a is the phase corresponding to the distance of 3m at the reflectivity edge. τf it (≡ ϕ) is the phase calculated from the fit. Notice that a positive phase/distance τ 1 induces a 90 4.2. Motion Artifacts shift of the correlation function to the left, such that the maximum of the correlation function is not at τ 1 but at 2π-τ 1 . Figure b shows how arbitrary such a least squares fit can be, if the used samples are of two different populations. The calculated value τ 1 (near to zero) is far from the mean value at 2m. Figure 4.9 gives insight to the quite interesting structure of the motion artifact errors at reflectivity edges: a plots the calculated range against the phase (or distance) and the ratio of the two irradiance magnitudes at the edge. b shows the resulting error in meters. c and d are analogous but show the calculated amplitude. Most interesting is that there are distances where hardly any error occurs. b a error in range phase at a gray value edge d c amplitude at a gray value edge relative error in amplitude Figure 4.9: Insight into the structure of motion artifacts at reflectivity edges Figure 4.10 shows the even more complex structure of the errors at distance edges. Again the irradiance was modeled according to the inverse square power law, while the reflectance of the surface was assumed to be constant. The relative error depicted in Figure 4.10b is defined as the quotient |ϕf it − ϕmean | / |ϕ1 − ϕ2 |, where ϕ1 , ϕ2 are the phases (or distances) at the edge and ϕmean their mean value. Also at distance edges we can find phase regions where the motion artifact error is less prominent. 91 Chapter 4. Motion Estimation b a phase at a distance edge relative error in range Figure 4.10: Insight into the structure of motion artifacts at distance edges: a calculated range, b relative error in range We showed the plots above, because they may be of interest regarding the planning of experimental setups that incorporate PMD-cameras. If there is a degree of freedom in the choice of the distance or reflectivity range that one plans to use for the setup, one might reduce the errors introduced by motion artifacts by choosing the distance and/or reflectivity according to the regions in the upper (or adequate) plots, where the errors are less dominant. If we have a sensor that is rather ideal in its correlation measurements, we might detect the presence of motion artifacts by exploiting the symmetry of the correlation signal: For an ideal correlation signal (2.9) the differences in the correlation samples D1 = I0 − I1 and D2 = I3 − I2 should be identically. Therefore we might just test on D = D1 − D2 < τ , where τ is a noise dependent threshold. If D is above the threshold it indicates a motion artifact, and we might process the respective pixel in an appropriate way. So far, we have not found a way to correct for the motion artifact errors properly. But actually we think that the best way to solve this problem is technologically within the sensor, just by taking all 4 correlation samples at once. While we know that this introduces other problems (like varying capacities of the necessary 4 readout gates) and might be in conflict with constrains to the manufacturing process, we think that it is essential for a robust and uncomplicated processing of sequences which image processes of (highly) dynamic content, e.g. structured objects that move rapidly. 92 Part II Experiments and Applications 93 Chapter 5 Testbench Measurements 5.1 Experimental Setup Most of the sequences we analyzed in the context of this thesis, were acquired with an experimental setup that we will describe in the following and to which we will refer as testbench. The setup consists of two motor-driven linear positioner tables mounted on top of standard industry tables. The industry tables are mobile and can be locked with a spoke. Both positioner tables hold object carriers. One carrier holds a TOF-camera the other a target. Between camera and target a zig-zag shader of black photo pasteboard has been installed, to avoid spurious reflections of the modulated IR light at the glossy positioner table rails; light emitted from the modulated illumination but reflected by other surfaces before it is reflected back by the target to the sensor pixels, has to pass a longer distance then light of direct illumination. The phase information of the irradiated light on the sensor pixels would no longer be consistent and the range measurement becomes corrupted. We used two arrangements of the tables to acquire the image sequences: a linear arrangement and a T-shaped one. The linear arrangement depicted in figure 5.1 was chiefly used for the calibration measurements as it allows to capture a depth range of 6m, as each positioner has a range of 3m. The depth range acquired was approximately from 30cm to 630cm . Besides calibration measurements also a pure translatory motion in depth can be acquired with this setup. The second, T-shaped arrangement depicted in figure 5.2 was used for motion measurements. It allows to acquire sequences of motion in the plane within an area equal 95 Chapter 5. Testbench Measurements Spurious reﬂections Target Camera Zig-Zag shaders Linear positioner tables Mobile industry tables with spoke Figure 5.1: Linear arrangement of the positioner tables for calibration measurements Target Camera Zig-Zag shaders Figure 5.2: T-shaped arrangement of the positioner tables for motion measurements to that of an isosceles triangle with 3m base and variable height (depending on the basic distance between the tables perpendicular to each other). Both camera and positioner tables are controlled by a standard PC. This allows a full automation of the experiments (besides the changing of targets and cameras on the carriers). The motor driven tables can be moved stepwise or continuous. The position accuracy is at least 1mm. We used both stepwise and continuous mode to acquire motion sequences. The stepwise mode allows to acquire several images of one position and therefore to do a statistical analysis of the temporal noise in the camera-signal and to separate it from the fixed pattern noise. Images may be denoised by averaging over a number of snapshots. Moreover, motion sequences taken in step mode do not suffer from PMD-specific motion-artifacts and motion blur. Sequences taken in continuous mode are therefore more realistic because they show these artifacts. By noting down the basic distance between target and camera, which we defined as the shortest distance between a point on the camera casing and 96 5.1. Experimental Setup a point on the target, we have ground truth information about the relative position of the target to the camera. We used three different targets shown in figure 5.3. Two of them, whiteboard and checkerboard target, have a plane surface and were used mostly for calibration. The whiteboard target consists of 8 photo-cards of a specified reflectivity of 84%. The checkerboard target is made of patches of photo-cards of 4 different reflectivities: 84%, 50%, 25% and 12.5%. For a more detailed description of the hardware used in the context of calibration we refer to [Rap07]. a b c Figure 5.3: Targets used for the experiments: a whiteboard target of high-reflectivity b checkerboard target and c pyramid target The third target is a pyramidal one. It consists of five wooden quadratic plates, put centered above each other. The bottom plate has a width of 20cm, the top one has 4cm, in between the width is decreasing with 4cm per plate. The depth of each plate is 2cm. The pyramid target was used for motion estimation. We acquired sequences with three different TOF-camera models. An overview on their technical specifications is given in figure 5.4. All three models are based on the principles of optoelectronic modulation-based time-of-flight measurement, that we presented in section 2.2, and to which we refer as PMD-technique. While the acronym PMD (photonic mixer device) relates to a specific realization of this technique protected by patent (held by PMDTechnologies GmbH), we still use it for all similar realization as there is no other common acronym for this technique. The R PMD[vision] 19k and O3D both use sensors from PMDTec, while the SR3000 is a development of CSEM/MESA. Little is known about the details of the SR3000 sensor, the main difference seems to be that the SR3000 a single tap sensor, i.e. it has only one storage site (i.e. capacity) per pixel, while the sensors of PMDTec have 2 storage sites per pixel. For different reasons in the context of this thesis mainly the PMD19k data was used. While the PMD19k is the oldest model it has a reasonable 97 Chapter 5. Testbench Measurements PMD19k PMD19k (PMDTec) SR3000 SR3000 (MESA) O3D O3D (IFM) Resolution [Pixels] 160 x 120 176 x 144 64 x 50 Pixel Dimensions [µm] 40 x 40 40 x 40 100 x 100 Focal Length [mm] 12.0 8.0 8.6 1 LED Array 1x LED Array, Laser 850 850 Light-Source λ [nm] ν [MHz] 2 LED Arrays 870 20 20, variabel 20, variabel 20 Frame rate [FPS] 1-15 10-20 1-50 Connection FireWire, Ethernet USB 2 Ethernet Dimensions LxWxH[mm] 220x210x55 60x50x65 55x45x85 Figure 5.4: Overview on the technical specifications of the used TOF-camera models resolution compared to the O3D. Furthermore, we needed to get access to the raw (or correlation) data of the cameras, and not only to the depth, amplitude and offset data, that have been calculated by the driver software of the camera. While there is a good documentation on how to access this data for the PMD19k and the O3D, nothing similar exists for the SR3000. While we could get access to the raw-data for the SR3000, we found that a depth map calculated from this data shows major systematic errors, which could not be corrected without further knowledge about the internal details of the camera. Because MESA would not give us any information about how to process the data, we worked preferentially with the PMD19k. 5.1.1 Power Budget Before we discuss the testbench measurements of the PMD-signal, we need to introduce the relevant radiometric terms and state the power budget for the PMD-camera system. We will give a concise description of results and refer for details on this topic to [Sch03, chap. 2.1.1] and [Lan00, chap. 4.1]. We are interested in the incident radiant energy Q(r) on a PMD-pixel within the exposure time T , in dependence of the observed object obj or rather object surface patch at a distance r of the camera. The active illumination i (the transmitting optic, e.g. a LED-array) is modeled as a light source emitting a radiant flux Φi in a solid angle Ωi . The light is reflected from an object obj with a surface that is assumed 98 5.1. Experimental Setup to be Lambertian. Specular reflection is neglected. The reflected light is seen by a sensor pixel p of size Ap . The PMD-camera’s receiving optic that projects the reflected light of the surface patch on the pixel is characterized by the corresponding solid angle Ωp of the pixel. Given the above model assumptions, the following equations hold true and relate illumination i, surface patch obj and sensor pixel p to each other: Φi cos(αi ) exp(−ka r) Ωi r 2 Eobj (r) = Ei (r) + Eb Ei (r) = Φ(r) = Eobj (r) Avir (r) η(r) % = Eobj (r) (5.1) Ωp r 2 η(r) % cos(αr ) Ωp % η(r) cos(αr ) = (Ei (r) + Eb ) 2 πr π Q(r) = Ep (r) T Ap Ep (r) = Φ(r) (5.2) (5.3) where the Ei (r) Φi Ωi αi used physical quantities are irradiance of active illumination on the scene/object at distance r [W m−2 ] radiant flux of illumination / sending optic [W ] solid angle of emitted radiant flux, characterizing the sending optic[sr] illumination angle: angle of the object’s surface normal against the direction of illumination [rad] ka absorption coefficient (Lambert–Beer law) [1/m] Eobj (r) overall irradiance on the object, including background irradiance Eb [W m−2 ] Avir (r) area of a virtual sensor pixel (i.e. its projection/image on the object surface) Φ(r) reflected radiant flux of a virtual pixel [W ] αr angle of reflection w.r.t. the sensor pixel [rad] % diffuse reflectance of the object surface, i.e. 1 − absorption coefficient − specular reflectance η(r) fraction of exposed pixel-area (< 1, if a surface patch is smaller than a virtual pixel; corresponds to small/distant objects or object edges) Ωp solid angle of pixel, characterizing the receiving optic [sr] Ep irradiance seen by a sensor pixel [W m−2 ] We learn from equation (5.2) that the deposit radiant energy Q(r) does not depend on the distance from object surface to camera but only on the distance from object to light source by Ei (r) (5.1). Because camera and illumination distance are approximated to be equal, Q still depends on the camera distance. Naı̈vely one might 99 Chapter 5. Testbench Measurements assume that Q(r) decreases not with r2 but (2r)2 . This is not the case because the solid angle Ωp seen by a pixel is independent of r. Only if the observed surface patch is smaller than the virtual pixel size Avir (r), which is determined by Ωp , the irradiance decreases additionally with η(r). Actually, this is quite obvious if we think of a scene with ambient (homogeneous) illumination (e.g. a landscape on a cloudy day): we will not perceive an object moving away as becoming darker until it is very small and begins to vanish (except for dust or foggy air, corresponding to a high ka ). Furthermore we note that Q is independent of the angle of reflection αr and only depends on the illumination angle αi , by cos(αi ). If the illuminations angle is approximately zero (i.e. the object surface is rather perpendicular to the straight line illumination–object), then small changes in the illumination angle (due to e.g. a translatory motion) will change Q only marginal because then cos(αi ) ≈ 1 − α2 ≈ 1. So the radiant power or energy of the active modulated illumination, that can be used to determine the distance r, decreases with 1/r2 , while the irradiance Eb of the background illumination (e.g. from sunlight) stays constant. This implies rather high signal dynamics. Taking into account the dependence of the noise in the range-signal on the background illumination, as discussed in section 2.2.2.2, we find here the most serious problems regarding the applicability of PMD-cameras. 5.2 Depth and Amplitude Analysis The TLS motion estimation technique presented in chapter 4 depends on range as well as amplitude data of the PMD-camera. Because the flow estimate is essentially based on derivatives of the input data, not so much the absolute value of range information is of major importance but rather the linearity of the range signal in depth (whereas there is also a dependence on the absolute range value in equations (4.14) and (4.34)). Therefore we need to do a calibration of the camera to come to a range measurement as linear in depth as possible. Rapp [Rap07] discussed some of the most important errors of PMD-cameras. We will present results only that are new compared to those in Rapp [Rap07]. 100 5.2. Depth and Amplitude Analysis 5.2.1 Fixed Pattern Noise First thing we did was to correct for the (temporal) constant depth (or phase) error in each pixel, known as fixed pattern noise. We described the approach in section 2.2.2.1 and used equation (2.26) to calculate a depth image E of the fixed pattern noise of our PMD19k model. Figure 5.5a and b shows the noise image and the corresponding histogram of the errors in depth. The histogram is centered at zero and has a b a 256 0 -0.015 0.015 c d e f g h i j Figure 5.5: Correction of fixed pattern noise: a noise image, b histogram of noise, c original and d corrected image of pyramid target at a distance of 86cm, e & f the same at 134cm. g-j same distances but for input images with a higher level of temporal noise. 101 Chapter 5. Testbench Measurements standard deviation of approx. 0.425cm. Thus approx. 95% of the pixel errors lie within a range of [−0.85, 0.85] cm, assuming normal distribution of the errors. The images 5.5c-f show how the subtraction of E from two depth maps taken at different distances (c:86cm and e:134cm) of the pyramid target, improves the depth accuracy (image d and f). The input images c and e were temporal denoised by averaging over a set of 100 images taken with an exposure time of 5ms. The images 5.5g-j show the same as c-f but the input images g and i are single shots (not averaged) taken with 5ms exposure time. The images shown are cropped and scaled (in size and color range) versions of the original images to reveal the changes in detail. While the fixed pattern noise correction improves the data in all cases its clear to see that for bad illumination conditions (or too short exposure times) as for g and even more for i, the temporal noise dominates and the improvements due to correction for fixed pattern noise are marginal. 5.2.2 Range Calibration The sequences for range calibration were taken is step mode. Step mode means that we acquired at a specific distance [target–camera] a subsequence of a specific number of frames (here 100) of the steady scene and then moved the target (and/or the camera) for a specified distance, to take another subsequence. From each subsequence a mean image was calculated by taking the arithmetic mean in each pixel. Furthermore the variance in each pixel was calculated giving a variance image. For range calibration we took several sequences at different exposure times in a depth range from 0.26m–6.22m, at 150 positions with a step-size of 40mm. The different exposure times were necessary to have for all distances images that are neither in underexposure nor overexposure, and to keep the noise level for the calibration rather low and constant. We used the mean images as input to the calibration. Figure 5.6 shows the error in range measurement calculated from the ground truth data and the whiteboard sequences. We tracked eight positions on the target over the measured depth range using a pinhole camera model (4.11), with a focal length of 12mm, as specified for the PMD19k. The positions are marked as magenta points in figure 5.7. We interpolated the depth values at the tracking positions (these are not necessarily on the pixel grid) using cubic interpolation. The ground truth distance, denoted as range in the following plots, is the radial distance calculated 102 5.2. Depth and Amplitude Analysis from the pinhole model using the known position of the targets. The difference between tracked range values and ground truth distance is the range error shown in figure 5.6 as crosses. The range error offset of approx. 0.62m is that large, because the offset-calibration of the camera was erroneous (possibly due to a firmware upgrade). Range error and its Fourier approximation 0.66 pos1 pos2 pos3 pos4 pos5 pos6 pos7 pos8 error in range [m] 0.65 0.64 0.63 0.62 0.61 0.6 0.59 1 2 3 range [m] 4 5 6 Figure 5.6: Range error (crosses x) and approximation (line —) of range error by 3 modes of a Fourier series for 8 positions on the whiteboard target 2.44 20 2.42 40 2.4 60 2.38 80 2.36 2.34 100 2.32 120 20 40 60 80 100 120 140 160 Figure 5.7: Range image with the tracked positions (magenta dots) on the whiteboard Based on the range errors we fitted a Fourier series equation (2.21) with only the first, second and fourth harmonic. The first harmonic has a wavelength of 7.5m, the unambiguity range of the camera. The approximation, the continuous lines in figure 5.6, fits the error quite well. We used the approximative inverse function of the range error (2.23) to correct for the periodic error in the depth measurements. Figure 5.8a shows the remaining error, if the calibration is applied on the same data set from which the calibration coefficients were calculated. We notice that the approximation errors, indicated by the circles (o) all near to zero, are marginal; the approximation error is the difference between the error in the Fourier fit (indicated 103 Chapter 5. Testbench Measurements by crosses (x)) and the error in the corrected (or calibrated ) range measurement. Therefore the remaining error of the calibration (line —) is approximately equal to the difference between the Fourier fit and the original errors. a Range error after calibration 8 remaining error fit error inversion error 6 error [mm] 4 2 0 −2 −4 −6 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 range [m] b Range error after calibration 12 10 8 error [mm] 6 4 2 0 −2 remaining error fit error inversion error −4 −6 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 range [m] Figure 5.8: Remaining errors for 4 different tracked target positions after a applying the calibration coefficients on the same position they were calculated from and b using a single set of coefficients for the positions Figure 5.8b shows the errors of the calibration, where we used a single set of Fourier coefficients for the different tracked target positions. The errors were corrected for the position 1,3,5,7, two in the center of the target and two more peripherally, while the calibration coefficient set was calculated from position 2. The errors are clearly 104 5.2. Depth and Amplitude Analysis increased but still small against the original errors: the standard deviation σ for the set of 4 position is 3.3mm at a mean error of 4.7mm, while σ of the uncalibrated errors is 15.1mm. The errors introduced by the approximative inversion are rather small again. Actually, a calibration based on data from tracked points on the target, is not optimal: we intermix the errors of various pixels and can not separate the influence from a change in depth from that of a change in irradiance completely (even by compiling sequences of different exposure time). A more advanced experimental setup would allow to change the depth information at all pixels over the whole unambiguity range. Thus the need for tracking individual target positions would cease to exist and the calibration could be done per pixel as well as the correction of the range errors. For each pixel an individual set of calibration coefficients could be calculated. As we need only 7 real numbers (1 offset and 3 complex Fourier coefficients) per pixel, the memory requirements would be rather low even for an embedded system. 5.2.3 Interdependence of Amplitude and Range For amplitude analysis and its interdependence to the range measurement we used the checkerboard target. Similar to above we tracked 8 positions on the target. But this time we used a single sequence at an exposure time of 12ms. Figure 5.9 shows a range and an amplitude image of the checkerboard target and the tracked positions on it (the magenta dots). b a 350 2.1 2.09 20 2.08 2.07 40 Central Positions 20 250 40 84% 50% 2.06 60 Overexposure 2.05 300 60 200 25% 12½% 150 2.04 80 2.03 2.02 100 80 Border Positions 100 100 50 2.01 120 20 40 60 80 100 120 140 160 2 120 20 40 60 80 100 120 140 160 0 Figure 5.9: Tracked positions (magenta dots) on the checkerboard: a range image, b amplitude image 105 Chapter 5. Testbench Measurements The most notable difference between the range image 5.9a of the checkerboard and the whiteboard target figure 5.7 are the range variations of several cm within the smooth plane of the checkerboard target. Another interesting, while not intended feature, is the bright white spot between 84% and 25% patch. Its corresponds to a (partial) overexposure due to a defect in the checkerboard target: there is a small gap between the reflectivity patches and the underlying metal plate reflects the light in contrast to the patches not (approximately) Lambertian but specular. While the irradiance is not that strong that the PMD-pixel capacities are all saturated (otherwise the amplitude was zero), the partial saturation leads to a lower amplitude and a higher range measurement compared to the 84% patch above. Right to the bright spot there is a similar defect in the target (between 50% and 12.5% patch), but the gap is smaller and the specular reflection does not cause overexposure. Here both range and amplitude measurement are increased relative to the neighborhood. Looking at the range-error plot figure 5.10 we easily identify the correspondence between reflectivity differences and differences in the range measurements, while the ground truth distance has not changed. A lower reflectivity coincides with a smaller range measurement. Between the 84% and 12.5% reflectivity patch there is a difference of about 5cm. The reflectivity dependent range difference (RDRD) is quite constant over the acquired depth range. However, near the camera and far away from it, there are obvious variations in the RDRD, which we want to investigate a little further. Range measurement errors for various reflectivities and positions 0.4 0.35 error [m] 0.3 84% 50% 25% 12.5% 84%brdr 0.25 0.2 50%brdr 0.15 25%brdr 12.5%brdr 0.1 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 range [m] Figure 5.10: Range measurement errors for various reflectivities and positions 106 5.2. Depth and Amplitude Analysis Depth over ground truth depth 84% 50% 25% 12.5% 84%brdr measured range [m] 6 5 4 50%brdr 25%brdr 3 12.5%brdr 2 1 2 3 4 range [m] 5 6 Figure 5.11: Measured range over ground truth range for the tracked positions For large distances the statistical errors in the range measurement for the low reflectivity patches increase due to underexposure particularly strong, with an evident bias for too small distance values. While we are not sure about the reason for the bias, which is even more clear to see in the range plot 5.11, we know that the increase in statistical error is due to the noise dependence of the PMD-signal, that we described by equations (2.29) and (2.30) and illustrated in figure 2.6. The variations in the RDRD near to the camera can be exemplified best for the 50% center and border position (light and dark green lines in figure 5.10). We find the positions with increasing ground truth range to converge to a common measurement range value (at approx. 2.4m) to keep a constant RDRD to the 84% positions, which behave similar. Thus we not only have RDRD errors but also an explicit dependence of the range on the position (central or border). The reason for this are near field errors of the camera, because its illumination system is neither punctual nor of rotational symmetry. Generally speaking the assumptions we made for finding equation (5.3) are only valid approximatively in the far field of an extended light source. The PMD19k has two LED-arrays for illumination, that are aligned along the horizontal axis (see the small picture in figure 5.4). Therefore in the near field of the camera the illumination can not be approximated as a punctual light source: while a border position pixel is irradiated by a mixture of light that traveled (essentially) two different distances, a central pixel sees light of only one distance. The demodulation is ideally a linear process, therefore the resulting distance is the arithmetic mean of 107 Chapter 5. Testbench Measurements the, simplified spoken, ”phase information” of the generated photo electrons. Because the irradiance Ei (5.1) depends on the distance from the light source, there will be always more electrons in the mix corresponding to photons that took a shorter path. Therefore the border position pixels are biased for too short distances in the near field of the camera. This is exactly what we see in figure 5.10, most pronounced for the 50% border position that has a too small range (compared to the central positions) up to at least 2.2m. Another reason for the RDRD variations is overexposure. While for example the dark green line (50% center position) is at 1m not any longer in total overexposure, still some of the correlation samples may belong to (nearly) saturated capacities that corrupt the linearity of the demodulation, as we have explained in section 2.2.2.1 on page 27. Therefore the dark green line at approx. 1m or the dark blue line at 1.4m show too high range measurements. This partial overexposure can be identified even better in the amplitude signal A(r) shown in figure 5.12. While total overexposure corresponds to zero amplitude, partial overexposure shows varying amplitude. We marked the region of partial overexposure by reddish ellipses in figure 5.12. Looking at the log-log plot 5.12b, we see that the amplitude for the high reflectivity patches, reaches a maximum after which it decreases first slowly (small negative slope) to decrease faster for higher distances. a 350 84% 50% 25% 12.5% 84% 200 brdr 50%brdr 150 25% brdr 12.5%brdr 100 a= 2 8 log2(amplitude) 250 6 84% 50% 25% 12.5% 84%brdr 4 2 50%brdr 25%brdr 0 50 0 Amplitude over ground truth depth (log-log) 10 300 amplitude b Amplitude over ground truth depth -1 1 2 3 4 range [m] 5 6 -2 -1 12.5%brdr 0 0 1 log2(range) [m] 2 3 Figure 5.12: Amplitude signal A of the PMD19k: a amplitude over radial distance, b log2-log2 plot of a. The reddish ellipses indicate the range of partial overexposure. The RDRD (including near field errors and partial overexposure) are of the same magnitude as the periodic systematic distance dependent errors, that we corrected 108 5.2. Depth and Amplitude Analysis in the previous section. Being constant within only a limited distance range not too near to or too far from the camera, it is an open question if and how they may be corrected. The bluish ellipse marks a region where we can observe an unusual increase of the amplitude. However, this is only due to a weakness in the experimental setup and an imperfect tracking of the patches: the tracked positions run into false patches; and the manufacturing defects in the target (shown in figure 5.9) induce due to the specular reflection in the surrounding pixels an increased irradiance. Hence we can ignore the measurements in the ellipse as corrupted. Power law exponent corresponding to amplitude signal 2 84% 50% 25% 12.5% 84%brdr D[log(amp(log(r)))] 1 0 −1 50%brdr −2 −3 −4 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 range r Figure 5.13: The scaling exponent for the amplitude modeled locally by a power law In figure 5.13 we plotted the scaling exponent a of the power law model (4.32) that we would like to use for motion estimation. We calculated the exponent as the first derivative of the log-log data, by resampling the (binomially smoothed) logarithmic amplitude-range data (of the log-log plot 5.12b) using spline-interpolation at evenly spaced sampling position in the logarithmic domain. The derivative may then be calculated as forward difference of the resampled amplitude measurements multiplied by the sampling frequency. This is necessary to compensate for the increased noise in the measurements at farther distance and to adapt for the scaling behavior w.r.t. the power law. Unfortunately the exponent is not constant, not even beyond the overexposure range in the far field of the camera. Neither a is approx. 2 as Q(r) (5.3) would suggest. The pale green diagonals in figure 5.12b indicate the resulting scaling exponent, if we assume the measured amplitude A(r) to be proportional to Q(r). We suppose that the deviation is partially due to the demodulation contrast (see section 2.2.1.3) 109 Chapter 5. Testbench Measurements which is not independent of the deposit radiant energy Q(r), as discussed in [Lan00, chap. 5.2.4]. Therefore A(r) is not proportional to Q(r) (see equation (2.18)). Furthermore, if one excludes the range of (partial and total) overexposure one might identify a kind of periodic modulation of the scaling exponent, similar to the periodic range error; but this is very speculative. A final evaluation seems within the limitations of the current experimental setup not possible. For similar reflectivities and not too bad under- or overexposure the scaling exponents are similar too (compare the blue (84% central), green (50% central) and magenta (84% border) line within a range from 2.7–5.2m in figure 5.13). However, the uncertainty interval is rather high (standard deviation is approx. 0.2). Therefore its questionable if a distance dependent scaling exponent a(r) (calculated from the data for a specific reflectivity or averaged over a reflectivity range) can improve the results of motion estimation essentially. So far, we used for motion estimation only constant scaling exponents. 110 Chapter 6 Applications 6.1 Still Image Processing We applied the new two state smoothing, that we described in section 3.3, on various real world scenes and test patterns. A final evaluation is however outstanding. Anyhow we want to present some intermediate results because they look promising. In figure 6.1 you can see a (partial) GUI-snapshot of the image processing software Heurisko we used to implement the two state smoothing. It shows different data sets of a scene with a medium level of noise acquired by the PMD19k at 20ms exposure time. The upper left image is the range map of the scene which is rich in fine structures. There are errors in the distance data that are much higher in magnitude than the regular noise, i.e. outliers (some of them are marked by the red boxes with round edges). The observed distance range is from about 2.0–5.5m. The image on the right shows an average over 10 frames, that we used as a kind of ”ground truth”. However, while noise is strongly reduced the outliers are still present. They occur typically either due to under- or overexposure. For example the wooden pillar in the very front of the scene, reflects the light of the PMD partially specular, what leads to overexposure and therefore wrong range estimates. In the lower right of figure 6.1 you find the weight image we used for two state channel smoothing. It is the square root of the amplitude image. We found the square root (or alternatively the logarithmized amplitude) to yield better results than weighting directly with the amplitude. The lower left of figure 6.1 shows the result of channel smoothing using 50 channels and a binomial 3 × 3 filter mask. Actually, the results look similar using 25 channels; 111 Chapter 6. Applications Figure 6.1: Two-State-Channel smoothing applied: original range image (dImg), arithmetic average over 10 frames (dMean), two state smoothed range image (dDec) and the used weighting image (dScAmp). however the very details of the image are partially blurred. We notice that a large part of the outliers has been removed, the image is denoised and the fine details of the image still exist. The algorithm could however not (completely) remove the big faulty area on the wooden pillar. It is just to big to be treated as an outlier using a filter mask size of 3 × 3. Simple binomial smoothing depicted in figure 6.2a is, besides the blurring of the object features, also quite problematic, because it also blurs the outliers and therefore introduces new errors in their neighborhood. Conventional B-spline channel smoothing, shown in figure 6.2b, removes less outliers because it is missing the additional confidence information from the amplitude data. 112 6.1. Still Image Processing a b Figure 6.2: a Simple binomial smoothing introduces additional errors around outliers and blurs image features/details. b Conventional B-spline channel smoothing removes less outliers. In figure 6.3 we depicted a detail of the scene (marked by the central red box in figure 6.1) denoised with different methods. Particularly notice the thin line coming from the top (a cable): Channel smoothing conserves all the details while binomial smoothing destroys them. The averaged range image b and the channel smoothing result d are very similar, but in d only 3 outlier pixels are left, while b is rather identical to a w.r.t. outliers. a b c d Figure 6.3: Details of the scene: a original with noise and outliers, b temporal average over 10 frames, binomial smoothing (c) and channel smoothing with 50 channels (d) applied to a Using the mean image as ground truth and the amplitude data as a confidence measure we calculate a quality measure Q: max((r − m)2 , 2.0e-6) qch = clip log , −3, 3 , (sch − m)2 Qch = hqch i , (6.1) 113 Chapter 6. Applications where r, m and s are the pixel values of the original range image R, the mean image M and the channel smoothed image S. hqi is the arithmetic mean over the all pixels that have an amplitude of more than 0.3. And clip(x, a, b) limits the summands to values in the range [a, b]. sch depends on the number of channels ch used for smoothing and therefore Q depends on ch too. Figure 6.4 shows how the quality measure Q grows with increasing channel number to find a constant level at approx. 60 channels. For larger filter mask sizes the quality measure looks similar. However, if one uses the same scene but bearing a higher noise level (because it was acquired with a shorter exposure time), the quality finds a maximum somewhere around 55 channels and then slowly decreases. The quality for conventional channel smoothing is always below two state channel smoothing. The quality measure for binomial smoothing is equal to Q1 and less than 0.02. 0.3 0.25 quality 0.2 0.15 0.1 normal channel smoothing two state channel smoothing binomial smoothing 0.05 0 0 10 20 30 40 50 60 channel number 70 80 90 100 Figure 6.4: Quality measure used to find optimal number of channels. However, Q does only partially capture the quality of the denoising, because the ”ground truth” we have is not a real ground truth. Moreover, Q is somewhat heuristic and the simple average (6.1) can not really measure how good the details of the image have been conserved. Therefore the above analysis is preliminary. 114 6.2. Synthetic Test Sequences 6.2 Synthetic Test Sequences In the following we will present results on synthetic image sequences, which correspond to our model assumption of a single, translatory motion. The simulated optical imaging is that of a pinhole camera. The range measurement is a perfect radial distance (except for noise we add to the data) to the simulated object surface. The amplitude information follows exactly a power law. We use the sequences for a proof of concept of our method. We illuminate its advantages and show what the requirements for its successful application are. We discuss possible extensions and limits. Tabular Result Scheme We discuss the method at examples that try to catch the typical problematic situation in context of 3D motion estimation. All examples are given in a fixed tabular scheme identical to that of figure 6.5. The content is as follows: a, b two amplitude image frames of the sequence with frame index 3 and 5, to give an idea of the motion. a contains a smaller subframe showing the range-image (that reveals not that much information). c,d,e show the errors in the velocity components U,V,W of the estimated motion field. U,V,W are the velocity components in the direction of the Cartesian coordinate axes X,Y,Z. The error is given relative to the single ground truth velocity components. The value range covered is shown in a small colormap on the left of the images. f, g are the confidence and type measure of the corresponding pixels. The value range is from zero to one. Parameters the table describes the noise levels of the synthetic data and the parameters of the motion estimation algorithm: • nlr and nlg are the noise level of the range and amplitude data. nlr is the standard deviation σr of the added normal noise in cm. nlg is σa relative to the contrast of the amplitude-signal. 115 Chapter 6. Applications • ps is the pre-smoothing level. The pre-smoothing for range data is a normalized averaging with a binomial applicability of (2ps+1)×(2ps+1) pixels. For the amplitude image standard binomial smoothing with the same mask size was applied. No smoothing in time was done (this typically decreases the quality of the estimate). • ws is the size of the binomial integration window (ws×ws) of the structure tensor. • τ is the threshold of the confidence measure (4.30). • τ2 is the minimum value that the second smallest eigenvalue must have such that the estimate is assumed to be a full flow. It corresponds to a threshold on the magnitude of the type measure (4.31). • β is the weighting factor for the amplitude structure tensor equation (4.35), i.e. the squared, global weighting factor for the rows of the data matrix of the extended BCCE (4.33). • wσ and wµ are the weights of the gradient based weighting (4.22) (only given if applied) Error analysis The table shows the density d of the estimated motion field and statistics (mean, standard deviation σ, minimum and maximum) to the errors of the estimate. The density d is defined as the number of pixels for which a full flow could be estimated and which have a confidence of at least 0.5 divided by the number of pixels for which a ground truth flow exists, i.e. non moving regions are excluded from the statistic. With the estimated flow vector h iT f̂ = Û , V̂ , Ŵ and ground truth flow f = [U, V, W ]T the given error types are • relative magnitude error: f̂ − f / |f | f̂ h ·f h , where f̂ h , f h are the homogeneous flow vec|f̂ h ||f h | tors, i.e. the vectors extended for the temporal dimension, f h = [U, V, W, 1]T (because the ”change” in time is by definition always 1) • angular error: arccos • directional error: arccos f̂ ·f |f̂ ||f h | • absolute magnitude error: f̂ − f 116 6.2. Synthetic Test Sequences • bias of estimate: (e · f )/ |f |, where e = f̂ − f . Thus this is the projection of the error vector on the ground truth flow vector. It indicates a bias (positive or negative) of the error in direction of the true flow. Description to the right of the error analysis table there is a short description and discussion of the most relevant features of the example. Algorithms and Performance Issues All results presented in the following sections use derivative filters optimized for orientation estimation along edges (i.e. optimized Sobel filters, see also section 3.1.4 on page 41 and [Sch00]). This is reasonable because motion estimation is basically spatiotemporal orientation estimation, as we have shown before. For the spatial derivatives we use 5 × 5 filters (i.e. no smoothing in time is applied), while we use for temporal derivatives a 3 × 3 × 3 filter, which smoothes within the spatial domain. The results were calculated using local TLS-motion-estimation as described in section 4.1.6. The local flow estimates were not regularized (see section 4.1.6.2). While we also used the subspace regularization scheme developed by Spies and Garbe [SG02], we found the improvements in quality compared to the increase in computational costs disappointing. The regularization of a flow field, with a full-flow-density of about 10%, takes about 4–5 times longer than the calculation of the basic flow field. Because regularization is implemented by iterative algorithms and may converge slowly (or not at all) the needed time also depends on the maximum number of iterations allowed (we used 500). Moreover, we found the regularization results to be very sensitive to parametrization and the structure of the processed sequences. This is also the reason, why we do not discuss the results for the estimated plane and line flow. These are only of interest if used within a regularization schema. We achieved reasonable densities in many cases without regularization by choosing the thresholds τ and τ2 appropriately. We achieved about 20 flow-field-frames/sec using a single threaded implementation on a 2.4GHz Intel Core 2 processor. The frames had a size of 160 × 120. The implementation is not optimized for avoiding unnecessary operations but rather to be as flexible as possible (for large parts of the implementation the high-level imageprocessing language and library Heurisko was used). This is why the author expects a possible increase in frame rate of about a factor 3, only by avoiding unnecessary operations that are trivial to detect. A elaborate analysis of the mathematical structure of the calculations, might reveal further optimization possibilities (the author 117 Chapter 6. Applications thinks that a multigrid (and/or multiscale) implementation could boost computational performance for about one magnitude at least). Most of the used algorithms are standard implementations (like Jacobi-rotations for eigenvalue analysis). Using more efficient, but equivalent algorithms and a multithreadable/parallelized implementation (for multi-core machines) should increase the frame rate significantly. 6.2.1 Motion of a plane We start on the most basic kind of motion, a (linear) translatory motion of a plane. It is the basis of our motion model, because we approximate the observed surface in motion as conjoined small planar surface patches. Therefore the motion field which our algorithm estimates should be optimal compared to scenes where the surface is curved or has discontinuities; the results are a kind of upper limit in motion field quality for the proposed method. 118 6.2. Synthetic Test Sequences a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.001 β = 0.3 ps = 0 ws = 5 U = 3.00 V = -6.00 W = 9.00 0 d=94.3 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 0.20 0.21 0.21 0.05 -0.01 min max 0.04 0.00 0.13 0.00 0.13 0.00 0.02 0.01 0.00 -0.02 0.34 0.62 0.61 0.12 0.01 This is the most ideal constellation, fitting perfectly to our model assumption. The sampling theorem is completely fulfilled, both in space and time. The only noise is due to (single-precision) floating point calculation roundoff errors. The density of the motion field is maximal (only the border pixels decrease the density). Maximum errors are less than 1% in magnitude and 1◦ in direction and there is no bias. The plane has a paraboloid-like shape in the range image, because of the radial distance measurement. Figure 6.5: Plane perpendicular to optical axis at distance Z=3m. Two superimposed planar wave patterns of different orientation yield a plaid like texture. a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.001 β = 0.3 ps = 0 ws = 5 U = 3.00 V = 6.00 W = 9.00 0 d=94.3 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 1.71 0.83 0.83 0.25 -0.06 min max 0.63 0.42 3.15 0.22 0.39 1.33 0.22 0.39 1.33 0.08 0.09 0.43 0.00 -0.06 -0.02 The plane covers a distance range from 2.6m–3.9m. The approximative character of the derivative filters introduces minor errors for the farther regions were the texture gets smaller, i.e. the curvature increases. There is a minor bias toward too small velocities. Figure 6.6: Plane at same distance but tilted with surface normal having zenith θ and azimuth φ of both 30◦ 119 Chapter 6. Applications a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.001 β = 0.3 ps = 0 ws = 5 U = 3.00 V = 6.00 W = 9.00 0 d=89.0 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 0.82 2.01 2.01 0.40 -0.36 min max 1.30 0.00 2.55 0.20 2.55 0.20 0.50 0.04 0.46 -5.41 13.72 35.07 34.91 6.47 -0.02 The flow estimate begins to break down for very far distances (the range is 2m–7m). For far distances the spatial sampling theorem is hardly fulfilled anymore. The filter derivatives introduce large errors and the flow estimate is biased for to small estimates. The density decreases. Notice that increasing the pattern size would improve the results (not shown). The confidence measure excludes successfully the worst region and indicates correctly were the measurements are less trustworthy. Figure 6.7: The plane is tilted heavy with a surface normal θ = 60◦ , φ = −30◦ a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.0003 β = 0.3 ps = 0 ws = 5 U = 3.00 V = 6.00 W = 9.00 0 d=86.7 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 0.27 0.57 0.57 0.12 -0.09 min max 0.28 0.00 0.35 0.03 0.35 0.03 0.07 0.02 0.07 -0.77 5.69 4.65 4.64 1.06 0.34 The radial-wave-like pattern has less local structure than the plaid pattern. Around the contour lines of small (brightness) gradient magnitude, i.e. the crests and troughs of the radial wave pattern, there is an aperture problem, only line flow can be determined. Moreover, the motion estimate using this pattern is more sensitive to noise compared to results for the plaid pattern (not shown). Figure 6.8: Plane with a radial-wave-like pattern tilted for θ = 45◦ , φ = −30◦ 120 6.2. Synthetic Test Sequences a 30% c d b -30% 1 f g e Parameters nlr = 1.00 nlg = 1.50 τ = 0.0005 τ2 = 0.1 β = 0.3 ps = 0 ws = 5 U = 3.00 V = 6.00 W = 12.00 0 d=93.7 % relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias mean sigma 9.04 4.74 4.75 1.79 0.02 min max 6.86 0.00 3.34 0.04 3.33 0.07 1.06 0.05 1.55 -5.97 54.26 36.33 36.29 13.00 5.63 Adding a weak noise to both data channels deteriorates the motion estimate. However, the density is still about maximum. The noise in range of σr = 1cm is about the noise of a PMD-sensor under good illumination conditions. Notice that increasing the velocity would improve the relative errors of the estimate. The error decreases approx. inverse proportional to the velocity, until the temporal sampling theorem is violated (then the motion estimate fails). Figure 6.9: Plane (θ = 25◦ , φ = 15◦ , rc = 3.5m) with weak normal i.i.d. noise added to both amplitude and range data a 30% c d b -30% 1 f g e Parameters nlr = 3.00 nlg = 3.00 τ = 0.0005 τ2 = 0.1 β = 0.3 ps = 0 ws = 5 U = 3.00 V = 6.00 W = 12.00 0 d=18.7 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 23.00 11.31 11.34 4.25 0.13 min max 16.97 0.02 105.60 10.03 0.12 94.22 9.99 0.31 93.68 2.65 0.25 16.61 4.04 -14.47 14.26 A high noise level decreases the density of the motion field clearly. The remaining velocity field is roughly of correct direction and magnitude. The confidence measure values indicate that the estimates are not too trustworthy. The type measure correctly indicates a full flow estimate. The estimate can be improved by pre-smoothing the data, as the next figure demonstrates. Figure 6.10: Same plane as before but noise is increased about a factor 3 121 Chapter 6. Applications a 30% c d b -30% 1 f g e Parameters nlr = 3.00 nlg = 3.00 τ = 0.0005 τ2 = 0.1 β = 0.3 ps = 3 ws = 7 U = 3.00 V = 6.00 W = 12.00 0 d=91.1 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 16.71 7.69 7.71 3.00 -0.32 min max 12.38 0.00 6.53 0.03 6.52 0.08 1.98 0.10 2.98 -14.06 89.15 78.70 78.32 14.78 13.33 By pre-smoothing the data by normalized averaging with a 7 × 7 binomial applicability and using a larger integration windows size (of the same size) we recovered nearly maximum density. Both magnitude and directional errors are improved. While maximum errors are still high the width of the error distribution was improved. Figure 6.11: Same plane and noise level as before but pre-smoothed a 30% c d b -30% 1 f g e Parameters nlr = 3.00 nlg = 3.00 τ =3 τ2 = 10 β = 0.3 ps = 3 ws = 7 U = 3.00 V = 6.00 W = 12.00 0 d=90.4 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 16.85 5.68 5.71 2.85 1.23 min max 13.97 0.00 129.36 3.67 0.00 32.69 3.66 0.11 32.72 1.98 0.03 21.30 2.52 -5.88 15.31 Here the same pre-smoothing and integration window size was used, but the structure tensor was equilibrated. While the error in magnitude has not changed significantly the directional errors were improved clearly (particularly σ is much smaller). However, the equilibration is strangely more biased then the not equilibrated estimate. Because of the equilibration the thresholds (τ, τ2 for confidence and type measure) need to be adapted. Figure 6.12: Flow estimation using an equilibrated structure tensor 122 6.2. Synthetic Test Sequences The presented results for a plane at pure translatory motion are in good agreement with our expectations. The method is somewhat sensitive to noise. For weak noise the results are still good. For medium level noise the results are acceptable if a pre-smoothing filter is applied that is about the same size as the integration window (here 7 × 7). Most important for a satisfactory result is that the sampling theorem is not violated. We also saw that if there is not enough structure in the neighborhood a full flow can not be estimated and the density of the motion field decreases. 6.2.2 Motion of a sphere We now present results for a sphere in motion. This introduces additional problems compared to the plane motion, because a sphere has a curved surface and is of limited extension. There are motion boundaries present that are not explicitly modeled. The texture on the surface is of varying curvature and has discontinuities that are in conflict with the spatial sampling theorem, which is rather problematic for the employed derivative filters. 123 Chapter 6. Applications a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.001 β = 0.3 ps = 0 ws = 5 U = 7.00 V = 4.00 W = 13.00 0 d=70.9 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 0.36 0.47 0.47 0.14 -0.01 min max 0.46 0.00 0.39 0.00 0.38 0.02 0.12 0.00 0.15 -0.53 4.69 3.94 3.93 1.21 1.20 The errors are comparable to those of the noise-free plane of similar texture. But the density of the motion field is decreased because in the center and on the central-right side of the sphere the discontinuities in the texture violate the spatiotemporal sampling theorem. Also at the edge of the sphere no motion estimation is possible due to the motion boundary. The size of missing border is about the size of the integration window. Figure 6.13: A sphere with a radius of 50cm and at a distance of 2.2m having a sinusoidal plaid texture. Motion boundaries and the violation of sampling theorem decrease the density of the motion field. a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.001 β = 0.3 ps = 3 ws = 5 U = 7.00 V = 4.00 W = 13.00 0 d=77.8 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 0.41 0.52 0.52 0.16 -0.00 min max 0.58 0.00 0.48 0.00 0.48 0.00 0.15 0.01 0.17 -1.05 5.58 5.36 5.36 1.70 1.69 The pre-smoothing removes the very fine structures and discontinuities. Therefore the sampling theorem is no longer violated and the density of the motion field is increased. Figure 6.14: Same sphere as above but with pre-smoothing of range and intensity data. 124 6.2. Synthetic Test Sequences a 30% c d b -30% 1 f g e Parameters nlr = 0.00 nlg = 0.00 τ = 1e-006 τ2 = 0.001 β = 0.3 ps = 3 ws = 5 U = 14.00 V = 10.00 W = 50.00 0 d=32.5 % relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias mean sigma 0.49 0.72 0.72 0.73 0.33 min max 0.55 0.00 0.73 0.00 0.73 0.00 0.72 0.04 0.68 -0.48 3.00 3.48 3.48 3.56 3.04 Increasing the velocity leads around fine structured regions to a violation of the sampling theorem in time. The motion field density decreases, but the estimated velocity vectors are still very good, i.e. the relative and directional errors show no significant difference to the results of the sphere with lower speed. Figure 6.15: Same sphere but with increased velocity of the sphere. The motion estimate breaks down around the fine textured structures a 30% c d b -30% 1 f g e Parameters nlr = 1.00 nlg = 1.50 τ = 0.001 τ2 = 0.2 β = 0.3 ps = 0 ws = 7 U = 5.00 V = 7.00 W = 9.00 0 d=73.8 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 8.46 5.78 5.79 1.75 -0.15 min max 6.58 0.01 43.64 4.08 0.08 33.68 4.06 0.15 33.57 1.01 0.05 8.09 1.34 -5.50 6.37 The integration window size was increased to 7 × 7, to compensate for the increased noise. The errors in the motion estimate are of the same magnitude as those for the plane subject to noise of the same (weak) noise level. Only the density is smaller. No pre-smoothing was applied. Thus we find in the center of the sphere an unestimated region. Figure 6.16: The same sphere at a farther distance subject to weak noise, with no pre-smoothing applied 125 Chapter 6. Applications a 30% c d e b -30% 1 f g Parameters nlr = 1.00 nlg = 1.50 wσ = 2.00 τ = 0.001 τ2 = 0.2 β = 0.3 ps = 0 ws = 7 wµ = 0.60 U = 5.00 V = 7.00 W = 9.00 0 d=81.3 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 9.06 6.16 6.17 1.85 -0.09 min max 6.77 0.00 53.97 3.90 0.09 36.56 3.88 0.09 36.43 0.99 0.08 7.77 1.35 -5.53 4.89 Here we demonstrate how the proposed gradient based weighting of the data terms integrated by the structure tensor can increase the density of the estimate without reducing the quality of the estimate. The errors are comparable to the prior example but the density is increased. Figure 6.17: Identical to the prior example but with gradient based weighting applied. The density of the estimated motion field is increased a 50% c d e b -50% 1 f g Parameters nlr = 3.00 nlg = 3.00 wσ = 0.50 τ = 0.0009 τ2 = 0.1 β = 0.3 ps = 3 ws = 9 wµ = 0.00 U = 10.00 V = 15.00 W = 18.00 0 d=81.0 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 13.50 9.53 9.53 5.78 3.12 min max 10.64 0.00 115.92 6.97 0.07 53.38 6.97 0.09 53.32 3.80 0.18 38.67 4.28 -10.59 33.47 Here we increased the integration window size to 9 × 9 and applied pre-smoothing. Gradient based weighting improves the density here too, at a small loss of quality in the estimate (2% in relative magnitude, insignificant in angular error) compared to the binomial weighted structure tensor (not shown). Figure 6.18: Sphere corrupted by noise of same level as used in figures 6.10 and 6.12. Pre-smoothing and gradient based weighting yields an acceptable quality at a high density of the motion field. 126 6.3. Real World Sequences The estimated motion fields for the sphere we presented are of similar quality as those for the plane motion. Motion boundaries do not increase the errors in the velocity field but are excluded by the confidence measure and therefore reduce the density of the motion field. Gradient based weighting can increase the density of the motion field, at an only small loss of quality. Its important to notice that the ratio of density to error of the motion field is not fixed but depends on the chosen parameters. Particularly the thresholds τ and τ2 on confidence and type measure regulate if an estimate is a full flow or not. Typically increasing τ increases the density at a cost of quality. And the right choice of the threshold depends on the noise level of the signal, which may vary over the image. While weighting the input data during pre-smoothing according to a confidence measure (as discussed in section 3.1.5) is a first step, an automatic adaption to the noise level would be desirable. However, the author is not sure how this could be done best. Some type of analysis of the kind of a Wiener-filter may be appropriate, but a detailed analysis of this topic is outstanding. 6.3 Real World Sequences After the proof of concept by simulated data, we demonstrate the performance of the motion estimation algorithm on real world sequences. We used sequences of the pyramid target moving in horizontal direction and parallel to the optical axis. The input data are mean images (from step mode acquisition described in section 5.2.2) that have a lower noise level and show no motion artifacts compared to live sequences. The results are presented in the same tabular scheme as the synthetic sequences. The meaning of the error colormaps has changed slightly. While in the previous figures it has been the percental error w.r.t. the single ground-truth velocity components U, V and W, its now the percental error w.r.t. the ground-truth velocity magnitude, i.e. all three error images have the same scale. The definition of the density d has changed slightly too (because we have no ground truth velocity field of the hole scene, but only the ground truth velocity of the target). It is defined by the number of pixels with a full flow estimate and a confidence of at least 0.5 divided by the number of pixels that have an amplitude higher than a specific threshold (which is 20 for all pyramid sequences). All motion fields were calculated using a fixed scaling exponent a = 3 (see equation (4.32)). This is a sane compromise for the distance range from 1.8–4.8m, present in the used sequences, w.r.t. the varying and not fully understood dependence of a on the distance as depicted in figure 5.13 and discussed in section 5.2.3. 127 Chapter 6. Applications a 30% c d b -30% 1 f g e Parameters ps = 0 wσ = 2.00 τ = 5e-006 τ2 = 0.1 β = 0.3 ws = 7 wµ = 0.50 U = 0.00 V = 0.00 W = -6.00 0 d=45.9 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 2.92 4.15 4.11 0.49 -0.01 min max 2.18 0.00 11.30 2.32 0.12 19.02 2.28 0.24 18.79 0.24 0.03 2.10 0.22 -0.60 0.67 The pyramid target performs a pure translation in depth. The errors are in the magnitude of the spatial resolution of the sensor. This is not trivial, as the range sequence appears divergent. The estimate is not biased for this frame, but generally there is a bias for uncalibrated camera data that corresponds to the periodic phase errors show in figure 5.6. Figure 6.19: Translation of the pyramid target in Z-direction at central position. The density and quality of the motion field are good. a 30% c d b -30% 1 f g e Parameters ps = 0 wσ = 2.00 τ = 1e-005 τ2 = 0.1 β = 0.3 ws = 7 wµ = 0.50 U = 0.00 V = 0.00 W = -6.00 0 d=18.8 % relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias mean sigma 6.64 3.88 3.91 0.62 0.36 min max 4.21 0.03 26.67 2.10 0.04 13.80 2.02 0.52 13.62 0.25 0.07 1.68 0.28 -0.33 1.58 The target is no longer in a central position. The confidence of the left and right triangle of the pyramid is about zero. While in the central position (of the prior example) the triangles have a reduced confidence too, its much more articulate in the border positions. The estimate errors are only increased in magnitude but not directional. Figure 6.20: Translation of the pyramid target in Z-direction at left position. The density of the estimate is clearly reduced. 128 6.3. Real World Sequences a 30% c d b -30% 1 f g e Parameters ps = 0 wσ = 2.00 τ = 1e-005 τ2 = 0.1 β = 0.3 ws = 7 wµ = 0.50 U = 0.00 V = 0.00 W = -6.00 0 d=19.8 % relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias mean sigma 6.32 3.98 4.00 0.61 0.35 min max 4.15 0.01 24.86 2.71 0.02 33.32 2.65 0.30 32.99 0.32 0.04 4.10 0.25 -0.25 1.47 The fixed pattern noise was removed by calibration. While the range and amplitude data looks much better, the motion estimate is not significantly improved. Sigma of the angular error got even worse. Only the magnitude error shows a minor improvement. Figure 6.21: Same as before but fixed pattern noise was removed. Interestingly the estimate is not improved. a 30% c d b -30% 1 f g e Parameters ps = 0 wσ = 2.00 τ = 1e-005 τ2 = 0.05 β =0 ws = 7 wµ = 0.50 U = 0.00 V = 0.00 W = -6.00 0 d=18.1 % relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias mean sigma 9.96 7.75 7.73 1.10 0.51 min max 5.90 0.16 37.29 5.20 0.15 26.48 5.09 0.61 26.17 0.56 0.19 3.00 0.38 -0.59 2.12 Here we demonstrate that a motion estimation fails in determining a full flow, if it does not take advantage of the amplitude information using the extended BCCE (4.33). While the confidence in the estimate is increased, only line or plane flows can be estimated (not shown). The errors are clearly increased. Figure 6.22: No amplitude information for the motion estimate was used (β = 0). The aperture problem allows only an estimate at the diagonal edges of the pyramid. 129 Chapter 6. Applications a 100% c d b -100% 1 f g e Parameters ps = 0 wσ = 2.00 τ = 6e-006 τ2 = 0.3 β = 0.3 ws = 7 wµ = 0.50 U = -1.00 V = 0.00 W = 0.00 0 d=20.3 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 47.65 41.90 28.75 0.70 0.49 min max 29.47 0.08 181.15 41.83 0.41 175.36 16.80 1.11 92.92 0.37 0.04 2.18 0.53 -1.77 2.11 A horizontal translation of the pyramid target is rather problematic for the motion estimation. Besides missing information in the left and right triangle, there is a clear bias in the estimate of the upper and lower triangle. The bias contributes to about 70% of the magnitude error. The angular error is large. Pre-smoothing the data will not help to solve the problem. Dependent on the chosen threshold it will either decrease the density or increase the errors. Figure 6.23: Horizontal translation of the pyramid target. There is a strong bias in the motion estimate. The motion estimates for the pyramid sequences 6.19 to 6.23 show rather heterogeneous results. While pure motions in Z-direction could be estimated well, the results for the estimates in X-direction are rather bad. The problem is not due to noise in the data as this is really low. Interestingly removing the fixed pattern noise increased the standard deviation in the angular errors! The author thinks that the pyramid target is a quite difficult object for motion estimation. While it looks rather optimal, as it has structure both in range and amplitude information, a second look reveals that the texture information is of a kind that is suboptimal for our algorithm. The texture in the amplitude is due to shadows and varying angles of reflectance. This kind of texture is not modeled as we assume spatially but not temporally varying reflectivity on the object surface, i.e. the reflectivity texture on the surface is constant in time. Moreover the step edges in the range data of the pyramid are hard to handle properly by the derivative filters. At each edge there is a discontinuity and on each step we find an aperture problem. This might be addressed by pre-smoothing, but it seems that the integration window size is to small to come to an unique estimate. Possibly a multiscale approach could help, but we are not sure about this. An open question in this context is weather the simple pinhole camera model introduces significant errors. Therefore a geometric calibration of the camera is desirable. 130 6.3. Real World Sequences Figure 6.22 showed the benefit of taking both information channels of the PMDsignal into account. If only the range information is used the errors in magnitude at similar density increased about 50% in magnitude and 100% in direction. We remarked with figure 6.19, that the motion estimates are generally biased if the camera is not properly calibrated in range, with a similar method as exemplified in section 5.2.2. Figure 6.24 shows the motion estimation results for an uncalibrated sequence of the checkerboard target performing a motion along the optical axis. Besides the decreased density due to the low reflectivity patches (pixels of low amc c X−velocity Y−velocity 10 10 5 10 10 5 diagonal−pixels diagonal−pixels 5 15 20 0 25 30 −5 35 5 15 20 0 25 30 −5 35 40 10 20 30 40 50 frame c 60 70 −10 80 40 Z−velocity 20 40 frame 60 80 −10 70 68 5 66 10 c diagonal−pixels 64 15 62 20 60 58 25 56 30 54 35 40 52 10 20 30 40 frame 50 60 70 80 50 Figure 6.24: Checkerboard target translation of 60mm/frame in Z-direction from 1.2m–6.0m: a, b and c velocity components U, V and W [mm/frame] at the pixel positions along the (magenta) diagonal, shown in the amplitude image d (at 2.2m). The variation of W with the distance corresponds to the periodic phase error of the camera. plitude are excluded from the motion field), we see the bias of the motion estimate that varies with the range in correspondence to the measurements 5.6 and 5.10. 131 Chapter 6. Applications The results for the pyramid target are not satisfactory. Therefore, we have a look at a real world sequence in a more natural setup which is depicted in figure 6.25. While there is no ground truth motion data for these kind of sequences we can at least check if the results are consistent with the apparent motion. We used the average a 100% c d b -100% 1 f g e Parameters ps = 1 wσ = 2.00 τ = 0.001 τ2 = 0.1 β = 0.3 ws = 7 wµ = 0.50 U = 7.00 V = 3.00 W = -25.00 0 d=53.3 % mean sigma relative [%] dir’nal [◦ ] angular [◦ ] magnitude bias 35.26 13.38 13.56 10.88 -3.73 min max 25.73 0.05 9.07 0.20 9.35 0.33 6.28 0.22 9.58 -21.57 99.14 52.26 78.63 26.54 22.08 The errors given on the left are a very rough approximation. We took the approximate mean velocity of the person and used it as ground truth for the sequence. Because the robot moves much slower than the person, it is of red color in the W-error-map (e). We find the motion field to be consistent with the apparent motion. Moreover, it has a high density (except for the missing head and the motion boundaries). Figure 6.25: Real world sequence with a person moving toward the camera and a robot that moves in the same direction but much slower. velocity of the moving person calculated from the motion field as ”ground truth”. Because there are at least two major motions in the sequence the mean errors are not too meaningful. More interesting are the standard deviations as they should be an upper limit of the true standard deviation because we falsely treat two motions as one w.r.t. the error statistic. Nevertheless sigma is relatively small which indicates that our estimate is consistent w.r.t. the apparent major motions. The similar color in the velocity components of each of the two moving objects indicates this too. 132 6.4. Summary 6.4 Summary We have shown that the proposed local motion estimation method yields correct motion fields for synthetic range sequences that show translatory motion. Weak to medium level i.i.d. noise in the range and amplitude data can be compensated by an appropriate pre-smoothing if necessary at all. The method is numerically stable, i.e. (small) errors in the input are not magnified in the output. Therefore the quality of the motion field might be reduced due to noise, but only relative to the level of noise introduced. The method however is sensitive to a correct spatiotemporal sampling of the input data. Motions large compared to the spatial frequency of the range or amplitude data (i.e. its range-structure or reflectivity-texture) can not be handled directly, as they violate the temporal sampling theorem. An extension of the method to a multiscale implementation similar to the coarse-to-fine strategy developed by Black and Anandan [BA96] would be necessary to overcome this problem. More advanced techniques including a regularization of the motion field as introduced by Bruhn, Weickert, and Schnoerr [BWS05] seem applicable too. Very fine (spatial) textures are problematic w.r.t. the employed derivative filters, independent of the magnitude of the motion, and need to be addressed by pre-smoothing the data. The results for the real world sequences are heterogeneous. Most likely due to the discussed problematic nature of the pyramid target w.r.t. our model assumptions, we could not yield satisfactory results for a translation in horizontal direction, while the results for a motion in depth are in very good agreement with the true motion field. Analysis of a motion sequence in a natural setup showed the consistency of the estimated motion field with the apparent motion. The density of the locally determined motion field is pleasantly high. The aperture problem was successfully addressed by using both range and amplitude signal of the PMD-sensor. Combining both data channels yields for most cases a higher quality and density of the determined motion field. 133 Chapter 6. Applications 134 Chapter 7 Conclusion and Outlook In the previous chapters we have presented and discussed methods from a large number of research fields, with the final goal to extend the possibilities of image processing w.r.t. the analysis of dynamic processes captured in image sequences. In particular we want to estimate correct, physical, three dimensional motion fields. The image sequences of interest are acquired with range cameras based on the TOF distance measuring principle realized in PMD-technology. In the following we give a summary of the major topics we worked up and how well we met our goal. The author states his personal evaluation of PMD-technology w.r.t. motion estimation and highlights important topics that should be addressed regarding the sensor technology as well as algorithmic aspects of motion estimation. 7.1 Summary To optimally exploit the extended information content that a PMD-camera features, we need to understand where this information originates from. Therefore we analyzed the basic measuring principle and showed possible sources of errors that can be avoided, if specific details about the technical realization are know; particularly the blindfold application of the common formula (2.12), which is actually valid only for sinusoidal modulation (and even harmonics of higher order), can lead to articulate systematic errors if the exact modulation of the emitted (infrared) light is more rectangular than sinusoidal. We presented formula (2.14) to calculate the correct range and amplitude for a rectangular modulated optical signal. Actually, the producers of the cameras should be aware of the problem and correct or better avoid them. However, for current camera models this is not the case yet. 135 Chapter 7. Conclusion and Outlook We have shown a new way to correct for one of the most prominent systematic errors by means of a calibration based on Fourier approximation. The correction of the error, after calibration has been done, can be calculated easily and from very few data. Therefore an implementation of the correction algorithm directly on the camera should be unproblematic. We also discussed several further errors of the sensor and derived, based on the statistical errors, an uncertainty measure for the acquired range data, which can be used in further image processing steps. Thus, we might increase the accuracy of results or at least come to a better estimate about the magnitude of the possible errors, i.e. a confidence measure. We successfully used the uncertainty measure within a novel extension to B-spline channel smoothing, that we named two state smoothing. We introduced the necessary theoretical fundament for motion estimation from image sequences. We use the range flow constraint equation in a novel formulation that embeds the used pinhole camera model analytically and reduces the number of necessary filter operations. To improve motion estimation results w.r.t. the unavoidable aperture problem, we extended the brightness change constraint equation for the PMD-camera, assuming the irradiance to follow a (inverse) power law. Thus we can take both BCCE and RFCE into account and use both in a combined structure tensor approach, that is computational efficient. On synthetic and real world data we applied our method successfully, achieving rather dense motion fields without the need for regularization. However, the method has still weaknesses that need to be addressed: the algorithm fails to estimate a satisfactory motion field under specific conditions for which, within the limits of the method, an estimate should be possible. Within the project Lynkeus, funded by the Federal Ministry of Education and Research (BMBF), we implemented an interface to our algorithm, that allows it to be used within a complex runtime environment. Figure 7.1 shows the range flow estimation module within a simple configuration of the Lynkeus runtime environment (RTE)∗ . Furthermore, we developed a method for motion artifact detection and utilized it in conjunction with two state channel smoothing for the successful realization of ∗ the Lynkeus RTE is developed by the collaboration partner Elektrobit 136 7.2. Evaluation and Outlook Figure 7.1: A prototype of the range flow module developed for the Lynkeus runtime environment. a demonstrator within the project Smartvision. Figure 7.2 shows the GUI to our software and the associated experimental setup (from the collaboration partner Schmersal), which demonstrates the application of PMD technology in the field of safety engineering. Figure 7.2: GUI and experimental setup, demonstrating the application of PMD technology in the field of safety engineering. 7.2 Evaluation and Outlook First of all the author would like to state that he thinks the PMD technology to be a promising new method to acquire quickly and with rather small effort range maps of the close surrounding. The ease of use and installation may be one of the major advantages of this technology. 137 Chapter 7. Conclusion and Outlook However, the method still has several limitations that hopefully will be addressed in near future. The most important from the author’s point of view are: framerate the current framerates of maximal 20 frames/sec are rather problematic for high accuracy motion estimates that our method aims at. dynamic range due to the dependence of pixel irradiance on object distance and of range measurement accuracy on irradiance and the limited capacitance of the pixels’ storage sites (see section 2.2.2.2), the dynamic range of a PMDsensor is critical w.r.t. its applicability in real world tasks. Techniques like multiple exposure and adaptive illumination should be exploited to overcome shortcomings of current sensor models w.r.t. this topic. systematic errors a better calibration of the camera is possible and should be done on producers side. While some problems like the interdependence of range and amplitude/reflectivity are not well understood yet, there are still several errors that can be corrected easily. motion artifacts the problem of motion artifacts should be addressed technically by taking the necessary correlation samples at the same time. While prototypic PMD’s of this kind were realized as 4-tap lock-in pixels by Lange [Lan00], their realization seems to contradict other optimization criteria of the CMOSprocess, like for instance the fill factor. However, for the sake of a simple and robust analysis of dynamic image data, the problem should be addressed nevertheless. objective the optical elements used for some camera models seem to be partially suboptimal. Scattered light can influence the range measurements very badly. temperature the range measurement is not stable over time, but depends clearly on the temperature of the sensor (however this problem is more serious for the SR3000, than for O3D and PMD19k). documentation a better technical documentation on specific features of the camera would be of great benefit, particularly regarding demodulation contrast and modulation of the light source. With respect to our own algorithms there is room for various extension and improvements: 138 7.2. Evaluation and Outlook • So far, the thresholds necessary for the eigenvalue analysis of the structure tensor (and hence for motion estimation) need to be tuned manually. While the same thresholds can be used for various kinds of sequences, still the need to adapt for local and global noise levels exists. A detailed analysis of the novel constraint equations w.r.t. error propagation and particularly equilibration is standing out. An automatic adaption of the thresholds based on this analysis seems possible, more particularly as we consider the uncertainty measure that can be derived from the PMD-signal. An additional local analysis in the kind of a Wiener filter might be helpful for noise estimation; also a local analysis of the variation of the eigenvalues in the structure tensor might be appropriate, but computational costly. • We used the subspace regularization scheme presented in [SG02], but found the improvements in quality compared to the increase in computational costs disappointing. Meanwhile there exist more advanced concepts of global methods incorporating local estimates. An extension of our method towards the concepts and techniques presented in [BWS05; Pap+06; Bru+06] seems attractive. This would allow the estimation of dense flow fields in the presence of locally extended aperture problems and in situations where the temporal sampling theorem is partially violated. At the moment our method is limited to rather ”well behaving” image data, i.e. data that complies to the sampling theorem spatially and temporally. With current limitations of the cameras in frame rate and resolution and the given (rather high) noise level such sequences can only be acquired for a limited number of real world applications. • An additional analysis of the method on more realistic simulated test data is advised. Originally we wanted to test the algorithms also on sequences created with the TOF-simulator developed by Keller et al. [Kel+07]. Unfortunately the simulator was not yet capable of simulating textures on surfaces, which however are essential to our method. • Finally a more thorough study of the interdependence of range and amplitude/reflectivity measurement is of interest for an exact motion estimate. A mostly uncertain property in this context is the amplitude/irradiance dependent demodulation contrast of the various sensors. It could be essential for an optimal modeling of the irradiance used in the extended BCCE. 139 Chapter 7. Conclusion and Outlook 140 Part III Appendices 141 Acronyms and Notation Acronyms and Abbreviations APS Active Pixel Sensor CCD Charge Coupled Device CMOS Complementary Metal–Oxide–Semiconductor CTE Charge Tranfer Efficiency DFT Discrete Fourier Transform DRNU Dark Response NonUniformity FFT Fast Fourier Transform FT Fourier Transform i.i.d. Independent and Identically Distributed MTF Modulation Transfer Function NLS Nonlinear Least Squares OLS Ordinary Least Squares pdf Probability Density Function PMD Photonic Mixer Device PRNU Photo Response NonUniformity PSF Point Spread Function PTLS Partial Total Least Squares SNR Signal-to-Noise Ratio 143 Chapter 7. Conclusion and Outlook SVD Singular Value Decomposition TLS Total Least Squares TOF Time Of Flight w.l.o.g. Without Loss Of Generality General notation ha, bi Standard dot product between vectors a and b hvi Expectation value or average w.r.t. to random variable or set v hhA, Bii Dot product between matrices A and B v̂ ı Estimate of a (random) variable v in a statistical sense √ Imaginary unit −1 M m × n matrix diag a A diagonal matrix with vector a on its diagonal. I Identity matrix tr A The trace of matrix A. 1l Unit matrix of ones B Binomial convolution operator or averaging operator O, P, Q, . . . Caligraphic letters indicate a representation-independent operator v Column vector m̄ Statistical mean (either arithmetic mean or population mean) over a set of measurements m v̂ Normalized or unit vector (vi ) Vector v with components vi . an The nth vector of a sequence of vectors vT, M T Transposed (column) vector (i.e. row vector) or matrix 144 Acronyms and Notation Greek Symbols ϕ, θ Phase of a periodic signal σ Standard deviation of a normal distribution Latin Symbols C Complex numbers N Natural numbers R Real numbers Rn n-dimensional vector space over R 145 Acronyms and Notation 146 Bibliography [Alb07] Martin Albrecht. “Untersuchung von Photogate-PMD-Sensoren hinsichtlich qualifizierender Charakterisierungsparameter und -methoden”. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2007. [BA96] M. J. Black and P. Anandan. “The robust estimation of multiple motions: parametric and piecewise-smooth flow fields”. In: Computer Vision and Image Understanding 63 (1996). Pp. 75–104. [Bar00] E. Barth. “The Minors of the Structure Tensor”. In: DAGM. Kiel, Germany 2000. Pp. 221–228. [Bar+03] E. Barth et al. “Spatio-temporal Motion Estimation for Trancparency and Occlusions”. In: In Proceedings of IEEE International Conference on Image Processing. 2003. [Big06] Josef Bigun. Vision with direction. Berlin: Springer Verlag, 2006. URL: http://www2.hh.se/staff/josef/. [Bla+98] M. J. Black et al. “Robust anisotropic diffusion”. In: IEEE Transactions on Image Processing 7.3 (Mar. 1998). Pp. 412–432. [BR96] Michael J. Black and Anand Rangarajan. “On the Unification of Line Processes, Outlier Rejection and Robust Statistics with Applications in Early Vision”. In: International Journal of Computer Vision 19.1 (July 1996). Pp. 57–92. [Bru+06] A. Bruhn et al. “A Multigrid Platform for Real-Time Motion Computation with Discontinuity-Preserving Variational Methods”. In: International Journal of Computer Vision 70.3 (2006). Pp. 257–277. DOI: 10.1007/s11263-006-6616-7. [BWS05] A. Bruhn, J. Weickert, and C. Schnoerr. “Lucas/Kanade Meets Horn/Schunk: Combining Local and Global Optic Flow Methods”. In: International Journal of Computer Vision 61.3 (2005). Pp. 211–231. 147 Bibliography [DD02] Frédo Durand and Julie Dorsey. “Fast bilateral filtering for the display of high-dynamic-range images.” In: SIGGRAPH. Ed. by Tom Appolloni. ACM, 2002. Pp. 257–266. ISBN: 1-58113-521-1. URL: http://dblp. uni-trier.de/db/conf/siggraph/siggraph2002.html#DurandD02. [Far03] Gunnar Farnebäck. “Two-Frame Motion Estimation Based on Polynomial Expansion”. In: Proceedings of the 13th Scandinavian Conference on Image Analysis. LNCS 2749. Gothenburg, Sweden 2003. Pp. 363–370. [FFS06] Michael Felsberg, Per-Erik Forssén, and Hanno Scharr. “Channel Smoothing: Efficient Robust Smoothing of Low-Level Signal Features.” In: IEEE Trans. Pattern Anal. Mach. Intell. 28.2 (Aug. 23, 2006). Pp. 209–222. URL: http://dblp.uni-trier.de/db/journals/pami/ pami28.html#FelsbergFS06. [FGW02] Per-Erik Forssén, G.H. Granlund, and J. Wiklund. Channel Representation of Colour Images. Technical Report LiTH-ISY-R-2418. Dept. of Electrical Eng., Linköping Univ., 2002. [For04] Per-Erik Forssén. “Low and Medium Level Vision using Channel Representations”. Dissertation No. 858, ISBN 91-7373-876-X. PhD thesis. SE-581 83 Linköping, Sweden: Linköping University, Sweden, 2004. [For98] Bength Fornberg. “Calculation of Weights in Finite Difference Formulas”. In: SIAM Review 40.3 (1998). Pp. 685–691. [Fos93] E. R. Fossum. “Active pixel sensors: are CCDs dinosaurs?” In: ChargeCoupled Devices and Solid State, Optical Sensors III. Ed. by M. M. Blouke. Vol. 1900. Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference. 1993. Pp. 2–14. [Fow+98] B. Fowler et al. “A Method for Estimating Quantum Efficiency for CMOS Image Sensors”. In: Proc. SPIE 3301 (1998). Pp. 178–185. [FPA99] C. Fermueller, R. Pless, and J. Aloimonos. “Statistical Biases in Optic Flow”. In: CVPR’99. Fort Collins, Colorado 1999. [Fra07] Mario Frank. “Investigation of a 3D Camera”. MA thesis. Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, 2007. [FSF02] Michael Felsberg, Hanno Scharr, and Per-Erik Forssén. The B-Spline Channel Representation: Channel Algebra and Channel Based Diffusion Filtering. Tech. Report LiTH-ISY-R-2461. Dept. of Electrical Eng., Linköping Univ., 2002. 148 Bibliography [GHO99] G. H. Golub, P. C. Hansen, and D. P. O’Leary. “Tikhonov Regularization and Total Least Squares”. In: SIAM Journal on Matrix Analysis and Applications 21.1 (1999). Pp. 185–194. [GK95] G. H. Granlund and H. Knutsson. Signal Processing for Computer Vision. Dordrecht, The Netherlands: Kluwer Academic, 1995. [GL80] G. H. Golub and C. F. van Loan. “An Analysis of the Total Least Squares Problem”. In: SIAM Journal on Numerical Analysis 17.6 (Dec. 1980). Pp. 883–893. [GL96] G. H. Golub and C. F. van Loan. Matrix Computations. 3rd ed. Baltimore and London: The Johns Hopkins University Press, 1996. [Gov06] V.M. Govindu. “Revisiting the Brightness Constraint: Probabilistic Formulation and Algorithms”. In: ECCV. 2006. III: 177–188. [Grö03] Hermann Gröning. “Radiometrische Kalibrierung und Charakterisierung von CCD- uund CMOS Bild-Sensoren und Monokulares 3D-Tracking in Echtzeit”. PhD thesis. University of Heidelberg, 2003. URL: http: //www.ub.uni-heidelberg.de/archiv/3589. [Had02] J. Hadamard. “Sur les problèmes aux dérivées partielles et leur signification physique”. In: Princeton University Bulletin (1902). Pp. 49–52. [Hei01] Horst G. Heinol. “Untersuchung und Entwicklung von modulationslaufzeitbasierten 3D-Sichtsystemen”. German. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2001. P. 157. [HF01] H. Haußecker and D. J. Fleet. “Computing Optical Flow with Physical Models of Brightness Variation”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 23.6 (June 2001). Pp. 661–673. [Hil82] E. C. Hildreth. “The Integration of Motion Information along Contours”. In: IEEE Workshop on Computer Vision, Representation and Control. 1982. Pp. 83–91. [Hor86] B. K. P. Horn. Robot Vision. Cambridge, MA: MIT Press, 1986. [Hor87] B. K. P. Horn. “Motion fields are hardly ever ambiguous”. In: Int.J.of Computer Vision 1 (1987). Pp. 259–274. 149 Bibliography [HS99] Horst Haußecker and Hagen Spies. “Motion”. In: Handbook of Computer Vision and Applications. Ed. by Bernd Jähne, Peter Geißler, and Horst Haußecker. Vol. 2: Signal Processing and Pattern Recognition. Academic Press, 1999. Chap. 13. [Häu+99] G. Häusler et al. “Three-Dimensional Sensors - Potentials and Limitations”. In: Handbook of Computer Vision and Applications. 1. Academic Press, 1999. Pp. 485–506. [Hub81] P. J. Huber. Robust Statistics. New York: John Wiley and Sons, 1981. [JDD03] Thouis R. Jones, Frédo Durand, and Mathieu Desbrun. “Non-iterative, feature-preserving mesh smoothing.” In: ACM Trans. Graph. 22.3 (Feb. 9, 2003). Pp. 943–949. URL: http://dblp.uni-trier.de/db/ journals/tog/tog22.html#JonesDD03. [JGH99] Bernd Jähne, Peter Geißler, and Horst Haußecker. Handbook of Computer Vision and Applications. San Diego: Academic Press, 1999. [JH00] Bernd Jähne and Horst Haußecker. Computer Vision and Applications: A Guide for Students and Practitioners. Academic Press, 2000. [Jäh02] B. Jähne. Digital Image Processing. 5th ed. Berlin, Germany: Springer, 2002. [Jäh04] Bernd Jähne. Practical Handbook on Image Processing for Scientific and Technical Applications. 2nd ed. Boca Rota London New York Washington, D.C.: CRC Press, 2004. [JHG99] B. Jähne, H. Haußecker, and P. Geißler. “Neighborhood Operators”. In: Handbook of Computer Vision and Applications. 5 2. Academic Press, 1999. Pp. 93–124. [JSK99] B. Jähne, H. Scharr, and S. Körkel. “Principles of Filter Design”. In: Handbook of Computer Vision and Applications. Ed. by B. Jähne, H. Haußecker, and P. Geißler. Vol. 2. Academic Press, 1999. Pp. 125–151. [Jus01] Detlef Justen. “Untersuchung eines neuartigen 2D- gestützten 3D-PMDBildverarbeitungssystems”. German. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2001. [Kel+07] M. Keller et al. “A Simulation Framework for Time-Of-Flight Sensors”. In: International Symposium on Signals, Circuits and Systems (ISSCS). Vol. 1. Iasi, Romania: IEEE CAS Society, 2007. Pp. 125–128. 150 Bibliography [Kla05] M. Klar. “Design of an endoscopic 3-D Particle-Tracking Velocimetry system and its application in flow measurements within a gravel layer”. PhD thesis. University of Heidelberg, 2005. URL: http://archiv.ub. uni- heidelberg.de/volltextserver/volltexte/2005/5961/pdf/ klar_PHD2005.pdf. [KRI06] T. Kahlmann, F. Remondino, and H. Ingensand. “Calibration for increased accuracy of the range imaging camera SwissRanger”. In: International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences XXXVI.5 (2006). Pp. 136–141. [Köt03] U Köthe. “Edge and junction detection with an improved structure tensor”. In: Proc. of 25th DAGM Symposium. Ed. by B. Michaelis and G. Krell. Vol. 2781. Lecture Notes in Computer Science. DAGM. Magdeburg 2003. Pp. 25–32. URL: http://kogs- www.informatik.unihamburg.de/~koethe/papers/structureTensor.pdf. [KW93] H. Knutsson and C. F. Westin. “Normalized and Differential Convolution: Methods for Interpolation and Filtering of Incomplete and Uncertain Data”. In: CVPR. New York City 1993. Pp. 515–516. [Lan00] Robert Lange. “Time-of-Flight Distance Measurement with Custom Solid-State Image Sensors in CMOS/CCD-Technology”. English. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2000. [LK06] M. Lindner and A. Kolb. “Lateral and Depth Calibration of PMDDistance Sensors”. In: International Symposium on Visual Computing (ISVC06). Vol. 2. Lake Tahoe, Nevada: Springer, 2006. Pp. 524–533. ISBN: 978-3-540-48626-8. [LP02] Prof. Dr. Wolfgang von der Linden and DI Alexander Prüll. Wahrscheinlichkeitstheorie, Statistik und Datenanalyse. Course-Script, Institute of Theoretical and Computational Physics, TU Graz, 2002. URL: http: //itp.tugraz.at/LV/wvl/Statistik/A_WS_pdf.pdf. [LS01] R. Lange and P. Seitz. “Solid-state time-of-flight range camera”. In: Quantum Electronics, IEEE Journal of 37.3 (2001). Pp. 390–397. [Lua01] Xuming Luan. “Experimental Investigation of Photonic Mixer Device and Development of TOF 3D Ranging Systems Based on PMD Technology”. English. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2001. 151 Bibliography [MF81] L. Mortara and A. Fowler. “Evaluations of charge-coupled device (CCD) performance for astronomical use”. In: Proc. SPIE 290 (1981). Pp. 28–33. [Müh04] Matthias Mühlich. “Estimation in Projective Spaces and Applications in Computer Vision”. PhD thesis. Johann Wolfgang Goethe Universität in Frankfurt am Main, 2004. [MM01] Matthias Mühlich and Rudolf Mester. “Subspace Methods and Equilibration in Computer Vision”. In: Proceedings of Scandinavian Conference on Image Analysis SCIA 2001 Bergen. 2001. [MSB01] C. Mota, I. Stuke, and E. Barth. “Analytic solutions for multiple motions”. In: Proc. of International Conference on Image Processing. Vol. 2. 2001. Pp. 917–920. [MV06] Kurt Meyberg and Peter Vachenauer. Differentialgleichungen, Funktionentheorie, Fourier-Analysis, Variationsrechnung. Vol. 2. Höhere Mathematik. Berlin ; Heidelberg: Springer, 2006. XIII, 457 S. ISBN: 3-54041851-2, 978-3-540-41851-1. [NGK94] Klas Nordberg, Gösta H. Granlund, and Hans Knutsson. “Representation and Learning of Invariance”. In: ICIP (2). 1994. Pp. 585–589. URL: http://dblp.uni-trier.de/db/conf/icip/icip1994-2.html# NordbergGK94. [Pap+06] Nils Papenberg et al. “Highly Accurate Optic Flow Computation with Theoretically Justified Warping.” In: International Journal of Computer Vision 67.2 (2006). Pp. 141–158. [Pla06] Matthias Plaue. Analysis of the PMD Imaging System. Tech. rep. Interdisciplinary Center for Scientific Computing (IWR), Univ. of Heidelberg, 2006. [PM90] P. Perona and J. Malik. “Scale Space and Edge Detection using Anisotropic Diffusion”. In: PAMI 12 (July 1990). Pp. 629–639. [Rap07] Holger Rapp. “Experimental and Theoretical Investigation of Correlating TOF-Camera Systems”. MA thesis. Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, 2007. [RL87] Peter J. Rousseeuw and Annik M. Leroy. Robust Regression and Outlier Detection. Wiley & Sons, New York, 1987. 152 Bibliography [Sch00] H. Scharr. “Optimale Operatoren in der Digitalen Bildverarbeitung”. PhD thesis. Heidelberg, Germany: University of Heidelberg, 2000. [Sch03] Bernd Schneider. “Der Photomischdetektor zur schnellen 3DVermessung für Sicherheitssysteme und zur Informationsübertragung im Automobil”. PhD thesis. Siegen, Germany: Department of Electrical Engineering and Computer Science, 2003. [Sch+98] R. Schwarte et al. “Novel 3D-vision systems based on layout optimized PMD-structures”. German. In: Technisches Messen 65.7-8 (1998). Pp. 264–271. ISSN: 0171-8096. [SG02] H. Spies and C. S. Garbe. “Dense Parameter Fields from Total Least Squares”. In: Pattern Recognition. Ed. by L. Van Gool. Vol. LNCS 2449. Lecture Notes in Computer Science. Zurich, CH: Springer-Verlag, 2002. Pp. 379–386. URL: http : / / books . google . com / books ? id = 0xcL1dIafSUC. [SJB00] H. Spies, B. Jähne, and J. L. Barron. “Regularised Range Flow”. In: ECCV. Ed. by D. Vernon. Vol. 2. Lecture Notes in Computer Science 1843. Dublin, Ireland: Springer, 2000. Pp. 785–799. [Spi01] H. Spies. “Analysing Dynamic Processes in Range Data Sequences”. PhD thesis. Heidelberg, Germany: University of Heidelberg, 2001. [Spi+99] Hagen Spies et al. “Differential Range Flow Estimation”. In: DAGMSymposium. 1999. Pp. 309–316. [Ste99] Charles V. Stewart. “Robust Parameter Estimation in Computer Vision”. In: Society for Industrial and Applied Mathematics, SIAM 41.3 (1999). Pp. 513–537. [Stu+03] I. Stuke et al. “Estimation of multiple motions: regularization and performance evaluation”. In: Image and Video Communication and Processing, Proceedings of SPIE. Vol. 5022. 2003. Pp. 75–86. [Tel+06] Alexandru Telea et al. “A Variational Approach to Joint Denoising, Edge Detection and Motion Estimation.” In: DAGM-Symposium. 2006. Pp. 525–535. [TM98] Carlo Tomasi and Roberto Manduchi. “Bilateral Filtering for Gray and Color Images.” In: ICCV. 1998. Pp. 839–846. URL: http://dblp.unitrier.de/db/conf/iccv/iccv1998.html#TomasiM98. 153 Bibliography [TRK01] Yanghai Tsin, Visvanathan Ramesh, and Takeo Kanade. “Statistical Calibration of CCD Imaging Process”. In: In IEEE International Conference on Computer Vision. 2001. [Tsc02] D. Tschumperle. “PDE’s based regularization of multivalued images and applications”. PhD thesis. Université de Nice-Sophia, 2002. [Tsc06] David Tschumperlé. “Fast Anisotropic Smoothing of Multi-Valued Images using Curvature-Preserving PDE’s”. In: Int. J. Comput. Vision 68.1 (2006). Pp. 65–82. ISSN: 0920-5691. DOI: http://dx.doi.org/ 10.1007/s11263-006-5631-z. [VHV91] S. Van Huffel and J. Vandewalle. The Total Least Squares Problem: Computational Aspects and Analysis. http://www.netlib.org/vanhuffel/. Philadelphia: Society for Industrial and Applied Mathematics, 1991. [Wag03] C. Wagner. “Informationstheoretische Grenzen optischer 3D-Sensoren”. PhD thesis. Universität Erlangen-Nürnberg, 2003. [Wes94] C. F. Westin. “A Tensor Framework for Multidimensional Signal Processing”. PhD thesis. Linköping, Sweden: Linköping University, 1994. [Xia+06] J. Xiao et al. “Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection”. In: ECCV06. 2006. I: 211–224. [Xu99] Zhanping Xu. Investigation of 3D-Imaging Systems Based on Modulated Light and Optical RF-Interferometry (ORFI). English. Vol. 14. ZESS Forschungsberichte. Aachen, Germany: Shaker Verlag, 1999. ISBN: 38265-6736-6. URL: http://www.shaker.de/Online-Gesamtkatalog/ Booklist.idc?Reihe=102. [Yam+93] F. Yamamoto et al. “Three-dimensional PTV based on binary crosscorrelation method”. In: JSME International Journal, Series B 36.2 (1993). Pp. 279–284. 154

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement